A lightweight wrapper around Intl.Segmenter for segment-aware string operations
npm install segment-stringA lightweight, intuitive wrapper around Intl.Segmenter for seamless segment-aware string operations in TypeScript and JavaScript.
---
- Intuitive Intl.Segmenter Wrapper: Simplifies text segmentation with a clean API.
- Standards-Based: Built on native Intl.Segmenter for robust compatibility.
- Lightweight & Tree-Shakeable: Minimal footprint with optimal bundling (836B minified + gzipped).
- Highly Performant: Uses iterators for efficient, on-demand processing.
- Full TypeScript Support: Strict types for safe, predictable usage.
---
``shell`
npm install segment-string
segment-string is a lightweight wrapper for Intl.Segmenter, designed to simplify locale-sensitive text segmentation in JavaScript and TypeScript. It lets you easily segment and manipulate text by graphemes, words, or sentences, ideal for handling complex cases like multi-character emojis or language-specific boundaries.
`typescript
import { SegmentString } from "segment-string";
const str = new SegmentString("Hello, world! ๐ฉโ๐ฉโ๐งโ๐ฆ๐๐");
// Segment by grapheme
console.log([...str.graphemes()]); // ['H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd', '!', ' ', '๐ฉโ๐ฉโ๐งโ๐ฆ', '๐', '๐']
`
---
The SegmentString class encapsulates a string and provides methods for segmentation, counting, and retrieving segments at specified indices with locale and granularity options.
`typescript`
new SegmentString(str: string, locales?: Intl.LocalesArgument);
- str: The string to segment.
- locales: Optional locales argument for segmentation.
#### segments(granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): Iterable
Segments the string by the specified granularity and returns the segments as strings.
#### rawSegments(granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): Intl.Segments | Iterable
Returns raw Intl.SegmentData objects based on granularity and options.
#### segmentCount(granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): number
Counts segments in the string based on the specified granularity.
#### segmentAt(index: number, granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): string | undefined
Retrieves the segment at a specific index, supporting negative indices.
#### rawSegmentAt(index: number, granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): Intl.SegmentData | undefined
Returns the raw segment data at a specific index, supporting negative indices.
#### graphemes(options?: SegmentationOptions): Iterable
Returns an iterable of grapheme segments as strings.
#### rawGraphemes(options?: SegmentationOptions): Iterable
Returns an iterable of raw grapheme segments.
#### graphemeCount(options?: SegmentationOptions): number
Counts grapheme segments in the string.
#### graphemeAt(index: number, options?: SegmentationOptions): string | undefined
Returns the grapheme at a specific index, supporting negative indices.
#### rawGraphemeAt(index: number, options?: SegmentationOptions): Intl.SegmentData | undefined
Returns the raw grapheme data at a specific index, supporting negative indices.
#### words(options?: WordSegmentationOptions): Iterable
Returns an iterable of word segments, with optional filtering for word-like segments.
#### rawWords(options?: WordSegmentationOptions): Iterable
Returns an iterable of raw word segments, with optional filtering for word-like segments.
#### wordCount(options?: WordSegmentationOptions): number
Counts word segments in the string.
#### wordAt(index: number, options?: WordSegmentationOptions): string | undefined
Returns the word at a specific index, supporting negative indices.
#### rawWordAt(index: number, options?: WordSegmentationOptions): Intl.SegmentData | undefined
Returns the raw word data at a specific index, supporting negative indices.
#### sentences(options?: SegmentationOptions): Iterable
Returns an iterable of sentence segments.
#### rawSentences(options?: SegmentationOptions): Iterable
Returns an iterable of raw sentence segments.
#### sentenceCount(options?: SegmentationOptions): number
Counts sentence segments in the string.
#### sentenceAt(index: number, options?: SegmentationOptions): string | undefined
Returns the sentence at a specific index, supporting negative indices.
#### rawSentenceAt(index: number, options?: SegmentationOptions): Intl.SegmentData | undefined
Returns the raw sentence data at a specific index, supporting negative indices.
#### [Symbol.iterator](): Iterator
Returns an iterator over the graphemes of the string.
---
`typescript
import { SegmentString } from "segment-string";
const text = new SegmentString("Hello, world! ๐ฉโ๐ฉโ๐งโ๐ฆ๐๐");
// Segmenting by words
for (const word of text.words()) {
console.log(word); // 'Hello', ',', ' ', 'world', '!', ' ๐ฉโ๐ฉโ๐งโ๐ฆ๐๐'
}
// Segmenting graphemes and counting
console.log([...text.graphemes()]); // ['H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd', '!', ' ', '๐ฉโ๐ฉโ๐งโ๐ฆ', '๐', '๐']
console.log("Grapheme count:", text.graphemeCount()); // 17
console.log("String length:", text.toString().length); // 29
// Accessing a specific word
const secondWord = text.wordAt(1, { isWordLike: true }); // 'world'
console.log(secondWord);
`
---
Alternatively, the SegmentSplitter class allows you to create an instance that can be directly used with JavaScript's String.prototype.split method for basic segmentation.
`typescript`
new SegmentSplitter
- granularity: Specifies the segmentation granularity level ('grapheme', 'word', 'sentence', etc.).
- options: Optional settings to customize the segmentation for the given granularity.
`typescript`
const str = "Hello, world!";
const wordSplitter = new SegmentSplitter("word", { isWordLike: true });
const words = str.split(wordSplitter);
console.log(words); // ["Hello", "world"]
---
`typescript`
function getRawSegments(
str: string,
granularity: Granularity,
options?: SegmentationOptions | WordSegmentationOptions,
): Intl.Segments | Iterable
- Description: Returns raw Intl.SegmentData objects based on granularity and options.str
- Parameters:
- : The string to segment.granularity
- : Specifies the segmentation level ('grapheme', 'word', or 'sentence').options
- : Includes locales for specifying locale and isWordLike for filtering word-like segments.Intl.SegmentData
- Returns: An iterable of raw .
`typescript`
function getSegments(
str: string,
granularity: Granularity,
options?: SegmentationOptions | WordSegmentationOptions,
): Iterable
- Description: Returns segments of the string as plain strings.
- Parameters: Similar to getRawSegments.
- Returns: An iterable of segments as strings.
`typescript`
function segmentCount(
str: string,
granularity: Granularity,
options?: SegmentationOptions | WordSegmentationOptions,
): number;
- Description: Returns the count of segments based on granularity and options.
- Parameters: Similar to getRawSegments.
- Returns: Number of segments.
`typescript`
function rawSegmentAt(
str: string,
index: number,
granularity: Granularity,
options?: SegmentationOptions | WordSegmentationOptions,
): Intl.SegmentData | undefined;
- Description: Returns the raw segment data at a specified index, supporting negative indices.
- Parameters: Similar to getRawSegments, plus an index parameter.Intl.SegmentData
- Returns: The at the specified index, or undefined if out of bounds.
`typescript`
function segmentAt(
str: string,
index: number,
granularity: Granularity,
options?: SegmentationOptions | WordSegmentationOptions,
): string | undefined;
- Description: Returns the segment at a specified index, supporting negative indices.
- Parameters: Similar to getRawSegments, plus an index parameter.undefined
- Returns: The segment at the specified index or if out of bounds.
`typescript`
function filterRawWordLikeSegments(
segments: Intl.Segments,
): Iterable
- Description: Filters and returns an iterable of raw word-like segment data where isWordLike is true.segments
- Parameters:
- : The segments to filter.Intl.SegmentData
- Returns: An iterable of for each word-like segment.
`typescript`
function filterWordLikeSegments(segments: Intl.Segments): Iterable
- Description: Filters and returns an iterable of word-like segments as strings where isWordLike is true.segments
- Parameters:
- : The segments to filter.
- Returns: An iterable of strings for each word-like segment.
---
> ๐ This package was templated with create-typescript-app`.