Like `Intl.Segmenter`, but for paragraphs instead of graphemes/words/sentences
npm install paraseg



Like Intl.Segmenter,
but for paragraphs instead of graphemes/words/sentences.
* How do I install it?
* How do I use it?
* Does it handle both Unix and Windows line endings?
* How do I specify custom paragraph separators?
* How do I trim leading and trailing spaces from segmented paragraphs?
* Can I save memory by only returning offset and length data for each segment?
* Is there a change log?
* How do I set up the dev environment?
* What versions of Node.js does it support?
* What license is it released under?
If you're using npm:
```
npm i paraseg --save
Or if you just want
the git repo:
``
git clone git@gitlab.com:philbooth/paraseg.git
`ts
import * as assert from 'node:assert';
import { ParagraphSegmenter } from 'paraseg';
const text = The quick brown fox
jumps over the lazy dog.
How now brown cow?;
const segmenter = new ParagraphSegmenter();
const paragraphs = [];
for (const paragraph of segmenter.segment(text)) {
paragraphs.push(paragraph);
}
assert.equal(paragraphs.length, 2);
assert.equal(paragraphs[0].segment, 'The quick brown fox jumps\nover the lazy dog.\n\n');
assert.equal(paragraphs[1].segment, 'How now brown cow?');
`
Yes.
Pass the separators option to the constructor.separators is an array of substring candidates
to be matched as segmentation points.
`ts
import * as assert from 'node:assert';
import { ParagraphSegmenter } from 'paraseg';
const text = The quick brown fox jumps over the lazy dog.
How now brown cow?;
const segmenter = new ParagraphSegmenter({
separators: ['\n'],
});
const paragraphs = [];
for (const paragraph of segmenter.segment(text)) {
paragraphs.push(paragraph);
}
assert.equal(paragraphs.length, 2);
assert.equal(paragraphs[0].segment, 'The quick brown fox jumps over the lazy dog.\n');
assert.equal(paragraphs[1].segment, 'How now brown cow?');
`
Pass the trim option to the constructor.trim is a boolean,
set it to true if you want to remove spaces
from the start and end of each paragraph.
`ts
import * as assert from 'node:assert';
import { ParagraphSegmenter } from 'paraseg';
const text = The quick brown fox
jumps over the lazy dog.
How now brown cow? ;
const segmenter = new ParagraphSegmenter({ trim: true });
const paragraphs = [];
for (const paragraph of segmenter.segment(text)) {
paragraphs.push(paragraph);
}
assert.equal(paragraphs.length, 2);
assert.equal(paragraphs[0].segment, 'The quick brown fox\n jumps over the lazy dog.');
assert.equal(paragraphs[1].segment, 'How now brown cow?');
`
Yes,
pass the slim: true option to the constructor.
`ts
import * as assert from 'node:assert';
import { ParagraphSegmenter } from 'paraseg';
const text = The quick brown fox
jumps over the lazy dog.
How now brown cow?;
const segmenter = new ParagraphSegmenter({ slim: true });
const paragraphs = [];
for (const paragraph of segmenter.segment(text)) {
paragraphs.push(paragraph);
}
assert.equal(paragraphs.length, 2);
assert.equal(Object.keys(paragraphs[0]).includes('segment'), false);
assert.equal(Object.keys(paragraphs[1]).includes('segment'), false);
assert.equal(paragraphs[0].offset, 0);
assert.equal(paragraphs[1].offset, text.indexOf('How now brown cow?'));
assert.equal(paragraphs[0].length, paragraphs[1].offset);
assert.equal(paragraphs[1].length, text.length - paragraphs[0].length);
`
Note that slim: true is mutually exclusivetrim
with the option.
Yes.
To compile TypeScript:
``
make build
To lint the code:
``
make lint
To run the tests:
```
make test
Node versions 20 or greater are supported.
MIT.