split for files, streams, or any text really
split-anything lets you read files line by line synchronously. As a bonus,
you can use it
- on strings like String.split,
- as a stream transform like split2,
- to read files line by line _asynchronously_, or
- with your own interface around the underlying SplitAnything class.
What sets it apart from other text-splitting utilities is that it preserves line
endings by default, unlike String.split-based
solutions.
String interface - splitStr
---------------------------
splitStr takes a string an returns an array of strings where each element
is a line:
~~~javascript
const { splitStr } = require('split-anything')
const input = so much depends
upon
a red wheel
barrow ♥♥♥
const output = [
'so much depends\n',
'upon\n',
'\n',
'a red wheel\n',
'barrow ♥♥♥'
]
tap.same(splitStr(input), output)
~~~
Usage: splitStr(text[, separator[, chomp]])
> - text _String_ the text to split into lines.
> - separator _RegExp_ the line boundary. Default: /\n/
> - chomp _boolean_ whether to remove line boundaries from the end of lines. Default: false
> - Returns: _Array_ text split into lines.
Set chomp to true if you don't want to keep line endings, e.g.
~~~javascript
tap.same(splitStr(input, /\n/, true), [
'so much depends',
'upon',
'',
'a red wheel',
'barrow ♥♥♥'
])
~~~
You can use arbitrary regular expressions as your "line boundary", e.g.
if you just want to get rid of empty lines:
~~~javascript
tap.same(splitStr(input, /(?<=\n)\n+/, true), [
'so much depends\nupon\n',
'a red wheel\nbarrow ♥♥♥'
])
~~~
File interface - SplitReader
----------------------------
~~~javascript
const { SplitReader } = require('split-anything')
const fName = tmp.tmpNameSync()
fs.writeFileSync(fName, input)
const reader = new SplitReader(fName)
const readOut = []
let line
while ((line = reader.readSync(), line !== null)) {
readOut.push(line)
}
tap.same(readOut, output)
~~~
Usage: new SplitReader(file[, encoding[, separator[, chomp]])
> - file _String or integer_ name of the file to read from, or a file descriptor.
> - encoding _String_ the file encoding. Default: 'utf8'
> - separator _RegExp_ the line boundary. Default: /\n/
> - chomp _boolean_ whether to remove line boundaries from the end of lines. Default: false
splitReader.readSync([bufLength])
> - bufLength _integer_ the number of bytes to read. Default: 250.
> - Returns: _String_ the next line of text, or null if EOF is reached.
>
> If it's possible to avoid reading from the underlying file (i.e. if some
> lines have already been read and buffered), then this function doesn't read,
> and just returns the next line from the buffer. Conversely, if after reading
> bufLength bytes from the file it still hasn't found a complete line, then
> it reads again until it has a line to return.
~~~javascript
const fd = fs.openSync(fName, 'r')
const fdReader = new SplitReader(fd, 'utf8', /\s/, true)
const words = []
while ((line = fdReader.readSync(1), line !== null)) {
words.push(line)
}
tap.same(words, input.split(/\s/))
~~~
You can also read lines asynchronously:
~~~javascript
tap.test('async read', t => {
t.plan(1)
const fd = fs.openSync(fName, 'r')
const reader = new SplitReader(fd)
const lines = []
const readTilEnd = () => reader.read().then(line => {
if (line === null) {
t.same(lines, output)
} else {
lines.push(line)
readTilEnd()
}
})
readTilEnd()
})
~~~
SplitReader.read([bufLength])
> - bufLength _integer_ the number of bytes to read. Default: 250.
> - Returns: _Promise_ resolves to the next line of text, or null if
> the end of file has been reached.
>
> This is the async counterpart of readSync.
It's safe to use both read and readSync on the same file:
~~~javascript
tap.test('mixed read', t => {
const fd = fs.openSync(fName, 'r')
const reader = new SplitReader(fd)
reader.read(20).then(line => {
t.equals(line, 'so much depends\n')
t.equals(reader.readSync(10), 'upon\n')
reader.read(5).then(line => {
t.equals(line, '\n')
t.equals(reader.readSync(1), 'a red wheel\n')
reader.read(1).then(line => {
t.equals(line, 'barrow ♥♥♥')
t.equals(reader.readSync(1), null)
t.end()
})
})
})
})
~~~
Stream interface - SplitTransform
---------------------------------
~~~javascript
const { SplitTransform } = require('split-anything')
tap.test('You can do the same thing with a stream transform', t => {
const actualOutput = []
const tx = new SplitTransform()
tx.on('data', line => actualOutput.push(line))
tx.on('end', () => {
t.plan(1)
t.same(actualOutput, output)
})
tx.end(input)
})
~~~
Usage: new SplitTransform([separator[, chomp[, streamOptions]]])
> - separator _RegExp_ the line boundary. Default: /\n/
> - chomp _boolean_ whether to remove line boundaries from the end of lines. Default: false
> - streamOptions _Object_ options to pass to the streams.Transform
> constructor.
~~~javascript
tap.test('separator & chomp just like splitStr and SplitReader', t => {
const actualOutput = []
const tx = new SplitTransform(/\b\w{1,3}\s/, true, { highWaterMark: 2 })
tx.on('data', line => actualOutput.push(line))
tx.on('end', () => {
t.plan(1)
t.same(actualOutput, [
'much depends\nupon\n\n',
'wheel\nbarrow ♥♥♥'
])
})
tx.end(input)
})
~~~
Generic interface - SplitAnything
---------------------------------
splitStr, SplitTransform, and SplitReader are all wrappers around
SplitAnything. If you have some text to split but these interfaces don't
work for you, you can build your own by interacting with SplitAnything
directly.
~~~javascript
const { SplitAnything } = require('split-anything')
const sa = new SplitAnything()
sa.cat(input)
tap.equals(sa.getLine(true), 'so much depends\n')
tap.equals(sa.getLine(true), 'upon\n')
tap.equals(sa.getLine(true), '\n')
tap.equals(sa.getLine(true), 'a red wheel\n')
tap.equals(sa.getLine(true), 'barrow ♥♥♥')
tap.equals(sa.getLine(true), undefined)
~~~
new SplitAnything([separator[, chomp]])
> - separator _RegExp_ the line boundary. Default: /\n/
> - chomp _boolean_ whether to remove line boundaries from the end of lines. Default: false
SplitAnything.cat(str)
> - str _String_ the chunk of text to concatenate
> - Returns: this so you can chain calls
>
> Appends str to the internal text buffer.
~~~javascript
const sa1 = (new SplitAnything(/ /, true)).cat('1 2').cat('3 4')
tap.equals(sa1.getLine(true), '1')
tap.equals(sa1.getLine(true), '23')
tap.equals(sa1.getLine(true), '4')
~~~
SplitAnything.getLine([last])
> Returns the next complete line from the text that has been cat so far, or
> undefined if there isn't one. The last line from the text always counts
> as incomplete so it won't be returned, because SplitAnything expects you to
> cat more text. If you've reached the end of the text you want to split, set
> last to true (default: false), and the last line will be counted as
> complete and returned when its turn comes.
~~~javascript
sa.cat('1\n2\n3')
tap.equals(sa.getLine(), '1\n')
tap.equals(sa.getLine(), '2\n')
tap.equals(sa.getLine(), undefined)
tap.equals(sa.getLine(true), '3')
tap.equals(sa.getLine(true), undefined)
sa.cat('4\n5\n')
tap.equals(sa.getLine(), '4\n')
tap.equals(sa.getLine(), '5\n')
tap.equals(sa.getLine(true), undefined)
~~~
Contributing
------------
This project is left deliberately imperfect to encourage you to participate in
its development. If you make a Pull Request that
- explains and solves a problem,
- follows standard style, and
- maintains 100% test coverage
it _will_ be merged: this project follows the
C4 process.
To make sure your commits follow the style guide and pass all tests, you can add
./.pre-commit
to your git pre-commit hook.