BFJ

![Build status](https://gitlab.com/philbooth/bfj/pipelines)
![Package status](https://www.npmjs.com/package/bfj)
![Downloads](https://www.npmjs.com/package/bfj)
![License](https://opensource.org/licenses/MIT)

Big-Friendly JSON. Asynchronous streaming functions for large JSON data sets.

* Why would I want those?
* Is it fast?
* What functions does it implement?
* How do I install it?
* How do I read a JSON file?
* How do I parse a stream of JSON?
* How do I selectively parse individual items from a JSON stream?
* How do I write a JSON file?
* How do I create a stream of JSON?
* How do I create a JSON string?
* What other methods are there?
* bfj.walk (stream, options)
* bfj.eventify (data, options)
* What options can I specify?
* Options for parsing functions
* Options for serialisation functions
* Is it possible to pause parsing or serialisation from calling code?
* Can it break long strings into chunks?
* Can it recursively parse JSON nested inside a JSON string?
* Can it handle newline-delimited JSON (NDJSON)?
* Is there a change log?
* How do I set up the dev environment?
* What versions of Node.js does it support?
* What license is it released under?

Why would I want those?

If you need
to parse huge JSON strings
or stringify huge JavaScript data sets,
it monopolises the event loop
and can lead to out-of-memory exceptions.
BFJ implements asynchronous functions
and uses pre-allocated fixed-length arrays
to try and alleviate those issues.

Is it fast?

No.

BFJ yields frequently
to avoid monopolising the event loop,
interrupting its own execution
to let other event handlers run.
The frequency of those yields
can be controlled with the yieldRate option,
but fundamentally it is not designed for speed.

Furthermore,
when serialising data to a stream,
BFJ uses a fixed-length buffer
to avoid exhausting available memory.
Whenever that buffer is full,
serialisation is paused
until the receiving stream processes some more data,
regardless of the value of yieldRate.
You can control the size of the buffer
using the bufferLength option
but really,
if you need quick results,
BFJ is not for you.

What functions does it implement?

Nine functions
are exported.

Five are
concerned with
parsing, or
turning JSON strings
into JavaScript data:

* read
asynchronously parses
a JSON file from disk.

* parse and unpipe
are for asynchronously parsing
streams of JSON.

* match
selectively parses individual items
from a JSON stream.

* walk
asynchronously walks
a stream,
emitting events
as it encounters
JSON tokens.
Analagous to a
[SAX parser][sax].

The other four functions
handle the reverse transformations,
serialising
JavaScript data
to JSON:

* write
asynchronously serialises data
to a JSON file on disk.

* streamify
asynchronously serialises data
to a stream of JSON.

* stringify
asynchronously serialises data
to a JSON string.

* eventify
asynchronously traverses
a data structure
depth-first,
emitting events
as it encounters items.
By default
it coerces
promises, buffers and iterables
to JSON-friendly values.

How do I install it?

If you're using npm:

``npm i bfj --save`

Or if you just want the git repo:

`git clone git@gitlab.com:philbooth/bfj.git`

`How do I read a JSON file?`

`js const bfj = require('bfj');

bfj.read(path, options) .then(data => { // :) }) .catch(error => { // :( });`

readreturns a promise and asynchronously parses a JSON file from disk.

It takes two arguments; the path to the JSON file and an options object.

If there are no syntax errors, the returned promise is resolved with the parsed data. If syntax errors occur, the promise is rejected with the first error.

`How do I parse a stream of JSON?`

`js const bfj = require('bfj');

// By passing a readable stream to bfj.parse(): bfj.parse(fs.createReadStream(path), options) .then(data => { // :) }) .catch(error => { // :( });

// ...or by passing the result from bfj.unpipe() to stream.pipe(): request({ url }).pipe(bfj.unpipe((error, data) => { if (error) { // :( } else { // :) } }))`

* parsereturns a promise and asynchronously parses a stream of JSON data.

It takes two arguments; a [readable stream][readable] from which the JSON will be parsed and an options object.

If there are no syntax errors, the returned promise is resolved with the parsed data. If syntax errors occur, the promise is rejected with the first error.

* unpipereturns a [writable stream][writable] that can be passed to [stream.pipe][pipe], then parses JSON data read from the stream.

It takes two arguments; a callback function that will be called after parsing is complete and an options object.

If there are no errors, the callback is invoked with the result as the second argument. If errors occur, the first error is passed the callback as the first argument.

`How do I selectively parse individual items from a JSON stream?`

`js const bfj = require('bfj');

// Call match with your stream and a selector predicate/regex/JSONPath/string const dataStream = bfj.match(jsonStream, selector, options);

// Get data out of the returned stream with event handlers dataStream.on('data', item => { / ... / }); dataStream.on('end', () => { / ... /); dataStream.on('error', () => { / ... /); dataStream.on('dataError', () => { / ... /);

// ...or you can pipe it to another stream dataStream.pipe(someOtherStream);`

matchreturns a readable, object-mode stream and asynchronously parses individual matching items from an input JSON stream.

It takes three arguments: a [readable stream][readable] from which the JSON will be parsed; a selector argument for determining matches, which may be a string, a regular expression, a JSONPath expression, or a predicate function; and an options object.

If the selector is a string, it will be compared to property keys to determine whether each item in the data is a match. If it is a regular expression, the comparison will be made by calling the [RegExptestmethod][regexp-test] with the property key. If it is a JSONPath expression, it must start with$.to identify the root node and only usechildscope expressions for subsequent nodes. Predicate functions will be called with three arguments:key, value and depth. If the result of the predicate is a truthy value then the item will be deemed a match.

In addition to the regular options accepted by other parsing functions, you can also specifyminDepthto only apply the selector to certain depths. This can improve performance and memory usage, if you know that you're not interested in parsing top-level items.

If there are any syntax errors in the JSON, adataErrorevent will be emitted. If any other errors occur, anerror event will be emitted.

`How do I write a JSON file?`

`js const bfj = require('bfj');

bfj.write(path, data, options) .then(() => { // :) }) .catch(error => { // :( });`

writereturns a promise and asynchronously serialises a data structure to a JSON file on disk. The promise is resolved when the file has been written, or rejected with the error if writing failed.

It takes three arguments; the path to the JSON file, the data structure to serialise and an options object.

`How do I create a stream of JSON?`

`js const bfj = require('bfj');

const stream = bfj.streamify(data, options);

// Get data out of the stream with event handlers stream.on('data', chunk => { / ... / }); stream.on('end', () => { / ... /); stream.on('error', () => { / ... /); stream.on('dataError', () => { / ... /);

// ...or you can pipe it to another stream stream.pipe(someOtherStream);`

streamifyreturns a [readable stream][readable] and asynchronously serialises a data structure to JSON, pushing the result to the returned stream.

It takes two arguments; the data structure to serialise and an options object.

If there a circular reference is encountered in the data andoptions.circular is not set to 'ignore', adataErrorevent will be emitted. If any other errors occur, anerror event will be emitted.

`How do I create a JSON string?`

`js const bfj = require('bfj');

bfj.stringify(data, options) .then(json => { // :) }) .catch(error => { // :( });`

stringifyreturns a promise and asynchronously serialises a data structure to a JSON string. The promise is resolved to the JSON string when serialisation is complete.

It takes two arguments; the data structure to serialise and an options object.

`What other methods are there?`

`$3`

`js const bfj = require('bfj');

const emitter = bfj.walk(fs.createReadStream(path), options);

emitter.on(bfj.events.array, () => { / ... / }); emitter.on(bfj.events.object, () => { / ... / }); emitter.on(bfj.events.property, name => { / ... / }); emitter.on(bfj.events.string, value => { / ... / }); emitter.on(bfj.events.number, value => { / ... / }); emitter.on(bfj.events.literal, value => { / ... / }); emitter.on(bfj.events.endArray, () => { / ... / }); emitter.on(bfj.events.endObject, () => { / ... / }); emitter.on(bfj.events.error, error => { / ... / }); emitter.on(bfj.events.dataError, error => { / ... / }); emitter.on(bfj.events.end, () => { / ... / });`

walkreturns an [event emitter][eventemitter] and asynchronously walks a stream of JSON data, emitting events as it encounters tokens.

It takes two arguments; a [readable stream][readable] from which the JSON will be read and an options object.

The emitted events are defined as public properties of an object,bfj.events:

* bfj.events.arrayindicates that an array context has been entered by encountering the[ character.

* bfj.events.endArrayindicates that an array context has been left by encountering the] character.

* bfj.events.objectindicates that an object context has been entered by encountering the{ character.

* bfj.events.endObjectindicates that an object context has been left by encountering the} character.

* bfj.events.propertyindicates that a property has been encountered in an object. The listener will be passed the name of the property as its argument and the next event to be emitted will represent the property's value.

* bfj.events.stringindicates that a string has been encountered. The listener will be passed the value as its argument.

* bfj.events.stringChunkindicates that a string chunk has been encountered if thestringChunkSizeoption was set. The listener will be passed the chunk as its argument.

* bfj.events.numberindicates that a number has been encountered. The listener will be passed the value as its argument.

* bfj.events.literalindicates that a JSON literal (eithertrue, false or null) has been encountered. The listener will be passed the value as its argument.

* bfj.events.errorindicates that an error was caught from one of the event handlers in user code. The listener will be passed theErrorinstance as its argument.

* bfj.events.dataErrorindicates that a syntax error was encountered in the incoming JSON stream. The listener will be passed anErrorinstance decorated withactual, expected, lineNumber and columnNumberproperties as its argument.

* bfj.events.endindicates that the end of the input has been reached and the stream is closed.

* bfj.events.endLineindicates that a root-level newline character has been encountered in an NDJSON stream. Only emitted if thendjson option is set.

If you are using bfj.walkto sequentially parse items in an array, you might also be interested in the [bfj-collections] module.

`$3`

`js const bfj = require('bfj');

const emitter = bfj.eventify(data, options);

eventifyreturns an [event emitter][eventemitter] and asynchronously traverses a data structure depth-first, emitting events as it encounters items. By default it coerces promises, buffers and iterables to JSON-friendly values.

It takes two arguments; the data structure to traverse and an options object.

The emitted events are defined as public properties of an object,bfj.events:

* bfj.events.arrayindicates that an array has been encountered.

* bfj.events.endArrayindicates that the end of an array has been encountered.

* bfj.events.objectindicates that an object has been encountered.

* bfj.events.endObjectindicates that the end of an object has been encountered.

* bfj.events.stringindicates that a string has been encountered. The listener will be passed the value as its argument.

* bfj.events.numberindicates that a number has been encountered. The listener will be passed the value as its argument.

* bfj.events.literalindicates that a JSON literal (eithertrue, false or null) has been encountered. The listener will be passed the value as its argument.

* bfj.events.errorindicates that an error was caught from one of the event handlers in user code. The listener will be passed theErrorinstance as its argument.

* bfj.events.dataErrorindicates that a circular reference was encountered in the data and thecircular option was not set to 'ignore'. The listener will be passed anErrorinstance as its argument.

* bfj.events.endindicates that the end of the data has been reached and no further events will be emitted.

`What options can I specify?`

`$3`

* options.reviver: Transformation function, invoked depth-first against the parsed data structure. This option is analagous to the [reviver parameter for JSON.parse][reviver].

* options.yieldRate: The number of data items to process before yielding to the event loop. Smaller values yield to the event loop more frequently, meaning less time will be consumed by bfj per tick but the overall parsing time will be slower. Larger values yield to the event loop less often, meaning slower tick times but faster overall parsing time. The default value is1024.

* options.ndjson: If set totrue, newline characters at the root level will be treated as delimiters between discrete chunks of JSON. See NDJSON for more information.

* options.stringChunkSize: Forbfj.walkonly, set this to the character count at which you wish to chunk strings. Each chunk will be emitted as abfj.events.stringChunkevent, followed by the regularbfj.events.stringevent after all chunks are emitted.

* options.recursive: Forbfj.matchonly, set this totrueif you wish to match against recursively JSON-parsed strings.

* options.numbers: Forbfj.matchonly, set this totrueif you wish to match against numbers with a string or regular expressionselector argument.

* options.bufferLength: Forbfj.matchonly, the length of the match buffer. Smaller values use less memory but may result in a slower parse time. The default value is256.

* options.highWaterMark: Forbfj.matchonly, set this if you would like to pass a value for thehighWaterMarkoption to the readable stream constructor.

`$3`

* options.space: Indentation string or the number of spaces to indent each nested level by. This option is analagous to the [space parameter for JSON.stringify][space].

* options.promises: By default, promises are coerced to their resolved value. Set this property to'ignore'for improved performance if you don't need to coerce promises.

* options.buffers: By default, buffers are coerced using theirtoStringmethod. Set this property to'ignore'for improved performance if you don't need to coerce buffers.

* options.maps: By default, maps are coerced to plain objects. Set this property to'ignore'for improved performance if you don't need to coerce maps.

* options.iterables: By default, other iterables (i.e. not arrays, strings or maps) are coerced to arrays. Set this property to'ignore'for improved performance if you don't need to coerce iterables.

* options.circular: By default, circular references will cause the write to fail. Set this property to'ignore'if you'd prefer to silently skip past circular references in the data.

* options.bufferLength: The length of the write buffer. Smaller values use less memory but may result in a slower serialisation time. The default value is256.

* options.highWaterMark: Set this if you would like to pass a value for thehighWaterMarkoption to the readable stream constructor.

* options.yieldRate: The number of data items to process before yielding to the event loop. Smaller values yield to the event loop more frequently, meaning less time will be consumed by bfj per tick but the overall serialisation time will be slower. Larger values yield to the event loop less often, meaning slower tick times but faster overall serialisation time. The default value is1024.

`Is it possible to pause parsing or serialisation from calling code?`

Yes it is! Bothwalkandeventifydecorate their returned event emitters with apausemethod that will prevent any further events being emitted. Thepausemethod itself returns aresumefunction that you can call to indicate that processing should continue.

For example:

`js const bfj = require('bfj'); const emitter = bfj.walk(fs.createReadStream(path), options);

// Later, when you want to pause parsing:

const resume = emitter.pause();

// Then when you want to resume:

resume();`

`Can it break long strings into chunks?`

Yes. If you pass thestringChunkSizeoption tobfj.walk, it will emit abfj.events.stringChunkevent for each chunk of the string. The regularbfj.events.stringevent will still be emitted after all the chunks.

`Can it recursively parse JSON nested inside a JSON string?`

Yes. If you pass therecursiveoption tobfj.match, it will recursively parse any string values that satisfy theselectorargument. Note the same selector is applied to every level of recursion, so this works best in combination with selectors that are predicate functions.

`Can it handle newline-delimited JSON (NDJSON)?`

Yes. If you pass thendjsonoption tobfj.walk, bfj.match or bfj.parse, newline characters at the root level will act as delimiters between discrete JSON values:

* bfj.walk will emit a bfj.events.endLineevent each time it encounters a newline character.

* bfj.matchwill just ignore the newlines while it continues looking for matching items.

* bfj.parsewill resolve with the first value and pause the underlying stream. If it's called again with the same stream, it will resume processing and resolve with the second value. To parse the entire stream, calls should be made sequentially one-at-a-time until the returned promise resolves toundefined(undefined is not a valid JSON token).

bfj.unpipe and bfj.read will not parse NDJSON.

`Can I specify a different promise implementation?`

Yes. Just pass thePromiseoption to any method. You might want to try this if you get any out-of-memory errors when parsing huge files.

`Is there a change log?`

[Yes][history].

`How do I set up the dev environment?`

The development environment relies on [Node.js][node], [ESLint], [Mocha], [Chai], [Proxyquire] and [Spooks]. Assuming that you already have node and NPM set up, you just need to runnpm installto install all of the dependencies as listed inpackage.json.

You can lint the code with the commandnpm run lint.

You can run the tests with the commandnpm test.

`What versions of Node.js does it support?`

As of version 8.0.0, only Node.js versions 18 or greater are supported.

Between versions 3.0.0and6.1.2, only Node.js versions 6 or greater were supported.

Until version 2.1.2`,
only Node.js versions 4 or greater
were supported.

What license is it released under?

[MIT][license].

[ci-image]: https://secure.travis-ci.org/philbooth/bfj.png?branch=master
[ci-status]: http://travis-ci.org/#!/philbooth/bfj
[sax]: http://en.wikipedia.org/wiki/Simple_API_for_XML
[bfj-collections]: https://github.com/hash-bang/bfj-collections
[eventemitter]: https://nodejs.org/api/events.html#events_class_eventemitter
[readable]: https://nodejs.org/api/stream.html#stream_readable_streams
[writable]: https://nodejs.org/api/stream.html#stream_writable_streams
[pipe]: https://nodejs.org/api/stream.html#stream_readable_pipe_destination_options
[regexp-test]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/test
[reviver]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/JSON/parse#Using_the_reviver_parameter
[space]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify#The_space_argument
[history]: HISTORY.md
[node]: https://nodejs.org/en/
[eslint]: http://eslint.org/
[mocha]: https://mochajs.org/
[chai]: http://chaijs.com/
[proxyquire]: https://github.com/thlorenz/proxyquire
[spooks]: https://gitlab.com/philbooth/spooks.js
[license]: COPYING