![Node.js CI](https://github.com/Borewit/strtok3/actions/workflows/ci.yml)
![CodeQL](https://github.com/Borewit/strtok3/actions/workflows/codeql.yml)
![NPM version](https://npmjs.org/package/strtok3)
![npm downloads](https://npmcharts.com/compare/strtok3,token-types?start=1200&interval=30)
![DeepScan grade](https://deepscan.io/dashboard#view=project&tid=5165&pid=8526&bid=103329)
![Known Vulnerabilities](https://snyk.io/test/github/Borewit/strtok3?targetFile=package.json)
![Codacy Badge](https://www.codacy.com/app/Borewit/strtok3?utm_source=github.com&utm_medium=referral&utm_content=Borewit/strtok3&utm_campaign=Badge_Grade)

strtok3

A promise based streaming tokenizer for Node.js and browsers.

The strtok3 module provides several methods for creating a tokenizer from various input sources.
Designed for:
* Seamless support in streaming environments.
* Efficiently decode binary data, strings, and numbers.
* Reading predefined or custom tokens.
Offering tokenizers* for reading from files, streams or Uint8Arrays.

$3

strtok3 can read from:
* Files, using a file path as input.
* Node.js streams.
* Buffer or Uint8Array.
* HTTP chunked transfer provided by @tokenizer/http.
* Amazon S3 chunks with @tokenizer/s3.

Installation

sh

npm install strtok3





$3



Starting with version 7, the module has migrated from CommonJS to pure ECMAScript Module (ESM).

The distributed JavaScript codebase is compliant with the ECMAScript 2020 (11th Edition) standard.



Requires a modern browser, Node.js (V8) ≥ 18 engine or Bun (JavaScriptCore) ≥ 1.2.



For TypeScript CommonJs backward compatibility, you can use load-esm.



> [!NOTE]

> This module requires a Node.js ≥ 16 engine.

> It can also be used in a browser environment when bundled with a module bundler.



Support the Project

If you find this project useful and would like to support its development, consider sponsoring or contributing:



- Become a sponsor to Borewit



- Buy me a coffee:



  



API Documentation



$3



Use one of the methods to instantiate an abstract tokenizer:

- fromBlob

- fromBuffer

- fromFile*

- fromStream*

- fromWebStream



> [!NOTE]

>

fromFile and fromStream

  only available when importing this module with Node.js



All methods return a

Tokenizer

, either directly or via a promise.



####

fromBlob()

 function



Create a tokenizer from a Blob.

ts

function fromBlob(blob: Blob, options?: ITokenizerOptions): BlobTokenizer





| Parameter | Optional  | Type                                              | Description                                                                            |

|-----------|-----------|---------------------------------------------------|----------------------------------------------------------------------------------------|

| blob      | no        | Blob  | Blob or File to read from |

| options   | yes       | ITokenizerOptions           | Tokenizer options                                                                      |



Returns a tokenizer.

js

import { fromBlob } from 'strtok3';

import { openAsBlob } from 'node:fs';

import * as Token from 'token-types';



async function parse() {

  const blob = await openAsBlob('somefile.bin');



  const tokenizer = fromBlob(blob);



  const myUint8Number = await tokenizer.readToken(Token.UINT8);

  console.log(

My number: ${myUint8Number}

);   

}



parse();





####

fromBuffer()

 function



Create a tokenizer from memory (Uint8Array or Node.js Buffer).

ts

function fromBuffer(uint8Array: Uint8Array, options?: ITokenizerOptions): BufferTokenizer





| Parameter  | Optional | Type                                             | Description                       |

|------------|----------|--------------------------------------------------|-----------------------------------|

| uint8Array | no       | Uint8Array | Buffer or Uint8Array to read from |

| options    | yes      | ITokenizerOptions          | Tokenizer options                 |



Returns a tokenizer.

js

import { fromBuffer } from 'strtok3';

import * as Token from 'token-types';



const tokenizer = fromBuffer(buffer);



async function parse() {

  const myUint8Number = await tokenizer.readToken(Token.UINT8);

  console.log(

My number: ${myUint8Number}

);

}



parse();





####

fromFile

 function



Creates a tokenizer from a local file.

ts

function fromFile(sourceFilePath: string): Promise

  



| Parameter      | Type     | Description                |

|----------------|----------|----------------------------|

| sourceFilePath |

string

 | Path to file to read from  |



> [!NOTE]

> - Only available for Node.js engines

> -

fromFile

 automatically embeds file-information



A Promise resolving to a tokenizer which can be used to parse a file.

js

import { fromFile } from 'strtok3';

import * as Token from 'token-types';



async function parse() {

  const tokenizer = await fromFile('somefile.bin');

  try {

    const myNumber = await tokenizer.readToken(Token.UINT8);

    console.log(

My number: ${myNumber}

);

  } finally {

    tokenizer.close(); // Close the file

  }

}



parse();





####

fromWebStream()

 function



Create a tokenizer from a WHATWG ReadableStream.

ts

function fromWebStream(webStream: AnyWebByteStream, options?: ITokenizerOptions): ReadStreamTokenizer





| Parameter      | Optional | Type                                                                     | Description                        |

|----------------|----------|--------------------------------------------------------------------------|------------------------------------|

| webStream      | no       | ReadableStream | WHATWG ReadableStream to read from |

| options        | yes      | ITokenizerOptions                                   | Tokenizer options                  |



Returns a tokenizer.

js

import { fromWebStream } from 'strtok3';

import * as Token from 'token-types';



async function parse() {

  const tokenizer = fromWebStream(readableStream);

  try {

    const myUint8Number = await tokenizer.readToken(Token.UINT8);

    console.log(

My number: ${myUint8Number}

);

  } finally {

    await tokenizer.close();

  }

}



parse();





$3

The tokenizer is an abstraction of a stream, file or Uint8Array, allowing _reading_ or _peeking_ from the stream.

It can also be translated in chunked reads, as done in @tokenizer/http;



#### Key Features:



- Supports seeking within the stream using

tokenizer.ignore()

.

- Offers

peek

 methods to preview data without advancing the read pointer.

- Maintains the read position via tokenizer.position.



#### Tokenizer functions



_Read_ methods advance the stream pointer, while _peek_ methods do not.



There are two kind of functions:

1. read methods: used to read a token of Buffer from the tokenizer. The position of the tokenizer-stream will advance with the size of the token.

2. peek methods: same as the read, but it will not advance the pointer. It allows to read (peek) ahead.



####

readBuffer

 function



Read data from the _tokenizer_ into provided "buffer" (

Uint8Array

).

readBuffer(buffer, options?)

ts

readBuffer(buffer: Uint8Array, options?: IReadChunkOptions): Promise;





| Parameter  | Type                                                           | Description                                                                                                                                                                                                                            |

|------------|----------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

| buffer     | Buffer | Uint8Array | Target buffer to write the data read to                                                                                                                                                                                                |

| options    | IReadChunkOptions                        | An integer specifying the number of bytes to read                                                                                                                                                                                      |



Return promise with number of bytes read.

The number of bytes read may be less than requested if the

mayBeLess

 flag is set.



####

peekBuffer

 function



Peek (read ahead), from tokenizer, into the buffer without advancing the stream pointer.

ts

peekBuffer(uint8Array: Uint8Array, options?: IReadChunkOptions): Promise;





| Parameter  | Type                                    | Description                                         |

|------------|-----------------------------------------|-----------------------------------------------------|

| buffer     | Buffer | Uint8Array                | Target buffer to write the data read (peeked) to.   |

| options    | IReadChunkOptions | An integer specifying the number of bytes to read.  |                                                                                                                           |



Return value

Promise Promise with number of bytes read. The number of bytes read may be less if the mayBeLess

 flag was set.



####

readToken

 function



Read a token from the tokenizer-stream.

ts

readToken(token: IGetToken, position: number = this.position): Promise

  



| Parameter  | Type                    | Description                                                                                                           |

|------------|-------------------------|---------------------------------------------------------------------------------------------------------------------- |

| token      | IGetToken | Token to read from the tokenizer-stream.                                                                              |

| position?  | number                  | Offset where to begin reading within the file. If position is null, data will be read from the current file position. |



Return value

Promise. Promise with number of bytes read. The number of bytes read maybe if less, mayBeLess

 flag was set.



####

peek

 function



Peek a token from the tokenizer.

ts

peekToken(token: IGetToken, position: number = this.position): Promise





| Parameter  | Type                       | Description                                                                                                             |

|------------|----------------------------|-------------------------------------------------------------------------------------------------------------------------|

| token      | IGetToken | Token to read from the tokenizer-stream.                                                                                |

| position?  | number                     | Offset where to begin reading within the file. If position is null, data will be read from the current file position.   |



Return a promise with the token value peeked from the tokenizer.



####

readNumber

 function



Read a numeric token from the tokenizer.

ts

readNumber(token: IToken): Promise





| Parameter  | Type                            | Description                                        |

|------------|---------------------------------|----------------------------------------------------|

| token      | IGetToken | Numeric token to read from the tokenizer-stream.   |



A promise resolving to a numeric value read and decoded from the tokenizer-stream.



####

ignore

 function



Advance the offset pointer with the token number of bytes provided.

ts

ignore(length: number): Promise





| Parameter  | Type   | Description                                                      |

|------------|--------|------------------------------------------------------------------|

| length     | number | Number of bytes to ignore. Will advance the

tokenizer.position

 |



A promise resolving to the number of bytes ignored from the tokenizer-stream.



####

 function

Clean up resources, such as closing a file pointer if applicable.



####

Tokenizer

 attributes



-

fileInfo





  Optional attribute describing the file information, see IFileInfo



-

position





  Pointer to the current position in the tokenizer stream.

  If a position is provided to a _read_ or _peek_ method, is should be, at least, equal or greater than this value.



$3



Each attribute is optional:



| Attribute | Type    | Description                                                                                                                                                                                                                   |

|-----------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

| length    | number  | Requested number of bytes to read.                                                                                                                                                                                            |

| position  | number  | Position where to peek from the file. If position is null, data will be read from the current file position. Position may not be less then tokenizer.position |

| mayBeLess | boolean | If and only if set, will not throw an EOF error if less than the requested

mayBeLess

 could be read.                                                                                                                         |



Example usage:

js

  tokenizer.peekBuffer(buffer, {mayBeLess: true});





$3



Provides optional metadata about the file being tokenized.



| Attribute | Type    | Description                                                                                       |

|-----------|---------|---------------------------------------------------------------------------------------------------|

| size      | number  | File size in bytes                                                                                |

| mimeType  | string  | MIME-type of file. |

| path      | string  | File path                                                                                         |

| url       | string  | File URL                                                                                          |



$3



The token is basically a description of what to read from the tokenizer-stream.

A basic set of token types can be found here: token-types.



A token is something which implements the following interface:

ts

export interface IGetToken {



  /**

   * Length in bytes of encoded value

   */

  len: number;



  /**

   * Decode value from buffer at offset

   * @param buf Buffer to read the decoded value from

   * @param off Decode offset

   */

  get(buf: Uint8Array, off: number): T;

}



The tokenizer reads

token.len

 bytes from the tokenizer-stream into a Buffer.

The

token.get will be called with the Buffer. token.get

 is responsible for conversion from the buffer to the desired output type.



$3

To convert a Web-API readable stream into a Node.js readable stream), you can use readable-web-to-node-stream to convert one in another.

js

import { fromWebStream } from 'strtok3';

import { ReadableWebToNodeStream } from 'readable-web-to-node-stream';



(async () => {



  const response = await fetch(url);

  const readableWebStream = response.body; // Web-API readable stream

  const webStream = new ReadableWebToNodeStream(readableWebStream); // convert to Node.js readable stream



  const tokenizer = fromWebStream(webStream); // And we now have tokenizer in a web environment

})();





Dependencies



Dependencies:

- @tokenizer/token: Provides token definitions and utilities used by

strtok3` for interpreting binary data.

Licence

This project is licensed under the MIT License. Feel free to use, modify, and distribute as needed.

strtok3

$3

strtok3 can read from:
* Files, using a file path as input.
* Node.js streams.
* Buffer or Uint8Array.
* HTTP chunked transfer provided by @tokenizer/http.
* Amazon S3 chunks with @tokenizer/s3.

Installation

sh

npm install strtok3





$3



Starting with version 7, the module has migrated from CommonJS to pure ECMAScript Module (ESM).

The distributed JavaScript codebase is compliant with the ECMAScript 2020 (11th Edition) standard.



Requires a modern browser, Node.js (V8) ≥ 18 engine or Bun (JavaScriptCore) ≥ 1.2.



For TypeScript CommonJs backward compatibility, you can use load-esm.



> [!NOTE]

> This module requires a Node.js ≥ 16 engine.

> It can also be used in a browser environment when bundled with a module bundler.



Support the Project

If you find this project useful and would like to support its development, consider sponsoring or contributing:



- Become a sponsor to Borewit



- Buy me a coffee:



  



API Documentation



$3



Use one of the methods to instantiate an abstract tokenizer:

- fromBlob

- fromBuffer

- fromFile*

- fromStream*

- fromWebStream



> [!NOTE]

>

fromFile and fromStream

  only available when importing this module with Node.js



All methods return a

Tokenizer

, either directly or via a promise.



####

fromBlob()

 function



Create a tokenizer from a Blob.

ts

function fromBlob(blob: Blob, options?: ITokenizerOptions): BlobTokenizer





| Parameter | Optional  | Type                                              | Description                                                                            |

|-----------|-----------|---------------------------------------------------|----------------------------------------------------------------------------------------|

| blob      | no        | Blob  | Blob or File to read from |

| options   | yes       | ITokenizerOptions           | Tokenizer options                                                                      |



Returns a tokenizer.

js

import { fromBlob } from 'strtok3';

import { openAsBlob } from 'node:fs';

import * as Token from 'token-types';



async function parse() {

  const blob = await openAsBlob('somefile.bin');



  const tokenizer = fromBlob(blob);



  const myUint8Number = await tokenizer.readToken(Token.UINT8);

  console.log(

My number: ${myUint8Number}

);   

}



parse();





####

fromBuffer()

 function



Create a tokenizer from memory (Uint8Array or Node.js Buffer).

ts

function fromBuffer(uint8Array: Uint8Array, options?: ITokenizerOptions): BufferTokenizer





| Parameter  | Optional | Type                                             | Description                       |

|------------|----------|--------------------------------------------------|-----------------------------------|

| uint8Array | no       | Uint8Array | Buffer or Uint8Array to read from |

| options    | yes      | ITokenizerOptions          | Tokenizer options                 |



Returns a tokenizer.

js

import { fromBuffer } from 'strtok3';

import * as Token from 'token-types';



const tokenizer = fromBuffer(buffer);



async function parse() {

  const myUint8Number = await tokenizer.readToken(Token.UINT8);

  console.log(

My number: ${myUint8Number}

);

}



parse();





####

fromFile

 function



Creates a tokenizer from a local file.

ts

function fromFile(sourceFilePath: string): Promise

  



| Parameter      | Type     | Description                |

|----------------|----------|----------------------------|

| sourceFilePath |

string

 | Path to file to read from  |



> [!NOTE]

> - Only available for Node.js engines

> -

fromFile

 automatically embeds file-information



A Promise resolving to a tokenizer which can be used to parse a file.

js

import { fromFile } from 'strtok3';

import * as Token from 'token-types';



async function parse() {

  const tokenizer = await fromFile('somefile.bin');

  try {

    const myNumber = await tokenizer.readToken(Token.UINT8);

    console.log(

My number: ${myNumber}

);

  } finally {

    tokenizer.close(); // Close the file

  }

}



parse();





####

fromWebStream()

 function



Create a tokenizer from a WHATWG ReadableStream.

ts

function fromWebStream(webStream: AnyWebByteStream, options?: ITokenizerOptions): ReadStreamTokenizer





| Parameter      | Optional | Type                                                                     | Description                        |

|----------------|----------|--------------------------------------------------------------------------|------------------------------------|

| webStream      | no       | ReadableStream | WHATWG ReadableStream to read from |

| options        | yes      | ITokenizerOptions                                   | Tokenizer options                  |



Returns a tokenizer.

js

import { fromWebStream } from 'strtok3';

import * as Token from 'token-types';



async function parse() {

  const tokenizer = fromWebStream(readableStream);

  try {

    const myUint8Number = await tokenizer.readToken(Token.UINT8);

    console.log(

My number: ${myUint8Number}

);

  } finally {

    await tokenizer.close();

  }

}



parse();





$3

The tokenizer is an abstraction of a stream, file or Uint8Array, allowing _reading_ or _peeking_ from the stream.

It can also be translated in chunked reads, as done in @tokenizer/http;



#### Key Features:



- Supports seeking within the stream using

tokenizer.ignore()

.

- Offers

peek

 methods to preview data without advancing the read pointer.

- Maintains the read position via tokenizer.position.



#### Tokenizer functions



_Read_ methods advance the stream pointer, while _peek_ methods do not.



There are two kind of functions:

1. read methods: used to read a token of Buffer from the tokenizer. The position of the tokenizer-stream will advance with the size of the token.

2. peek methods: same as the read, but it will not advance the pointer. It allows to read (peek) ahead.



####

readBuffer

 function



Read data from the _tokenizer_ into provided "buffer" (

Uint8Array

).

readBuffer(buffer, options?)

ts

readBuffer(buffer: Uint8Array, options?: IReadChunkOptions): Promise;





| Parameter  | Type                                                           | Description                                                                                                                                                                                                                            |

|------------|----------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

| buffer     | Buffer | Uint8Array | Target buffer to write the data read to                                                                                                                                                                                                |

| options    | IReadChunkOptions                        | An integer specifying the number of bytes to read                                                                                                                                                                                      |



Return promise with number of bytes read.

The number of bytes read may be less than requested if the

mayBeLess

 flag is set.



####

peekBuffer

 function



Peek (read ahead), from tokenizer, into the buffer without advancing the stream pointer.

ts

peekBuffer(uint8Array: Uint8Array, options?: IReadChunkOptions): Promise;





| Parameter  | Type                                    | Description                                         |

|------------|-----------------------------------------|-----------------------------------------------------|

| buffer     | Buffer | Uint8Array                | Target buffer to write the data read (peeked) to.   |

| options    | IReadChunkOptions | An integer specifying the number of bytes to read.  |                                                                                                                           |



Return value

Promise Promise with number of bytes read. The number of bytes read may be less if the mayBeLess

 flag was set.



####

readToken

 function



Read a token from the tokenizer-stream.

ts

readToken(token: IGetToken, position: number = this.position): Promise

  



| Parameter  | Type                    | Description                                                                                                           |

|------------|-------------------------|---------------------------------------------------------------------------------------------------------------------- |

| token      | IGetToken | Token to read from the tokenizer-stream.                                                                              |

| position?  | number                  | Offset where to begin reading within the file. If position is null, data will be read from the current file position. |



Return value

Promise. Promise with number of bytes read. The number of bytes read maybe if less, mayBeLess

 flag was set.



####

peek

 function



Peek a token from the tokenizer.

ts

peekToken(token: IGetToken, position: number = this.position): Promise





| Parameter  | Type                       | Description                                                                                                             |

|------------|----------------------------|-------------------------------------------------------------------------------------------------------------------------|

| token      | IGetToken | Token to read from the tokenizer-stream.                                                                                |

| position?  | number                     | Offset where to begin reading within the file. If position is null, data will be read from the current file position.   |



Return a promise with the token value peeked from the tokenizer.



####

readNumber

 function



Read a numeric token from the tokenizer.

ts

readNumber(token: IToken): Promise





| Parameter  | Type                            | Description                                        |

|------------|---------------------------------|----------------------------------------------------|

| token      | IGetToken | Numeric token to read from the tokenizer-stream.   |



A promise resolving to a numeric value read and decoded from the tokenizer-stream.



####

ignore

 function



Advance the offset pointer with the token number of bytes provided.

ts

ignore(length: number): Promise





| Parameter  | Type   | Description                                                      |

|------------|--------|------------------------------------------------------------------|

| length     | number | Number of bytes to ignore. Will advance the

tokenizer.position

 |



A promise resolving to the number of bytes ignored from the tokenizer-stream.



####

 function

Clean up resources, such as closing a file pointer if applicable.



####

Tokenizer

 attributes



-

fileInfo





  Optional attribute describing the file information, see IFileInfo



-

position





  Pointer to the current position in the tokenizer stream.

  If a position is provided to a _read_ or _peek_ method, is should be, at least, equal or greater than this value.



$3



Each attribute is optional:



| Attribute | Type    | Description                                                                                                                                                                                                                   |

|-----------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

| length    | number  | Requested number of bytes to read.                                                                                                                                                                                            |

| position  | number  | Position where to peek from the file. If position is null, data will be read from the current file position. Position may not be less then tokenizer.position |

| mayBeLess | boolean | If and only if set, will not throw an EOF error if less than the requested

mayBeLess

 could be read.                                                                                                                         |



Example usage:

js

  tokenizer.peekBuffer(buffer, {mayBeLess: true});





$3



Provides optional metadata about the file being tokenized.



| Attribute | Type    | Description                                                                                       |

|-----------|---------|---------------------------------------------------------------------------------------------------|

| size      | number  | File size in bytes                                                                                |

| mimeType  | string  | MIME-type of file. |

| path      | string  | File path                                                                                         |

| url       | string  | File URL                                                                                          |



$3



The token is basically a description of what to read from the tokenizer-stream.

A basic set of token types can be found here: token-types.



A token is something which implements the following interface:

ts

export interface IGetToken {



  /**

   * Length in bytes of encoded value

   */

  len: number;



  /**

   * Decode value from buffer at offset

   * @param buf Buffer to read the decoded value from

   * @param off Decode offset

   */

  get(buf: Uint8Array, off: number): T;

}



The tokenizer reads

token.len

 bytes from the tokenizer-stream into a Buffer.

The

token.get will be called with the Buffer. token.get

 is responsible for conversion from the buffer to the desired output type.



$3

To convert a Web-API readable stream into a Node.js readable stream), you can use readable-web-to-node-stream to convert one in another.

js

import { fromWebStream } from 'strtok3';

import { ReadableWebToNodeStream } from 'readable-web-to-node-stream';



(async () => {



  const response = await fetch(url);

  const readableWebStream = response.body; // Web-API readable stream

  const webStream = new ReadableWebToNodeStream(readableWebStream); // convert to Node.js readable stream



  const tokenizer = fromWebStream(webStream); // And we now have tokenizer in a web environment

})();





Dependencies



Dependencies:

- @tokenizer/token: Provides token definitions and utilities used by

strtok3` for interpreting binary data.

Licence

This project is licensed under the MIT License. Feel free to use, modify, and distribute as needed.

strtok3

strtok3

$3

Installation

$3

Support the Project

API Documentation

$3

$3

$3

$3

$3

$3

Dependencies

Licence

strtok3

strtok3

$3

Installation

$3

Support the Project

API Documentation

$3

$3

$3

$3

$3

$3

Dependencies

Licence

Dist Tags