![Node.js CI](https://github.com/Borewit/tokenizer-s3/actions/workflows/nodejs-ci.yml)
![CodeQL](https://github.com/Borewit/tokenizer-s3/actions/workflows/github-code-scanning/codeql)
![NPM version](https://npmjs.org/package/@tokenizer/s3)
![npm downloads](https://npmcharts.com/compare/@tokenizer/s3,@tokenizer/range?start=300)
![Known Vulnerabilities](https://snyk.io/test/github/Borewit/tokenizer-s3?targetFile=package.json)

@tokenizer/s3

The tokenizer-s3 module enables seamless integration with Amazon Web Services (AWS) S3, allowing you to read and tokenize data from S3 objects in a streaming fashion. This module extends the functionality of the strtok3 tokenizer by providing support for chunked S3 data access.

Features

Streaming Support: Efficiently read and tokenize data from Amazon S3 objects using streaming, which is ideal for handling large files without loading them entirely into memory.
Integration with strtok3: Works seamlessly with the strtok3 tokenizer to process S3 data streams, making it easy to handle various tokenization tasks.
Flexible Access: Provides options to configure S3 access, allowing for customized tokenization workflows based on your specific needs.
Promise-Based API: Utilizes a promise-based API for easy integration into modern asynchronous workflows.

Installation

shell

npm install @tokenizer/s3





Sponsor

If you appreciate my work and want to support the development of open-source projects like music-metadata, file-type, and listFix(), consider becoming a sponsor or making a small contribution.

Your support helps sustain ongoing development and improvements.

Become a sponsor to Borewit



or







API Documention



$3



Initialize a tokenizer, with the option for random access, 

from an Amazon S3 client for use in extracting metadata from media files.



#### Function Signature

ts

function makeChunkedTokenizerFromS3(s3: S3Client, objRequest: GetObjectRequest): Promise



Reads from the S3 as a stream.



#### Parameters



-

s3 (S3Client

):



  The S3 client used to make requests to Amazon S3.

  > [!NOTE]

  > To configure AWS client authentication see Configuration and credential file settings.



-

objRequest (GetObjectRequest

):



  The S3 object request containing details about the S3 object to fetch.

  This includes properties like the bucket name and object key.



-

options (IS3Options

, optional):



#### Returns



-

Promise

:



  A Promise that resolves to an instance of

IRandomAccessTokenizer

.

  This tokenizer can be used to extract metadata from the specified media file in the S3 object.

  It supports random access reads. 



$3



Initialize a tokenizer from an Amazon S3 client for use in extracting metadata from media files.



#### Function Signature

ts

function makeStreamingTokenizerFromS3(s3: S3Client, objRequest: GetObjectRequest): Promise



Reads from the S3 as a stream.



#### Parameters



-

s3 (S3Client

):

  

  The S3 client used to make requests to Amazon S3.

  > [!NOTE] 

  > To configure AWS client authentication see Configuration and credential file settings.

 

-

objRequest (GetObjectRequest

):

  

  The S3 object request containing details about the S3 object to fetch.

  This includes properties like the bucket name and object key.



#### Returns

 

-

Promise

:

 

  A Promise that resolves to an instance of

ITokenizer

.

  This tokenizer can be used to extract metadata from the specified media file in the S3 object.



Compatibility



Module: version 0.3.0 migrated from CommonJS to pure ECMAScript Module (ESM).

The distributed JavaScript codebase is compliant with the ECMAScript 2020 (11th Edition) standard.



This module requires a Node.js ≥ 16 engine.

It can also be used in a browser environment when bundled with a module bundler.



For TypeScript CommonJs backward compatibility, you can use load-esm.



Examples



$3



Determine file type (based on it's content) from a file stored Amazon S3 cloud:

js

import { fileTypeFromTokenizer } from 'file-type';

import { fromEnv } from '@aws-sdk/credential-providers';

import { S3Client } from '@aws-sdk/client-s3';

import { makeChunkedTokenizerFromS3 } from '@tokenizer/s3';



(async () => {



  // Initialize S3 client

  const s3 = new S3Client({

    region: 'eu-west-2',

    credentials: fromEnv(),

  });



  // Initialize S3 tokenizer

  const s3Tokenizer = await makeChunkedTokenizerFromS3(s3, {

    Bucket: 'affectlab',

    Key: '1min_35sec.mp4'

  });



  // Figure out what kind of file it is

  const fileType = await fileTypeFromTokenizer(s3Tokenizer);

  console.log(fileType);

})();





See also example at file-type.



$3



Retrieve music-metadata

js

import { makeChunkedTokenizerFromS3 } from '@tokenizer/s3';

import { S3Client } from '@aws-sdk/client-s3';

import { parseFromTokenizer } from 'music-metadata/lib/core';



/**

 * Retrieve metadata from Amazon S3 object

 * @param objRequest S3 object request

 * @param options

tokenizer-s3

 options

 * @return Metadata

 */

async function parseS3Object(s3, objRequest, options) {

  const s3Tokenizer = await makeChunkedTokenizerFromS3(s3, objRequest);

  return parseFromTokenizer(s3Tokenizer, options);

}



(async () => {

  const s3 = new S3Client({});



  const metadata = await parseS3Object(s3, {

    Bucket: 'standing0media',

    Key: '01 Where The Highway Takes Me.mp3'

  });



  console.log(metadata);

})();

``

A module implementation of this example can be found in @music-metadata/s3.

Dependency graph

!dependency graph

Installation

shell
npm install @tokenizer/s3

Sponsor

If you appreciate my work and want to support the development of open-source projects like music-metadata, file-type, and listFix(), consider becoming a sponsor or making a small contribution.
Your support helps sustain ongoing development and improvements.
Become a sponsor to Borewit

or

API Documention

$3

Initialize a tokenizer, with the option for random access,
from an Amazon S3 client for use in extracting metadata from media files.

#### Function Signature

ts
function makeChunkedTokenizerFromS3(s3: S3Client, objRequest: GetObjectRequest): Promise

Reads from the S3 as a stream.

#### Parameters

-

(

):

The S3 client used to make requests to Amazon S3.
> [!NOTE]
> To configure AWS client authentication see Configuration and credential file settings.

-

(

):

The S3 object request containing details about the S3 object to fetch.
This includes properties like the bucket name and object key.

-

(

, optional):

#### Returns

-

:



  A Promise that resolves to an instance of

IRandomAccessTokenizer

.

  This tokenizer can be used to extract metadata from the specified media file in the S3 object.

  It supports random access reads. 



$3



Initialize a tokenizer from an Amazon S3 client for use in extracting metadata from media files.



#### Function Signature

ts

function makeStreamingTokenizerFromS3(s3: S3Client, objRequest: GetObjectRequest): Promise



Reads from the S3 as a stream.



#### Parameters



-

s3 (S3Client

):

  

  The S3 client used to make requests to Amazon S3.

  > [!NOTE] 

  > To configure AWS client authentication see Configuration and credential file settings.

 

-

objRequest (GetObjectRequest

):

  

  The S3 object request containing details about the S3 object to fetch.

  This includes properties like the bucket name and object key.



#### Returns

 

-

Promise

:

 

  A Promise that resolves to an instance of

ITokenizer

.

  This tokenizer can be used to extract metadata from the specified media file in the S3 object.



Compatibility



Module: version 0.3.0 migrated from CommonJS to pure ECMAScript Module (ESM).

The distributed JavaScript codebase is compliant with the ECMAScript 2020 (11th Edition) standard.



This module requires a Node.js ≥ 16 engine.

It can also be used in a browser environment when bundled with a module bundler.



For TypeScript CommonJs backward compatibility, you can use load-esm.



Examples



$3



Determine file type (based on it's content) from a file stored Amazon S3 cloud:

js

import { fileTypeFromTokenizer } from 'file-type';

import { fromEnv } from '@aws-sdk/credential-providers';

import { S3Client } from '@aws-sdk/client-s3';

import { makeChunkedTokenizerFromS3 } from '@tokenizer/s3';



(async () => {



  // Initialize S3 client

  const s3 = new S3Client({

    region: 'eu-west-2',

    credentials: fromEnv(),

  });



  // Initialize S3 tokenizer

  const s3Tokenizer = await makeChunkedTokenizerFromS3(s3, {

    Bucket: 'affectlab',

    Key: '1min_35sec.mp4'

  });



  // Figure out what kind of file it is

  const fileType = await fileTypeFromTokenizer(s3Tokenizer);

  console.log(fileType);

})();





See also example at file-type.



$3



Retrieve music-metadata

js

import { makeChunkedTokenizerFromS3 } from '@tokenizer/s3';

import { S3Client } from '@aws-sdk/client-s3';

import { parseFromTokenizer } from 'music-metadata/lib/core';



/**

 * Retrieve metadata from Amazon S3 object

 * @param objRequest S3 object request

 * @param options

tokenizer-s3

 options

 * @return Metadata

 */

async function parseS3Object(s3, objRequest, options) {

  const s3Tokenizer = await makeChunkedTokenizerFromS3(s3, objRequest);

  return parseFromTokenizer(s3Tokenizer, options);

}



(async () => {

  const s3 = new S3Client({});



  const metadata = await parseS3Object(s3, {

    Bucket: 'standing0media',

    Key: '01 Where The Highway Takes Me.mp3'

  });



  console.log(metadata);

})();

``

A module implementation of this example can be found in @music-metadata/s3.

Dependency graph

!dependency graph