HuggingFace tokenizer support for Chonkie - extends @chonkiejs/core with real tokenization
npm install @chonkiejs/token_HuggingFace tokenizer support for Chonkie - extends @chonkiejs/core with real tokenization._



Install with npm:
``bash`
npm i @chonkiejs/token @chonkiejs/core
Install with pnpm:`bash`
pnpm add @chonkiejs/token @chonkiejs/core
Install with yarn:`bash`
yarn add @chonkiejs/token @chonkiejs/core
Install with bun:`bash`
bun add @chonkiejs/token @chonkiejs/core
Simply install this package alongside @chonkiejs/core, then use tokenizer names:
`typescript
import { RecursiveChunker, TokenChunker } from '@chonkiejs/core';
// Use GPT-2 tokenization (automatically uses @chonkiejs/token)
const chunker = await RecursiveChunker.create({
tokenizer: 'Xenova/gpt2',
chunkSize: 512
});
const chunks = await chunker.chunk('Your text here...');
`
Any HuggingFace model from transformers.js:
- Xenova/gpt2Xenova/gpt-4
- bert-base-uncased
- google-bert/bert-base-multilingual-cased
-
- And many more!
See: https://huggingface.co/models?library=transformers.js
`typescript
import { RecursiveChunker } from '@chonkiejs/core';
const chunker = await RecursiveChunker.create({
tokenizer: 'Xenova/gpt2',
chunkSize: 512
});
const chunks = await chunker.chunk('Your document...');
`
`typescript
import { TokenChunker } from '@chonkiejs/core';
const chunker = await TokenChunker.create({
tokenizer: 'bert-base-uncased',
chunkSize: 256,
chunkOverlap: 50
});
const chunks = await chunker.chunk('Your text...');
`
`typescript
import { HuggingFaceTokenizer } from '@chonkiejs/token';
const tokenizer = await HuggingFaceTokenizer.create('Xenova/gpt2');
const count = tokenizer.countTokens('Hello world!');
const tokens = tokenizer.encode('Hello world!');
const text = tokenizer.decode(tokens);
console.log(Token count: ${count});`
When you call Tokenizer.create('gpt2') in @chonkiejs/core:
1. Core tries to dynamically import @chonkiejs/token`
2. If installed: Uses HuggingFaceTokenizer
3. If not installed: Shows helpful error message
This keeps core lightweight while allowing advanced tokenization when needed!
Want to help grow Chonkie? Check out CONTRIBUTING.md to get started! Whether you're fixing bugs, adding features, improving docs, or simply leaving a āļø on the repo, every contribution helps make Chonkie a better CHONK for everyone.
Remember: No contribution is too small for this tiny hippo!