Lightweight text preprocessing utilities for NLP in TypeScript.
npm install text-prep-litetext-prep-lite provides two core helpers:
normalizeText – clean & normalise raw text into a predictable representation.
tokenize – break text into lowercase word tokens.
text-prep-lite does those common steps with zero runtime dependencies.
bash
npm install text-prep-lite
or
yarn add text-prep-lite
`
---
Usage
`ts
import { normalizeText, tokenize } from "text-prep-lite";
const raw = " I can't believe it's not butter! 🧈 ";
const cleaned = normalizeText(raw, {
expandContractions: true,
removePunctuation: true,
removeEmojis: true,
});
// → "i cannot believe it is not butter"
const tokens = tokenize(raw);
// → ["i", "can", "t", "believe", "it", "s", "not", "butter"]
`
---
API
$3
Returns a cleaned version of input.
NormalizeOptions:
| Option | Default | Description |
|--------|---------|-------------|
| expandContractions | false | Expand contractions for the selected locale. |
| removePunctuation | false | Strip punctuation characters. |
| removeEmojis | false | Remove Unicode emoji characters. |
| locale | 'en' | BCP-47 language tag for locale-specific rules (currently: en, sq, fr, de, he). |
Supported locales
- en – English (default)
- sq – Albanian
- fr – French
- de – German
- he – Hebrew
- es – Spanish
- zh – Chinese (Mandarin)
- yue – Chinese (Cantonese)
`ts
// French example
normalizeText("C'est incroyable!", { expandContractions: true, locale: "fr" });
// → "ce est incroyable!" (punctuation kept in this call)
`
---
$3
1. Converts text to lowercase.
2. Removes punctuation & emojis.
3. Splits by whitespace / word boundaries.
Returns an array of tokens.
tokenize has no options – it always lowercases, strips punctuation & emojis, and splits on whitespace.
---
🔗 Related
1. 👉 Need word embeddings for semantic analysis?
Check out wink-embeddings-small-en-50d
2. 👉 Need a simple and robust PDF text extraction utility with a quality interface?
Check out [pdf-worker-package]https://www.npmjs.com/package/pdf-worker-package
---
Development
`bash
run tests
npm test
build library
npm run build
`
---
License
MIT © Cavani21/thegreatbey
---
Contributing
1. Fork & clone the repo
2. npm i
3. npm test` – run lint & unit tests