A TypeScript library for Vietnamese text processing including accent normalization, text masking, and string utilities
npm install normalize-vietnameseA TypeScript library for Vietnamese text processing including accent normalization, text masking, and string utilities.
- ✅ Vietnamese Accent Normalization: Automatically correct Vietnamese accent placement according to grammar rules
- ✅ Text Masking: Mask sensitive information in strings
- ✅ Text Normalization: Convert text to lowercase, remove special characters
- ✅ TypeScript Support: Full TypeScript definitions included
- ✅ Zero Dependencies: Only requires slugify for text normalization
``bash`use npm
npm install normalize-vietnameseuse yarn
yarn add normalize-vietnamese
`typescript`
import Str from "normalize-vietnamese";
// or
import { Str } from "normalize-vietnamese";
The normalizeVietnameseAccent method corrects Vietnamese accent placement according to Vietnamese grammar rules:
`typescript
// Correct diphthongs (2 vowels) - accent on first vowel
Str.normalizeVietnameseAccent("toà"); // returns 'tòa'
Str.normalizeVietnameseAccent("thuỷ"); // returns 'thủy'
// Correct triphthongs (3 vowels) - accent on second vowel
Str.normalizeVietnameseAccent("tòan"); // returns 'toàn'
Str.normalizeVietnameseAccent("khủyu"); // returns 'khuỷu'
// Exception: ê and ơ have priority regardless of position
Str.normalizeVietnameseAccent("thủơ"); // returns 'thuở'
Str.normalizeVietnameseAccent("chuỵên"); // returns 'chuyện'
// Handle special consonant clusters (gi, qu)
Str.normalizeVietnameseAccent("gìa"); // returns 'già'
Str.normalizeVietnameseAccent("qủa"); // returns 'quả'
// Process multiple words
Str.normalizeVietnameseAccent("tòa nhà toàn"); // returns 'tòa nhà toàn'
`
#### Vietnamese Accent Rules
1. Single vowel: Accent stays on the vowel
2. Two vowels (diphthong): Accent on first vowel
3. Three vowels (triphthong): Accent on second vowel
4. Exception: ê and ơ have priority regardless of positiongi
5. Special consonants: and qu are treated as single consonants
`typescript
// Mask entire string
Str.mask("hello"); // returns '*'
// Mask with start and end positions
Str.mask("hello", 1, 4); // returns 'h*o'
// Negative positions (from end)
Str.mask("hello", -2, 4); // returns 'hel*o'
Str.mask("hello", 1, -1); // returns 'h*o'
`
`typescript
// Convert to lowercase, remove special characters
Str.normalize("Hello World!"); // returns 'hello world'
// Handles Vietnamese characters
Str.normalize("Xin chào thế giới!"); // returns 'xin chao the gioi'
`
Normalizes Vietnamese accent marks according to Vietnamese grammar rules.
- Parameters: text - The text to normalize
- Returns: The normalized text with proper accent placement
- Throws: Returns original input if not a string
Masks part of a string with asterisks.
- Parameters:
- text - The text to maskstart
- - Start position (default: 0, supports negative values)end
- - End position (default: 0, supports negative values)
- Returns: The masked string
- Throws: Returns original text for invalid parameters
Normalizes text by converting to lowercase and removing special characters.
- Parameters: text - The text to normalize
- Returns: The normalized text
- Throws: Returns original input if not a string
`bash`
git clone https://github.com/nvminh461/normalize-vietnamese
cd normalize-vietnamese
npm install
`bashBuild the library
npm run build
$3
The library includes comprehensive tests covering all functionality:
`bash
npm test
`Requirements
- Node.js >= 14.0.0
- TypeScript >= 4.0.0 (for development)
License
MIT
Contributing
1. Fork the repository
2. Create your feature branch (
git checkout -b feature/amazing-feature)
3. Commit your changes (git commit -m 'Add some amazing feature')
4. Push to the branch (git push origin feature/amazing-feature`)- Initial release
- Vietnamese accent normalization
- Text masking functionality
- Text normalization utilities
- Full TypeScript support