Normalize Amharic text by mapping phonetically equivalent characters (homophones) to a canonical form. Useful for search, NLP, games, and text processing.
npm install amharic-normalizer


Normalize Amharic text by mapping phonetically equivalent characters (homophones) to a canonical form.
Amharic script has multiple characters that represent the same sound. For example:
| Sound | Characters | Normalized |
|-------|------------|------------|
| /h/ | ሀ ሐ ኀ | ሀ |
| /s/ | ሰ ሠ | ሰ |
| /ʔ/ | አ ዐ | አ |
| /ts'/ | ጸ ፀ | ጸ |
This causes problems when:
- Searching text - "ሰላም" won't match "ሠላም"
- Comparing strings - Two identical-sounding words appear different
- NLP preprocessing - Models treat homophones as different tokens
- Building games/quizzes - User input might use different character variants
This package solves these problems by normalizing all homophones to a single canonical form.
``bash`
npm install amharic-normalizer
`typescript
import { normalizeAmharic } from "amharic-normalizer";
// Normalize text
normalizeAmharic("ሠላም"); // → "ሰላም"
normalizeAmharic("ሐበሻ"); // → "ሀበሻ"
normalizeAmharic("ዐማርኛ"); // → "አማርኛ"
// Mixed text works too
normalizeAmharic("Hello ሠላም!"); // → "Hello ሰላም!"
// Compare normalized strings
const input1 = normalizeAmharic("ሰላም");
const input2 = normalizeAmharic("ሠላም");
console.log(input1 === input2); // true
`
`typescript
import { AMHARIC_NORMALIZATION_MAP } from "amharic-normalizer";
// Check if a character has a normalized form
console.log(AMHARIC_NORMALIZATION_MAP["ሠ"]); // "ሰ"
console.log(AMHARIC_NORMALIZATION_MAP["ሰ"]); // "ሰ"
`
| Family | Variants | Canonical | Description |
|--------|----------|-----------|-------------|
| H | ሀ ሐ ኀ (+ all vowel forms) | ሀ series | Glottal fricative /h/ |
| S | ሰ ሠ (+ all vowel forms) | ሰ series | Voiceless alveolar fricative /s/ |
| A | አ ዐ (+ all vowel forms) | አ series | Glottal stop /ʔ/ |
| TS | ጸ ፀ (+ all vowel forms) | ጸ series | Ejective alveolar affricate /ts'/ |
Each family includes all 7 vowel forms (ə, u, i, a, e, ɨ, o).
Normalizes Amharic text by replacing phonetically equivalent characters with their canonical form.
Parameters:
- text` - The Amharic text to normalize
Returns:
- The normalized text
A map of Amharic characters to their canonical forms. Characters not in the map are left unchanged.
MIT
Eyasu Lingerih