Advanced profanity filter for Thai and English text with multiple detection methods
npm install bad-words-thaiA comprehensive, modern profanity filter for Thai and English languages with advanced fuzzy matching, karaoke transliteration support, and multiple detection methods.
- Language-Aware Detection: Uses Google's CLD (Compact Language Detector) to detect language first
- Exact Matching: Direct word matching with case-insensitive options
- Fuzzy Matching: Levenshtein distance and string similarity algorithms
- Thai-Specific Support: Handles tone marks, vowel variations, and Thai script complexities
- Smart Karaoke Detection: Context-aware detection of Thai profanity in English characters
- Leetspeak Detection: Identifies common character substitutions (f*ck, sh1t, etc.)
- Variation Detection: Catches intentional misspellings and character repetitions
- Multi-language: Simultaneous Thai and English profanity detection
- Language selection (Thai, English, or both)
- Adjustable similarity thresholds
- Custom bad word dictionaries
- Whitelist support
- Configurable replacement characters
- Performance optimization settings
``bash`
npm install bad-words-thai
`bash`
git clone https://github.com/obbaeiei/bad-words-thai.git
cd bad-words-thai
npm install
npm run build
`typescript
import { ProfanityFilter } from 'bad-words-thai';
const filter = new ProfanityFilter();
// Basic usage - now with language detection and default ignore list!
const result = filter.check('āđāļŦāļĩāđāļĒāđāļāđ this is fucking bad');
console.log(result.isClean); // false
console.log(result.detectedWords); // Array of detected profanity
console.log(result.cleanedText); // Censored version
// No more false positives - these work by default:
console.log(filter.check('i love you').isClean); // true
console.log(filter.check('āļŦāļĩāļ').isClean); // true (āļŦāļĩāļ = chest/box)
console.log(filter.check('āļāļąāļāļāļ').isClean); // true (compound context)
console.log(filter.check('āļŠāļąāļŠāļāļĩ').isClean); // true (typo of āļŠāļ§āļąāļŠāļāļĩ)
`
`typescript`
const filter = new ProfanityFilter({
languages: ['thai', 'english'], // Languages to check
detectKaraoke: true, // Enable karaoke detection
levenshteinThreshold: 0.8, // Fuzzy matching sensitivity
similarityThreshold: 0.7, // String similarity threshold
customBadWords: ['mybadword'], // Add custom words
whitelistWords: ['whitelist'], // Words to ignore completely
ignoreList: ['āļŦāļĩāļ', 'āļŠāļąāļŠāļāļĩ', 'āļāļ'], // Default: ["āļŦāļĩāļ", "āļŠāļąāļŠāļāļĩ", "āļŦāļāđāļēāļŦāļĩāļ", "āļāļ"]
replaceChar: '*', // Censorship character
checkVariations: true, // Check word variations
checkLeetspeak: true, // Check l33tspeak
checkRepeatingChars: true, // Check repeated chars
maxRepeatingChars: 2, // Max allowed repetitions
caseInsensitive: true // Case sensitivity
});
`typescript
// Exact Thai profanity
filter.check('āđāļŦāļĩāđāļĒāđāļāđ'); // Detected
// With tone mark variations
filter.check('āđāļŦāļĩāđāļĒ'); // Detected as variation of āđāļŦāļĩāđāļĒ
// Thai variations
filter.check('āļŠāļēāļŠ'); // Detected as variation of āļŠāļąāļŠ
`
`typescript`
// Thai profanity in English characters
filter.check('You are such a hia'); // Detects 'hia' as āđāļŦāļĩāđāļĒ
filter.check('kuay mueng sus'); // Detects multiple Thai words
`typescript
// Leetspeak
filter.check('fvck this sh1t'); // Detects fck and sht
// Similar words
filter.check('fack'); // Detected as similar to 'fuck'
// Repeating characters
filter.check('fuuuuck'); // Normalized and detected
`
`typescript`
const result = filter.check('Hello āđāļŦāļĩāđāļĒ world, fucking āļŠāļąāļŠ!');
console.log(result.detectedWords);
// Shows Thai and English detections with language tags
`typescript
// NO configuration needed! Ignore list works by default
const filter = new ProfanityFilter();
// These are clean by default (built-in ignore list):
filter.check('āļŦāļĩāļ'); // Clean (āļŦāļĩāļ = chest/box)
filter.check('āļŦāļāđāļēāļŦāļĩāļ'); // Clean (āļŦāļāđāļēāļŦāļĩāļ = front of chest)
filter.check('āļŠāļąāļŠāļāļĩ'); // Clean (āļŠāļąāļŠāļāļĩ = typo of āļŠāļ§āļąāļŠāļāļĩ)
filter.check('āļāļąāļāļāļ'); // Clean (āļāļ in compound context)
// But standalone bad words are still detected:
filter.check('āđāļāļ'); // Bad (standalone profanity)
filter.check('āļŠāļąāļŠ'); // Bad (not in default ignore list)
// You can customize the ignore list if needed:
const customFilter = new ProfanityFilter({
ignoreList: ['āļŦāļĩāļ', 'āļŠāļąāļŠāļāļĩ', 'myword'] // Override defaults
});
`
`typescript`
interface FilterResult {
isClean: boolean; // True if no profanity detected
detectedWords: DetectedWord[]; // Array of detected profanity
cleanedText?: string; // Censored version (if enabled)
severity: 'none' | 'mild' | 'moderate' | 'severe';
confidence: number; // Overall detection confidence
}
`typescript`
interface DetectedWord {
word: string; // Original bad word from dictionary
originalWord: string; // Word found in text
position: number; // Position in text
length: number; // Length of detected word
method: DetectionMethod; // How it was detected
confidence: number; // Detection confidence (0-1)
language: 'thai' | 'english' | 'karaoke';
}
`typescript
// Check text for profanity
filter.check(text: string): FilterResult
// Dynamic word management
filter.addCustomWord(word: string): void
filter.removeCustomWord(word: string): void
filter.addWhitelistWord(word: string): void
filter.removeWhitelistWord(word: string): void
filter.addIgnoreWord(word: string): void
filter.removeIgnoreWord(word: string): void
// Update configuration
filter.updateOptions(options: Partial
`
- Optimized Algorithms: Uses fastest-levenshtein for distance calculations
- Smart Caching: Variation maps generated once and reused
- Parallel Detection: Multiple detection methods run concurrently
- Deduplication: Removes overlapping detections automatically
`bash`
npm test
Run the comprehensive test suite covering:
- Basic functionality
- Thai-specific features
- Karaoke detection
- Fuzzy matching algorithms
- Configuration options
- Edge cases and performance
See examples/basic-usage.ts for comprehensive usage examples:
`bash``
npm run dev
MIT License - see LICENSE file for details.
1. Fork the repository
2. Create your feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit a pull request
- Built with modern TypeScript and Jest
- Google CLD2 for language detection (98.82% accuracy)
- Uses fastest-levenshtein for optimal performance
- Incorporates string-similarity for advanced matching
- Researched modern profanity filtering techniques from 2024-2025