High-performance extensible transliteration library with hub-and-spoke architecture
npm install shleshaA transliteration library for Sanskrit and Indic scripts using schema-driven architecture. Built with compile-time optimization and runtime schema loading.
Setup command:
``bash`
./scripts/quick-start.sh
This sets up everything: Rust environment, Python bindings, WASM support, and runs all tests.
For detailed setup instructions, see DEVELOPER_SETUP.md.
Documentation: See DOCUMENTATION_INDEX.md for guides and references.
---
- Schema-generated converters with compile-time optimization
- Zero runtime overhead from code generation
- Token-based conversion system for memory efficiency
Converters are generated at compile-time from declarative schemas:
`yamlschemas/slp1.yaml - Generates optimized SLP1 converter
metadata:
name: "slp1"
script_type: "roman"
description: "Sanskrit Library Phonetic Basic"
target: "iso15919"
mappings:
vowels:
"A": "ā"
"I": "ī"
"U": "ū"
# ... more mappings
`
`yamlschemas/bengali.yaml - Generates optimized Bengali converter
metadata:
name: "bengali"
script_type: "brahmic"
description: "Bengali/Bangla script"
mappings:
vowels:
"অ": "अ" # Bengali A → Devanagari A
"আ": "आ" # Bengali AA → Devanagari AA
# ... more mappings
`
The build system automatically generates highly optimized converters:
`bash`Build output showing schema processing
warning: Processing YAML schemas...
warning: Generating optimized converters with Handlebars templates...
warning: Created 18 schema-generated converters with O(1) lookups
- Devanagari Hub: Central format for Indic scripts (तमिल → देवनागरी → गुजराती)
- ISO-15919 Hub: Central format for romanization schemes (ITRANS → ISO → IAST)
- Cross-Hub Conversion: Seamless Indic ↔ Roman via both hubs
- Direct Conversion: Bypass hubs when possible for maximum performance
The system determines the conversion path:
`rust
// Direct passthrough - zero conversion cost
transliterator.transliterate("धर्म", "devanagari", "devanagari")?; // instant
// Single hub - one conversion
transliterator.transliterate("धर्म", "devanagari", "iso")?; // deva→iso
// Cross-hub - optimized path
transliterator.transliterate("dharma", "itrans", "bengali")?; // itrans→iso→deva→bengali
`
, deva) - Sanskrit, Hindi, Marathi
- Bengali (bengali, bn) - Bengali/Bangla script
- Tamil (tamil, ta) - Tamil script
- Telugu (telugu, te) - Telugu script
- Gujarati (gujarati, gu) - Gujarati script
- Kannada (kannada, kn) - Kannada script
- Malayalam (malayalam, ml) - Malayalam script
- Odia (odia, od) - Odia/Oriya script
- Gurmukhi (gurmukhi, pa) - Punjabi script
- Sinhala (sinhala, si) - Sinhala script
- Sharada (sharada, shrd) - Historical script of Kashmir, crucial for Vedic manuscripts
- Tibetan (tibetan, tibt, bo) - Important for Buddhist Vedic transmission
- Thai (thai, th) - Adapted from Grantha for Buddhist Vedic texts$3
- ISO-15919 (iso15919, iso) - International standard
- ITRANS (itrans) - Indian languages TRANSliteration
- SLP1 (slp1) - Sanskrit Library Phonetic Basic
- Harvard-Kyoto (harvard_kyoto, hk) - ASCII-based scheme
- Velthuis (velthuis) - TeX-compatible scheme
- WX (wx) - ASCII-based notation$3
- IAST (iast) - International Alphabet of Sanskrit Transliteration
- Kolkata (kolkata) - Regional romanization scheme
- Grantha (grantha) - Classical Sanskrit scriptUsage Examples
$3
`rust
use shlesha::Shlesha;let transliterator = Shlesha::new();
// High-performance cross-script conversion
let result = transliterator.transliterate("धर्म", "devanagari", "gujarati")?;
println!("{}", result); // "ધર્મ"
// Roman to Indic conversion
let result = transliterator.transliterate("dharmakṣetra", "slp1", "tamil")?;
println!("{}", result); // "தர்மக்ஷேத்ர"
// Schema-generated converters in action
let result = transliterator.transliterate("dharmakSetra", "slp1", "iast")?;
println!("{}", result); // "dharmakśetra"
`$3
`python
import shleshaCreate transliterator with all schema-generated converters
transliterator = shlesha.Shlesha()Fast schema-based conversion
result = transliterator.transliterate("ধর্ম", "bengali", "telugu")
print(result) # "ధర్మ"Performance with metadata tracking
result = transliterator.transliterate_with_metadata("धर्मkr", "devanagari", "iast")
print(f"Output: {result.output}") # "dharmakr"
print(f"Unknown tokens: {len(result.metadata.unknown_tokens)}")Runtime extensibility
scripts = shlesha.get_supported_scripts()
print(f"Supports {len(scripts)} scripts: {scripts}")
`$3
`bash
Schema-generated high-performance conversion
shlesha transliterate --from slp1 --to devanagari "dharmakSetra"
Output: धर्मक्षेत्र
Cross-script conversion via dual hubs
shlesha transliterate --from itrans --to tamil "dharma"
Output: தர்ம
List all schema-generated + hand-coded scripts
shlesha scripts
Output: bengali, devanagari, gujarati, harvard_kyoto, iast, iso15919, itrans, ...
`$3
`javascript
import init, { WasmShlesha } from './pkg/shlesha.js';async function demo() {
await init();
const transliterator = new WasmShlesha();
// Schema-generated converter performance in browser
const result = transliterator.transliterate("કર્મ", "gujarati", "devanagari");
console.log(result); // "कर्म"
// Runtime script discovery
const scripts = transliterator.listSupportedScripts();
console.log(
${scripts.length} scripts available);
}
`Runtime Schema Loading
Shlesha supports runtime schema loading across all APIs to add custom scripts without recompilation.
$3
`rust
use shlesha::Shlesha;let mut transliterator = Shlesha::new();
// Load custom schema from YAML content
let custom_schema = r#"
metadata:
name: "my_custom_script"
script_type: "roman"
has_implicit_a: false
description: "My custom transliteration scheme"
target: "iso15919"
mappings:
vowels:
"a": "a"
"e": "ē"
consonants:
"k": "k"
"t": "ṭ"
"#;
// Load the schema at runtime
transliterator.load_schema_from_string(custom_schema, "my_custom_script")?;
// Use immediately without recompilation
let result = transliterator.transliterate("kate", "my_custom_script", "devanagari")?;
println!("{}", result); // "काटे"
// Schema management
let info = transliterator.get_schema_info("my_custom_script").unwrap();
println!("Loaded {} with {} mappings", info.name, info.mapping_count);
`$3
`python
import shleshatransliterator = shlesha.Shlesha()
Load schema from YAML string
yaml_content = """
metadata:
name: "custom_script"
script_type: "roman"
has_implicit_a: false
description: "Custom transliteration"target: "iso15919"
mappings:
vowels:
"a": "a"
consonants:
"k": "k"
"""
Runtime loading
transliterator.load_schema_from_string(yaml_content, "custom_script")Immediate usage
result = transliterator.transliterate("ka", "custom_script", "devanagari")
print(result) # "क"Schema info
info = transliterator.get_schema_info("custom_script")
print(f"Script: {info['name']}, Mappings: {info['mapping_count']}")Schema management
transliterator.remove_schema("custom_script")
transliterator.clear_runtime_schemas()
`$3
`javascript
import init, { WasmShlesha } from './pkg/shlesha.js';async function loadCustomScript() {
await init();
const transliterator = new WasmShlesha();
// Define custom schema
const yamlContent =
target: "iso15919"
mappings:
vowels:
"a": "a"
consonants:
"k": "k";Name: ${info.name}, Mappings: ${info.mapping_count}
// Load at runtime
transliterator.loadSchemaFromString(yamlContent, "custom_script");
// Use immediately
const result = transliterator.transliterate("ka", "custom_script", "devanagari");
console.log(result); // "क"
// Get schema information
const info = transliterator.getSchemaInfo("custom_script");
console.log();`
}
- ✅ Load from YAML strings - No file system required
- ✅ Load from file paths - For development workflows
- ✅ Schema validation - Automatic error checking
- ✅ Hot reloading - Add/remove schemas dynamically
- ✅ Schema introspection - Get metadata about loaded schemas
- ✅ Memory management - Clear schemas when done
- ✅ Cross-platform - Identical API across Rust, Python, WASM
Development & Testing
`rust`
// Test schema variations quickly
transliterator.load_schema_from_string(variant_a, "test_a")?;
transliterator.load_schema_from_string(variant_b, "test_b")?;
// Compare results immediately
Dynamic Applications
`python`User uploads custom transliteration scheme
user_schema = request.files['schema'].read().decode('utf-8')
transliterator.load_schema_from_string(user_schema, user_id)Use immediately in application
Configuration-Driven Systems
`javascript`
// Load schemas from configuration
config.schemas.forEach(schema => {
transliterator.loadSchemaFromString(schema.content, schema.name);
});
Shlesha uses a hub-and-spoke architecture with schema-generated converters, trading some performance for extensibility compared to direct conversion approaches.
- Competitive with other transliteration libraries
- Schema-generated converters match hand-coded performance
- Optimized for both short and long text processing
| Aspect | Shlesha | Vidyut |
|--------|---------|---------|
| Performance | Hub-based | Direct conversion |
| Extensibility | Runtime schemas | Compile-time only |
| Script Support | 15+ (easily expandable) | Limited |
| Architecture | Hub-and-spoke | Direct conversion |
| Bindings | Rust/Python/WASM/CLI | Rust only |
Adding support for new scripts with schemas:
`yamlschemas/new_script.yaml
metadata:
name: "NewScript"
description: "Description of the script"
unicode_block: "NewScript"
has_implicit_vowels: true
mappings:
vowels:
- source: "𑀅" # New script character
target: "अ" # Devanagari equivalent
# ... add more mappings
`
`bash`Rebuild to include new script
cargo buildNew script automatically available!
Converters are generated using Handlebars templates for consistency:
`handlebars
{{!-- templates/indic_converter.hbs --}}
/// {{metadata.description}} converter generated from schema
pub struct {{pascal_case metadata.name}}Converter {
{{snake_case metadata.name}}_to_deva_map: HashMap
deva_to_{{snake_case metadata.name}}_map: HashMap
}
impl {{pascal_case metadata.name}}Converter {
pub fn new() -> Self {
// Generated O(1) lookup tables
let mut {{snake_case metadata.name}}_to_deva = HashMap::new();
{{#each character_mappings}}
{{snake_case ../metadata.name}}_to_deva.insert('{{this.source}}', '{{this.target}}');
{{/each}}
// ... template continues
}
}
`
- 127 tests covering all functionality
- Schema-generated converter tests for all 14 generated converters
- Performance regression tests ensuring schema = hand-coded speed
- Cross-script conversion matrix testing all 210+ pairs
- Unknown character handling
`bashTest schema-generated converters maintain performance
cargo test --lib
Build Configuration & Features
$3
`bash
Default: Schema-generated + hand-coded converters
cargo buildDevelopment mode with schema recompilation
cargo build --features "schema-dev"Minimal build (hand-coded only)
cargo build --no-default-features --features "hand-coded-only"All features (Python + WASM + CLI)
cargo build --features "python,wasm,cli"
`$3
`rust
let mut transliterator = Shlesha::new();// Load additional schemas at runtime (future feature)
transliterator.load_schema("path/to/new_script.yaml")?;
// Schema registry access
let scripts = transliterator.list_supported_scripts();
println!("Dynamically loaded: {:?}", scripts);
`Advanced Features
$3
`rust
// Track unknown characters and conversion details
let result = transliterator.transliterate_with_metadata("धर्मkr", "devanagari", "iast")?;if let Some(metadata) = result.metadata {
println!("Conversion: {} → {}", metadata.source_script, metadata.target_script);
for unknown in metadata.unknown_tokens {
println!("Unknown '{}' at position {}", unknown.token, unknown.position);
}
}
`$3
`rust
// Schema-aware script properties
let registry = ScriptConverterRegistry::default();// Indic scripts have implicit vowels
assert!(registry.script_has_implicit_vowels("bengali").unwrap());
assert!(registry.script_has_implicit_vowels("devanagari").unwrap());
// Roman schemes don't
assert!(!registry.script_has_implicit_vowels("itrans").unwrap());
assert!(!registry.script_has_implicit_vowels("slp1").unwrap());
`$3
`rust
// Fine-grained control over conversion paths
let hub = Hub::new();// Direct hub operations
let iso_text = hub.deva_to_iso("धर्म")?; // Devanagari → ISO
let deva_text = hub.iso_to_deva("dharma")?; // ISO → Devanagari
// Cross-hub conversion with metadata
let result = hub.deva_to_iso_with_metadata("धर्म")?;
`Documentation
- Architecture Guide - Deep dive into hub-and-spoke design
- Schema Reference - Complete schema format documentation
- Performance Guide - Optimization techniques and benchmarks
- API Reference - Complete function and type reference
- Developer Setup - Development environment setup
- Release System - Automated release workflow overview
- Deployment Guide - Complete deployment and environment setup
- crates.io RC Support - Release candidate publishing guide
- Security Setup - Token management and environment security
- Contributing Guide - Guidelines for contributors
$3
`bash
Generate documentation
cargo doc --openRun all examples
cargo run --example shlesha_vs_vidyut_benchmark
cargo run --example roman_allocation_analysis Performance testing
cargo bench
`Releases
Shlesha uses an automated release system for publishing to package registries:
$3
`bash
Guided release process
./scripts/release.sh
`$3
`bash
Python (PyPI)
pip install shleshaWASM (npm)
npm install shlesha-wasmRust (crates.io)
cargo add shlesha
``See DEPLOYMENT.md for complete release documentation.
Contributions are welcome. The schema-driven architecture simplifies adding new scripts:
1. Add Schema: Create TOML/YAML mapping file
2. Test: Run test suite to verify
3. Benchmark: Ensure performance maintained
4. Submit: Open PR with schema and tests
See CONTRIBUTING.md for detailed guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
- Unicode Consortium for Indic script standards
- ISO-15919 for romanization standardization
- Sanskrit Library for SLP1 encoding schemes
- Vidyut Project for performance benchmarking standards
- Rust Community for excellent tools (PyO3, wasm-pack, handlebars)