Word frequency extension for GLOST - generates and formats frequency data
npm install glost-frequencybash
pnpm add glost-frequency
`
Usage
$3
`typescript
import { createFrequencyGeneratorExtension, createFrequencyEnhancerExtension } from "glost-frequency";
import { createThaiFrequencyProvider } from "glost-th/extensions";
// Create language-specific provider
const thaiProvider = createThaiFrequencyProvider(datasource);
// Create extensions
const generator = createFrequencyGeneratorExtension({
targetLanguage: "th",
provider: thaiProvider
});
const enhancer = createFrequencyEnhancerExtension({
normalize: true
});
// Process
const result = await processGLOSTWithExtensionsAsync(doc, [generator, enhancer]);
`
$3
If your documents already have frequency data, you can use just the enhancer:
`typescript
import { FrequencyEnhancerExtension } from "glost-frequency";
import { processGLOSTWithExtensions } from "glost-extensions";
// Synchronous processing (no generator needed)
const result = processGLOSTWithExtensions(document, [FrequencyEnhancerExtension]);
`
Frequency Levels
The extension uses four standard frequency levels:
- rare - Infrequently used words
- uncommon - Less common words
- common - Commonly used words
- very-common - Very frequently used words
Provider Pattern Benefits
1. Data Integrity: Only real corpus data, no guessing
2. Language-Specific: Each language can have optimized providers
3. Data Source Flexibility: Use different corpora or validated resources
4. Composability: Mix and match providers and enhancers
5. Testability: Mock providers for testing
6. Graceful Degradation: Returns undefined when no data available
API
$3
Creates extension that populates frequency data.
Options:
- targetLanguage - ISO-639-1 language code
- provider - FrequencyProvider instance
- skipExisting - Skip words with existing frequency (default: true)
$3
Creates extension that formats frequency data.
Options:
- normalize - Normalize frequency values (default: true)
- customMapping - Word → frequency mappings
$3
Implement the FrequencyProvider interface with real corpus data:
`typescript
import type { FrequencyProvider, FrequencyLevel } from "glost-frequency";
export function createMyFrequencyProvider(corpusData: Map): FrequencyProvider {
return {
async getFrequency(word, language) {
const count = corpusData.get(word);
if (!count) return undefined; // No data? Return undefined, don't guess!
// Map corpus counts to frequency levels based on your data
if (count > 10000) return "very-common";
if (count > 1000) return "common";
if (count > 100) return "uncommon";
return "rare";
}
};
}
`
$3
Convenience function that creates both generator and enhancer.
Returns: [generator, enhancer]
Migration from glost-extensions
Before (v0.1.x):
`typescript
import { FrequencyExtension } from "glost-extensions";
processGLOSTWithExtensions(doc, [FrequencyExtension]);
`
After (v0.2.0+):
`typescript
import { createFrequencyExtension } from "glost-frequency";
import { createThaiFrequencyProvider } from "glost-th/extensions";
// Use real corpus data provider
const provider = createThaiFrequencyProvider({
corpusData: thaiNationalCorpusFrequencies
});
const [generator, enhancer] = createFrequencyExtension({
targetLanguage: "th",
provider
});
await processGLOSTWithExtensionsAsync(doc, [generator, enhancer]);
``