Clause segmentation extension for GLOST - segments sentences into clauses
npm install glost-clause-segmenterglost-en/segmenter - English segmentation rules
glost-th/segmenter - Thai segmentation rules
glost-ja/segmenter - Japanese segmentation rules (coming soon)
bash
Core segmenter (required)
npm install glost-clause-segmenter
Language-specific provider (pick your language)
npm install glost-en # English
npm install glost-th # Thai
`
Usage
$3
`typescript
import { createClauseSegmenterExtension } from "glost-clause-segmenter";
import { englishSegmenterProvider } from "glost-en/segmenter";
const segmenter = createClauseSegmenterExtension({
targetLanguage: "en",
provider: englishSegmenterProvider
});
const result = await processGLOSTWithExtensionsAsync(document, [segmenter]);
`
$3
`typescript
import { createClauseSegmenterExtension } from "glost-clause-segmenter";
import { thaiSegmenterProvider } from "glost-th/segmenter";
const segmenter = createClauseSegmenterExtension({
targetLanguage: "th",
provider: thaiSegmenterProvider
});
`
Provider Interface
Language packages implement the ClauseSegmenterProvider interface:
`typescript
interface ClauseSegmenterProvider {
segmentSentence(
words: string[],
language: string
): Promise;
detectMood?(
sentenceText: string,
language: string
): Promise;
}
`
$3
`typescript
import type { ClauseSegmenterProvider, SegmentationResult } from "glost-clause-segmenter";
const myCustomProvider: ClauseSegmenterProvider = {
async segmentSentence(words, language) {
const boundaries = [];
// Your language-specific logic here
for (let i = 0; i < words.length; i++) {
const word = words[i];
if (isSubordinator(word)) {
boundaries.push({
position: i,
clauseType: "subordinate",
marker: word,
includeMarker: true
});
}
}
return { boundaries };
},
async detectMood(text, language) {
// Optional: detect sentence mood
return "declarative";
}
};
`
API
$3
Creates a clause segmenter extension.
Options:
- targetLanguage (required): Language code (e.g., "en", "th")
- provider (required): Language-specific segmenter provider
- includeMarkers: Whether to include markers in clause nodes (default: true)
Returns: GLOSTExtension
Types
$3
Detected clause boundary:
`typescript
interface ClauseBoundary {
position: number; // Word index
clauseType: ClauseType; // Type of clause
marker: string; // The conjunction/marker
includeMarker?: boolean; // Whether to include marker
}
`
$3
`typescript
type ClauseType =
| "main" // Main clause
| "subordinate" // Subordinate clause
| "relative" // Relative clause
| "causal" // Causal clause (because, since)
| "conditional" // Conditional clause (if, unless)
| "temporal" // Temporal clause (when, while)
| "complement" // Complement clause (that, whether)
| "coordinate"; // Coordinated clause (and, but, or)
`
$3
`typescript
type GrammaticalMood =
| "declarative" // Statement
| "interrogative" // Question
| "imperative" // Command
| "conditional"; // Conditional statement
`
Philosophy
$3
The clause segmenter package is language agnostic:
- ✅ Defines the provider interface
- ✅ Implements the transformation logic
- ✅ Handles document traversal
- ❌ NO language-specific rules
$3
Language packages provide language-specific implementations:
- ✅ Clause markers (conjunctions, particles)
- ✅ Segmentation rules
- ✅ Mood detection
- ✅ Cultural/linguistic nuances
Benefits:
- Single extension works for all languages
- Data stays in language packages (single source of truth)
- Easy to add new languages
- Clear separation of concerns
Implementation Guide
$3
To add clause segmentation support for your language:
1. Create segmenter module in your language package:
`
glost-[lang]/
src/
segmenter/
index.ts # Your provider implementation
`
2. Implement the provider:
`typescript
import type { ClauseSegmenterProvider } from "glost-clause-segmenter";
export const myLanguageSegmenterProvider: ClauseSegmenterProvider = {
async segmentSentence(words, language) {
// Your segmentation logic
}
};
`
3. Export from package.json:
`json
{
"exports": {
"./segmenter": {
"types": "./dist/segmenter/index.d.ts",
"default": "./dist/segmenter/index.js"
}
}
}
`
4. Add dependency:
`json
{
"dependencies": {
"glost-clause-segmenter": "workspace:*"
}
}
``