Core library for **thalo** (Thought And Lore Language). Provides parsing, semantic analysis, validation, and workspace management for `.thalo` files.
npm install @rejot-dev/thaloCore library for thalo (Thought And Lore Language). Provides parsing, semantic analysis,
validation, and workspace management for .thalo files.
- Parser: Parse .thalo files and extract thalo blocks from Markdown
- Incremental Parsing: Efficient incremental updates via Tree-sitter's edit API
- Fragment Parsing: Parse individual expressions (queries, values, type expressions)
- AST: Strongly-typed abstract syntax tree with inline syntax error nodes
- Semantic Analysis: Build semantic models with link indexes and schema resolution
- Schema: Entity schema registry with define-entity / alter-entity support
- Checker: Visitor-based validation rules with incremental checking support
- Services: Definition lookup, references, hover, query execution, and semantic tokens
- Source Mapping: Position translation for embedded blocks in Markdown files
``bash`
pnpm add @rejot-dev/thalo
`typescript
import { Workspace } from "@rejot-dev/thalo";
const workspace = new Workspace();
// Add documents (schemas first, then instances)
workspace.addDocument(schemaSource, { filename: "entities.thalo" });
workspace.addDocument(instanceSource, { filename: "entries.thalo" });
// Access semantic models
const model = workspace.getModel("entries.thalo");
console.log(model.ast.entries); // Array of parsed entries
// Query links across workspace
const linkDef = workspace.getLinkDefinition("my-link-id");
const refs = workspace.getLinkReferences("my-link-id");
`
The workspace supports efficient incremental updates:
`typescript
import { Workspace } from "@rejot-dev/thalo";
const workspace = new Workspace();
workspace.addDocument(source, { filename: "test.thalo" });
// Apply an incremental edit (more efficient than re-adding)
const invalidation = workspace.applyEdit(
"test.thalo",
startLine,
startColumn,
endLine,
endColumn,
newText,
);
// Check what was invalidated
if (invalidation.schemasChanged) {
// Entity schemas changed - may affect other files
}
if (invalidation.linksChanged) {
// Link definitions/references changed
}
// Get files affected by changes in a specific file
const affected = workspace.getAffectedFiles("test.thalo");
`
The checker reports both syntax errors (from AST) and semantic errors (from rules):
`typescript
import { Workspace, check } from "@rejot-dev/thalo";
const workspace = new Workspace();
workspace.addDocument(source, { filename: "test.thalo" });
const diagnostics = check(workspace);
for (const d of diagnostics) {
// Syntax errors have codes like "syntax-missing_timezone"
// Semantic errors have codes like "unknown-entity", "unresolved-link"
console.log(${d.severity}: ${d.message} (${d.code}));`
}
`typescript
import { check } from "@rejot-dev/thalo";
const diagnostics = check(workspace, {
rules: {
"unknown-field": "off", // Disable this rule
"unresolved-link": "error", // Upgrade to error
},
});
`
Parse individual expressions without a full document context:
`typescript
import { parseFragment, parseQuery } from "@rejot-dev/thalo";
import { createParser } from "@rejot-dev/thalo/native";
// Create a parser (native for Node.js, or web parser for browser)
const parser = createParser();
// Parse a query expression
const queryResult = parseQuery(parser, "lore where subject = ^self and #career");
if (queryResult.valid) {
console.log(queryResult.node.type); // "query"
}
// Parse a type expression
const typeResult = parseFragment(parser, "type_expression", 'string | "literal"');
`
Query entries in a workspace based on synthesis source definitions:
`typescript
import { Workspace, executeQuery } from "@rejot-dev/thalo";
const workspace = new Workspace();
// ... add documents ...
const results = executeQuery(workspace, {
entity: "lore",
conditions: [
{ kind: "field", field: "subject", value: "^self" },
{ kind: "tag", tag: "career" },
],
});
`
Handle positions in markdown files with embedded thalo blocks:
`typescript
import { parseDocument, findBlockAtPosition, toFileLocation } from "@rejot-dev/thalo";
const parsed = parseDocument(markdownSource, { filename: "notes.md" });
// Find which block contains a file position
const match = findBlockAtPosition(parsed.blocks, { line: 10, column: 5 });
if (match) {
// Convert block-relative location to file-absolute
const fileLocation = toFileLocation(match.block.sourceMap, entryLocation);
}
`
| Export | Description |
| --------------------------- | ---------------------------------------------------------- |
| @rejot-dev/thalo | Main entry (parser, workspace, checker, services, types) |@rejot-dev/thalo/native
| | Native parser factory (createParser) |@rejot-dev/thalo/web
| | WASM parser factory (createParser) for browsers |@rejot-dev/thalo/ast
| | AST types, builder, visitor, extraction |@rejot-dev/thalo/model
| | Workspace, Document, LineIndex, model types |@rejot-dev/thalo/semantic
| | SemanticModel, analyzer, link index types |@rejot-dev/thalo/schema
| | SchemaRegistry, EntitySchema types |@rejot-dev/thalo/checker
| | Validation rules, visitor pattern, check functions |@rejot-dev/thalo/services
| | Definition, references, hover, query execution, entity nav |
Both /native and /web exports provide a createParser() function that returns a ThaloParser
instance with a consistent API:
`typescript
// Native (Node.js) - synchronous
import { createParser } from "@rejot-dev/thalo/native";
const parser = createParser();
// Web (WASM) - asynchronous, requires WASM binaries
import { createParser } from "@rejot-dev/thalo/web";
const [treeSitterWasm, languageWasm] = await Promise.all([
fetch("/wasm/web-tree-sitter.wasm").then((r) => r.arrayBuffer()),
fetch("/wasm/tree-sitter-thalo.wasm").then((r) => r.arrayBuffer()),
]);
const parser = await createParser({
treeSitterWasm: new Uint8Array(treeSitterWasm),
languageWasm: new Uint8Array(languageWasm),
});
// Both return the same ThaloParser interface
const tree = parser.parse(source);
const doc = parser.parseDocument(source, { fileType: "thalo" });
const tree2 = parser.parseIncremental(newSource, tree);
`
For Vite projects, you can import the WASM files directly using Vite's ?url suffix:
`typescript
import { createParser } from "@rejot-dev/thalo/web";
import treeSitterWasmUrl from "web-tree-sitter/web-tree-sitter.wasm?url";
import languageWasmUrl from "@rejot-dev/tree-sitter-thalo/tree-sitter-thalo.wasm?url";
const [treeSitterWasm, languageWasm] = await Promise.all([
fetch(treeSitterWasmUrl).then((r) => r.arrayBuffer()),
fetch(languageWasmUrl).then((r) => r.arrayBuffer()),
]);
const parser = await createParser({
treeSitterWasm: new Uint8Array(treeSitterWasm),
languageWasm: new Uint8Array(languageWasm),
});
`
Both WASM files are re-exported from this package for convenience:
- @rejot-dev/thalo/web-tree-sitter.wasm — The web-tree-sitter runtime@rejot-dev/thalo/tree-sitter-thalo.wasm
- — The thalo language grammar
In environments like Cloudflare Workers, WASM imports are provided as pre-compiled
WebAssembly.Module instances rather than raw bytes. The web parser supports this directly:
`typescript
import { createParser } from "@rejot-dev/thalo/web";
import treeSitterWasm from "./tree-sitter.wasm"; // WebAssembly.Module
import languageWasm from "./tree-sitter-thalo.wasm"; // WebAssembly.Module
const parser = await createParser({ treeSitterWasm, languageWasm });
`
The main entry (@rejot-dev/thalo) provides convenience functions that use a singleton native
parser internally for backwards compatibility.
The checker validates documents using rules organized into categories:
- Instance: Validates instance entries — entity types, required fields, field types, sections
- Link: Validates link references and definitions — unresolved links, duplicate IDs
- Schema: Validates entity schema definitions — duplicates, alter/define ordering, defaults
- Metadata: Validates metadata values — empty values, date range formats
- Content: Validates entry content structure — duplicate section headings
Rules use a visitor pattern for efficient single-pass execution. Each rule declares a scope (entry,
document, or workspace) to enable incremental checking when documents change.
To see all available rules, use the CLI:
`bash`
thalo rules list
The library follows a clear pipeline from source to services:
``
Source → Tree-sitter → CST → AST Builder → AST → Semantic Analyzer → SemanticModel
↓ ↓
Formatter Checker
(CST only) (syntax + semantic)
↓
Services
The workspace supports incremental updates at multiple levels:
1. Document Level: The Document class wraps source text with a LineIndex for efficientupdateSemanticModel
position lookups and tracks Tree-sitter trees per block
2. Semantic Level: incrementally updates link indexes and schema entrieslinkDependencies
3. Workspace Level: Dependency tracking (, entityDependencies) enables
targeted invalidation
4. Rule Level: Rules declare their scope and dependencies, enabling incremental checking
| Error Type | Created In | Examples |
| ------------ | ------------- | --------------------------------------------- |
| Syntax | AST Builder | Missing timezone, missing entity name |
| Semantic | Checker Rules | Unresolved link, unknown field, type mismatch |
Syntax errors are inline SyntaxErrorNode instances in the AST. Semantic errors are diagnostics
from checker rules.
``
src/
├── parser.ts # Default parser using native singleton (backwards compat)
├── parser.native.ts # Native parser factory (createParser)
├── parser.web.ts # WASM parser factory (createParser)
├── parser.shared.ts # Shared parser logic (ThaloParser interface, createThaloParser)
├── fragment.ts # Parse individual expressions (query, value, type)
├── source-map.ts # Position mapping for embedded blocks
├── constants.ts # Language constants (directives, types)
├── ast/
│ ├── types.ts # AST node types (Entry, Timestamp, SyntaxErrorNode, etc.)
│ ├── extract.ts # Extract AST from tree-sitter CST
│ ├── builder.ts # Build decomposed types (timestamps with parts)
│ ├── visitor.ts # AST traversal utilities (walkAst, collectSyntaxErrors)
│ └── node-at-position.ts # Find semantic context at cursor position
├── semantic/
│ ├── types.ts # SemanticModel, LinkIndex, LinkDefinition, LinkReference
│ └── analyzer.ts # Build SemanticModel from AST (indexes links, collects schemas)
├── model/
│ ├── types.ts # Model types (ModelSchemaEntry for SchemaRegistry)
│ ├── document.ts # Document class with LineIndex and incremental edit support
│ ├── line-index.ts # Fast offset↔position conversion
│ └── workspace.ts # Multi-document workspace with dependency tracking
├── schema/
│ ├── types.ts # Schema types (EntitySchema, FieldSchema, TypeExpr)
│ └── registry.ts # Schema registry with define/alter resolution
├── checker/
│ ├── types.ts # Rule, diagnostic, and dependency types
│ ├── check.ts # Main check functions (check, checkDocument, checkIncremental)
│ ├── visitor.ts # RuleVisitor interface and visitor execution
│ ├── workspace-index.ts # Pre-computed indices for efficient rule execution
│ └── rules/ # Individual validation rules (27 rules)
└── services/
├── definition.ts # Go-to-definition lookup
├── references.ts # Find-all-references
├── hover.ts # Hover information (entries, types, directives)
├── query.ts # Query execution engine
├── entity-navigation.ts # Entity/tag/field/section navigation
└── semantic-tokens.ts # LSP semantic tokens
`bashBuild
pnpm build
Future Work
$3
The Tree-Sitter grammar parses all values into typed AST nodes:
- ✅ Type validation using AST structure (no regex parsing)
- ✅ Query parsing at the grammar level
- ✅ Proper array element validation
- ✅ Typed default values (
quoted_value, link, datetime_value)
- ✅ Query execution engine for synthesis entries
- ✅ Decomposed timestamps with date/time/timezone parts
- ✅ Inline syntax error nodes for recoverable parse errors
- ✅ Incremental parsing and semantic analysis
- ✅ Visitor-based rule system with dependency declarationsRemaining work:
1. Entity-only queries: The grammar requires
where for query expressions. Supporting
entity-only queries (e.g., sources: lore) would require grammar changes to avoid ambiguity.$3
1. Cross-file link resolution: Currently, link references are validated within the workspace.
Future work could include import/export semantics for external references.
$3
1. Cyclic reference detection: Links can form cycles; detection would prevent infinite loops.
2. Schema evolution validation:
alter-entity` changes could be validated for backwards3. Content structure validation: Validate that content sections match the schema's section
definitions.