SMT-backed semantic diff tool for TypeScript business logic. Proves equivalence or finds counterexamples.
npm install semprovestring, number, boolean
+, -, *, /
&&, ||, !, comparisons
if/else, return, ternary, variable diffs
for, while)
%
bash
npm install -g semprove
`
Windows users:
`powershell
npx -y semprove@beta --help
`
---
๐ Quick Start
$3
`bash
semprove compare \
--old src/pricing.old.ts \
--new src/pricing.ts \
--fn calculateDiscount
`
Output:
- artifacts/result.json: Verdict (UNSAT/COUNTEREXAMPLE/TIMEOUT)
- artifacts/counterexample.json: Exact input where functions diverge (if found)
- artifacts/trace.json: Old/new return values and event logs
$3
`bash
semprove pr --base main --head feature-branch
`
Output:
- artifacts/pr-summary.json: CTO-friendly metrics (functions analyzed, verdicts, p95 time)
- Per-function artifacts in artifacts/[file]/[function]/
---
๐ก๏ธ Production-minded Safeguards
$3
When Z3 proves equivalence (UNSAT), semprove now validates with fast-check sampling:
`bash
semprove compare --old old.ts --new new.ts --fn myFunc --paranoid
`
- Normal mode: 200 sampling runs
- Paranoid mode (--paranoid): 2000 sampling runs
- Confidence field: "unsat+sampled" vs "unsat-only"
Why? Sampling is a soundness smoke-test against lowering bugs; it cannot guarantee UNSAT is correct.
---
$3
Semprove explicitly refuses constructs that can diverge between Z3 and runtime:
โ == / != operators (use === / !==)
โ String concatenation with +
โ undefined / null keywords
โ NaN / Infinity identifiers
Error message:
`
UNSUPPORTED: == operator (use === instead).
Reason: Type coercion can diverge between Z3 model and runtime.
`
---
$3
For event-based logic (function calls), semprove classifies changes:
- Low: Only logging/metrics tags changed
- Med: Guard conditions changed
- High: Order/args changed on business logic
Field in result: "eventDiffSeverity": "low" | "med" | "high"
---
$3
Default Safety Posture:
- Runtime replay and sampling are disabled by default when external calls are detected.
- This prevents accidental database writes, API calls, or file system mutations in CI.
- To override (at your own risk), pass --allowSideEffects.
Float Handling:
- Runtime replay treats near-equal numeric results as equal (relative epsilon) to avoid float-noise false positives.
---
$3
After semprove pr, get a CTO-friendly summary:
`json
{
"functionsAnalyzed": 12,
"functionsSkipped": 3,
"skippedReasons": { "UNSUPPORTED_FEATURE": 2, "TIMEOUT": 1 },
"verdictCounts": { "unsat": 8, "found": 3, "approved": 2 },
"runtimeP95Ms": 4500,
"topDivergences": [
{
"fn": "calculateDiscount",
"file": "src/pricing.ts",
"counterexample": { "price": 100, "tier": "gold" },
"oldRetRuntime": 90,
"newRetRuntime": 85,
"eventDiffSeverity": "high",
"approved": false
}
]
}
`
---
๐ Supported TypeScript Fragment
Semprove supports a tight, well-defined subset optimized for pure business logic:
โ
Arithmetic: +, -, *, /
โ
Comparisons: >, <, >=, <=, ===, !==
โ
Logical: &&, ||, !
โ
Control flow: if/else, switch/case, ternary
โ
Function calls (as uninterpreted functions)
โ Loops (for, while)
โ Arrays, objects with dynamic keys
โ Async/await, promises
โ Type coercions (==, undefined, null, NaN)
Non-goal: general TypeScript / app code. Semprove targets deterministic pricing/policy/rules.
See: SUPPORTED_FRAGMENT.md for full details (~20 lines of language)
---
๐ฏ Presets
$3
For financial calculations (pricing, VAT, discounts):
`bash
semprove compare --old old.ts --new new.ts --fn calculatePrice --money
`
- Inputs: integers (cents)
- Intermediates: rationals (supports * 0.9, / 1.20)
- Bounds: min/max to prevent overflow
---
๐ง Options
$3
| Option | Default | Description |
|--------|---------|-------------|
| --timeoutMs | 60000 | Z3 solver timeout (ms) |
| --[no-]ints | true | Treat numbers as integers |
| --realInputs | false | Treat numbers as reals (no integer constraint) |
| --money | false | Preset for financial calculations |
| --min | -1000000 | Min bound for numeric inputs |
| --max | 1000000 | Max bound for numeric inputs |
| --paranoid | false | Increase sampling runs for UNSAT (2000 vs 200) |
| --nonBlockingTags | "" | Comma-separated event tags to ignore |
| --out | ./artifacts | Output directory for artifacts |
| --allowSideEffects | false | Allow runtime replay even if side effects detected |
$3
| Option | Default | Description |
|--------|---------|-------------|
| --base | HEAD~1 | Base git ref |
| --head | HEAD | Head git ref |
| --mode | refactor | refactor or bugfix |
---
๐ Commands
$3
Compare two specific functions:
`bash
semprove compare --old old.ts --new new.ts --fn myFunction
`
$3
Analyze all changed functions in a PR:
`bash
semprove pr --base main --head feature-branch --mode refactor
`
Generates:
- artifacts/pr-summary.json
- Per-function artifacts
$3
Approve a known divergence (for refactor mode):
`bash
semprove approve \
--fn calculateDiscount \
--from artifacts/src_pricing_ts/calculateDiscount/counterexample.json \
--reason "Intentional discount adjustment for Q1 promo" \
--approvedBy "john-doe"
`
Stores approval in .semprove/approvals.json
---
๐ Documentation
- PRODUCTION_RISKS.md: 5 real production risks and mitigations
- SUPPORTED_FRAGMENT.md: Exact TypeScript subset supported
- PRODUCTION_READY.md: Implementation summary and next steps
- SPRINT_A_BENCH.md: Benchmark plan (30 test functions)
- SPRINT_B_GITHUB_ACTION.md: GitHub Action integration
- SPRINT_C_DESIGN_PARTNERS.md: Design partner script
---
๐งช Benchmarks
Run the benchmark suite:
`bash
npm run build
node bench/run-bench.ts
`
Generates: bench/RESULTS.md with accuracy metrics
Current benchmarks:
- 10 refactors (should be UNSAT)
- 10 bugfixes (should find COUNTEREXAMPLE)
- 10 Copilot errors (should find COUNTEREXAMPLE)
---
๐๏ธ Architecture
1. Extract (src/extract.ts): Parse TypeScript functions with ts-morph
2. Symbolic (src/symbolic.ts): Lower to Z3 symbolic representation
3. Lower (src/lower.ts): Convert TS AST to Z3 constraints
4. Solve (src/solve.ts): Check equivalence with Z3, validate with sampling
5. Report (src/report.ts): Generate artifacts (JSON, markdown)
---
๐ How It Works
$3
`typescript
// OLD
export function calculateDiscount(price: number, tier: "gold" | "silver" | "none"): number {
if (tier === "gold") return price * 0.9;
return price;
}
// NEW
export function calculateDiscount(price: number, tier: "gold" | "silver" | "none"): number {
if (tier === "gold") return price * 0.85; // BUG: changed discount
return price;
}
`
Semprove output:
`json
{
"status": "COUNTEREXAMPLE_FOUND",
"counterexample": { "price": 100, "tier": "gold" },
"oldRet": "90",
"newRet": "85",
"oldRetRuntime": 90,
"newRetRuntime": 85,
"runtimeDiverged": true
}
`
Verdict: Functions diverge at { price: 100, tier: "gold" } โ
---
๐ Roadmap
$3
Create 30 benchmark functions, validate accuracy > 90%
$3
Build installable GitHub Action with PR comment integration
$3
Work with 3 companies to validate on real codebases
---
๐ค Contributing
Contributions welcome! Focus areas:
- More benchmark cases (pricing, policy, tax logic)
- Lowering improvements (support more TS constructs)
- Performance (faster Z3 queries, caching)
- UX (better error messages, visualization)
---
๐ License
MIT
---
๐ FAQ
$3
A: Not supported yet. Refactor into fixed-arity logic or extract loop-free helpers.
$3
A: No. Semprove targets pure business logic (pricing, policy, calculations).
$3
A: Property testing (fast-check) tests examples. Semprove does SMT-based equivalence checking within the supported fragment and bounds, or finds a concrete counterexample.
$3
A: In our dry run (20 functions, AMD Ryzen 9 5900X, 60s timeout): P95 ~1.3s, median ~200ms. Performance depends on function complexity, bounds, and hardware. Use --timeoutMs to adjust.
$3
A: Yes! Use semprove approve to mark expected changes in refactor mode.
---
Built with:
- Z3 (SMT solver)
- ts-morph (TypeScript AST)
- fast-check (Property testing)
- commander (CLI framework)
---
โ ๏ธ Known Issues
$3
- Z3 runs in WebAssembly with ~2GB heap limit.
- Large functions (>50 variables) or very large bounds may hit this limit.
- Mitigation:
- Reduce search bounds: --min -1000 --max 1000
- Use integer reasoning: --no-realInputs (default)
- Split complex functions into smaller helpers
- Use --timeoutMs 5000 to fail fast instead of hanging
$3
- Use -y flag with npx: npx -y semprove ...
- Long paths may require --% escape in some shells
- Git worktrees work correctly (tested on Windows 11)
$3
- First run may be slower (WASM initialization)
- Performance scales with function complexity and bounds
- Recommendation: Start with smaller bounds (--min, --max) for exploration
---
Questions? Open an issue or reach out!
๐ Security & Trust
- No network required for analysis.
- No telemetry (if ever added, it will be opt-in).
Next: Try semprove pr` on your codebase ๐