Semprove - SMT-Backed Semantic Diff for TypeScript

Semprove is an SMT-backed semantic diff tool for TypeScript business logic.
It can prove equivalence within a restricted TypeScript fragment, configured bounds, and modeling assumptions (e.g., external-call abstraction), or produce a concrete counterexample and replay it at runtime.

✅ Guarantees (read this first)

- UNSAT_IN_MODEL = no counterexample exists within configured bounds + supported fragment + modeling assumptions
- COUNTEREXAMPLE_FOUND = a concrete input where old/new diverge (runtime-replayed)
- TIMEOUT_OR_UNSUPPORTED = solver timed out or code uses unsupported constructs

---

🔍 Supported Fragment (Beta)

Semprove targets a strict subset of TypeScript for business logic.

✅ Supported:
- Primitive types: string, number, boolean
- Arithmetic: +, -, *, /
- Logic: &&, ||, !, comparisons
- Control flow: if/else, return, ternary, variable diffs
- Constraint: Must be a pure function with stable parameter names.

⚠️ Not Supported:
- Loops (for, while)
- Object literals / Arrays
- Async / Promises
- Parameter renaming
- Modulo %

See SUPPORTED_FRAGMENT.md for full details.

---

🎯 What It Does

Given two versions of a function (old and new), semprove:

1. Proves equivalence (UNSAT_IN_MODEL): No counterexample within bounds + fragment + modeling assumptions
2. Finds bugs (COUNTEREXAMPLE_FOUND): Shows exact input where functions diverge
3. Falls back gracefully (TIMEOUT_OR_UNSUPPORTED): If Z3 can't decide or features are unsupported, tries fast-check sampling

Safety note: Runtime replay is disabled by default when external calls are present.

Use cases (within supported fragment + bounds):
- Verifying refactors of pricing rules
- Validating policy/eligibility logic changes
- Catching off-by-one errors in tax calculations
- Proving discount tier adjustments are semantically equivalent

---

📦 Installation

Node LTS: Requires Node.js 20 or 22.

``

bash

npm install -g semprove





Windows users:

powershell

npx -y semprove@beta --help





---



🚀 Quick Start



$3

bash

semprove compare \

  --old src/pricing.old.ts \

  --new src/pricing.ts \

  --fn calculateDiscount





Output:

-

artifacts/result.json

: Verdict (UNSAT/COUNTEREXAMPLE/TIMEOUT)

-

artifacts/counterexample.json

: Exact input where functions diverge (if found)

-

artifacts/trace.json

: Old/new return values and event logs



$3

bash

semprove pr --base main --head feature-branch





Output:

-

artifacts/pr-summary.json

: CTO-friendly metrics (functions analyzed, verdicts, p95 time)

- Per-function artifacts in

artifacts/[file]/[function]/





---



🛡️ Production-minded Safeguards



$3



When Z3 proves equivalence (UNSAT), semprove now validates with fast-check sampling:

bash

semprove compare --old old.ts --new new.ts --fn myFunc --paranoid





- Normal mode: 200 sampling runs

- Paranoid mode (

--paranoid

): 2000 sampling runs

- Confidence field:

"unsat+sampled" vs "unsat-only"





Why? Sampling is a soundness smoke-test against lowering bugs; it cannot guarantee UNSAT is correct.



---



$3



Semprove explicitly refuses constructs that can diverge between Z3 and runtime:



❌

== / != operators (use === / !==

)  

❌ String concatenation with

❌

undefined / null

 keywords  

❌

NaN / Infinity

 identifiers  



Error message:



UNSUPPORTED: == operator (use === instead). 

Reason: Type coercion can diverge between Z3 model and runtime.





---



$3



For event-based logic (function calls), semprove classifies changes:



- Low: Only logging/metrics tags changed

- Med: Guard conditions changed

- High: Order/args changed on business logic



Field in result:

"eventDiffSeverity": "low" | "med" | "high"





---



$3



Default Safety Posture:

- Runtime replay and sampling are disabled by default when external calls are detected.

- This prevents accidental database writes, API calls, or file system mutations in CI.

- To override (at your own risk), pass

--allowSideEffects

.



Float Handling:

- Runtime replay treats near-equal numeric results as equal (relative epsilon) to avoid float-noise false positives.



---



$3



After

semprove pr

, get a CTO-friendly summary:

json

{

  "functionsAnalyzed": 12,

  "functionsSkipped": 3,

  "skippedReasons": { "UNSUPPORTED_FEATURE": 2, "TIMEOUT": 1 },

  "verdictCounts": { "unsat": 8, "found": 3, "approved": 2 },

  "runtimeP95Ms": 4500,

  "topDivergences": [

    {

      "fn": "calculateDiscount",

      "file": "src/pricing.ts",

      "counterexample": { "price": 100, "tier": "gold" },

      "oldRetRuntime": 90,

      "newRetRuntime": 85,

      "eventDiffSeverity": "high",

      "approved": false

    }

  ]

}





---



📏 Supported TypeScript Fragment



Semprove supports a tight, well-defined subset optimized for pure business logic:



✅ Arithmetic:

+, -, *, /

  

✅ Comparisons:

>, <, >=, <=, ===, !==

  

✅ Logical:

&&, ||, !

  

✅ Control flow:

if/else, switch/case

, ternary  

✅ Function calls (as uninterpreted functions)  



❌ Loops (

for, while

)  

❌ Arrays, objects with dynamic keys  

❌ Async/await, promises  

❌ Type coercions (

==, undefined, null, NaN

)  



Non-goal: general TypeScript / app code. Semprove targets deterministic pricing/policy/rules.



See: SUPPORTED_FRAGMENT.md for full details (~20 lines of language)



---



🎯 Presets



$3



For financial calculations (pricing, VAT, discounts):

bash

semprove compare --old old.ts --new new.ts --fn calculatePrice --money





- Inputs: integers (cents)

- Intermediates: rationals (supports

* 0.9, / 1.20

)

- Bounds: min/max to prevent overflow



---



🔧 Options



$3



| Option | Default | Description |

|--------|---------|-------------|

|

--timeoutMs | 60000

 | Z3 solver timeout (ms) |

|

--[no-]ints | true

 | Treat numbers as integers |

|

--realInputs | false

 | Treat numbers as reals (no integer constraint) |

|

--money | false

 | Preset for financial calculations |

|

--min | -1000000

 | Min bound for numeric inputs |

|

--max | 1000000

 | Max bound for numeric inputs |

|

--paranoid | false

 | Increase sampling runs for UNSAT (2000 vs 200) |

|

--nonBlockingTags | ""

 | Comma-separated event tags to ignore |

|

--out | ./artifacts

 | Output directory for artifacts |

|

--allowSideEffects | false

 | Allow runtime replay even if side effects detected |



$3



| Option | Default | Description |

|--------|---------|-------------|

|

--base | HEAD~1

 | Base git ref |

|

--head | HEAD

 | Head git ref |

|

--mode | refactor | refactor or bugfix

 |



---



📊 Commands



$3



Compare two specific functions:

bash

semprove compare --old old.ts --new new.ts --fn myFunction





$3



Analyze all changed functions in a PR:

bash

semprove pr --base main --head feature-branch --mode refactor





Generates:

-

artifacts/pr-summary.json



- Per-function artifacts



$3



Approve a known divergence (for refactor mode):

bash

semprove approve \

  --fn calculateDiscount \

  --from artifacts/src_pricing_ts/calculateDiscount/counterexample.json \

  --reason "Intentional discount adjustment for Q1 promo" \

  --approvedBy "john-doe"





Stores approval in

.semprove/approvals.json





---



📚 Documentation



- PRODUCTION_RISKS.md: 5 real production risks and mitigations

- SUPPORTED_FRAGMENT.md: Exact TypeScript subset supported

- PRODUCTION_READY.md: Implementation summary and next steps

- SPRINT_A_BENCH.md: Benchmark plan (30 test functions)

- SPRINT_B_GITHUB_ACTION.md: GitHub Action integration

- SPRINT_C_DESIGN_PARTNERS.md: Design partner script



---



🧪 Benchmarks



Run the benchmark suite:

bash

npm run build

node bench/run-bench.ts





Generates:

bench/RESULTS.md

 with accuracy metrics



Current benchmarks:

- 10 refactors (should be UNSAT)

- 10 bugfixes (should find COUNTEREXAMPLE)

- 10 Copilot errors (should find COUNTEREXAMPLE)



---



🏗️ Architecture



1. Extract (

src/extract.ts

): Parse TypeScript functions with ts-morph

2. Symbolic (

src/symbolic.ts

): Lower to Z3 symbolic representation

3. Lower (

src/lower.ts

): Convert TS AST to Z3 constraints

4. Solve (

src/solve.ts

): Check equivalence with Z3, validate with sampling

5. Report (

src/report.ts

): Generate artifacts (JSON, markdown)



---



🎓 How It Works



$3

typescript

// OLD

export function calculateDiscount(price: number, tier: "gold" | "silver" | "none"): number {

  if (tier === "gold") return price * 0.9;

  return price;

}



// NEW

export function calculateDiscount(price: number, tier: "gold" | "silver" | "none"): number {

  if (tier === "gold") return price * 0.85;  // BUG: changed discount

  return price;

}





Semprove output:

json

{

  "status": "COUNTEREXAMPLE_FOUND",

  "counterexample": { "price": 100, "tier": "gold" },

  "oldRet": "90",

  "newRet": "85",

  "oldRetRuntime": 90,

  "newRetRuntime": 85,

  "runtimeDiverged": true

}





Verdict: Functions diverge at

{ price: 100, tier: "gold" }

 ✅



---



🚀 Roadmap



$3

Create 30 benchmark functions, validate accuracy > 90%



$3

Build installable GitHub Action with PR comment integration



$3

Work with 3 companies to validate on real codebases



---



🤝 Contributing



Contributions welcome! Focus areas:



- More benchmark cases (pricing, policy, tax logic)

- Lowering improvements (support more TS constructs)

- Performance (faster Z3 queries, caching)

- UX (better error messages, visualization)



---



📄 License



MIT



---



🙋 FAQ



$3

A: Not supported yet. Refactor into fixed-arity logic or extract loop-free helpers.



$3

A: No. Semprove targets pure business logic (pricing, policy, calculations).



$3

A: Property testing (fast-check) tests examples. Semprove does SMT-based equivalence checking within the supported fragment and bounds, or finds a concrete counterexample.



$3

A: In our dry run (20 functions, AMD Ryzen 9 5900X, 60s timeout): P95 ~1.3s, median ~200ms. Performance depends on function complexity, bounds, and hardware. Use

--timeoutMs

 to adjust.



$3

A: Yes! Use

semprove approve

 to mark expected changes in refactor mode.



---



Built with:

- Z3 (SMT solver)

- ts-morph (TypeScript AST)

- fast-check (Property testing)

- commander (CLI framework)



---



⚠️ Known Issues



$3

- Z3 runs in WebAssembly with ~2GB heap limit.

- Large functions (>50 variables) or very large bounds may hit this limit.

- Mitigation:

  - Reduce search bounds:

--min -1000 --max 1000



  - Use integer reasoning:

--no-realInputs

 (default)

  - Split complex functions into smaller helpers

  - Use

--timeoutMs 5000

 to fail fast instead of hanging



$3

- Use

-y flag with npx: npx -y semprove ...



- Long paths may require

--%

 escape in some shells

- Git worktrees work correctly (tested on Windows 11)



$3

- First run may be slower (WASM initialization)

- Performance scales with function complexity and bounds

- Recommendation: Start with smaller bounds (

--min, --max

) for exploration



---



Questions? Open an issue or reach out!



🔐 Security & Trust



- No network required for analysis.

- No telemetry (if ever added, it will be opt-in).



Next: Try

semprove pr` on your codebase 🚀
# Semprove

Semprove - SMT-Backed Semantic Diff for TypeScript

✅ Guarantees (read this first)

🔍 Supported Fragment (Beta)

🎯 What It Does

📦 Installation

Node LTS: Requires Node.js 20 or 22.

``

bash

npm install -g semprove





Windows users:

powershell

npx -y semprove@beta --help





---



🚀 Quick Start



$3

bash

semprove compare \

  --old src/pricing.old.ts \

  --new src/pricing.ts \

  --fn calculateDiscount





Output:

-

artifacts/result.json

: Verdict (UNSAT/COUNTEREXAMPLE/TIMEOUT)

-

artifacts/counterexample.json

: Exact input where functions diverge (if found)

-

artifacts/trace.json

: Old/new return values and event logs



$3

bash

semprove pr --base main --head feature-branch





Output:

-

artifacts/pr-summary.json

: CTO-friendly metrics (functions analyzed, verdicts, p95 time)

- Per-function artifacts in

artifacts/[file]/[function]/





---



🛡️ Production-minded Safeguards



$3



When Z3 proves equivalence (UNSAT), semprove now validates with fast-check sampling:

bash

semprove compare --old old.ts --new new.ts --fn myFunc --paranoid





- Normal mode: 200 sampling runs

- Paranoid mode (

--paranoid

): 2000 sampling runs

- Confidence field:

"unsat+sampled" vs "unsat-only"





Why? Sampling is a soundness smoke-test against lowering bugs; it cannot guarantee UNSAT is correct.



---



$3



Semprove explicitly refuses constructs that can diverge between Z3 and runtime:



❌

== / != operators (use === / !==

)  

❌ String concatenation with

❌

undefined / null

 keywords  

❌

NaN / Infinity

 identifiers  



Error message:



UNSUPPORTED: == operator (use === instead). 

Reason: Type coercion can diverge between Z3 model and runtime.





---



$3



For event-based logic (function calls), semprove classifies changes:



- Low: Only logging/metrics tags changed

- Med: Guard conditions changed

- High: Order/args changed on business logic



Field in result:

"eventDiffSeverity": "low" | "med" | "high"





---



$3



Default Safety Posture:

- Runtime replay and sampling are disabled by default when external calls are detected.

- This prevents accidental database writes, API calls, or file system mutations in CI.

- To override (at your own risk), pass

--allowSideEffects

.



Float Handling:

- Runtime replay treats near-equal numeric results as equal (relative epsilon) to avoid float-noise false positives.



---



$3



After

semprove pr

, get a CTO-friendly summary:

json

{

  "functionsAnalyzed": 12,

  "functionsSkipped": 3,

  "skippedReasons": { "UNSUPPORTED_FEATURE": 2, "TIMEOUT": 1 },

  "verdictCounts": { "unsat": 8, "found": 3, "approved": 2 },

  "runtimeP95Ms": 4500,

  "topDivergences": [

    {

      "fn": "calculateDiscount",

      "file": "src/pricing.ts",

      "counterexample": { "price": 100, "tier": "gold" },

      "oldRetRuntime": 90,

      "newRetRuntime": 85,

      "eventDiffSeverity": "high",

      "approved": false

    }

  ]

}





---



📏 Supported TypeScript Fragment



Semprove supports a tight, well-defined subset optimized for pure business logic:



✅ Arithmetic:

+, -, *, /

  

✅ Comparisons:

>, <, >=, <=, ===, !==

  

✅ Logical:

&&, ||, !

  

✅ Control flow:

if/else, switch/case

, ternary  

✅ Function calls (as uninterpreted functions)  



❌ Loops (

for, while

)  

❌ Arrays, objects with dynamic keys  

❌ Async/await, promises  

❌ Type coercions (

==, undefined, null, NaN

)  



Non-goal: general TypeScript / app code. Semprove targets deterministic pricing/policy/rules.



See: SUPPORTED_FRAGMENT.md for full details (~20 lines of language)



---



🎯 Presets



$3



For financial calculations (pricing, VAT, discounts):

bash

semprove compare --old old.ts --new new.ts --fn calculatePrice --money





- Inputs: integers (cents)

- Intermediates: rationals (supports

* 0.9, / 1.20

)

- Bounds: min/max to prevent overflow



---



🔧 Options



$3



| Option | Default | Description |

|--------|---------|-------------|

|

--timeoutMs | 60000

 | Z3 solver timeout (ms) |

|

--[no-]ints | true

 | Treat numbers as integers |

|

--realInputs | false

 | Treat numbers as reals (no integer constraint) |

|

--money | false

 | Preset for financial calculations |

|

--min | -1000000

 | Min bound for numeric inputs |

|

--max | 1000000

 | Max bound for numeric inputs |

|

--paranoid | false

 | Increase sampling runs for UNSAT (2000 vs 200) |

|

--nonBlockingTags | ""

 | Comma-separated event tags to ignore |

|

--out | ./artifacts

 | Output directory for artifacts |

|

--allowSideEffects | false

 | Allow runtime replay even if side effects detected |



$3



| Option | Default | Description |

|--------|---------|-------------|

|

--base | HEAD~1

 | Base git ref |

|

--head | HEAD

 | Head git ref |

|

--mode | refactor | refactor or bugfix

 |



---



📊 Commands



$3



Compare two specific functions:

bash

semprove compare --old old.ts --new new.ts --fn myFunction





$3



Analyze all changed functions in a PR:

bash

semprove pr --base main --head feature-branch --mode refactor





Generates:

-

artifacts/pr-summary.json



- Per-function artifacts



$3



Approve a known divergence (for refactor mode):

bash

semprove approve \

  --fn calculateDiscount \

  --from artifacts/src_pricing_ts/calculateDiscount/counterexample.json \

  --reason "Intentional discount adjustment for Q1 promo" \

  --approvedBy "john-doe"





Stores approval in

.semprove/approvals.json





---



📚 Documentation



- PRODUCTION_RISKS.md: 5 real production risks and mitigations

- SUPPORTED_FRAGMENT.md: Exact TypeScript subset supported

- PRODUCTION_READY.md: Implementation summary and next steps

- SPRINT_A_BENCH.md: Benchmark plan (30 test functions)

- SPRINT_B_GITHUB_ACTION.md: GitHub Action integration

- SPRINT_C_DESIGN_PARTNERS.md: Design partner script



---



🧪 Benchmarks



Run the benchmark suite:

bash

npm run build

node bench/run-bench.ts





Generates:

bench/RESULTS.md

 with accuracy metrics



Current benchmarks:

- 10 refactors (should be UNSAT)

- 10 bugfixes (should find COUNTEREXAMPLE)

- 10 Copilot errors (should find COUNTEREXAMPLE)



---



🏗️ Architecture



1. Extract (

src/extract.ts

): Parse TypeScript functions with ts-morph

2. Symbolic (

src/symbolic.ts

): Lower to Z3 symbolic representation

3. Lower (

src/lower.ts

): Convert TS AST to Z3 constraints

4. Solve (

src/solve.ts

): Check equivalence with Z3, validate with sampling

5. Report (

src/report.ts

): Generate artifacts (JSON, markdown)



---



🎓 How It Works



$3

typescript

// OLD

export function calculateDiscount(price: number, tier: "gold" | "silver" | "none"): number {

  if (tier === "gold") return price * 0.9;

  return price;

}



// NEW

export function calculateDiscount(price: number, tier: "gold" | "silver" | "none"): number {

  if (tier === "gold") return price * 0.85;  // BUG: changed discount

  return price;

}





Semprove output:

json

{

  "status": "COUNTEREXAMPLE_FOUND",

  "counterexample": { "price": 100, "tier": "gold" },

  "oldRet": "90",

  "newRet": "85",

  "oldRetRuntime": 90,

  "newRetRuntime": 85,

  "runtimeDiverged": true

}





Verdict: Functions diverge at

{ price: 100, tier: "gold" }

 ✅



---



🚀 Roadmap



$3

Create 30 benchmark functions, validate accuracy > 90%



$3

Build installable GitHub Action with PR comment integration



$3

Work with 3 companies to validate on real codebases



---



🤝 Contributing



Contributions welcome! Focus areas:



- More benchmark cases (pricing, policy, tax logic)

- Lowering improvements (support more TS constructs)

- Performance (faster Z3 queries, caching)

- UX (better error messages, visualization)



---



📄 License



MIT



---



🙋 FAQ



$3

A: Not supported yet. Refactor into fixed-arity logic or extract loop-free helpers.



$3

A: No. Semprove targets pure business logic (pricing, policy, calculations).



$3

A: Property testing (fast-check) tests examples. Semprove does SMT-based equivalence checking within the supported fragment and bounds, or finds a concrete counterexample.



$3

A: In our dry run (20 functions, AMD Ryzen 9 5900X, 60s timeout): P95 ~1.3s, median ~200ms. Performance depends on function complexity, bounds, and hardware. Use

--timeoutMs

 to adjust.



$3

A: Yes! Use

semprove approve

 to mark expected changes in refactor mode.



---



Built with:

- Z3 (SMT solver)

- ts-morph (TypeScript AST)

- fast-check (Property testing)

- commander (CLI framework)



---



⚠️ Known Issues



$3

- Z3 runs in WebAssembly with ~2GB heap limit.

- Large functions (>50 variables) or very large bounds may hit this limit.

- Mitigation:

  - Reduce search bounds:

--min -1000 --max 1000



  - Use integer reasoning:

--no-realInputs

 (default)

  - Split complex functions into smaller helpers

  - Use

--timeoutMs 5000

 to fail fast instead of hanging



$3

- Use

-y flag with npx: npx -y semprove ...



- Long paths may require

--%

 escape in some shells

- Git worktrees work correctly (tested on Windows 11)



$3

- First run may be slower (WASM initialization)

- Performance scales with function complexity and bounds

- Recommendation: Start with smaller bounds (

--min, --max

) for exploration



---



Questions? Open an issue or reach out!



🔐 Security & Trust



- No network required for analysis.

- No telemetry (if ever added, it will be opt-in).



Next: Try

semprove pr` on your codebase 🚀
# Semprove