openchem

A fast TypeScript / JavaScript chemistry toolkit for working with molecular structures: parsing & generation (SMILES, MOL, SDF), canonicalization, pattern matching (SMARTS), 2D rendering, molecular descriptors, and structural analysis.

Production-ready, TypeScript-first library for cheminformatics — works in both browser and Node.js. openchem keeps a small runtime footprint.

Features

$3

- SMILES — Parse and generate canonical SMILES with full stereochemistry
- MOL files — V2000/V3000 format support with 2D coordinate generation
- SDF files — Multi-molecule files with property data
- InChI — Generate InChI and InChIKey identifiers
- IUPAC names — Bidirectional IUPAC ↔ SMILES conversion

$3

- Pattern matching — SMARTS substructure search
- Fingerprints — Morgan (ECFP) fingerprints with Tanimoto similarity
- Murcko scaffolds — Extract core scaffolds, generic frameworks, scaffold trees
- Tautomers — Complete enumeration (25 rules, 100% RDKit coverage) with RDKit-compatible scoring
- Ring systems — SSSR detection, fused/spiro/bridged classification
- Aromaticity — Hückel rule perception and kekulization
- Symmetry — Canonical ordering via modified Morgan algorithm
- Stereochemistry — Full support for tetrahedral centers, E/Z bonds, extended chirality

$3

- Basic — Formula, mass, atom/bond counts
- Structural — Valence electrons, amide bonds, spiro/bridgehead atoms, ring classifications
- Stereochemistry — Specified and unspecified stereocenter counting
- Drug-likeness — Lipinski's Rule of Five, Veber rules, BBB penetration
- Descriptors — TPSA, LogP, rotatable bonds, H-bond donors/acceptors
- Ring analysis — Saturated/aliphatic/heterocyclic ring counts

$3

- 2D rendering — Publication-quality SVG with automatic layout
- Smart positioning — Overlap-aware fused ring placement
- Stereochemistry display — Wedge/hash bonds for chirality
- Customizable — Element colors, bond styles, canvas size

$3

- ⚡ Fast — Optimized coordinate generation, CSR graph for O(1) lookups
- 🔬 Accurate — 100% RDKit agreement on canonical SMILES (325/325 molecules)
- ✅ Well-tested — 2,093 passing tests including bulk RDKit comparisons
- 🎯 Production-ready — Used with real drugs, natural products, edge cases
- 📦 Lightweight — Minimal dependencies, works in browser and Node.js
- 🔒 TypeScript-first — Full type safety with excellent IDE support

Quick Start

``bash npm install openchem

`or: bun add openchem`

`typescript import { parseSMILES, renderSVG, Descriptors } from "openchem";

// Parse a molecule const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O").molecules[0];

// Render as SVG const svg = renderSVG(aspirin); console.log(svg.svg); // SVG markup ready for display

// Get all molecular properties at once const props = Descriptors.all(aspirin); console.log(props.formula); // "C9H8O4" console.log(props.mass); // 180.16 console.log(props.logP); // 1.19 console.log(props.lipinskiPass); // true - aspirin is drug-like!

// Or get specific categories const drugLike = Descriptors.drugLikeness(aspirin); console.log(drugLike.lipinski.passes); // true console.log(drugLike.lipinski.violations); // []`

`HTML Playground`

openchem includes an interactive HTML playground for testing SMILES parsing, molecular visualization, and descriptor calculation:

`bash

`Build the browser bundle and start a local server`


bun run serve
Then open http://localhost:3000/smiles-playground.html in your browser


The playground provides:
- 2D Structure Visualization — Clean SVG rendering of molecular structures
- Molecular Descriptors — Formula, mass, TPSA, rotatable bonds, etc.
- Drug-Likeness Checks — Lipinski's Rule of Five, Veber rules, BBB penetration
- Interactive Examples — Pre-loaded molecules like aspirin, caffeine, ibuprofen
The playground automatically detects if the full openchem library is available and falls back to approximate calculations if needed.

Note: The HTML playground requires a web server to load the openchem library due to ES module security restrictions. Use bun run serve to start a local server, then open http://localhost:3000/smiles-playground.html in your browser.

`Model Context Protocol (MCP) Server`

The MCP server for AI assistant integration is now available as a separate package: @openchem/mcp

`$3`

`bash

`Install MCP server`


npm install -g @openchem/mcp
Start server

openchem-mcp
Server runs on http://localhost:3000

$3

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

`json { "mcpServers": { "openchem": { "url": "http://localhost:3000/mcp" } } }`

Restart Claude Desktop and try: _"Analyze aspirin using SMILES CC(=O)Oc1ccccc1C(=O)O"_

`$3`

- analyze — Complete molecular analysis (40+ descriptors, drug-likeness, IUPAC name, optional rendering) - compare — Molecular similarity (Morgan fingerprints, Tanimoto similarity, property comparison) - search — Substructure matching (SMARTS patterns with match counts and indices) - render — 2D structure visualization (publication-quality SVG) - convert — Format conversion (canonical SMILES, IUPAC names, Murcko scaffolds)

`$3`

- @openchem/mcp Package — Full MCP server documentation - MCP Integration Guide — Complete integration guide (Claude Desktop, custom clients, deployment) - MCP Server Reference — API documentation, tool schemas, examples

`Code Examples`

`typescript import { parseSMILES, generateSMILES, parseMolfile, generateMolfile, parseSDF, writeSDF, } from "openchem";

// Parse SMILES into molecule structure const result = parseSMILES("CC(=O)O"); // acetic acid console.log(result.molecules[0].atoms.length); // 4 atoms console.log(result.molecules[0].bonds.length); // 3 bonds

// Generate canonical SMILES const canonical = generateSMILES(result.molecules[0]); console.log(canonical); // "CC(=O)O"

// Parse MOL file const molContent =
acetic acid
openchem

4 3 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.2500 1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
2.2500 -1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 2 0 0 0 0
2 4 1 0 0 0 0
M END
; const molResult = parseMolfile(molContent); console.log(generateSMILES(molResult.molecule!)); // "CC(=O)O"

// Generate MOL file from SMILES const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O"); const molfile = generateMolfile(aspirin.molecules[0], { title: "aspirin" }); console.log(molfile); // Full MOL file with coordinates

// Parse SDF file const sdfContent =
Mrv2311 02102409422D

3 2 0 0 0 0 999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.2500 1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
M END
>
MOL001

>
Ethanol

$$$$
; const sdfResult = parseSDF(sdfContent); console.log(sdfResult.records[0].molecule?.atoms.length); // 3 console.log(sdfResult.records[0].properties.NAME); // "Ethanol"`

// Generate InChI from molecule const inchi = await generateInChI(aspirin.molecules[0]); console.log(inchi); // "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"

// Generate InChIKey const inchikey = await generateInChIKey(inchi); console.log(inchikey); // "BSYNRYMUTXBXSQ-UHFFFAOYSA-N"

`$3`

`typescript import { parseSMILES, computeMorganFingerprint, tanimotoSimilarity } from 'openchem';

// Generate fingerprints for similarity comparison const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O'); const ibuprofen = parseSMILES('CC(C)Cc1ccc(cc1)C(C)C(=O)O');

const fp1 = computeMorganFingerprint(aspirin.molecules[0], 2, 512); const fp2 = computeMorganFingerprint(ibuprofen.molecules[0], 2, 512);

// Calculate structural similarity const similarity = tanimotoSimilarity(fp1, fp2); console.log(Similarity: ${(similarity * 100).toFixed(1)}%); // ~45.2%``

`$3`

Extract core molecular scaffolds for drug discovery and compound classification:

`typescript import { parseSMILES, getMurckoScaffold, getBemisMurckoFramework, generateSMILES } from "openchem";

// Extract scaffold (rings + linkers, remove side chains) const ibuprofen = parseSMILES("CC(C)Cc1ccc(cc1)C(C)C(=O)O").molecules[0]; const scaffold = getMurckoScaffold(ibuprofen); console.log(generateSMILES(scaffold)); // "c1ccc(cc1)" - benzene core

// Get generic framework (all atoms → carbon, all bonds → single) const framework = getBemisMurckoFramework(ibuprofen); console.log(generateSMILES(framework)); // "C1CCCCC1" - cyclohexane

// Compare scaffolds of similar drugs import { haveSameScaffold } from "openchem"; const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O").molecules[0]; console.log(haveSameScaffold(ibuprofen, aspirin)); // true - both have benzene scaffold`

Applications:

- Compound library classification - Lead series identification - Scaffold hopping strategies - Fragment-based drug design

`$3`

Enumerate and score tautomers (keto-enol, imine-enamine, amide-imidol, etc.) with RDKit-compatible scoring:

`typescript import { parseSMILES, enumerateTautomers, generateSMILES } from "openchem";

// Enumerate tautomers for acetylacetone (pentane-2,4-dione) const mol = parseSMILES("CC(=O)CC(=O)C").molecules[0]; const tautomers = enumerateTautomers(mol, { maxTautomers: 16 });

console.log(Found ${tautomers.length} tautomers:); tautomers.forEach((t, i) => { console.log(${i + 1}. ${t.smiles} (score: ${t.score})); });

// Get canonical tautomer (highest scoring) import { canonicalTautomer } from "openchem"; const canonical = canonicalTautomer(mol); console.log(Canonical: ${generateSMILES(canonical)});`

Supported tautomer types (26 rules, 100% RDKit coverage):

- 1,3 and 1,5 keto-enol (carbonyl ↔ enol, conjugated systems) - Imine-enamine (C=N ↔ C-NH, including aromatic special cases) - 1,5/1,7/1,9/1,11 aromatic heteroatom H shift (pyrrole, indole, large heterocycles) - Furanone (lactone tautomerism in 5-membered rings) - Amide-imidol (N-C=O ↔ N=C-OH) - Lactam-lactim (cyclic amide ↔ cyclic imidate) - Nitro-aci-nitro, nitroso-oxime, oxim/nitroso via phenol - Thione-thiol (C=S ↔ C-SH) - Guanidine, tetrazole, imidazole (heterocycle tautomerism) - Phosphonic acid, sulfoxide (P/S heteroatom shifts) - Edge cases: keten/ynol, cyano/isocyanic acid, formamidinesulfinic acid, isocyanide

Scoring system (RDKit-compatible):

- +250 per all-carbon aromatic ring (benzene) - +100 per heteroaromatic ring (pyridine) - +25 for benzoquinone patterns - +4 for oximes (C=N-OH) - +2 for carbonyls (C=O, N=O, P=O) - -10 per formal charge - -4 for aci-nitro forms - -1 per hydrogen on P, S, Se, Te

Applications:

- Compound standardization for databases - Virtual screening preparation - pKa prediction support - Tautomer-aware structure searching

`$3`

`typescript import { parseSMILES, renderSVG } from "openchem";

// Render molecule as SVG const caffeine = parseSMILES("CN1C=NC2=C1C(=O)N(C(=O)N2C)C"); const svgResult = renderSVG(caffeine.molecules[0], { width: 300, height: 200, showCarbonLabels: false, bondLength: 30, });

console.log(svgResult.svg); // Complete SVG markup console.log(Canvas: ${svgResult.width}x${svgResult.height}); // "300x200"`

`Testing & RDKit comparison`

openchem has an extensive test suite (unit, integration, and RDKit comparison tests) that exercises parsing, generation, file round-trips, stereochemistry, aromatic perception, and molecular properties. Rather than rely on fragile hard-coded counts in the README, the project keeps comprehensive automated tests in the test/ folder and runs RDKit parity checks as part of the comparison test suite when RDKit is available.

Highlights:

- Broad unit and integration coverage across parsers, generators, utils, and validators - RDKit comparison tests for canonical SMILES and round-trip fidelity (these run when RDKit is available in the test environment) - Tests are designed to be self-contained and to skip RDKit-specific checks when RDKit isn't present in the environment

For maintainers: update and run the test suite with bun test. Use RUN_RDKIT_BULK=1 to enable the heavier RDKit bulk comparisons when you have RDKit available.

`Validation`

openchem maintains broad automated test coverage across unit, integration, and RDKit comparison tests. The test/ directory contains the authoritative suite; maintainers can run bun test locally and enable the heavier RDKit comparison runs with RUN_RDKIT_BULK=1 when RDKit is available. Tests are designed to validate parsing, generation, round-tripping, stereochemistry, aromatic perception, and molecular properties without requiring hard-coded counts in the README.

`Installation`

`bash npm install openchem

bun add openchem

pnpm add openchem`

`Usage`

`$3`

For comprehensive working examples, see:

- docs/examples/comprehensive-example.ts— All major features (SMILES, properties, IUPAC, InChI, SVG, SMARTS, fingerprints) -docs/examples/example-iupac.ts— IUPAC name generation and parsing (both directions) -docs/examples/example-aromaticity.ts— Aromaticity perception using Hückel's rule -docs/examples/example-drug-likeness.ts— Drug-likeness assessment (Lipinski, Veber, BBB) -docs/examples/example-murcko-scaffolds.ts— Murcko scaffold extraction and analysis -docs/examples/example-tautomers.ts— Tautomer enumeration and canonical selection -docs/examples/example-sdf-export.ts — SDF file generation

Run any example:

`bash bun run docs/examples/comprehensive-example.ts`

`$3`

The repository contains two long-running RDKit comparison tests (the 10k SMILES suite and the bulk 300-SMILES suite). These tests are skipped by default to keep regular test runs fast.

To run them set the RUN_RDKIT_BULK environment variable:

`bash

`Run heavy RDKit comparisons (rdkit-10k and rdkit-bulk)`


RUN_RDKIT_BULK=1 bun test

Add RUN_VERBOSE=1 for more detailed RDKit reporting during the run.

`typescript import { parseSMILES } from "openchem";

// Simple molecule const ethanol = parseSMILES("CCO"); console.log(ethanol.molecules[0].atoms.length); // 3

// Check for errors const result = parseSMILES("invalid"); if (result.errors.length > 0) { console.error("Parse errors:", result.errors); }

// Complex molecule with stereochemistry const lAlanine = parseSMILES("CC@HC(=O)O"); const chiralCenter = lAlanine.molecules[0].atoms.find((a) => a.chiral); console.log(chiralCenter?.chiral); // '@'`

`$3`

openchem provides comprehensive molecular property calculations for drug discovery and cheminformatics applications.

#### Basic Properties

`typescript import { parseSMILES, getMolecularFormula, getMolecularMass, getExactMass } from "openchem";

const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O"); const mol = aspirin.molecules[0];

// Get molecular formula (Hill notation) const formula = getMolecularFormula(mol); console.log(formula); // "C9H8O4"

// Get molecular mass (average atomic masses) const mass = getMolecularMass(mol); console.log(mass); // 180.042

// Get exact mass (most abundant isotope) const exactMass = getExactMass(mol); console.log(exactMass); // 180.042`

#### Atom Counts and Structure

`typescript import { parseSMILES, getHeavyAtomCount, getHeteroAtomCount, getRingCount, getAromaticRingCount, getRingInfo, } from "openchem";

const ibuprofen = parseSMILES("CC(C)Cc1ccc(cc1)C(C)C(=O)O"); const mol = ibuprofen.molecules[0];

// Count heavy atoms (non-hydrogen) console.log(getHeavyAtomCount(mol)); // 13

// Count heteroatoms (N, O, S, P, halogens, etc.) console.log(getHeteroAtomCount(mol)); // 2

// Count total rings console.log(getRingCount(mol)); // 1

// Count aromatic rings console.log(getAromaticRingCount(mol)); // 1

// Get comprehensive ring information const ringInfo = getRingInfo(mol); console.log(ringInfo.numRings()); // 1 console.log(ringInfo.rings()); // [[6,7,8,9,10,11]] - atom IDs in the ring`

#### Drug-Likeness Properties

`typescript import { parseSMILES, getFractionCSP3, getHBondDonorCount, getHBondAcceptorCount, getTPSA, } from "openchem";

const caffeine = parseSMILES("CN1C=NC2=C1C(=O)N(C(=O)N2C)C"); const mol = caffeine.molecules[0];

// Fraction of sp3 carbons (structural complexity) console.log(getFractionCSP3(mol)); // 0.25

// H-bond donors (N-H, O-H) console.log(getHBondDonorCount(mol)); // 0

// H-bond acceptors (N, O atoms) console.log(getHBondAcceptorCount(mol)); // 6

// Topological polar surface area (Ų) // Critical for predicting oral bioavailability and BBB penetration console.log(getTPSA(mol)); // 61.82`

#### TPSA for Drug Design

TPSA (Topological Polar Surface Area) is essential for predicting drug properties:

`typescript import { parseSMILES, getTPSA } from "openchem";

// Oral bioavailability: TPSA < 140 Ų const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O"); console.log(getTPSA(aspirin.molecules[0])); // 63.60 ✓ Good oral availability

// Blood-brain barrier penetration: TPSA < 90 Ų const morphine = parseSMILES("CN1CC[C@]23[C@@H]4[C@H]1CC5=C2C(=C(C=C5)O)O[C@H]3C@HO"); console.log(getTPSA(morphine.molecules[0])); // 52.93 ✓ CNS-active`

#### Drug-Likeness Rule Checkers

`typescript import { parseSMILES, checkLipinskiRuleOfFive, checkVeberRules, checkBBBPenetration, } from "openchem";

// Lipinski's Rule of Five (oral drug-likeness) const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O"); const lipinski = checkLipinskiRuleOfFive(aspirin.molecules[0]); console.log(lipinski.passes); // true console.log(lipinski.properties); // { molecularWeight: 180.04, hbondDonors: 1, hbondAcceptors: 4, logP: 1.31 }

// Veber Rules (oral bioavailability) const veber = checkVeberRules(aspirin.molecules[0]); console.log(veber.passes); // true console.log(veber.properties); // { rotatableBonds: 3, tpsa: 63.60 }

// Blood-brain barrier penetration prediction const caffeine = parseSMILES("CN1C=NC2=C1C(=O)N(C(=O)N2C)C"); const bbb = checkBBBPenetration(caffeine.molecules[0]); console.log(bbb.likelyPenetration); // true (TPSA: 61.82 < 90)`

`$3`

`typescript import { parseSMILES, generateSMILES } from "openchem";

// Generate canonical SMILES (default) const input = "CC(C)CC"; const parsed = parseSMILES(input); const canonical = generateSMILES(parsed.molecules[0]); console.log(canonical); // "CCC(C)C" - canonicalized

// Stereo normalization matches RDKit const trans1 = parseSMILES("C\\C=C\\C"); // trans (down markers) console.log(generateSMILES(trans1.molecules[0])); // "C/C=C/C" - normalized to up markers

const trans2 = parseSMILES("C/C=C/C"); // trans (up markers) console.log(generateSMILES(trans2.molecules[0])); // "C/C=C/C" - already normalized

// Generate simple (non-canonical) SMILES const simple = generateSMILES(parsed.molecules[0], false); console.log(simple); // "CC(C)CC" - preserves input order

// Explicit canonical generation const explicitCanonical = generateSMILES(parsed.molecules[0], true); console.log(explicitCanonical); // "CCC(C)C"

// Handle multiple disconnected molecules const mixture = parseSMILES("CCO.O"); // ethanol + water const output = generateSMILES(mixture.molecules); console.log(output); // "CCO.O"`

`$3`

Render molecules as 2D SVG structures with automatic coordinate generation. openchem provides deterministic layouts, fast performance, and excellent handling of rings, branches, and terminal atoms.

#### Basic SVG Rendering

`typescript import { parseSMILES, renderSVG } from "openchem";

// Render from parsed molecule const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O"); const result = renderSVG(aspirin.molecules[0]); console.log(result.svg); // SVG string ready for display console.log(result.width); // Canvas width console.log(result.height); // Canvas height

// Or render directly from SMILES (if parsing is included) const renderResult = renderSVG("CCO"); if (renderResult.errors.length === 0) { console.log(renderResult.svg); }

// Render multiple molecules in a grid const molecules = [ parseSMILES("CC(=O)O").molecules[0], parseSMILES("CCO").molecules[0], parseSMILES("CC(C)C").molecules[0], ]; const gridResult = renderSVG(molecules); console.log(gridResult.svg); // Multi-molecule grid`

#### SVG Rendering Options

`typescript import { parseSMILES, renderSVG } from "openchem"; import type { SVGRendererOptions } from "openchem";

const benzene = parseSMILES("c1ccccc1"); const mol = benzene.molecules[0];

const options: SVGRendererOptions = { // Canvas sizing width: 400, height: 400, padding: 20,

// Bond styling bondLineWidth: 2, bondLength: 40, bondColor: "#000000",

// Atom & text styling fontSize: 14, fontFamily: "Arial, sans-serif", showCarbonLabels: false, // Hide C labels for cleaner appearance showImplicitHydrogens: false, // Hide implicit hydrogens

// Color mapping by element atomColors: { C: "#222222", N: "#3050F8", O: "#FF0D0D", S: "#E6C200", F: "#50FF50", Cl: "#1FF01F", Br: "#A62929", I: "#940094", },

// Background backgroundColor: "#FFFFFF",

// Stereochemistry display showStereoBonds: true,

// Layout & coordinate generation kekulize: true, // Convert aromatic to alternating single/double bonds (default: true) moleculeSpacing: 60, // Spacing between molecules in grid layouts };

const result = renderSVG(mol, options); console.log(result.svg); // Custom-styled SVG`

#### Using Pre-computed Coordinates

`typescript import { parseSMILES, renderSVG } from "openchem";

const ethanol = parseSMILES("CCO"); const mol = ethanol.molecules[0];

// Provide your own atom coordinates (useful for custom layouts) const customCoords = [ { x: 0, y: 0 }, // C { x: 40, y: 0 }, // C { x: 80, y: 0 }, // O ];

const result = renderSVG(mol, { atomCoordinates: customCoords, width: 200, height: 100, });

console.log(result.svg);`

#### Coordinate Generation Features

openchem's coordinate generator provides:

- Deterministic layouts — Same molecule always produces same coordinates - Fast performance — Optimized for speed and quality - Perfect terminal atom placement — OH, NH₂, and other terminal groups extend radially - Ring system detection — Automatically detects and regularizes 5/6-membered rings, fused rings, spiro, and bridged systems - Zero atom overlaps — Intelligent substituent placement prevents collisions - Publication-quality output — Clean, chemically accurate 2D structures

`typescript import { parseSMILES, renderSVG } from "openchem";

// Complex fused ring system const naphthalene = parseSMILES("c1ccc2ccccc2c1"); const result = renderSVG(naphthalene.molecules[0], { width: 300, height: 300, bondLength: 35, });

console.log(result.svg);`

#### Error Handling

`typescript import { renderSVG } from "openchem";

const result = renderSVG("C"); if (result.errors.length > 0) { console.error("SVG rendering errors:", result.errors); } else { console.log(result.svg); }`

`$3`

Match molecular patterns using SMARTS (SMILES Arbitrary Target Specification) notation.

`typescript import { parseSMILES, parseSMARTS, matchSMARTS } from "openchem";

// Parse molecule and SMARTS pattern const molecule = parseSMILES("CC(=O)Oc1ccccc1C(=O)O"); // aspirin const pattern = parseSMARTS("[O;D1]"); // Single-bonded oxygen (carbonyl)

// Find matching atoms const matches = matchSMARTS(molecule.molecules[0], pattern); console.log(matches.length); // 2 (two carbonyl oxygens) console.log(matches); // [[2], [7]] (atom indices)

// Example: Find aromatic rings const aromaticRing = parseSMARTS("c1ccccc1"); // benzene pattern const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O"); const ringMatches = matchSMARTS(aspirin.molecules[0], aromaticRing); console.log(ringMatches.length); // 1 (one benzene ring)

// Example: Find carboxylic acid groups const carboxylPattern = parseSMARTS("C[O;H1]"); // COOH const matches2 = matchSMARTS(aspirin.molecules[0], carboxylPattern); console.log(matches2.length); // 1 (one carboxylic acid)

// Example: Find all heteroatoms const heteroPattern = parseSMARTS("[!C;!H]"); // Any non-carbon, non-hydrogen const heteroMatches = matchSMARTS(aspirin.molecules[0], heteroPattern); console.log(heteroMatches.length); // Number of heteroatoms`

`$3`

Convert aromatic molecules to alternating single/double bond representations (Kekulé structures).

`typescript import { parseSMILES, kekulize, generateSMILES } from "openchem";

// Parse aromatic molecule const benzene = parseSMILES("c1ccccc1"); const mol = benzene.molecules[0];

// Convert to Kekulé structure const kekuleMol = kekulize(mol);

// Generate SMILES from Kekulé form const kekuleSMILES = generateSMILES(kekuleMol); console.log(kekuleSMILES); // "C1=CC=CC=C1" or similar alternating structure

// SVG rendering automatically kekulizes (unless disabled) import { renderSVG } from "openchem";

const result = renderSVG(mol, { kekulize: true, // default: true }); // Rendered SVG shows alternating single/double bonds`

`$3`

Calculate LogP (partition coefficient) for predicting lipophilicity and membrane permeability.

`typescript import { parseSMILES, computeLogP, crippenLogP } from "openchem";

const molecules = [ "CC(=O)Oc1ccccc1C(=O)O", // aspirin "CC(C)Cc1ccc(cc1)C(C)C(=O)O", // ibuprofen "CC(=O)Nc1ccc(O)cc1", // acetaminophen ];

molecules.forEach((smiles) => { const mol = parseSMILES(smiles).molecules[0];

// Wildman-Crippen method (more accurate) const logP = computeLogP(mol); console.log(${smiles.substring(0, 10)}... LogP: ${logP.toFixed(2)});

// Alternative: crippenLogP (alias) const logP2 = crippenLogP(mol); console.log( Crippen LogP: ${logP2.toFixed(2)}); });

// LogP guidelines for drug design const caffeine = parseSMILES("CN1C=NC2=C1C(=O)N(C(=O)N2C)C"); const caffeineMol = caffeine.molecules[0]; const logpValue = computeLogP(caffeineMol);

console.log(Caffeine LogP: ${logpValue.toFixed(2)}); if (logpValue > 5) { console.log("⚠️ High LogP - may have poor water solubility"); } else if (logpValue < 0) { console.log("✓ Good LogP - hydrophilic, good bioavailability"); } else { console.log("✓ Optimal LogP - good balance of lipophilicity and hydrophilicity"); }`

`$3`

`typescript import { parseSMILES } from "openchem"; import { BondType } from "openchem";

const result = parseSMILES("C=C"); const mol = result.molecules[0];

// Access atoms mol.atoms.forEach((atom) => { console.log(${atom.symbol} (id: ${atom.id})); console.log( Aromatic: ${atom.aromatic}); console.log( Charge: ${atom.charge}); console.log( Hydrogens: ${atom.hydrogens}); });

// Access bonds mol.bonds.forEach((bond) => { console.log(Bond ${bond.atom1}-${bond.atom2}); console.log( Type: ${bond.type === BondType.DOUBLE ? "DOUBLE" : "SINGLE"}); });`

`Running Tests`

`bash

`Run all tests (includes RDKit comparisons)`


bun test
Run with Node.js

npm test
Run specific test file

bun test test/parser.test.ts

Note: RDKit comparison tests require @rdkit/rdkit package. Tests will automatically skip RDKit validations if the package is unavailable. For full validation, ensure you're running tests with Node.js (RDKit's WebAssembly may not work in all Bun versions).

`API Reference`

`$3`

openchem provides 38 functions organized into 8 categories:

Parsing & Generation (8)

- parseSMILES- Parse SMILES strings -generateSMILES- Generate canonical/non-canonical SMILES -parseMolfile- Parse MOL files (V2000/V3000) -generateMolfile- Generate MOL files (V2000) -parseSDF- Parse SDF files with properties -writeSDF- Write SDF files with properties -generateInChI- Generate InChI strings from molecules -generateInChIKey - Generate InChIKey strings from molecules

Pattern Matching & Rendering (6)

- renderSVG- Render molecules as 2D SVG structures -parseSMARTS- Parse SMARTS pattern strings -matchSMARTS- Find SMARTS pattern matches in molecules -kekulize- Convert aromatic to Kekulé structures -computeMorganFingerprint- Generate Morgan fingerprints from molecules -tanimotoSimilarity - Calculate Tanimoto similarity between fingerprints

Scaffold Analysis (5)

- getMurckoScaffold- Extract Murcko scaffold (rings + linkers) -getBemisMurckoFramework- Generic scaffold (all C, single bonds) -getScaffoldTree- Hierarchical scaffold decomposition -getGraphFramework- Pure topology (all atoms → wildcard) -haveSameScaffold - Compare two molecules' scaffolds

Tautomer Analysis (2)

- enumerateTautomers- Generate all tautomers with RDKit scoring -canonicalTautomer - Select highest-scoring canonical tautomer

Basic Properties (3)

- getMolecularFormula- Hill notation formula -getMolecularMass- Average molecular mass -getExactMass - Exact mass (monoisotopic)

Lipophilicity (3)

- computeLogP- Wildman-Crippen partition coefficient -crippenLogP- Alias for computeLogP -logP - Alternative LogP calculation

Structural Properties (8)

- getHeavyAtomCount- Non-hydrogen atom count -getHeteroAtomCount- Heteroatom count (N, O, S, etc.) -getRingCount- Total ring count -getAromaticRingCount- Aromatic ring count -getRingInfo- Comprehensive ring information object -getFractionCSP3- sp³ carbon fraction -getHBondDonorCount- H-bond donor count -getHBondAcceptorCount - H-bond acceptor count

Drug-Likeness (5)

- getTPSA- Topological polar surface area -getRotatableBondCount- Rotatable bond count -checkLipinskiRuleOfFive- Lipinski's Rule of Five -checkVeberRules- Veber rules for bioavailability -checkBBBPenetration - Blood-brain barrier prediction

---

`$3`

#### Parsing & Generation (6 functions)

##### parseSMILES(smiles: string): ParseResult

Parses a SMILES string into molecule structures.

Returns: ParseResult containing:

- molecules: Molecule[]— Array of parsed molecules -errors: string[] — Parse/validation errors (empty if successful)

##### generateSMILES(input: Molecule | Molecule[], canonical?: boolean): string

Generates SMILES from molecule structure(s).

Parameters:

- input— Single molecule or array of molecules -canonical — Generate canonical SMILES (default: true)

Returns: SMILES string (uses . to separate disconnected molecules)

Canonical SMILES features:

- RDKit-compatible atom ordering using modified Morgan algorithm - Automatic E/Z double bond stereo normalization - Deterministic output for identical molecules - Preserves tetrahedral and double bond stereochemistry

##### generateMolfile(molecule: Molecule, options?: MolGeneratorOptions): string

Generates a MOL file (V2000 format) from a molecule structure. Matches RDKit's output structure for compatibility with cheminformatics tools.

Parameters:

- molecule— Molecule structure to convert -options— Optional configuration: -title?: string— Molecule title (default: empty) -programName?: string— Program name in header (default: "openchem") -dimensionality?: '2D' | '3D'— Coordinate system (default: "2D") -comment?: string — Comment line (default: empty)

Returns: MOL file content as string with V2000 format

Features:

- V2000 MOL format compatible with RDKit and other tools - 2D coordinate generation using circular layout - Proper atom/bond type mapping (aromatic, charged, isotopic) - Stereochemistry support (chiral centers, E/Z double bonds) - Fixed-width formatting matching RDKit output

Example:

`typescript import { parseSMILES, generateMolfile } from "openchem";

const result = parseSMILES("CCO"); const molfile = generateMolfile(result.molecules[0]); console.log(molfile); // Output: MOL file with header, atom coordinates, bond connectivity, etc.`

##### parseMolfile(input: string): MolfileParseResult

Parses a MOL file (MDL Molfile format) into a molecule structure. Supports both V2000 and V3000 formats with comprehensive validation.

Parameters:

- input — MOL file content as a string

Returns: MolfileParseResult containing:

- molfile: MolfileData | null— Raw MOL file data structure (or null on critical errors) -molecule: Molecule | null— Parsed molecule with enriched properties (or null on errors) -errors: ParseError[] — Array of parse/validation errors (empty if successful)

Supported formats:

- V2000: Classic fixed-width format (most common) - V3000: Extended format with additional features

Validation features:

- Validates atom/bond counts match declared values - Checks bond references point to valid atoms - Validates numeric fields (coordinates, counts, bond types) - Detects malformed data (NaN, negative counts, invalid types) - Returns errors without throwing exceptions

Parsed features:

- Atom coordinates (2D/3D) - Element symbols (organic and periodic table) - Charges (both atom block and M CHG property) - Isotopes (both mass diff and M ISO property) - Bond types (single, double, triple, aromatic) - Stereochemistry (bond wedges, chiral centers) - Atom mapping (reaction mapping)

Limitations:

- SGroups are parsed but not converted to molecule structure - Query atoms/bonds not supported

Example:

`typescript import { parseMolfile, generateSMILES } from "openchem";

const molContent =
ethanol
openchem

3 2 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.2500 1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
M END
;

const result = parseMolfile(molContent); if (result.errors.length === 0) { console.log(result.molecule?.atoms.length); // 3 console.log(result.molecule?.bonds.length); // 2

// Convert to SMILES const smiles = generateSMILES(result.molecule!); console.log(smiles); // "CCO" }

// Error handling const invalid = parseMolfile("invalid content"); if (invalid.errors.length > 0) { console.error("Parse errors:", invalid.errors); }`

Round-trip workflow:

`typescript import { parseSMILES, generateMolfile, parseMolfile, generateSMILES } from "openchem";

// SMILES → MOL → SMILES round-trip const original = "CC(=O)O"; // acetic acid const mol = parseSMILES(original).molecules[0]; const molfile = generateMolfile(mol); const parsed = parseMolfile(molfile); const roundtrip = generateSMILES(parsed.molecule!); console.log(roundtrip); // "CC(=O)O"`

##### parseSDF(input: string): SDFParseResult

Parses an SDF (Structure-Data File) into molecule structures with associated properties. SDF files can contain multiple molecules, each with a MOL block and optional property fields.

Parameters:

- input — SDF file content as a string

Returns: SDFParseResult containing:

- records: SDFRecord[]— Array of parsed records -errors: ParseError[] — Global parse errors (empty if successful)

Record structure (SDFRecord):

- molecule: Molecule | null— Parsed molecule (null on parse errors) -molfile: MolfileData | null— Raw MOL file data (null on parse errors) -properties: Record— Property name-value pairs -errors: ParseError[] — Record-specific errors (empty if successful)

Features:

- Multi-record parsing (splits on $$$$delimiter) - Property block parsing (> format) - Multi-line property values with blank line handling - Empty property names and values - Windows (CRLF) and Unix (LF) line endings - Tolerant parsing: continues after invalid records

Example (single record):

`typescript import { parseSDF, generateSMILES } from "openchem";

const sdfContent =
Mrv2311 02102409422D

>
Ethanol

>
C2H6O

$$$$
;

const result = parseSDF(sdfContent); if (result.errors.length === 0) { const record = result.records[0]; console.log(record.molecule?.atoms.length); // 3 console.log(record.properties.ID); // "MOL001" console.log(record.properties.NAME); // "Ethanol" console.log(record.properties.FORMULA); // "C2H6O"

// Convert to SMILES const smiles = generateSMILES(record.molecule!); console.log(smiles); // "CCO" }

// Error handling if (result.records[0].errors.length > 0) { console.error("Record errors:", result.records[0].errors); }`

Example (multiple records):

`typescript import { parseSDF } from "openchem";

const multiRecordSDF =
Mrv2311 02102409422D

1 0 0 0 0 0 999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
M END
>
1

>
Methane

$$$$

Mrv2311 02102409422D

2 1 0 0 0 0 999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
M END
>
2

>
Ethane

$$$$
;

const result = parseSDF(multiRecordSDF); console.log(result.records.length); // 2 console.log(result.records[0].properties.NAME); // "Methane" console.log(result.records[1].properties.NAME); // "Ethane"`

Round-trip workflow:

`typescript import { parseSMILES, writeSDF, parseSDF, generateSMILES } from "openchem";

// SMILES → SDF → SMILES round-trip const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O").molecules[0]; const sdfResult = writeSDF({ molecule: aspirin, properties: { NAME: "aspirin", FORMULA: "C9H8O4" }, });

const parsed = parseSDF(sdfResult.sdf); const roundtrip = generateSMILES(parsed.records[0].molecule!); console.log(roundtrip); // "CC(=O)Oc1ccccc1C(=O)O" console.log(parsed.records[0].properties.NAME); // "aspirin"`

##### generateInChI(molecule: Molecule): Promise

Generates an InChI (International Chemical Identifier) string from a molecule structure. InChI provides a unique, canonical representation of chemical structures that can be used for database lookups and structure comparison.

Returns: Promise resolving to InChI string

Example:

`typescript import { parseSMILES, generateInChI } from "openchem";

const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O"); const inchi = await generateInChI(aspirin.molecules[0]); console.log(inchi); // "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"`

##### generateInChIKey(inchi: string): Promise

Generates an InChIKey (a hashed, fixed-length version of InChI) from an InChI string. InChIKeys are commonly used for database indexing and fast lookups.

Parameters:

- inchi — InChI string to convert

Returns: Promise resolving to InChIKey string (27 characters)

Example:

`typescript const inchikey = await generateInChIKey(inchi); console.log(inchikey); // "BSYNRYMUTXBXSQ-UHFFFAOYSA-N"`

##### writeSDF(records: SDFRecord | SDFRecord[], options?: SDFWriterOptions): SDFWriterResult

Writes molecules to SDF (Structure-Data File) format. Supports single or multiple records with optional property data. SDF files are commonly used for storing chemical databases and transferring molecular data between cheminformatics tools.

Parameters:

- records— Single record or array of records to write -options — Optional configuration (same as MolGeneratorOptions): -title?: string— Default title for records (default: empty) -programName?: string— Program name in headers (default: "openchem") -dimensionality?: '2D' | '3D'— Coordinate system (default: "2D") -comment?: string — Default comment (default: empty)

Returns: SDFWriterResult containing:

- sdf: string— Complete SDF file content -errors: string[] — Any errors encountered (empty if successful)

Record format:

`typescript interface SDFRecord { molecule: Molecule; properties?: Record; }`

SDF structure:

- MOL block (V2000 format) for each molecule - Property fields (> , value, blank line) - Record separator ($$$$)

Example (single molecule):

`typescript import { parseSMILES, writeSDF } from "openchem";

const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O"); const result = writeSDF({ molecule: aspirin.molecules[0], properties: { NAME: "aspirin", MOLECULAR_FORMULA: "C9H8O4", MOLECULAR_WEIGHT: 180.042, }, });

console.log(result.sdf); // Output: SDF file with MOL block + properties + $$$$`

Example (multiple molecules):

`typescript import { parseSMILES, writeSDF } from "openchem";

const drugs = [ { smiles: "CC(=O)Oc1ccccc1C(=O)O", name: "aspirin" }, { smiles: "CC(C)Cc1ccc(cc1)C(C)C(=O)O", name: "ibuprofen" }, { smiles: "CC(=O)Nc1ccc(O)cc1", name: "acetaminophen" }, ];

const records = drugs.map((drug) => { const mol = parseSMILES(drug.smiles).molecules[0]; return { molecule: mol, properties: { NAME: drug.name, SMILES: drug.smiles, }, }; });

const result = writeSDF(records, { programName: "my-drug-tool" }); console.log(result.sdf); // Output: Multi-record SDF with all 3 molecules`

Property formatting:

- Strings: Written as-is - Numbers: Converted to strings - Booleans:"true" or "false"- Property names are case-sensitive

Compatibility:

- Output compatible with RDKit, OpenBabel, ChemDraw, and other tools - Standard SDF format (V2000 MOL blocks) - Properties follow MDL SDF specification

---

#### Pattern Matching & Rendering (4 functions)

##### renderSVG(input: string | Molecule | Molecule[] | ParseResult, options?: SVGRendererOptions): SVGRenderResult

Renders molecules as 2D SVG structures with automatic coordinate generation using webcola collision prevention.

Parameters:

- input— SMILES string, single molecule, array of molecules, or ParseResult -options — Optional rendering configuration (see SVGRendererOptions below)

Returns: SVGRenderResult containing:

- svg: string— SVG markup ready for display -width: number— Canvas width in pixels -height: number— Canvas height in pixels -errors: string[] — Any rendering errors (empty if successful)

SVGRendererOptions:

- width?: number— Canvas width (default: 300) -height?: number— Canvas height (default: 300) -bondLineWidth?: number— Bond line thickness (default: 2) -bondLength?: number— Target bond length in pixels (default: 40) -fontSize?: number— Atom label font size (default: 12) -fontFamily?: string— Font family (default: "Arial, sans-serif") -padding?: number— Canvas padding (default: 20) -showCarbonLabels?: boolean— Show C atom labels (default: false) -showImplicitHydrogens?: boolean— Show implicit hydrogens (default: false) -kekulize?: boolean— Convert aromatic to Kekulé (default: true) -atomColors?: Record— Element-specific colors -backgroundColor?: string— Background color (default: "#FFFFFF") -bondColor?: string— Bond color (default: "#000000") -showStereoBonds?: boolean— Show wedge/hash bonds (default: true) -atomCoordinates?: AtomCoordinates[]— Pre-computed coordinates -webcolaIterations?: number— Collision prevention iterations (default: 100) -deterministicChainPlacement?: boolean— Deterministic layouts (default: false) -moleculeSpacing?: number — Space between molecules in grid (default: 60)

Features:

- Automatic 2D coordinate generation with collision prevention - Ring regularization for 5 and 6-membered rings - Fused ring system handling - Stereochemistry display (wedge/hash bonds) - Element-specific atom coloring - Publication-quality output

##### parseSMARTS(smarts: string): ParseResult

Parses a SMARTS pattern string into a pattern molecule structure.

Returns: ParseResult containing:

- molecules: Molecule[]— Array with pattern molecule -errors: string[] — Parse errors (empty if successful)

SMARTS support:

- Logical operators: ! (not), & (and), ,(or) - Atom properties:[D1] (degree), [H1] (explicit H), [v3](valence) - Connectivity:[#6X4](carbon with degree 4) - Aromatic matching:[c] or [a] (aromatic carbon)

##### matchSMARTS(molecule: Molecule, pattern: ParseResult): number[][]

Finds all matches of a SMARTS pattern in a molecule.

Parameters:

- molecule— Target molecule to search -pattern — SMARTS pattern (from parseSMARTS())

Returns: Array of matches, where each match is an array of atom indices

Example:

`typescript import { parseSMILES, parseSMARTS, matchSMARTS } from "openchem";

const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O").molecules[0]; const carbonyl = parseSMARTS("C").molecules[0];

const matches = matchSMARTS(aspirin, carbonyl); // matches: [[1, 2], [7, 8]] (two carbonyl groups)`

##### kekulize(molecule: Molecule): Molecule

Converts aromatic molecules to alternating single/double bond (Kekulé) representation.

Returns: New molecule with aromatic bonds replaced by alternating single/double bonds

Example:

`typescript import { parseSMILES, kekulize, generateSMILES } from "openchem";

const benzene = parseSMILES("c1ccccc1"); const kek = kekulize(benzene.molecules[0]); console.log(generateSMILES(kek)); // "C1=CC=CC=C1"`

##### computeMorganFingerprint(molecule: Molecule, radius?: number, fpSize?: number): Uint8Array

Generates a Morgan fingerprint (ECFP-like) for molecular similarity searching and compound classification. Uses a modified Morgan algorithm with atom typing and circular neighborhoods.

Parameters:

- molecule— Molecule to fingerprint -radius— Fingerprint radius (default: 2, equivalent to ECFP4) -fpSize — Fingerprint size in bits (default: 2048, RDKit standard)

Returns: Uint8Array containing the fingerprint bits

Example:

`typescript import { parseSMILES, computeMorganFingerprint } from "openchem";

const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O"); const fingerprint = computeMorganFingerprint(aspirin.molecules[0], 2, 512); console.log(fingerprint.length); // 64 (512 bits / 8 bytes)`

##### tanimotoSimilarity(fp1: Uint8Array, fp2: Uint8Array): number

Calculates the Tanimoto similarity coefficient between two Morgan fingerprints. Measures structural similarity on a scale from 0 (no similarity) to 1 (identical).

Parameters:

- fp1— First fingerprint -fp2 — Second fingerprint

Returns: Similarity score between 0 and 1

Example:

`typescript const similarity = tanimotoSimilarity(fingerprint1, fingerprint2); console.log(Similarity: ${(similarity * 100).toFixed(1)}%);`

---

#### Scaffold Analysis (5 functions)

##### getMurckoScaffold(molecule: Molecule, options?: MurckoOptions): Molecule

Extracts the Murcko scaffold from a molecule — the core ring systems and linkers connecting them, with all terminal side chains removed. This is the standard scaffold used in medicinal chemistry for compound classification.

Parameters:

- molecule— Molecule to analyze -options.includeLinkers — Include linker atoms between rings (default: true)

Returns: New Molecule containing only the scaffold

Example:

`typescript import { parseSMILES, getMurckoScaffold, generateSMILES } from "openchem";

const ibuprofen = parseSMILES("CC(C)Cc1ccc(cc1)C(C)C(=O)O").molecules[0]; const scaffold = getMurckoScaffold(ibuprofen); console.log(generateSMILES(scaffold)); // "c1ccccc1" - benzene core`

##### getBemisMurckoFramework(molecule: Molecule): Molecule

Generates a generic Bemis-Murcko framework — the scaffold with all atoms converted to carbon and all bonds converted to single bonds. Useful for identifying compounds with similar topology but different heteroatom patterns.

Returns: New Molecule with generic framework

Example:

`typescript import { parseSMILES, getBemisMurckoFramework, generateSMILES } from "openchem";

const pyridine = parseSMILES("c1ccncc1").molecules[0]; const framework = getBemisMurckoFramework(pyridine); console.log(generateSMILES(framework)); // "C1CCCCC1" - cyclohexane`

##### getScaffoldTree(molecule: Molecule): Molecule[]

Generates a hierarchical scaffold tree by iteratively removing rings from the Murcko scaffold. Returns scaffolds ordered from most specific (full scaffold) to least specific (single ring).

Returns: Array of Molecule objects representing scaffolds at different levels

Example:

`typescript import { parseSMILES, getScaffoldTree, generateSMILES } from "openchem";

const mol = parseSMILES("c1ccc2ccccc2c1").molecules[0]; // Naphthalene const tree = getScaffoldTree(mol); console.log(tree.length); // 2 levels: full naphthalene, then single benzene tree.forEach((scaffold, idx) => { console.log(Level ${idx}: ${generateSMILES(scaffold)}); });`

##### getGraphFramework(molecule: Molecule): Molecule

Generates a pure topological framework with all atoms converted to wildcard atoms (*). This represents the molecular graph structure without any atom type information.

Returns: New Molecule with graph framework

Example:

`typescript import { parseSMILES, getGraphFramework, generateSMILES } from "openchem";

const caffeine = parseSMILES("CN1C=NC2=C1C(=O)N(C(=O)N2C)C").molecules[0]; const graph = getGraphFramework(caffeine); console.log(generateSMILES(graph)); // "1=*2=1()()2" - pure topology`

##### haveSameScaffold(mol1: Molecule, mol2: Molecule): boolean

Compares two molecules to determine if they share the same Murcko scaffold. Useful for compound series analysis and lead identification.

Returns: true if scaffolds match, false otherwise

Example:

`typescript import { parseSMILES, haveSameScaffold } from "openchem";

const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O").molecules[0]; const ibuprofen = parseSMILES("CC(C)Cc1ccc(cc1)C(C)C(=O)O").molecules[0]; console.log(haveSameScaffold(aspirin, ibuprofen)); // true - both benzene scaffold`

---

#### Tautomer Analysis (2 functions)

##### enumerateTautomers(molecule: Molecule, options?: TautomerOptions): TautomerResult[]

Enumerates all tautomers for a molecule using transform-based enumeration with RDKit-compatible scoring.

Options:

- maxTautomers?: number— Maximum tautomers to generate (default: 256) -maxTransforms?: number— Maximum transform operations (default: 1024) -phases?: number[]— Rule phases to apply (default: [1, 2, 3]) -useFingerprintDedup?: boolean — Use fingerprint deduplication (default: true)

Returns: Array of TautomerResult objects with:

- smiles: string— SMILES representation -molecule: Molecule— Molecule object -score: number— Stability score (higher = more stable) -ruleIds: string[] — Applied transformation rules

Scoring system (RDKit-inspired):

- +250 per all-carbon aromatic ring - +100 per heteroaromatic ring - +25 for benzoquinone - +4 for oximes (C=N-OH) - +2 for carbonyls (C=O, N=O, P=O) - -10 per formal charge - -4 for aci-nitro - -1 per H on P, S, Se, Te

Example:

`typescript import { parseSMILES, enumerateTautomers } from "openchem";

const mol = parseSMILES("CC(=O)CC(=O)C").molecules[0]; // acetylacetone const tautomers = enumerateTautomers(mol, { maxTautomers: 16 });

console.log(Found ${tautomers.length} tautomers:); tautomers.forEach((t, i) => { console.log(${i + 1}. ${t.smiles} (score: ${t.score})); }); // 1. CC(=O)CC(=O)C (score: 4) - diketo form // 2. CC(=O)C=C(C)O (score: 2) - monoenol form // 3. CC(O)=CC(=O)C (score: 2) - monoenol form`

##### canonicalTautomer(molecule: Molecule): Molecule

Selects the canonical (most stable) tautomer based on scoring.

Returns: The highest-scoring tautomer as a Molecule

Example:

`typescript import { parseSMILES, canonicalTautomer, generateSMILES } from "openchem";

const mol = parseSMILES("CC(=O)CC(=O)C").molecules[0]; const canonical = canonicalTautomer(mol); console.log(generateSMILES(canonical)); // "CC(=O)CC(=O)C" - diketo form preferred`

---

#### Lipophilicity (3 functions)

##### computeLogP(molecule: Molecule): number

Calculates the LogP (partition coefficient) using the Wildman-Crippen method. LogP predicts lipophilicity and membrane permeability.

Returns: LogP value as a number

Interpretation:

- LogP < 0: Hydrophilic (water-loving) - 0 ≤ LogP ≤ 5: Optimal range for most drugs - LogP > 5: Lipophilic (fat-loving), may have poor water solubility

Example:

`typescript import { parseSMILES, computeLogP } from "openchem";

const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O").molecules[0]; console.log(computeLogP(aspirin)); // 1.31 (good bioavailability)`

##### crippenLogP(molecule: Molecule): number

Alias for computeLogP(). Alternative name for the Wildman-Crippen LogP calculation.

##### logP(molecule: Molecule): number

Alternative LogP calculation method. May use different fragment contributions than Crippen.

---

#### Basic Properties (3 functions)

##### getMolecularFormula(molecule: Molecule): string

Returns the molecular formula in Hill notation (C first, then H, then alphabetical).

Example: C9H8O4 for aspirin

##### getMolecularMass(molecule: Molecule): number

Returns the molecular mass using average atomic masses from the periodic table.

Example: 180.042 for aspirin

##### getExactMass(molecule: Molecule): number

Returns the exact mass using the most abundant isotope for each element.

Example: 180.042 for aspirin

---

#### Structural Properties (7 functions)

##### getHeavyAtomCount(molecule: Molecule): number

Returns the count of non-hydrogen atoms.

Example: 13 for ibuprofen

##### getHeteroAtomCount(molecule: Molecule): number

Returns the count of heteroatoms (any atom except C and H). Includes N, O, S, P, halogens, etc.

Example: 2 for aspirin (2 oxygen atoms in COOH group)

##### getRingCount(molecule: Molecule): number

Returns the total number of rings in the molecule using cycle detection.

Example: 2 for naphthalene (2 fused rings)

##### getAromaticRingCount(molecule: Molecule): number

Returns the number of aromatic rings.

Example: 1 for benzene, 2 for naphthalene

##### getRingInfo(molecule: Molecule): RingInformation

Returns a comprehensive ring information object providing access to SSSR (Smallest Set of Smallest Rings) and ring membership queries. Similar to RDKit's GetRingInfo() functionality.

Methods:

- numRings()- Number of rings in SSSR -rings()- Array of rings (each ring is atom ID array) -isAtomInRing(atomIdx)- Check if atom is in any ring -isBondInRing(atom1, atom2)- Check if bond is in any ring -atomRingMembership(atomIdx)- Ring membership count for atom ([Rn] in SMARTS) -atomRings(atomIdx)- All rings containing specific atom -ringAtoms(ringIdx)- Atoms in specific ring -ringBonds(ringIdx) - Bonds in specific ring

Example:

`typescript const ringInfo = getRingInfo(mol); console.log(ringInfo.numRings()); // 2 console.log(ringInfo.isAtomInRing(5)); // true console.log(ringInfo.atomRingMembership(3)); // 2 (bridgehead atom)`

##### getFractionCSP3(molecule: Molecule): number

Returns the fraction of sp³-hybridized carbons (saturated carbons) relative to total carbons. Higher values indicate greater structural complexity and 3D character. Range: 0.0 to 1.0.

Example: 0.25 for caffeine, 0.67 for ibuprofen

##### getHBondDonorCount(molecule: Molecule): number

Returns the count of hydrogen bond donors (N-H and O-H groups).

Example: 1 for aspirin (carboxylic acid O-H), 0 for caffeine

##### getHBondAcceptorCount(molecule: Molecule): number

Returns the count of hydrogen bond acceptors (N and O atoms).

Example: 4 for aspirin, 6 for caffeine

---

#### Drug-Likeness Properties (5 functions)

##### getTPSA(molecule: Molecule): number

Returns the Topological Polar Surface Area in Ų (square Ångströms) using the Ertl et al. fragment-based algorithm. TPSA is a key descriptor for predicting drug absorption and bioavailability.

Guidelines:

- TPSA < 140 Ų: Good oral bioavailability - TPSA < 90 Ų: Likely blood-brain barrier penetration - TPSA > 140 Ų: Poor membrane permeability

Example: 63.60 for aspirin (good oral availability), 52.93 for morphine (CNS-active)

##### getRotatableBondCount(molecule: Molecule): number

Returns the count of rotatable bonds (single non-ring bonds between non-terminal heavy atoms). Used in Veber rules for predicting oral bioavailability.

Example: 3 for aspirin, 4 for ibuprofen

##### checkLipinskiRuleOfFive(molecule: Molecule): LipinskiResult

Evaluates Lipinski's Rule of Five for oral drug-likeness. Returns result object with:

- passes: boolean indicating if all rules pass -violations: array of violation messages -properties: { molecularWeight, hbondDonors, hbondAcceptors, logP }

Rules:

- Molecular weight ≤ 500 Da - H-bond donors ≤ 5 - H-bond acceptors ≤ 10 - LogP ≤ 5

##### checkVeberRules(molecule: Molecule): VeberResult

Evaluates Veber rules for oral bioavailability. Returns result object with:

- passes: boolean indicating if all rules pass -violations: array of violation messages -properties: { rotatableBonds, tpsa }

Rules:

- Rotatable bonds ≤ 10 - TPSA ≤ 140 Ų

##### checkBBBPenetration(molecule: Molecule): BBBResult

Predicts blood-brain barrier penetration. Returns result object with:

- likelyPenetration: boolean (true if TPSA < 90 Ų) -tpsa: TPSA value

---

`$3`

`typescript interface Molecule { atoms: Atom[]; bonds: Bond[]; }

interface Atom { id: number; symbol: string; aromatic: boolean; hydrogens: number; charge: number; isotope: number | null; chiral: string | null; atomClass: number | null; isBracket: boolean; atomicNumber: number; }

interface Bond { atom1: number; atom2: number; type: BondType; stereo: StereoType; }

enum BondType { SINGLE = 1, DOUBLE = 2, TRIPLE = 3, QUADRUPLE = 4, AROMATIC = 5, }`

`Performance`

openchem is designed for production use with real-world performance:

- Parsing: ~1-10ms per molecule (depending on complexity) - Generation: ~1-5ms per molecule - Memory: Minimal overhead, compact AST representation - Zero dependencies: No external runtime dependencies

Benchmark with 325 diverse molecules including commercial drugs: Average parse + generate round-trip < 5ms

`Architecture`

`$3`

openchem uses a post-processing enrichment system that pre-computes expensive molecular properties during parsing. This design significantly improves performance for downstream property queries while maintaining code simplicity.

#### Why Pre-compute Properties?

Molecular property calculations like ring finding, hybridization determination, and rotatable bond classification are computationally expensive (O(n²) complexity). Without pre-computation:

1. Redundant calculations: Ring finding would run every time you query ring count, aromatic rings, or check if atoms/bonds are in rings 2. Performance penalty: Property queries would dominate runtime, especially for drug-likeness checks that need multiple properties 3. Code complexity: Every property function would need to duplicate expensive logic

The Solution: Compute once during parsing, cache results, use everywhere.

#### Key Components

- types.ts — Extended with optional cached properties on Atom, Bond, and Moleculeinterfaces -src/utils/molecule-enrichment.ts— Post-processing module that enriches molecules after parsing -src/parser.ts — Calls enrichMolecule()after validation phase at line 451 -src/utils/molecular-properties.ts — Uses cached properties when available, falls back to computation

#### Cached Properties

- Atom: degree (neighbor count), isInRing, ringIds[], hybridization(sp/sp²/sp³) - Bond:isInRing, ringIds[], isRotatable- Molecule:rings[][] (all rings as atom IDs), ringInfo (lookup maps)

#### Performance Impact

Benchmark Results (10,000 molecules, 7 properties each):

- Parse time: 1.22 ms/molecule (includes enrichment) - Property query time: 0.006 ms/molecule (0.5% of parse time) - Rotatable bond queries: ~3.1 million ops/second (simple array filter vs 47-line calculation)

Complexity Improvements:

- Ring finding: Once per molecule (O(n²)) → subsequent queries O(1) - Rotatable bonds: O(n×m) nested loops → O(n) array filter - Property queries: 200× faster on average

#### Immutability Contract

Important: Molecules are immutable after parsing. All enriched properties remain valid for the lifetime of the molecule object. This design:

- Prevents stale cached properties (no mutation = no invalidation needed) - Enables safe sharing across threads/workers - Simplifies reasoning about molecule state

If you need to modify a molecule, create a new one by parsing updated SMILES.

#### Design Notes

- Ring analysis (analyzeRings()) runs only during enrichment - Downstream property functions check cached values first, fall back to computation if missing - Backward compatible: cached properties are optional (?:) with defensive fallbacks - New code should always use cached properties when available

`Edge Cases & Limitations`

openchem handles 100% of tested SMILES correctly (325/325 in bulk validation).

Key implementation details:

- Stereo normalization: Trans alkenes are automatically normalized to use / (up) markers on both ends to match RDKit's canonical form. For example, C\C=C\C and C/C=C/C both represent trans configuration and canonicalize to C/C=C/C.

- Canonical ordering: Atoms are ordered using a modified Morgan algorithm matching RDKit's approach, with tie-breaking by atomic number, degree, and other properties.

- Aromatic validation: Aromatic notation (lowercase letters) is accepted as specified in SMILES. The parser validates that aromatic atoms are in rings but accepts aromatic notation without strict Hückel rule enforcement, matching RDKit's behavior for broader compatibility.

This implementation has been validated against RDKit's canonical SMILES output for diverse molecule sets including stereocenters, complex rings, heteroatoms, and 25 commercial pharmaceutical drugs.

`OpenSMILES Specification Compliance`

openchem implements the OpenSMILES specification with high fidelity while prioritizing RDKit compatibility for real-world interoperability. In specific areas where the OpenSMILES specification provides recommendations rather than strict requirements, openchem follows RDKit's implementation choices to ensure 100% parity with the industry-standard cheminformatics toolkit.

`$3`

OpenSMILES Recommendation: Start traversal on heteroatoms first, then terminals.

- Example preference: OCCC over CCCOfor propanol - Rationale: Heteroatoms are "more interesting" chemically

openchem Implementation: Canonical labels first, heteroatoms as tie-breaker.

- Example: Both OCCC and CCCO canonicalize to CCCO- Rationale: Ensures 100% deterministic output for identical molecules

Why RDKit's Approach:

1. Determinism: Canonical labels guarantee the same molecule always produces identical output, regardless of input order 2. Interoperability: 100% agreement with RDKit enables seamless integration with existing cheminformatics pipelines and databases 3. Real-world usage: Major chemical databases (PubChem, ChEMBL) prioritize canonical determinism over heteroatom preference 4. Chemical equivalence: BothOCCC and CCCO` represent the same molecule; the output difference is purely cosmetic

Impact: Minimal — affects only the order atoms appear in canonical output, not chemical meaning or validity. All SMILES remain valid OpenSMILES syntax.

$3

**OpenSMILES S

openchem

Production-ready, TypeScript-first library for cheminformatics — works in both browser and Node.js. openchem keeps a small runtime footprint.

Features

$3

Quick Start

``bash npm install openchem

`or: bun add openchem`

`typescript import { parseSMILES, renderSVG, Descriptors } from "openchem";

// Parse a molecule const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O").molecules[0];

// Render as SVG const svg = renderSVG(aspirin); console.log(svg.svg); // SVG markup ready for display

// Or get specific categories const drugLike = Descriptors.drugLikeness(aspirin); console.log(drugLike.lipinski.passes); // true console.log(drugLike.lipinski.violations); // []`

`HTML Playground`

openchem includes an interactive HTML playground for testing SMILES parsing, molecular visualization, and descriptor calculation:

`bash

`Build the browser bundle and start a local server`


bun run serve
Then open http://localhost:3000/smiles-playground.html in your browser


The playground provides:
- 2D Structure Visualization — Clean SVG rendering of molecular structures
- Molecular Descriptors — Formula, mass, TPSA, rotatable bonds, etc.
- Drug-Likeness Checks — Lipinski's Rule of Five, Veber rules, BBB penetration
- Interactive Examples — Pre-loaded molecules like aspirin, caffeine, ibuprofen
The playground automatically detects if the full openchem library is available and falls back to approximate calculations if needed.

`Model Context Protocol (MCP) Server`

The MCP server for AI assistant integration is now available as a separate package: @openchem/mcp

`$3`

`bash

`Install MCP server`


npm install -g @openchem/mcp
Start server

openchem-mcp
Server runs on http://localhost:3000

$3

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

`json { "mcpServers": { "openchem": { "url": "http://localhost:3000/mcp" } } }`

Restart Claude Desktop and try: _"Analyze aspirin using SMILES CC(=O)Oc1ccccc1C(=O)O"_

`$3`

`Code Examples`

`typescript import { parseSMILES, generateSMILES, parseMolfile, generateMolfile, parseSDF, writeSDF, } from "openchem";

// Generate canonical SMILES const canonical = generateSMILES(result.molecules[0]); console.log(canonical); // "CC(=O)O"

// Parse MOL file const molContent =
acetic acid
openchem

// Parse SDF file const sdfContent =
Mrv2311 02102409422D

>
Ethanol

$$$$
; const sdfResult = parseSDF(sdfContent); console.log(sdfResult.records[0].molecule?.atoms.length); // 3 console.log(sdfResult.records[0].properties.NAME); // "Ethanol"`

// Generate InChI from molecule const inchi = await generateInChI(aspirin.molecules[0]); console.log(inchi); // "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"

// Generate InChIKey const inchikey = await generateInChIKey(inchi); console.log(inchikey); // "BSYNRYMUTXBXSQ-UHFFFAOYSA-N"

`$3`

`typescript import { parseSMILES, computeMorganFingerprint, tanimotoSimilarity } from 'openchem';

// Generate fingerprints for similarity comparison const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O'); const ibuprofen = parseSMILES('CC(C)Cc1ccc(cc1)C(C)C(=O)O');

const fp1 = computeMorganFingerprint(aspirin.molecules[0], 2, 512); const fp2 = computeMorganFingerprint(ibuprofen.molecules[0], 2, 512);

// Calculate structural similarity const similarity = tanimotoSimilarity(fp1, fp2); console.log(Similarity: ${(similarity * 100).toFixed(1)}%); // ~45.2%``

`$3`

Extract core molecular scaffolds for drug discovery and compound classification:

`typescript import { parseSMILES, getMurckoScaffold, getBemisMurckoFramework, generateSMILES } from "openchem";

// Get generic framework (all atoms → carbon, all bonds → single) const framework = getBemisMurckoFramework(ibuprofen); console.log(generateSMILES(framework)); // "C1CCCCC1" - cyclohexane

Applications:

- Compound library classification - Lead series identification - Scaffold hopping strategies - Fragment-based drug design

`$3`

Enumerate and score tautomers (keto-enol, imine-enamine, amide-imidol, etc.) with RDKit-compatible scoring:

`typescript import { parseSMILES, enumerateTautomers, generateSMILES } from "openchem";

// Enumerate tautomers for acetylacetone (pentane-2,4-dione) const mol = parseSMILES("CC(=O)CC(=O)C").molecules[0]; const tautomers = enumerateTautomers(mol, { maxTautomers: 16 });

console.log(Found ${tautomers.length} tautomers:); tautomers.forEach((t, i) => { console.log(${i + 1}. ${t.smiles} (score: ${t.score})); });

// Get canonical tautomer (highest scoring) import { canonicalTautomer } from "openchem"; const canonical = canonicalTautomer(mol); console.log(Canonical: ${generateSMILES(canonical)});`

Supported tautomer types (26 rules, 100% RDKit coverage):

Scoring system (RDKit-compatible):

Applications:

- Compound standardization for databases - Virtual screening preparation - pKa prediction support - Tautomer-aware structure searching

`$3`

`typescript import { parseSMILES, renderSVG } from "openchem";

console.log(svgResult.svg); // Complete SVG markup console.log(Canvas: ${svgResult.width}x${svgResult.height}); // "300x200"`

`Testing & RDKit comparison`

Highlights:

For maintainers: update and run the test suite with bun test. Use RUN_RDKIT_BULK=1 to enable the heavier RDKit bulk comparisons when you have RDKit available.

`Validation`

`Installation`

`bash npm install openchem

bun add openchem

pnpm add openchem`

`Usage`

`$3`

For comprehensive working examples, see:

Run any example:

`bash bun run docs/examples/comprehensive-example.ts`

`$3`

The repository contains two long-running RDKit comparison tests (the 10k SMILES suite and the bulk 300-SMILES suite). These tests are skipped by default to keep regular test runs fast.

To run them set the RUN_RDKIT_BULK environment variable:

`bash

`Run heavy RDKit comparisons (rdkit-10k and rdkit-bulk)`


RUN_RDKIT_BULK=1 bun test

Add RUN_VERBOSE=1 for more detailed RDKit reporting during the run.

`typescript import { parseSMILES } from "openchem";

// Simple molecule const ethanol = parseSMILES("CCO"); console.log(ethanol.molecules[0].atoms.length); // 3

// Check for errors const result = parseSMILES("invalid"); if (result.errors.length > 0) { console.error("Parse errors:", result.errors); }

`$3`

openchem provides comprehensive molecular property calculations for drug discovery and cheminformatics applications.

#### Basic Properties

`typescript import { parseSMILES, getMolecularFormula, getMolecularMass, getExactMass } from "openchem";

const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O"); const mol = aspirin.molecules[0];

// Get molecular formula (Hill notation) const formula = getMolecularFormula(mol); console.log(formula); // "C9H8O4"

// Get molecular mass (average atomic masses) const mass = getMolecularMass(mol); console.log(mass); // 180.042

// Get exact mass (most abundant isotope) const exactMass = getExactMass(mol); console.log(exactMass); // 180.042`

#### Atom Counts and Structure

`typescript import { parseSMILES, getHeavyAtomCount, getHeteroAtomCount, getRingCount, getAromaticRingCount, getRingInfo, } from "openchem";

const ibuprofen = parseSMILES("CC(C)Cc1ccc(cc1)C(C)C(=O)O"); const mol = ibuprofen.molecules[0];

// Count heavy atoms (non-hydrogen) console.log(getHeavyAtomCount(mol)); // 13

// Count heteroatoms (N, O, S, P, halogens, etc.) console.log(getHeteroAtomCount(mol)); // 2

// Count total rings console.log(getRingCount(mol)); // 1

// Count aromatic rings console.log(getAromaticRingCount(mol)); // 1

// Get comprehensive ring information const ringInfo = getRingInfo(mol); console.log(ringInfo.numRings()); // 1 console.log(ringInfo.rings()); // [[6,7,8,9,10,11]] - atom IDs in the ring`

#### Drug-Likeness Properties

`typescript import { parseSMILES, getFractionCSP3, getHBondDonorCount, getHBondAcceptorCount, getTPSA, } from "openchem";

const caffeine = parseSMILES("CN1C=NC2=C1C(=O)N(C(=O)N2C)C"); const mol = caffeine.molecules[0];

// Fraction of sp3 carbons (structural complexity) console.log(getFractionCSP3(mol)); // 0.25

// H-bond donors (N-H, O-H) console.log(getHBondDonorCount(mol)); // 0

// H-bond acceptors (N, O atoms) console.log(getHBondAcceptorCount(mol)); // 6

// Topological polar surface area (Ų) // Critical for predicting oral bioavailability and BBB penetration console.log(getTPSA(mol)); // 61.82`

#### TPSA for Drug Design

TPSA (Topological Polar Surface Area) is essential for predicting drug properties:

`typescript import { parseSMILES, getTPSA } from "openchem";

// Oral bioavailability: TPSA < 140 Ų const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O"); console.log(getTPSA(aspirin.molecules[0])); // 63.60 ✓ Good oral availability

#### Drug-Likeness Rule Checkers

`typescript import { parseSMILES, checkLipinskiRuleOfFive, checkVeberRules, checkBBBPenetration, } from "openchem";

// Veber Rules (oral bioavailability) const veber = checkVeberRules(aspirin.molecules[0]); console.log(veber.passes); // true console.log(veber.properties); // { rotatableBonds: 3, tpsa: 63.60 }

`$3`

`typescript import { parseSMILES, generateSMILES } from "openchem";

// Stereo normalization matches RDKit const trans1 = parseSMILES("C\\C=C\\C"); // trans (down markers) console.log(generateSMILES(trans1.molecules[0])); // "C/C=C/C" - normalized to up markers

const trans2 = parseSMILES("C/C=C/C"); // trans (up markers) console.log(generateSMILES(trans2.molecules[0])); // "C/C=C/C" - already normalized

// Generate simple (non-canonical) SMILES const simple = generateSMILES(parsed.molecules[0], false); console.log(simple); // "CC(C)CC" - preserves input order

// Explicit canonical generation const explicitCanonical = generateSMILES(parsed.molecules[0], true); console.log(explicitCanonical); // "CCC(C)C"

// Handle multiple disconnected molecules const mixture = parseSMILES("CCO.O"); // ethanol + water const output = generateSMILES(mixture.molecules); console.log(output); // "CCO.O"`

`$3`

Render molecules as 2D SVG structures with automatic coordinate generation. openchem provides deterministic layouts, fast performance, and excellent handling of rings, branches, and terminal atoms.

#### Basic SVG Rendering

`typescript import { parseSMILES, renderSVG } from "openchem";

// Or render directly from SMILES (if parsing is included) const renderResult = renderSVG("CCO"); if (renderResult.errors.length === 0) { console.log(renderResult.svg); }

#### SVG Rendering Options

`typescript import { parseSMILES, renderSVG } from "openchem"; import type { SVGRendererOptions } from "openchem";

const benzene = parseSMILES("c1ccccc1"); const mol = benzene.molecules[0];

const options: SVGRendererOptions = { // Canvas sizing width: 400, height: 400, padding: 20,

// Bond styling bondLineWidth: 2, bondLength: 40, bondColor: "#000000",

// Atom & text styling fontSize: 14, fontFamily: "Arial, sans-serif", showCarbonLabels: false, // Hide C labels for cleaner appearance showImplicitHydrogens: false, // Hide implicit hydrogens

// Color mapping by element atomColors: { C: "#222222", N: "#3050F8", O: "#FF0D0D", S: "#E6C200", F: "#50FF50", Cl: "#1FF01F", Br: "#A62929", I: "#940094", },

// Background backgroundColor: "#FFFFFF",

// Stereochemistry display showStereoBonds: true,

// Layout & coordinate generation kekulize: true, // Convert aromatic to alternating single/double bonds (default: true) moleculeSpacing: 60, // Spacing between molecules in grid layouts };

const result = renderSVG(mol, options); console.log(result.svg); // Custom-styled SVG`

#### Using Pre-computed Coordinates

`typescript import { parseSMILES, renderSVG } from "openchem";

const ethanol = parseSMILES("CCO"); const mol = ethanol.molecules[0];

// Provide your own atom coordinates (useful for custom layouts) const customCoords = [ { x: 0, y: 0 }, // C { x: 40, y: 0 }, // C { x: 80, y: 0 }, // O ];

const result = renderSVG(mol, { atomCoordinates: customCoords, width: 200, height: 100, });

console.log(result.svg);`

#### Coordinate Generation Features

openchem's coordinate generator provides:

`typescript import { parseSMILES, renderSVG } from "openchem";

// Complex fused ring system const naphthalene = parseSMILES("c1ccc2ccccc2c1"); const result = renderSVG(naphthalene.molecules[0], { width: 300, height: 300, bondLength: 35, });

console.log(result.svg);`

#### Error Handling

`typescript import { renderSVG } from "openchem";

const result = renderSVG("C"); if (result.errors.length > 0) { console.error("SVG rendering errors:", result.errors); } else { console.log(result.svg); }`

`$3`

Match molecular patterns using SMARTS (SMILES Arbitrary Target Specification) notation.

`typescript import { parseSMILES, parseSMARTS, matchSMARTS } from "openchem";

// Parse molecule and SMARTS pattern const molecule = parseSMILES("CC(=O)Oc1ccccc1C(=O)O"); // aspirin const pattern = parseSMARTS("[O;D1]"); // Single-bonded oxygen (carbonyl)

// Find matching atoms const matches = matchSMARTS(molecule.molecules[0], pattern); console.log(matches.length); // 2 (two carbonyl oxygens) console.log(matches); // [[2], [7]] (atom indices)

`$3`

Convert aromatic molecules to alternating single/double bond representations (Kekulé structures).

`typescript import { parseSMILES, kekulize, generateSMILES } from "openchem";

// Parse aromatic molecule const benzene = parseSMILES("c1ccccc1"); const mol = benzene.molecules[0];

// Convert to Kekulé structure const kekuleMol = kekulize(mol);

// Generate SMILES from Kekulé form const kekuleSMILES = generateSMILES(kekuleMol); console.log(kekuleSMILES); // "C1=CC=CC=C1" or similar alternating structure

// SVG rendering automatically kekulizes (unless disabled) import { renderSVG } from "openchem";

const result = renderSVG(mol, { kekulize: true, // default: true }); // Rendered SVG shows alternating single/double bonds`

`$3`

Calculate LogP (partition coefficient) for predicting lipophilicity and membrane permeability.

`typescript import { parseSMILES, computeLogP, crippenLogP } from "openchem";

const molecules = [ "CC(=O)Oc1ccccc1C(=O)O", // aspirin "CC(C)Cc1ccc(cc1)C(C)C(=O)O", // ibuprofen "CC(=O)Nc1ccc(O)cc1", // acetaminophen ];

molecules.forEach((smiles) => { const mol = parseSMILES(smiles).molecules[0];

// Wildman-Crippen method (more accurate) const logP = computeLogP(mol); console.log(${smiles.substring(0, 10)}... LogP: ${logP.toFixed(2)});

// Alternative: crippenLogP (alias) const logP2 = crippenLogP(mol); console.log( Crippen LogP: ${logP2.toFixed(2)}); });

// LogP guidelines for drug design const caffeine = parseSMILES("CN1C=NC2=C1C(=O)N(C(=O)N2C)C"); const caffeineMol = caffeine.molecules[0]; const logpValue = computeLogP(caffeineMol);

`$3`

`typescript import { parseSMILES } from "openchem"; import { BondType } from "openchem";

const result = parseSMILES("C=C"); const mol = result.molecules[0];

// Access bonds mol.bonds.forEach((bond) => { console.log(Bond ${bond.atom1}-${bond.atom2}); console.log( Type: ${bond.type === BondType.DOUBLE ? "DOUBLE" : "SINGLE"}); });`

`Running Tests`

`bash

`Run all tests (includes RDKit comparisons)`


bun test
Run with Node.js

npm test
Run specific test file

bun test test/parser.test.ts

`API Reference`

`$3`

openchem provides 38 functions organized into 8 categories:

Parsing & Generation (8)

Pattern Matching & Rendering (6)

Scaffold Analysis (5)

Tautomer Analysis (2)

- enumerateTautomers- Generate all tautomers with RDKit scoring -canonicalTautomer - Select highest-scoring canonical tautomer

Basic Properties (3)

- getMolecularFormula- Hill notation formula -getMolecularMass- Average molecular mass -getExactMass - Exact mass (monoisotopic)

Lipophilicity (3)

- computeLogP- Wildman-Crippen partition coefficient -crippenLogP- Alias for computeLogP -logP - Alternative LogP calculation

Structural Properties (8)

Drug-Likeness (5)

---

`$3`

#### Parsing & Generation (6 functions)

##### parseSMILES(smiles: string): ParseResult

Parses a SMILES string into molecule structures.

Returns: ParseResult containing:

- molecules: Molecule[]— Array of parsed molecules -errors: string[] — Parse/validation errors (empty if successful)

##### generateSMILES(input: Molecule | Molecule[], canonical?: boolean): string

Generates SMILES from molecule structure(s).

Parameters:

- input— Single molecule or array of molecules -canonical — Generate canonical SMILES (default: true)

Returns: SMILES string (uses . to separate disconnected molecules)

Canonical SMILES features:

##### generateMolfile(molecule: Molecule, options?: MolGeneratorOptions): string

Generates a MOL file (V2000 format) from a molecule structure. Matches RDKit's output structure for compatibility with cheminformatics tools.

Parameters:

Returns: MOL file content as string with V2000 format

Features:

Example:

`typescript import { parseSMILES, generateMolfile } from "openchem";

const result = parseSMILES("CCO"); const molfile = generateMolfile(result.molecules[0]); console.log(molfile); // Output: MOL file with header, atom coordinates, bond connectivity, etc.`

##### parseMolfile(input: string): MolfileParseResult

Parses a MOL file (MDL Molfile format) into a molecule structure. Supports both V2000 and V3000 formats with comprehensive validation.

Parameters:

- input — MOL file content as a string

Returns: MolfileParseResult containing:

Supported formats:

- V2000: Classic fixed-width format (most common) - V3000: Extended format with additional features

Validation features:

Parsed features:

Limitations:

- SGroups are parsed but not converted to molecule structure - Query atoms/bonds not supported

Example:

`typescript import { parseMolfile, generateSMILES } from "openchem";

const molContent =
ethanol
openchem

const result = parseMolfile(molContent); if (result.errors.length === 0) { console.log(result.molecule?.atoms.length); // 3 console.log(result.molecule?.bonds.length); // 2

// Convert to SMILES const smiles = generateSMILES(result.molecule!); console.log(smiles); // "CCO" }

// Error handling const invalid = parseMolfile("invalid content"); if (invalid.errors.length > 0) { console.error("Parse errors:", invalid.errors); }`

Round-trip workflow:

`typescript import { parseSMILES, generateMolfile, parseMolfile, generateSMILES } from "openchem";

##### parseSDF(input: string): SDFParseResult

Parses an SDF (Structure-Data File) into molecule structures with associated properties. SDF files can contain multiple molecules, each with a MOL block and optional property fields.

Parameters:

- input — SDF file content as a string

Returns: SDFParseResult containing:

- records: SDFRecord[]— Array of parsed records -errors: ParseError[] — Global parse errors (empty if successful)

Record structure (SDFRecord):

Features:

Example (single record):

`typescript import { parseSDF, generateSMILES } from "openchem";

const sdfContent =
Mrv2311 02102409422D

>
Ethanol

>
C2H6O

$$$$
;

// Convert to SMILES const smiles = generateSMILES(record.molecule!); console.log(smiles); // "CCO" }

// Error handling if (result.records[0].errors.length > 0) { console.error("Record errors:", result.records[0].errors); }`

Example (multiple records):

`typescript import { parseSDF } from "openchem";

const multiRecordSDF =
Mrv2311 02102409422D

1 0 0 0 0 0 999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
M END
>
1

>
Methane

$$$$

Mrv2311 02102409422D

2 1 0 0 0 0 999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
M END
>
2

>
Ethane

$$$$
;

Round-trip workflow:

`typescript import { parseSMILES, writeSDF, parseSDF, generateSMILES } from "openchem";

##### generateInChI(molecule: Molecule): Promise

Returns: Promise resolving to InChI string

Example:

`typescript import { parseSMILES, generateInChI } from "openchem";

##### generateInChIKey(inchi: string): Promise

Generates an InChIKey (a hashed, fixed-length version of InChI) from an InChI string. InChIKeys are commonly used for database indexing and fast lookups.

Parameters:

- inchi — InChI string to convert

Returns: Promise resolving to InChIKey string (27 characters)

Example:

`typescript const inchikey = await generateInChIKey(inchi); console.log(inchikey); // "BSYNRYMUTXBXSQ-UHFFFAOYSA-N"`

##### writeSDF(records: SDFRecord | SDFRecord[], options?: SDFWriterOptions): SDFWriterResult

Parameters:

Returns: SDFWriterResult containing:

- sdf: string— Complete SDF file content -errors: string[] — Any errors encountered (empty if successful)

Record format:

`typescript interface SDFRecord { molecule: Molecule; properties?: Record; }`

SDF structure:

- MOL block (V2000 format) for each molecule - Property fields (> , value, blank line) - Record separator ($$$$)

Example (single molecule):

`typescript import { parseSMILES, writeSDF } from "openchem";

console.log(result.sdf); // Output: SDF file with MOL block + properties + $$$$`

Example (multiple molecules):

`typescript import { parseSMILES, writeSDF } from "openchem";

const drugs = [ { smiles: "CC(=O)Oc1ccccc1C(=O)O", name: "aspirin" }, { smiles: "CC(C)Cc1ccc(cc1)C(C)C(=O)O", name: "ibuprofen" }, { smiles: "CC(=O)Nc1ccc(O)cc1", name: "acetaminophen" }, ];

const records = drugs.map((drug) => { const mol = parseSMILES(drug.smiles).molecules[0]; return { molecule: mol, properties: { NAME: drug.name, SMILES: drug.smiles, }, }; });

const result = writeSDF(records, { programName: "my-drug-tool" }); console.log(result.sdf); // Output: Multi-record SDF with all 3 molecules`

Property formatting:

- Strings: Written as-is - Numbers: Converted to strings - Booleans:"true" or "false"- Property names are case-sensitive

Compatibility:

- Output compatible with RDKit, OpenBabel, ChemDraw, and other tools - Standard SDF format (V2000 MOL blocks) - Properties follow MDL SDF specification

---

#### Pattern Matching & Rendering (4 functions)

##### renderSVG(input: string | Molecule | Molecule[] | ParseResult, options?: SVGRendererOptions): SVGRenderResult

Renders molecules as 2D SVG structures with automatic coordinate generation using webcola collision prevention.

Parameters:

- input— SMILES string, single molecule, array of molecules, or ParseResult -options — Optional rendering configuration (see SVGRendererOptions below)

Returns: SVGRenderResult containing:

SVGRendererOptions:

Features:

##### parseSMARTS(smarts: string): ParseResult

Parses a SMARTS pattern string into a pattern molecule structure.

Returns: ParseResult containing:

- molecules: Molecule[]— Array with pattern molecule -errors: string[] — Parse errors (empty if successful)

SMARTS support:

##### matchSMARTS(molecule: Molecule, pattern: ParseResult): number[][]

Finds all matches of a SMARTS pattern in a molecule.

Parameters:

- molecule— Target molecule to search -pattern — SMARTS pattern (from parseSMARTS())

Returns: Array of matches, where each match is an array of atom indices

Example:

`typescript import { parseSMILES, parseSMARTS, matchSMARTS } from "openchem";

const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O").molecules[0]; const carbonyl = parseSMARTS("C").molecules[0];

const matches = matchSMARTS(aspirin, carbonyl); // matches: [[1, 2], [7, 8]] (two carbonyl groups)`

##### kekulize(molecule: Molecule): Molecule

Converts aromatic molecules to alternating single/double bond (Kekulé) representation.

Returns: New molecule with aromatic bonds replaced by alternating single/double bonds

Example:

`typescript import { parseSMILES, kekulize, generateSMILES } from "openchem";

const benzene = parseSMILES("c1ccccc1"); const kek = kekulize(benzene.molecules[0]); console.log(generateSMILES(kek)); // "C1=CC=CC=C1"`

##### computeMorganFingerprint(molecule: Molecule, radius?: number, fpSize?: number): Uint8Array

Generates a Morgan fingerprint (ECFP-like) for molecular similarity searching and compound classification. Uses a modified Morgan algorithm with atom typing and circular neighborhoods.

Parameters:

- molecule— Molecule to fingerprint -radius— Fingerprint radius (default: 2, equivalent to ECFP4) -fpSize — Fingerprint size in bits (default: 2048, RDKit standard)

Returns: Uint8Array containing the fingerprint bits

Example:

`typescript import { parseSMILES, computeMorganFingerprint } from "openchem";

const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O"); const fingerprint = computeMorganFingerprint(aspirin.molecules[0], 2, 512); console.log(fingerprint.length); // 64 (512 bits / 8 bytes)`

##### tanimotoSimilarity(fp1: Uint8Array, fp2: Uint8Array): number

Calculates the Tanimoto similarity coefficient between two Morgan fingerprints. Measures structural similarity on a scale from 0 (no similarity) to 1 (identical).

Parameters:

- fp1— First fingerprint -fp2 — Second fingerprint

Returns: Similarity score between 0 and 1

Example:

`typescript const similarity = tanimotoSimilarity(fingerprint1, fingerprint2); console.log(Similarity: ${(similarity * 100).toFixed(1)}%);`

---

#### Scaffold Analysis (5 functions)

##### getMurckoScaffold(molecule: Molecule, options?: MurckoOptions): Molecule

Parameters:

- molecule— Molecule to analyze -options.includeLinkers — Include linker atoms between rings (default: true)

Returns: New Molecule containing only the scaffold

Example:

`typescript import { parseSMILES, getMurckoScaffold, generateSMILES } from "openchem";

const ibuprofen = parseSMILES("CC(C)Cc1ccc(cc1)C(C)C(=O)O").molecules[0]; const scaffold = getMurckoScaffold(ibuprofen); console.log(generateSMILES(scaffold)); // "c1ccccc1" - benzene core`

##### getBemisMurckoFramework(molecule: Molecule): Molecule

Returns: New Molecule with generic framework

Example:

`typescript import { parseSMILES, getBemisMurckoFramework, generateSMILES } from "openchem";

const pyridine = parseSMILES("c1ccncc1").molecules[0]; const framework = getBemisMurckoFramework(pyridine); console.log(generateSMILES(framework)); // "C1CCCCC1" - cyclohexane`

##### getScaffoldTree(molecule: Molecule): Molecule[]

Generates a hierarchical scaffold tree by iteratively removing rings from the Murcko scaffold. Returns scaffolds ordered from most specific (full scaffold) to least specific (single ring).

Returns: Array of Molecule objects representing scaffolds at different levels

Example:

`typescript import { parseSMILES, getScaffoldTree, generateSMILES } from "openchem";

##### getGraphFramework(molecule: Molecule): Molecule

Generates a pure topological framework with all atoms converted to wildcard atoms (*). This represents the molecular graph structure without any atom type information.

Returns: New Molecule with graph framework

Example:

`typescript import { parseSMILES, getGraphFramework, generateSMILES } from "openchem";

const caffeine = parseSMILES("CN1C=NC2=C1C(=O)N(C(=O)N2C)C").molecules[0]; const graph = getGraphFramework(caffeine); console.log(generateSMILES(graph)); // "1=*2=1()()2" - pure topology`

##### haveSameScaffold(mol1: Molecule, mol2: Molecule): boolean

Compares two molecules to determine if they share the same Murcko scaffold. Useful for compound series analysis and lead identification.

Returns: true if scaffolds match, false otherwise

Example:

`typescript import { parseSMILES, haveSameScaffold } from "openchem";

---

#### Tautomer Analysis (2 functions)

##### enumerateTautomers(molecule: Molecule, options?: TautomerOptions): TautomerResult[]

Enumerates all tautomers for a molecule using transform-based enumeration with RDKit-compatible scoring.

Options:

Returns: Array of TautomerResult objects with:

- smiles: string— SMILES representation -molecule: Molecule— Molecule object -score: number— Stability score (higher = more stable) -ruleIds: string[] — Applied transformation rules

Scoring system (RDKit-inspired):

Example:

`typescript import { parseSMILES, enumerateTautomers } from "openchem";

const mol = parseSMILES("CC(=O)CC(=O)C").molecules[0]; // acetylacetone const tautomers = enumerateTautomers(mol, { maxTautomers: 16 });

##### canonicalTautomer(molecule: Molecule): Molecule

Selects the canonical (most stable) tautomer based on scoring.

Returns: The highest-scoring tautomer as a Molecule

Example:

`typescript import { parseSMILES, canonicalTautomer, generateSMILES } from "openchem";

const mol = parseSMILES("CC(=O)CC(=O)C").molecules[0]; const canonical = canonicalTautomer(mol); console.log(generateSMILES(canonical)); // "CC(=O)CC(=O)C" - diketo form preferred`

---

#### Lipophilicity (3 functions)

##### computeLogP(molecule: Molecule): number

Calculates the LogP (partition coefficient) using the Wildman-Crippen method. LogP predicts lipophilicity and membrane permeability.

Returns: LogP value as a number

Interpretation:

- LogP < 0: Hydrophilic (water-loving) - 0 ≤ LogP ≤ 5: Optimal range for most drugs - LogP > 5: Lipophilic (fat-loving), may have poor water solubility

Example:

`typescript import { parseSMILES, computeLogP } from "openchem";

const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O").molecules[0]; console.log(computeLogP(aspirin)); // 1.31 (good bioavailability)`

##### crippenLogP(molecule: Molecule): number

Alias for computeLogP(). Alternative name for the Wildman-Crippen LogP calculation.

##### logP(molecule: Molecule): number

Alternative LogP calculation method. May use different fragment contributions than Crippen.

---

#### Basic Properties (3 functions)

##### getMolecularFormula(molecule: Molecule): string

Returns the molecular formula in Hill notation (C first, then H, then alphabetical).

Example: C9H8O4 for aspirin

##### getMolecularMass(molecule: Molecule): number

Returns the molecular mass using average atomic masses from the periodic table.

Example: 180.042 for aspirin

##### getExactMass(molecule: Molecule): number

Returns the exact mass using the most abundant isotope for each element.

Example: 180.042 for aspirin

---

#### Structural Properties (7 functions)

##### getHeavyAtomCount(molecule: Molecule): number

Returns the count of non-hydrogen atoms.

Example: 13 for ibuprofen

##### getHeteroAtomCount(molecule: Molecule): number

Returns the count of heteroatoms (any atom except C and H). Includes N, O, S, P, halogens, etc.

Example: 2 for aspirin (2 oxygen atoms in COOH group)

##### getRingCount(molecule: Molecule): number

Returns the total number of rings in the molecule using cycle detection.

Example: 2 for naphthalene (2 fused rings)

##### getAromaticRingCount(molecule: Molecule): number

Returns the number of aromatic rings.

Example: 1 for benzene, 2 for naphthalene

##### getRingInfo(molecule: Molecule): RingInformation

Returns a comprehensive ring information object providing access to SSSR (Smallest Set of Smallest Rings) and ring membership queries. Similar to RDKit's GetRingInfo() functionality.

Methods:

Example:

##### getFractionCSP3(molecule: Molecule): number

Returns the fraction of sp³-hybridized carbons (saturated carbons) relative to total carbons. Higher values indicate greater structural complexity and 3D character. Range: 0.0 to 1.0.

Example: 0.25 for caffeine, 0.67 for ibuprofen

##### getHBondDonorCount(molecule: Molecule): number

Returns the count of hydrogen bond donors (N-H and O-H groups).

Example: 1 for aspirin (carboxylic acid O-H), 0 for caffeine

##### getHBondAcceptorCount(molecule: Molecule): number

Returns the count of hydrogen bond acceptors (N and O atoms).

Example: 4 for aspirin, 6 for caffeine

---

#### Drug-Likeness Properties (5 functions)

##### getTPSA(molecule: Molecule): number

Returns the Topological Polar Surface Area in Ų (square Ångströms) using the Ertl et al. fragment-based algorithm. TPSA is a key descriptor for predicting drug absorption and bioavailability.

Guidelines:

- TPSA < 140 Ų: Good oral bioavailability - TPSA < 90 Ų: Likely blood-brain barrier penetration - TPSA > 140 Ų: Poor membrane permeability

Example: 63.60 for aspirin (good oral availability), 52.93 for morphine (CNS-active)

##### getRotatableBondCount(molecule: Molecule): number

Returns the count of rotatable bonds (single non-ring bonds between non-terminal heavy atoms). Used in Veber rules for predicting oral bioavailability.

Example: 3 for aspirin, 4 for ibuprofen

##### checkLipinskiRuleOfFive(molecule: Molecule): LipinskiResult

Evaluates Lipinski's Rule of Five for oral drug-likeness. Returns result object with:

- passes: boolean indicating if all rules pass -violations: array of violation messages -properties: { molecularWeight, hbondDonors, hbondAcceptors, logP }

Rules:

- Molecular weight ≤ 500 Da - H-bond donors ≤ 5 - H-bond acceptors ≤ 10 - LogP ≤ 5

##### checkVeberRules(molecule: Molecule): VeberResult

Evaluates Veber rules for oral bioavailability. Returns result object with:

- passes: boolean indicating if all rules pass -violations: array of violation messages -properties: { rotatableBonds, tpsa }

Rules:

- Rotatable bonds ≤ 10 - TPSA ≤ 140 Ų

##### checkBBBPenetration(molecule: Molecule): BBBResult

Predicts blood-brain barrier penetration. Returns result object with:

- likelyPenetration: boolean (true if TPSA < 90 Ų) -tpsa: TPSA value

---

`$3`

`typescript interface Molecule { atoms: Atom[]; bonds: Bond[]; }

interface Bond { atom1: number; atom2: number; type: BondType; stereo: StereoType; }

enum BondType { SINGLE = 1, DOUBLE = 2, TRIPLE = 3, QUADRUPLE = 4, AROMATIC = 5, }`

`Performance`

openchem is designed for production use with real-world performance:

Benchmark with 325 diverse molecules including commercial drugs: Average parse + generate round-trip < 5ms

`Architecture`

`$3`

#### Why Pre-compute Properties?

Molecular property calculations like ring finding, hybridization determination, and rotatable bond classification are computationally expensive (O(n²) complexity). Without pre-computation:

The Solution: Compute once during parsing, cache results, use everywhere.

#### Key Components

#### Cached Properties

#### Performance Impact

Benchmark Results (10,000 molecules, 7 properties each):

Complexity Improvements:

- Ring finding: Once per molecule (O(n²)) → subsequent queries O(1) - Rotatable bonds: O(n×m) nested loops → O(n) array filter - Property queries: 200× faster on average

#### Immutability Contract

Important: Molecules are immutable after parsing. All enriched properties remain valid for the lifetime of the molecule object. This design:

- Prevents stale cached properties (no mutation = no invalidation needed) - Enables safe sharing across threads/workers - Simplifies reasoning about molecule state

If you need to modify a molecule, create a new one by parsing updated SMILES.

#### Design Notes

`Edge Cases & Limitations`

openchem handles 100% of tested SMILES correctly (325/325 in bulk validation).

Key implementation details:

- Canonical ordering: Atoms are ordered using a modified Morgan algorithm matching RDKit's approach, with tie-breaking by atomic number, degree, and other properties.

This implementation has been validated against RDKit's canonical SMILES output for diverse molecule sets including stereocenters, complex rings, heteroatoms, and 25 commercial pharmaceutical drugs.

`OpenSMILES Specification Compliance`

`$3`

OpenSMILES Recommendation: Start traversal on heteroatoms first, then terminals.

- Example preference: OCCC over CCCOfor propanol - Rationale: Heteroatoms are "more interesting" chemically

openchem Implementation: Canonical labels first, heteroatoms as tie-breaker.

- Example: Both OCCC and CCCO canonicalize to CCCO- Rationale: Ensures 100% deterministic output for identical molecules

Why RDKit's Approach:

Impact: Minimal — affects only the order atoms appear in canonical output, not chemical meaning or validity. All SMILES remain valid OpenSMILES syntax.

$3

**OpenSMILES S