A fast TypeScript / JavaScript chemistry toolkit for working with molecular structures: parsing & generation (SMILES, MOL, SDF), canonicalization, pattern matching (SMARTS), 2D rendering, and molecular descriptors.
npm install openchemA fast TypeScript / JavaScript chemistry toolkit for working with molecular structures: parsing & generation (SMILES, MOL, SDF), canonicalization, pattern matching (SMARTS), 2D rendering, molecular descriptors, and structural analysis.
Production-ready, TypeScript-first library for cheminformatics — works in both browser and Node.js. openchem keeps a small runtime footprint.
- SMILES — Parse and generate canonical SMILES with full stereochemistry
- MOL files — V2000/V3000 format support with 2D coordinate generation
- SDF files — Multi-molecule files with property data
- InChI — Generate InChI and InChIKey identifiers
- IUPAC names — Bidirectional IUPAC ↔ SMILES conversion
- Pattern matching — SMARTS substructure search
- Fingerprints — Morgan (ECFP) fingerprints with Tanimoto similarity
- Murcko scaffolds — Extract core scaffolds, generic frameworks, scaffold trees
- Tautomers — Complete enumeration (25 rules, 100% RDKit coverage) with RDKit-compatible scoring
- Ring systems — SSSR detection, fused/spiro/bridged classification
- Aromaticity — Hückel rule perception and kekulization
- Symmetry — Canonical ordering via modified Morgan algorithm
- Stereochemistry — Full support for tetrahedral centers, E/Z bonds, extended chirality
- Basic — Formula, mass, atom/bond counts
- Structural — Valence electrons, amide bonds, spiro/bridgehead atoms, ring classifications
- Stereochemistry — Specified and unspecified stereocenter counting
- Drug-likeness — Lipinski's Rule of Five, Veber rules, BBB penetration
- Descriptors — TPSA, LogP, rotatable bonds, H-bond donors/acceptors
- Ring analysis — Saturated/aliphatic/heterocyclic ring counts
- 2D rendering — Publication-quality SVG with automatic layout
- Smart positioning — Overlap-aware fused ring placement
- Stereochemistry display — Wedge/hash bonds for chirality
- Customizable — Element colors, bond styles, canvas size
- ⚡ Fast — Optimized coordinate generation, CSR graph for O(1) lookups
- 🔬 Accurate — 100% RDKit agreement on canonical SMILES (325/325 molecules)
- ✅ Well-tested — 2,093 passing tests including bulk RDKit comparisons
- 🎯 Production-ready — Used with real drugs, natural products, edge cases
- 📦 Lightweight — Minimal dependencies, works in browser and Node.js
- 🔒 TypeScript-first — Full type safety with excellent IDE support
``bash`
npm install openchemor: bun add openchem
`typescript
import { parseSMILES, renderSVG, Descriptors } from "openchem";
// Parse a molecule
const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O").molecules[0];
// Render as SVG
const svg = renderSVG(aspirin);
console.log(svg.svg); // SVG markup ready for display
// Get all molecular properties at once
const props = Descriptors.all(aspirin);
console.log(props.formula); // "C9H8O4"
console.log(props.mass); // 180.16
console.log(props.logP); // 1.19
console.log(props.lipinskiPass); // true - aspirin is drug-like!
// Or get specific categories
const drugLike = Descriptors.drugLikeness(aspirin);
console.log(drugLike.lipinski.passes); // true
console.log(drugLike.lipinski.violations); // []
`
openchem includes an interactive HTML playground for testing SMILES parsing, molecular visualization, and descriptor calculation:
`bashBuild the browser bundle and start a local server
bun run serve
The playground provides:
- 2D Structure Visualization — Clean SVG rendering of molecular structures
- Molecular Descriptors — Formula, mass, TPSA, rotatable bonds, etc.
- Drug-Likeness Checks — Lipinski's Rule of Five, Veber rules, BBB penetration
- Interactive Examples — Pre-loaded molecules like aspirin, caffeine, ibuprofen
The playground automatically detects if the full openchem library is available and falls back to approximate calculations if needed.
Note: The HTML playground requires a web server to load the openchem library due to ES module security restrictions. Use
bun run serve to start a local server, then open http://localhost:3000/smiles-playground.html in your browser.Model Context Protocol (MCP) Server
The MCP server for AI assistant integration is now available as a separate package: @openchem/mcp
$3
`bash
Install MCP server
npm install -g @openchem/mcpStart server
openchem-mcpServer runs on http://localhost:3000
`$3
Add to
~/Library/Application Support/Claude/claude_desktop_config.json:`json
{
"mcpServers": {
"openchem": {
"url": "http://localhost:3000/mcp"
}
}
}
`Restart Claude Desktop and try: _"Analyze aspirin using SMILES CC(=O)Oc1ccccc1C(=O)O"_
$3
- analyze — Complete molecular analysis (40+ descriptors, drug-likeness, IUPAC name, optional rendering)
- compare — Molecular similarity (Morgan fingerprints, Tanimoto similarity, property comparison)
- search — Substructure matching (SMARTS patterns with match counts and indices)
- render — 2D structure visualization (publication-quality SVG)
- convert — Format conversion (canonical SMILES, IUPAC names, Murcko scaffolds)
$3
- @openchem/mcp Package — Full MCP server documentation
- MCP Integration Guide — Complete integration guide (Claude Desktop, custom clients, deployment)
- MCP Server Reference — API documentation, tool schemas, examples
Code Examples
`typescript
import {
parseSMILES,
generateSMILES,
parseMolfile,
generateMolfile,
parseSDF,
writeSDF,
} from "openchem";// Parse SMILES into molecule structure
const result = parseSMILES("CC(=O)O"); // acetic acid
console.log(result.molecules[0].atoms.length); // 4 atoms
console.log(result.molecules[0].bonds.length); // 3 bonds
// Generate canonical SMILES
const canonical = generateSMILES(result.molecules[0]);
console.log(canonical); // "CC(=O)O"
// Parse MOL file
const molContent =
4 3 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.2500 1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
2.2500 -1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 2 0 0 0 0
2 4 1 0 0 0 0
M END;
const molResult = parseMolfile(molContent);
console.log(generateSMILES(molResult.molecule!)); // "CC(=O)O"
// Generate MOL file from SMILES
const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O");
const molfile = generateMolfile(aspirin.molecules[0], { title: "aspirin" });
console.log(molfile); // Full MOL file with coordinates
// Parse SDF file
const sdfContent =
Mrv2311 02102409422D
3 2 0 0 0 0 999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.2500 1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
M END
>
MOL001
>
Ethanol
$$$$;`
const sdfResult = parseSDF(sdfContent);
console.log(sdfResult.records[0].molecule?.atoms.length); // 3
console.log(sdfResult.records[0].properties.NAME); // "Ethanol"
// Generate InChI from molecule
const inchi = await generateInChI(aspirin.molecules[0]);
console.log(inchi); // "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"
// Generate InChIKey
const inchikey = await generateInChIKey(inchi);
console.log(inchikey); // "BSYNRYMUTXBXSQ-UHFFFAOYSA-N"
``
`typescript
import { parseSMILES, computeMorganFingerprint, tanimotoSimilarity } from 'openchem';
// Generate fingerprints for similarity comparison
const aspirin = parseSMILES('CC(=O)Oc1ccccc1C(=O)O');
const ibuprofen = parseSMILES('CC(C)Cc1ccc(cc1)C(C)C(=O)O');
const fp1 = computeMorganFingerprint(aspirin.molecules[0], 2, 512);
const fp2 = computeMorganFingerprint(ibuprofen.molecules[0], 2, 512);
// Calculate structural similarity
const similarity = tanimotoSimilarity(fp1, fp2);
console.log(Similarity: ${(similarity * 100).toFixed(1)}%); // ~45.2%``
Extract core molecular scaffolds for drug discovery and compound classification:
`typescript
import { parseSMILES, getMurckoScaffold, getBemisMurckoFramework, generateSMILES } from "openchem";
// Extract scaffold (rings + linkers, remove side chains)
const ibuprofen = parseSMILES("CC(C)Cc1ccc(cc1)C(C)C(=O)O").molecules[0];
const scaffold = getMurckoScaffold(ibuprofen);
console.log(generateSMILES(scaffold)); // "c1ccc(cc1)" - benzene core
// Get generic framework (all atoms → carbon, all bonds → single)
const framework = getBemisMurckoFramework(ibuprofen);
console.log(generateSMILES(framework)); // "C1CCCCC1" - cyclohexane
// Compare scaffolds of similar drugs
import { haveSameScaffold } from "openchem";
const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O").molecules[0];
console.log(haveSameScaffold(ibuprofen, aspirin)); // true - both have benzene scaffold
`
Applications:
- Compound library classification
- Lead series identification
- Scaffold hopping strategies
- Fragment-based drug design
Enumerate and score tautomers (keto-enol, imine-enamine, amide-imidol, etc.) with RDKit-compatible scoring:
`typescript
import { parseSMILES, enumerateTautomers, generateSMILES } from "openchem";
// Enumerate tautomers for acetylacetone (pentane-2,4-dione)
const mol = parseSMILES("CC(=O)CC(=O)C").molecules[0];
const tautomers = enumerateTautomers(mol, { maxTautomers: 16 });
console.log(Found ${tautomers.length} tautomers:);${i + 1}. ${t.smiles} (score: ${t.score})
tautomers.forEach((t, i) => {
console.log();
});
// Get canonical tautomer (highest scoring)
import { canonicalTautomer } from "openchem";
const canonical = canonicalTautomer(mol);
console.log(Canonical: ${generateSMILES(canonical)});`
Supported tautomer types (26 rules, 100% RDKit coverage):
- 1,3 and 1,5 keto-enol (carbonyl ↔ enol, conjugated systems)
- Imine-enamine (C=N ↔ C-NH, including aromatic special cases)
- 1,5/1,7/1,9/1,11 aromatic heteroatom H shift (pyrrole, indole, large heterocycles)
- Furanone (lactone tautomerism in 5-membered rings)
- Amide-imidol (N-C=O ↔ N=C-OH)
- Lactam-lactim (cyclic amide ↔ cyclic imidate)
- Nitro-aci-nitro, nitroso-oxime, oxim/nitroso via phenol
- Thione-thiol (C=S ↔ C-SH)
- Guanidine, tetrazole, imidazole (heterocycle tautomerism)
- Phosphonic acid, sulfoxide (P/S heteroatom shifts)
- Edge cases: keten/ynol, cyano/isocyanic acid, formamidinesulfinic acid, isocyanide
Scoring system (RDKit-compatible):
- +250 per all-carbon aromatic ring (benzene)
- +100 per heteroaromatic ring (pyridine)
- +25 for benzoquinone patterns
- +4 for oximes (C=N-OH)
- +2 for carbonyls (C=O, N=O, P=O)
- -10 per formal charge
- -4 for aci-nitro forms
- -1 per hydrogen on P, S, Se, Te
Applications:
- Compound standardization for databases
- Virtual screening preparation
- pKa prediction support
- Tautomer-aware structure searching
`typescript
import { parseSMILES, renderSVG } from "openchem";
// Render molecule as SVG
const caffeine = parseSMILES("CN1C=NC2=C1C(=O)N(C(=O)N2C)C");
const svgResult = renderSVG(caffeine.molecules[0], {
width: 300,
height: 200,
showCarbonLabels: false,
bondLength: 30,
});
console.log(svgResult.svg); // Complete SVG markup
console.log(Canvas: ${svgResult.width}x${svgResult.height}); // "300x200"`
openchem has an extensive test suite (unit, integration, and RDKit comparison tests) that exercises parsing, generation, file round-trips, stereochemistry, aromatic perception, and molecular properties. Rather than rely on fragile hard-coded counts in the README, the project keeps comprehensive automated tests in the test/ folder and runs RDKit parity checks as part of the comparison test suite when RDKit is available.
Highlights:
- Broad unit and integration coverage across parsers, generators, utils, and validators
- RDKit comparison tests for canonical SMILES and round-trip fidelity (these run when RDKit is available in the test environment)
- Tests are designed to be self-contained and to skip RDKit-specific checks when RDKit isn't present in the environment
For maintainers: update and run the test suite with bun test. Use RUN_RDKIT_BULK=1 to enable the heavier RDKit bulk comparisons when you have RDKit available.
openchem maintains broad automated test coverage across unit, integration, and RDKit comparison tests. The test/ directory contains the authoritative suite; maintainers can run bun test locally and enable the heavier RDKit comparison runs with RUN_RDKIT_BULK=1 when RDKit is available. Tests are designed to validate parsing, generation, round-tripping, stereochemistry, aromatic perception, and molecular properties without requiring hard-coded counts in the README.
`bash
npm install openchem
bun add openchem
pnpm add openchem
`
For comprehensive working examples, see:
- docs/examples/comprehensive-example.ts — All major features (SMILES, properties, IUPAC, InChI, SVG, SMARTS, fingerprints)
- docs/examples/example-iupac.ts — IUPAC name generation and parsing (both directions)
- docs/examples/example-aromaticity.ts — Aromaticity perception using Hückel's rule
- docs/examples/example-drug-likeness.ts — Drug-likeness assessment (Lipinski, Veber, BBB)
- docs/examples/example-murcko-scaffolds.ts — Murcko scaffold extraction and analysis
- docs/examples/example-tautomers.ts — Tautomer enumeration and canonical selection
- docs/examples/example-sdf-export.ts — SDF file generation
Run any example:
`bash`
bun run docs/examples/comprehensive-example.ts
The repository contains two long-running RDKit comparison tests (the 10k SMILES suite and the bulk 300-SMILES suite). These tests are skipped by default to keep regular test runs fast.
To run them set the RUN_RDKIT_BULK environment variable:
`bash`Run heavy RDKit comparisons (rdkit-10k and rdkit-bulk)
RUN_RDKIT_BULK=1 bun test
Add RUN_VERBOSE=1 for more detailed RDKit reporting during the run.
`typescript
import { parseSMILES } from "openchem";
// Simple molecule
const ethanol = parseSMILES("CCO");
console.log(ethanol.molecules[0].atoms.length); // 3
// Check for errors
const result = parseSMILES("invalid");
if (result.errors.length > 0) {
console.error("Parse errors:", result.errors);
}
// Complex molecule with stereochemistry
const lAlanine = parseSMILES("CC@HC(=O)O");
const chiralCenter = lAlanine.molecules[0].atoms.find((a) => a.chiral);
console.log(chiralCenter?.chiral); // '@'
`
openchem provides comprehensive molecular property calculations for drug discovery and cheminformatics applications.
#### Basic Properties
`typescript
import { parseSMILES, getMolecularFormula, getMolecularMass, getExactMass } from "openchem";
const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O");
const mol = aspirin.molecules[0];
// Get molecular formula (Hill notation)
const formula = getMolecularFormula(mol);
console.log(formula); // "C9H8O4"
// Get molecular mass (average atomic masses)
const mass = getMolecularMass(mol);
console.log(mass); // 180.042
// Get exact mass (most abundant isotope)
const exactMass = getExactMass(mol);
console.log(exactMass); // 180.042
`
#### Atom Counts and Structure
`typescript
import {
parseSMILES,
getHeavyAtomCount,
getHeteroAtomCount,
getRingCount,
getAromaticRingCount,
getRingInfo,
} from "openchem";
const ibuprofen = parseSMILES("CC(C)Cc1ccc(cc1)C(C)C(=O)O");
const mol = ibuprofen.molecules[0];
// Count heavy atoms (non-hydrogen)
console.log(getHeavyAtomCount(mol)); // 13
// Count heteroatoms (N, O, S, P, halogens, etc.)
console.log(getHeteroAtomCount(mol)); // 2
// Count total rings
console.log(getRingCount(mol)); // 1
// Count aromatic rings
console.log(getAromaticRingCount(mol)); // 1
// Get comprehensive ring information
const ringInfo = getRingInfo(mol);
console.log(ringInfo.numRings()); // 1
console.log(ringInfo.rings()); // [[6,7,8,9,10,11]] - atom IDs in the ring
`
#### Drug-Likeness Properties
`typescript
import {
parseSMILES,
getFractionCSP3,
getHBondDonorCount,
getHBondAcceptorCount,
getTPSA,
} from "openchem";
const caffeine = parseSMILES("CN1C=NC2=C1C(=O)N(C(=O)N2C)C");
const mol = caffeine.molecules[0];
// Fraction of sp3 carbons (structural complexity)
console.log(getFractionCSP3(mol)); // 0.25
// H-bond donors (N-H, O-H)
console.log(getHBondDonorCount(mol)); // 0
// H-bond acceptors (N, O atoms)
console.log(getHBondAcceptorCount(mol)); // 6
// Topological polar surface area (Ų)
// Critical for predicting oral bioavailability and BBB penetration
console.log(getTPSA(mol)); // 61.82
`
#### TPSA for Drug Design
TPSA (Topological Polar Surface Area) is essential for predicting drug properties:
`typescript
import { parseSMILES, getTPSA } from "openchem";
// Oral bioavailability: TPSA < 140 Ų
const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O");
console.log(getTPSA(aspirin.molecules[0])); // 63.60 ✓ Good oral availability
// Blood-brain barrier penetration: TPSA < 90 Ų
const morphine = parseSMILES("CN1CC[C@]23[C@@H]4[C@H]1CC5=C2C(=C(C=C5)O)O[C@H]3C@HO");
console.log(getTPSA(morphine.molecules[0])); // 52.93 ✓ CNS-active
`
#### Drug-Likeness Rule Checkers
`typescript
import {
parseSMILES,
checkLipinskiRuleOfFive,
checkVeberRules,
checkBBBPenetration,
} from "openchem";
// Lipinski's Rule of Five (oral drug-likeness)
const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O");
const lipinski = checkLipinskiRuleOfFive(aspirin.molecules[0]);
console.log(lipinski.passes); // true
console.log(lipinski.properties);
// { molecularWeight: 180.04, hbondDonors: 1, hbondAcceptors: 4, logP: 1.31 }
// Veber Rules (oral bioavailability)
const veber = checkVeberRules(aspirin.molecules[0]);
console.log(veber.passes); // true
console.log(veber.properties);
// { rotatableBonds: 3, tpsa: 63.60 }
// Blood-brain barrier penetration prediction
const caffeine = parseSMILES("CN1C=NC2=C1C(=O)N(C(=O)N2C)C");
const bbb = checkBBBPenetration(caffeine.molecules[0]);
console.log(bbb.likelyPenetration); // true (TPSA: 61.82 < 90)
`
`typescript
import { parseSMILES, generateSMILES } from "openchem";
// Generate canonical SMILES (default)
const input = "CC(C)CC";
const parsed = parseSMILES(input);
const canonical = generateSMILES(parsed.molecules[0]);
console.log(canonical); // "CCC(C)C" - canonicalized
// Stereo normalization matches RDKit
const trans1 = parseSMILES("C\\C=C\\C"); // trans (down markers)
console.log(generateSMILES(trans1.molecules[0])); // "C/C=C/C" - normalized to up markers
const trans2 = parseSMILES("C/C=C/C"); // trans (up markers)
console.log(generateSMILES(trans2.molecules[0])); // "C/C=C/C" - already normalized
// Generate simple (non-canonical) SMILES
const simple = generateSMILES(parsed.molecules[0], false);
console.log(simple); // "CC(C)CC" - preserves input order
// Explicit canonical generation
const explicitCanonical = generateSMILES(parsed.molecules[0], true);
console.log(explicitCanonical); // "CCC(C)C"
// Handle multiple disconnected molecules
const mixture = parseSMILES("CCO.O"); // ethanol + water
const output = generateSMILES(mixture.molecules);
console.log(output); // "CCO.O"
`
Render molecules as 2D SVG structures with automatic coordinate generation. openchem provides deterministic layouts, fast performance, and excellent handling of rings, branches, and terminal atoms.
#### Basic SVG Rendering
`typescript
import { parseSMILES, renderSVG } from "openchem";
// Render from parsed molecule
const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O");
const result = renderSVG(aspirin.molecules[0]);
console.log(result.svg); // SVG string ready for display
console.log(result.width); // Canvas width
console.log(result.height); // Canvas height
// Or render directly from SMILES (if parsing is included)
const renderResult = renderSVG("CCO");
if (renderResult.errors.length === 0) {
console.log(renderResult.svg);
}
// Render multiple molecules in a grid
const molecules = [
parseSMILES("CC(=O)O").molecules[0],
parseSMILES("CCO").molecules[0],
parseSMILES("CC(C)C").molecules[0],
];
const gridResult = renderSVG(molecules);
console.log(gridResult.svg); // Multi-molecule grid
`
#### SVG Rendering Options
`typescript
import { parseSMILES, renderSVG } from "openchem";
import type { SVGRendererOptions } from "openchem";
const benzene = parseSMILES("c1ccccc1");
const mol = benzene.molecules[0];
const options: SVGRendererOptions = {
// Canvas sizing
width: 400,
height: 400,
padding: 20,
// Bond styling
bondLineWidth: 2,
bondLength: 40,
bondColor: "#000000",
// Atom & text styling
fontSize: 14,
fontFamily: "Arial, sans-serif",
showCarbonLabels: false, // Hide C labels for cleaner appearance
showImplicitHydrogens: false, // Hide implicit hydrogens
// Color mapping by element
atomColors: {
C: "#222222",
N: "#3050F8",
O: "#FF0D0D",
S: "#E6C200",
F: "#50FF50",
Cl: "#1FF01F",
Br: "#A62929",
I: "#940094",
},
// Background
backgroundColor: "#FFFFFF",
// Stereochemistry display
showStereoBonds: true,
// Layout & coordinate generation
kekulize: true, // Convert aromatic to alternating single/double bonds (default: true)
moleculeSpacing: 60, // Spacing between molecules in grid layouts
};
const result = renderSVG(mol, options);
console.log(result.svg); // Custom-styled SVG
`
#### Using Pre-computed Coordinates
`typescript
import { parseSMILES, renderSVG } from "openchem";
const ethanol = parseSMILES("CCO");
const mol = ethanol.molecules[0];
// Provide your own atom coordinates (useful for custom layouts)
const customCoords = [
{ x: 0, y: 0 }, // C
{ x: 40, y: 0 }, // C
{ x: 80, y: 0 }, // O
];
const result = renderSVG(mol, {
atomCoordinates: customCoords,
width: 200,
height: 100,
});
console.log(result.svg);
`
#### Coordinate Generation Features
openchem's coordinate generator provides:
- Deterministic layouts — Same molecule always produces same coordinates
- Fast performance — Optimized for speed and quality
- Perfect terminal atom placement — OH, NH₂, and other terminal groups extend radially
- Ring system detection — Automatically detects and regularizes 5/6-membered rings, fused rings, spiro, and bridged systems
- Zero atom overlaps — Intelligent substituent placement prevents collisions
- Publication-quality output — Clean, chemically accurate 2D structures
`typescript
import { parseSMILES, renderSVG } from "openchem";
// Complex fused ring system
const naphthalene = parseSMILES("c1ccc2ccccc2c1");
const result = renderSVG(naphthalene.molecules[0], {
width: 300,
height: 300,
bondLength: 35,
});
console.log(result.svg);
`
#### Error Handling
`typescript
import { renderSVG } from "openchem";
const result = renderSVG("C");
if (result.errors.length > 0) {
console.error("SVG rendering errors:", result.errors);
} else {
console.log(result.svg);
}
`
Match molecular patterns using SMARTS (SMILES Arbitrary Target Specification) notation.
`typescript
import { parseSMILES, parseSMARTS, matchSMARTS } from "openchem";
// Parse molecule and SMARTS pattern
const molecule = parseSMILES("CC(=O)Oc1ccccc1C(=O)O"); // aspirin
const pattern = parseSMARTS("[O;D1]"); // Single-bonded oxygen (carbonyl)
// Find matching atoms
const matches = matchSMARTS(molecule.molecules[0], pattern);
console.log(matches.length); // 2 (two carbonyl oxygens)
console.log(matches); // [[2], [7]] (atom indices)
// Example: Find aromatic rings
const aromaticRing = parseSMARTS("c1ccccc1"); // benzene pattern
const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O");
const ringMatches = matchSMARTS(aspirin.molecules[0], aromaticRing);
console.log(ringMatches.length); // 1 (one benzene ring)
// Example: Find carboxylic acid groups
const carboxylPattern = parseSMARTS("C[O;H1]"); // COOH
const matches2 = matchSMARTS(aspirin.molecules[0], carboxylPattern);
console.log(matches2.length); // 1 (one carboxylic acid)
// Example: Find all heteroatoms
const heteroPattern = parseSMARTS("[!C;!H]"); // Any non-carbon, non-hydrogen
const heteroMatches = matchSMARTS(aspirin.molecules[0], heteroPattern);
console.log(heteroMatches.length); // Number of heteroatoms
`
Convert aromatic molecules to alternating single/double bond representations (Kekulé structures).
`typescript
import { parseSMILES, kekulize, generateSMILES } from "openchem";
// Parse aromatic molecule
const benzene = parseSMILES("c1ccccc1");
const mol = benzene.molecules[0];
// Convert to Kekulé structure
const kekuleMol = kekulize(mol);
// Generate SMILES from Kekulé form
const kekuleSMILES = generateSMILES(kekuleMol);
console.log(kekuleSMILES); // "C1=CC=CC=C1" or similar alternating structure
// SVG rendering automatically kekulizes (unless disabled)
import { renderSVG } from "openchem";
const result = renderSVG(mol, {
kekulize: true, // default: true
});
// Rendered SVG shows alternating single/double bonds
`
Calculate LogP (partition coefficient) for predicting lipophilicity and membrane permeability.
`typescript
import { parseSMILES, computeLogP, crippenLogP } from "openchem";
const molecules = [
"CC(=O)Oc1ccccc1C(=O)O", // aspirin
"CC(C)Cc1ccc(cc1)C(C)C(=O)O", // ibuprofen
"CC(=O)Nc1ccc(O)cc1", // acetaminophen
];
molecules.forEach((smiles) => {
const mol = parseSMILES(smiles).molecules[0];
// Wildman-Crippen method (more accurate)
const logP = computeLogP(mol);
console.log(${smiles.substring(0, 10)}... LogP: ${logP.toFixed(2)});
// Alternative: crippenLogP (alias)
const logP2 = crippenLogP(mol);
console.log( Crippen LogP: ${logP2.toFixed(2)});
});
// LogP guidelines for drug design
const caffeine = parseSMILES("CN1C=NC2=C1C(=O)N(C(=O)N2C)C");
const caffeineMol = caffeine.molecules[0];
const logpValue = computeLogP(caffeineMol);
console.log(Caffeine LogP: ${logpValue.toFixed(2)});`
if (logpValue > 5) {
console.log("⚠️ High LogP - may have poor water solubility");
} else if (logpValue < 0) {
console.log("✓ Good LogP - hydrophilic, good bioavailability");
} else {
console.log("✓ Optimal LogP - good balance of lipophilicity and hydrophilicity");
}
`typescript
import { parseSMILES } from "openchem";
import { BondType } from "openchem";
const result = parseSMILES("C=C");
const mol = result.molecules[0];
// Access atoms
mol.atoms.forEach((atom) => {
console.log(${atom.symbol} (id: ${atom.id})); Aromatic: ${atom.aromatic}
console.log(); Charge: ${atom.charge}
console.log(); Hydrogens: ${atom.hydrogens}
console.log();
});
// Access bonds
mol.bonds.forEach((bond) => {
console.log(Bond ${bond.atom1}-${bond.atom2}); Type: ${bond.type === BondType.DOUBLE ? "DOUBLE" : "SINGLE"}
console.log();`
});
`bashRun all tests (includes RDKit comparisons)
bun test
Note: RDKit comparison tests require
@rdkit/rdkit package. Tests will automatically skip RDKit validations if the package is unavailable. For full validation, ensure you're running tests with Node.js (RDKit's WebAssembly may not work in all Bun versions).API Reference
$3
openchem provides 38 functions organized into 8 categories:
Parsing & Generation (8)
-
parseSMILES - Parse SMILES strings
- generateSMILES - Generate canonical/non-canonical SMILES
- parseMolfile - Parse MOL files (V2000/V3000)
- generateMolfile - Generate MOL files (V2000)
- parseSDF - Parse SDF files with properties
- writeSDF - Write SDF files with properties
- generateInChI - Generate InChI strings from molecules
- generateInChIKey - Generate InChIKey strings from moleculesPattern Matching & Rendering (6)
-
renderSVG - Render molecules as 2D SVG structures
- parseSMARTS - Parse SMARTS pattern strings
- matchSMARTS - Find SMARTS pattern matches in molecules
- kekulize - Convert aromatic to Kekulé structures
- computeMorganFingerprint - Generate Morgan fingerprints from molecules
- tanimotoSimilarity - Calculate Tanimoto similarity between fingerprintsScaffold Analysis (5)
-
getMurckoScaffold - Extract Murcko scaffold (rings + linkers)
- getBemisMurckoFramework - Generic scaffold (all C, single bonds)
- getScaffoldTree - Hierarchical scaffold decomposition
- getGraphFramework - Pure topology (all atoms → wildcard)
- haveSameScaffold - Compare two molecules' scaffoldsTautomer Analysis (2)
-
enumerateTautomers - Generate all tautomers with RDKit scoring
- canonicalTautomer - Select highest-scoring canonical tautomerBasic Properties (3)
-
getMolecularFormula - Hill notation formula
- getMolecularMass - Average molecular mass
- getExactMass - Exact mass (monoisotopic)Lipophilicity (3)
-
computeLogP - Wildman-Crippen partition coefficient
- crippenLogP - Alias for computeLogP
- logP - Alternative LogP calculationStructural Properties (8)
-
getHeavyAtomCount - Non-hydrogen atom count
- getHeteroAtomCount - Heteroatom count (N, O, S, etc.)
- getRingCount - Total ring count
- getAromaticRingCount - Aromatic ring count
- getRingInfo - Comprehensive ring information object
- getFractionCSP3 - sp³ carbon fraction
- getHBondDonorCount - H-bond donor count
- getHBondAcceptorCount - H-bond acceptor countDrug-Likeness (5)
-
getTPSA - Topological polar surface area
- getRotatableBondCount - Rotatable bond count
- checkLipinskiRuleOfFive - Lipinski's Rule of Five
- checkVeberRules - Veber rules for bioavailability
- checkBBBPenetration - Blood-brain barrier prediction---
$3
#### Parsing & Generation (6 functions)
#####
parseSMILES(smiles: string): ParseResultParses a SMILES string into molecule structures.
Returns:
ParseResult containing:-
molecules: Molecule[] — Array of parsed molecules
- errors: string[] — Parse/validation errors (empty if successful)#####
generateSMILES(input: Molecule | Molecule[], canonical?: boolean): stringGenerates SMILES from molecule structure(s).
Parameters:
-
input — Single molecule or array of molecules
- canonical — Generate canonical SMILES (default: true)Returns: SMILES string (uses
. to separate disconnected molecules)Canonical SMILES features:
- RDKit-compatible atom ordering using modified Morgan algorithm
- Automatic E/Z double bond stereo normalization
- Deterministic output for identical molecules
- Preserves tetrahedral and double bond stereochemistry
#####
generateMolfile(molecule: Molecule, options?: MolGeneratorOptions): stringGenerates a MOL file (V2000 format) from a molecule structure. Matches RDKit's output structure for compatibility with cheminformatics tools.
Parameters:
-
molecule — Molecule structure to convert
- options — Optional configuration:
- title?: string — Molecule title (default: empty)
- programName?: string — Program name in header (default: "openchem")
- dimensionality?: '2D' | '3D' — Coordinate system (default: "2D")
- comment?: string — Comment line (default: empty)Returns: MOL file content as string with V2000 format
Features:
- V2000 MOL format compatible with RDKit and other tools
- 2D coordinate generation using circular layout
- Proper atom/bond type mapping (aromatic, charged, isotopic)
- Stereochemistry support (chiral centers, E/Z double bonds)
- Fixed-width formatting matching RDKit output
Example:
`typescript
import { parseSMILES, generateMolfile } from "openchem";const result = parseSMILES("CCO");
const molfile = generateMolfile(result.molecules[0]);
console.log(molfile);
// Output: MOL file with header, atom coordinates, bond connectivity, etc.
`#####
parseMolfile(input: string): MolfileParseResultParses a MOL file (MDL Molfile format) into a molecule structure. Supports both V2000 and V3000 formats with comprehensive validation.
Parameters:
-
input — MOL file content as a stringReturns:
MolfileParseResult containing:-
molfile: MolfileData | null — Raw MOL file data structure (or null on critical errors)
- molecule: Molecule | null — Parsed molecule with enriched properties (or null on errors)
- errors: ParseError[] — Array of parse/validation errors (empty if successful)Supported formats:
- V2000: Classic fixed-width format (most common)
- V3000: Extended format with additional features
Validation features:
- Validates atom/bond counts match declared values
- Checks bond references point to valid atoms
- Validates numeric fields (coordinates, counts, bond types)
- Detects malformed data (NaN, negative counts, invalid types)
- Returns errors without throwing exceptions
Parsed features:
- Atom coordinates (2D/3D)
- Element symbols (organic and periodic table)
- Charges (both atom block and M CHG property)
- Isotopes (both mass diff and M ISO property)
- Bond types (single, double, triple, aromatic)
- Stereochemistry (bond wedges, chiral centers)
- Atom mapping (reaction mapping)
Limitations:
- SGroups are parsed but not converted to molecule structure
- Query atoms/bonds not supported
Example:
`typescript
import { parseMolfile, generateSMILES } from "openchem";const molContent =
3 2 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.2500 1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
M END;
const result = parseMolfile(molContent);
if (result.errors.length === 0) {
console.log(result.molecule?.atoms.length); // 3
console.log(result.molecule?.bonds.length); // 2
// Convert to SMILES
const smiles = generateSMILES(result.molecule!);
console.log(smiles); // "CCO"
}
// Error handling
const invalid = parseMolfile("invalid content");
if (invalid.errors.length > 0) {
console.error("Parse errors:", invalid.errors);
}
`
Round-trip workflow:
`typescript
import { parseSMILES, generateMolfile, parseMolfile, generateSMILES } from "openchem";
// SMILES → MOL → SMILES round-trip
const original = "CC(=O)O"; // acetic acid
const mol = parseSMILES(original).molecules[0];
const molfile = generateMolfile(mol);
const parsed = parseMolfile(molfile);
const roundtrip = generateSMILES(parsed.molecule!);
console.log(roundtrip); // "CC(=O)O"
`
##### parseSDF(input: string): SDFParseResult
Parses an SDF (Structure-Data File) into molecule structures with associated properties. SDF files can contain multiple molecules, each with a MOL block and optional property fields.
Parameters:
- input — SDF file content as a string
Returns: SDFParseResult containing:
- records: SDFRecord[] — Array of parsed recordserrors: ParseError[]
- — Global parse errors (empty if successful)
Record structure (SDFRecord):
- molecule: Molecule | null — Parsed molecule (null on parse errors)molfile: MolfileData | null
- — Raw MOL file data (null on parse errors)properties: Record
- — Property name-value pairserrors: ParseError[]
- — Record-specific errors (empty if successful)
Features:
- Multi-record parsing (splits on $$$$ delimiter)>
- Property block parsing ( format)
- Multi-line property values with blank line handling
- Empty property names and values
- Windows (CRLF) and Unix (LF) line endings
- Tolerant parsing: continues after invalid records
Example (single record):
`typescript
import { parseSDF, generateSMILES } from "openchem";
const sdfContent =
Mrv2311 02102409422D
3 2 0 0 0 0 999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.2500 1.2990 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
M END
>
MOL001
>
Ethanol
>
C2H6O
$$$$;
const result = parseSDF(sdfContent);
if (result.errors.length === 0) {
const record = result.records[0];
console.log(record.molecule?.atoms.length); // 3
console.log(record.properties.ID); // "MOL001"
console.log(record.properties.NAME); // "Ethanol"
console.log(record.properties.FORMULA); // "C2H6O"
// Convert to SMILES
const smiles = generateSMILES(record.molecule!);
console.log(smiles); // "CCO"
}
// Error handling
if (result.records[0].errors.length > 0) {
console.error("Record errors:", result.records[0].errors);
}
`
Example (multiple records):
`typescript
import { parseSDF } from "openchem";
const multiRecordSDF =
Mrv2311 02102409422D
1 0 0 0 0 0 999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
M END
>
1
>
Methane
$$$$
Mrv2311 02102409422D
2 1 0 0 0 0 999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
M END
>
2
>
Ethane
$$$$;
const result = parseSDF(multiRecordSDF);
console.log(result.records.length); // 2
console.log(result.records[0].properties.NAME); // "Methane"
console.log(result.records[1].properties.NAME); // "Ethane"
`
Round-trip workflow:
`typescript
import { parseSMILES, writeSDF, parseSDF, generateSMILES } from "openchem";
// SMILES → SDF → SMILES round-trip
const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O").molecules[0];
const sdfResult = writeSDF({
molecule: aspirin,
properties: { NAME: "aspirin", FORMULA: "C9H8O4" },
});
const parsed = parseSDF(sdfResult.sdf);
const roundtrip = generateSMILES(parsed.records[0].molecule!);
console.log(roundtrip); // "CC(=O)Oc1ccccc1C(=O)O"
console.log(parsed.records[0].properties.NAME); // "aspirin"
`
##### generateInChI(molecule: Molecule): Promise
Generates an InChI (International Chemical Identifier) string from a molecule structure. InChI provides a unique, canonical representation of chemical structures that can be used for database lookups and structure comparison.
Returns: Promise resolving to InChI string
Example:
`typescript
import { parseSMILES, generateInChI } from "openchem";
const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O");
const inchi = await generateInChI(aspirin.molecules[0]);
console.log(inchi); // "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"
`
##### generateInChIKey(inchi: string): Promise
Generates an InChIKey (a hashed, fixed-length version of InChI) from an InChI string. InChIKeys are commonly used for database indexing and fast lookups.
Parameters:
- inchi — InChI string to convert
Returns: Promise resolving to InChIKey string (27 characters)
Example:
`typescript`
const inchikey = await generateInChIKey(inchi);
console.log(inchikey); // "BSYNRYMUTXBXSQ-UHFFFAOYSA-N"
##### writeSDF(records: SDFRecord | SDFRecord[], options?: SDFWriterOptions): SDFWriterResult
Writes molecules to SDF (Structure-Data File) format. Supports single or multiple records with optional property data. SDF files are commonly used for storing chemical databases and transferring molecular data between cheminformatics tools.
Parameters:
- records — Single record or array of records to writeoptions
- — Optional configuration (same as MolGeneratorOptions):title?: string
- — Default title for records (default: empty)programName?: string
- — Program name in headers (default: "openchem")dimensionality?: '2D' | '3D'
- — Coordinate system (default: "2D")comment?: string
- — Default comment (default: empty)
Returns: SDFWriterResult containing:
- sdf: string — Complete SDF file contenterrors: string[]
- — Any errors encountered (empty if successful)
Record format:
`typescript`
interface SDFRecord {
molecule: Molecule;
properties?: Record
}
SDF structure:
- MOL block (V2000 format) for each molecule
- Property fields (> , value, blank line)$$$$
- Record separator ()
Example (single molecule):
`typescript
import { parseSMILES, writeSDF } from "openchem";
const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O");
const result = writeSDF({
molecule: aspirin.molecules[0],
properties: {
NAME: "aspirin",
MOLECULAR_FORMULA: "C9H8O4",
MOLECULAR_WEIGHT: 180.042,
},
});
console.log(result.sdf);
// Output: SDF file with MOL block + properties + $$$$
`
Example (multiple molecules):
`typescript
import { parseSMILES, writeSDF } from "openchem";
const drugs = [
{ smiles: "CC(=O)Oc1ccccc1C(=O)O", name: "aspirin" },
{ smiles: "CC(C)Cc1ccc(cc1)C(C)C(=O)O", name: "ibuprofen" },
{ smiles: "CC(=O)Nc1ccc(O)cc1", name: "acetaminophen" },
];
const records = drugs.map((drug) => {
const mol = parseSMILES(drug.smiles).molecules[0];
return {
molecule: mol,
properties: {
NAME: drug.name,
SMILES: drug.smiles,
},
};
});
const result = writeSDF(records, { programName: "my-drug-tool" });
console.log(result.sdf);
// Output: Multi-record SDF with all 3 molecules
`
Property formatting:
- Strings: Written as-is
- Numbers: Converted to strings
- Booleans: "true" or "false"
- Property names are case-sensitive
Compatibility:
- Output compatible with RDKit, OpenBabel, ChemDraw, and other tools
- Standard SDF format (V2000 MOL blocks)
- Properties follow MDL SDF specification
---
#### Pattern Matching & Rendering (4 functions)
##### renderSVG(input: string | Molecule | Molecule[] | ParseResult, options?: SVGRendererOptions): SVGRenderResult
Renders molecules as 2D SVG structures with automatic coordinate generation using webcola collision prevention.
Parameters:
- input — SMILES string, single molecule, array of molecules, or ParseResultoptions
- — Optional rendering configuration (see SVGRendererOptions below)
Returns: SVGRenderResult containing:
- svg: string — SVG markup ready for displaywidth: number
- — Canvas width in pixelsheight: number
- — Canvas height in pixelserrors: string[]
- — Any rendering errors (empty if successful)
SVGRendererOptions:
- width?: number — Canvas width (default: 300)height?: number
- — Canvas height (default: 300)bondLineWidth?: number
- — Bond line thickness (default: 2)bondLength?: number
- — Target bond length in pixels (default: 40)fontSize?: number
- — Atom label font size (default: 12)fontFamily?: string
- — Font family (default: "Arial, sans-serif")padding?: number
- — Canvas padding (default: 20)showCarbonLabels?: boolean
- — Show C atom labels (default: false)showImplicitHydrogens?: boolean
- — Show implicit hydrogens (default: false)kekulize?: boolean
- — Convert aromatic to Kekulé (default: true)atomColors?: Record
- — Element-specific colorsbackgroundColor?: string
- — Background color (default: "#FFFFFF")bondColor?: string
- — Bond color (default: "#000000")showStereoBonds?: boolean
- — Show wedge/hash bonds (default: true)atomCoordinates?: AtomCoordinates[]
- — Pre-computed coordinateswebcolaIterations?: number
- — Collision prevention iterations (default: 100)deterministicChainPlacement?: boolean
- — Deterministic layouts (default: false)moleculeSpacing?: number
- — Space between molecules in grid (default: 60)
Features:
- Automatic 2D coordinate generation with collision prevention
- Ring regularization for 5 and 6-membered rings
- Fused ring system handling
- Stereochemistry display (wedge/hash bonds)
- Element-specific atom coloring
- Publication-quality output
##### parseSMARTS(smarts: string): ParseResult
Parses a SMARTS pattern string into a pattern molecule structure.
Returns: ParseResult containing:
- molecules: Molecule[] — Array with pattern moleculeerrors: string[]
- — Parse errors (empty if successful)
SMARTS support:
- Logical operators: ! (not), & (and), , (or)[D1]
- Atom properties: (degree), [H1] (explicit H), [v3] (valence)[#6X4]
- Connectivity: (carbon with degree 4)[c]
- Aromatic matching: or [a] (aromatic carbon)
##### matchSMARTS(molecule: Molecule, pattern: ParseResult): number[][]
Finds all matches of a SMARTS pattern in a molecule.
Parameters:
- molecule — Target molecule to searchpattern
- — SMARTS pattern (from parseSMARTS())
Returns: Array of matches, where each match is an array of atom indices
Example:
`typescript
import { parseSMILES, parseSMARTS, matchSMARTS } from "openchem";
const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O").molecules[0];
const carbonyl = parseSMARTS("C").molecules[0];
const matches = matchSMARTS(aspirin, carbonyl);
// matches: [[1, 2], [7, 8]] (two carbonyl groups)
`
##### kekulize(molecule: Molecule): Molecule
Converts aromatic molecules to alternating single/double bond (Kekulé) representation.
Returns: New molecule with aromatic bonds replaced by alternating single/double bonds
Example:
`typescript
import { parseSMILES, kekulize, generateSMILES } from "openchem";
const benzene = parseSMILES("c1ccccc1");
const kek = kekulize(benzene.molecules[0]);
console.log(generateSMILES(kek)); // "C1=CC=CC=C1"
`
##### computeMorganFingerprint(molecule: Molecule, radius?: number, fpSize?: number): Uint8Array
Generates a Morgan fingerprint (ECFP-like) for molecular similarity searching and compound classification. Uses a modified Morgan algorithm with atom typing and circular neighborhoods.
Parameters:
- molecule — Molecule to fingerprintradius
- — Fingerprint radius (default: 2, equivalent to ECFP4)fpSize
- — Fingerprint size in bits (default: 2048, RDKit standard)
Returns: Uint8Array containing the fingerprint bits
Example:
`typescript
import { parseSMILES, computeMorganFingerprint } from "openchem";
const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O");
const fingerprint = computeMorganFingerprint(aspirin.molecules[0], 2, 512);
console.log(fingerprint.length); // 64 (512 bits / 8 bytes)
`
##### tanimotoSimilarity(fp1: Uint8Array, fp2: Uint8Array): number
Calculates the Tanimoto similarity coefficient between two Morgan fingerprints. Measures structural similarity on a scale from 0 (no similarity) to 1 (identical).
Parameters:
- fp1 — First fingerprintfp2
- — Second fingerprint
Returns: Similarity score between 0 and 1
Example:
`typescriptSimilarity: ${(similarity * 100).toFixed(1)}%
const similarity = tanimotoSimilarity(fingerprint1, fingerprint2);
console.log();`
---
#### Scaffold Analysis (5 functions)
##### getMurckoScaffold(molecule: Molecule, options?: MurckoOptions): Molecule
Extracts the Murcko scaffold from a molecule — the core ring systems and linkers connecting them, with all terminal side chains removed. This is the standard scaffold used in medicinal chemistry for compound classification.
Parameters:
- molecule — Molecule to analyzeoptions.includeLinkers
- — Include linker atoms between rings (default: true)
Returns: New Molecule containing only the scaffold
Example:
`typescript
import { parseSMILES, getMurckoScaffold, generateSMILES } from "openchem";
const ibuprofen = parseSMILES("CC(C)Cc1ccc(cc1)C(C)C(=O)O").molecules[0];
const scaffold = getMurckoScaffold(ibuprofen);
console.log(generateSMILES(scaffold)); // "c1ccccc1" - benzene core
`
##### getBemisMurckoFramework(molecule: Molecule): Molecule
Generates a generic Bemis-Murcko framework — the scaffold with all atoms converted to carbon and all bonds converted to single bonds. Useful for identifying compounds with similar topology but different heteroatom patterns.
Returns: New Molecule with generic framework
Example:
`typescript
import { parseSMILES, getBemisMurckoFramework, generateSMILES } from "openchem";
const pyridine = parseSMILES("c1ccncc1").molecules[0];
const framework = getBemisMurckoFramework(pyridine);
console.log(generateSMILES(framework)); // "C1CCCCC1" - cyclohexane
`
##### getScaffoldTree(molecule: Molecule): Molecule[]
Generates a hierarchical scaffold tree by iteratively removing rings from the Murcko scaffold. Returns scaffolds ordered from most specific (full scaffold) to least specific (single ring).
Returns: Array of Molecule objects representing scaffolds at different levels
Example:
`typescript
import { parseSMILES, getScaffoldTree, generateSMILES } from "openchem";
const mol = parseSMILES("c1ccc2ccccc2c1").molecules[0]; // Naphthalene
const tree = getScaffoldTree(mol);
console.log(tree.length); // 2 levels: full naphthalene, then single benzene
tree.forEach((scaffold, idx) => {
console.log(Level ${idx}: ${generateSMILES(scaffold)});`
});
##### getGraphFramework(molecule: Molecule): Molecule
Generates a pure topological framework with all atoms converted to wildcard atoms (*). This represents the molecular graph structure without any atom type information.
Returns: New Molecule with graph framework
Example:
`typescript
import { parseSMILES, getGraphFramework, generateSMILES } from "openchem";
const caffeine = parseSMILES("CN1C=NC2=C1C(=O)N(C(=O)N2C)C").molecules[0];
const graph = getGraphFramework(caffeine);
console.log(generateSMILES(graph)); // "1=*2=1()()2" - pure topology
`
##### haveSameScaffold(mol1: Molecule, mol2: Molecule): boolean
Compares two molecules to determine if they share the same Murcko scaffold. Useful for compound series analysis and lead identification.
Returns: true if scaffolds match, false otherwise
Example:
`typescript
import { parseSMILES, haveSameScaffold } from "openchem";
const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O").molecules[0];
const ibuprofen = parseSMILES("CC(C)Cc1ccc(cc1)C(C)C(=O)O").molecules[0];
console.log(haveSameScaffold(aspirin, ibuprofen)); // true - both benzene scaffold
`
---
#### Tautomer Analysis (2 functions)
##### enumerateTautomers(molecule: Molecule, options?: TautomerOptions): TautomerResult[]
Enumerates all tautomers for a molecule using transform-based enumeration with RDKit-compatible scoring.
Options:
- maxTautomers?: number — Maximum tautomers to generate (default: 256)maxTransforms?: number
- — Maximum transform operations (default: 1024)phases?: number[]
- — Rule phases to apply (default: [1, 2, 3])useFingerprintDedup?: boolean
- — Use fingerprint deduplication (default: true)
Returns: Array of TautomerResult objects with:
- smiles: string — SMILES representationmolecule: Molecule
- — Molecule objectscore: number
- — Stability score (higher = more stable)ruleIds: string[]
- — Applied transformation rules
Scoring system (RDKit-inspired):
- +250 per all-carbon aromatic ring
- +100 per heteroaromatic ring
- +25 for benzoquinone
- +4 for oximes (C=N-OH)
- +2 for carbonyls (C=O, N=O, P=O)
- -10 per formal charge
- -4 for aci-nitro
- -1 per H on P, S, Se, Te
Example:
`typescript
import { parseSMILES, enumerateTautomers } from "openchem";
const mol = parseSMILES("CC(=O)CC(=O)C").molecules[0]; // acetylacetone
const tautomers = enumerateTautomers(mol, { maxTautomers: 16 });
console.log(Found ${tautomers.length} tautomers:);${i + 1}. ${t.smiles} (score: ${t.score})
tautomers.forEach((t, i) => {
console.log();`
});
// 1. CC(=O)CC(=O)C (score: 4) - diketo form
// 2. CC(=O)C=C(C)O (score: 2) - monoenol form
// 3. CC(O)=CC(=O)C (score: 2) - monoenol form
##### canonicalTautomer(molecule: Molecule): Molecule
Selects the canonical (most stable) tautomer based on scoring.
Returns: The highest-scoring tautomer as a Molecule
Example:
`typescript
import { parseSMILES, canonicalTautomer, generateSMILES } from "openchem";
const mol = parseSMILES("CC(=O)CC(=O)C").molecules[0];
const canonical = canonicalTautomer(mol);
console.log(generateSMILES(canonical)); // "CC(=O)CC(=O)C" - diketo form preferred
`
---
#### Lipophilicity (3 functions)
##### computeLogP(molecule: Molecule): number
Calculates the LogP (partition coefficient) using the Wildman-Crippen method. LogP predicts lipophilicity and membrane permeability.
Returns: LogP value as a number
Interpretation:
- LogP < 0: Hydrophilic (water-loving)
- 0 ≤ LogP ≤ 5: Optimal range for most drugs
- LogP > 5: Lipophilic (fat-loving), may have poor water solubility
Example:
`typescript
import { parseSMILES, computeLogP } from "openchem";
const aspirin = parseSMILES("CC(=O)Oc1ccccc1C(=O)O").molecules[0];
console.log(computeLogP(aspirin)); // 1.31 (good bioavailability)
`
##### crippenLogP(molecule: Molecule): number
Alias for computeLogP(). Alternative name for the Wildman-Crippen LogP calculation.
##### logP(molecule: Molecule): number
Alternative LogP calculation method. May use different fragment contributions than Crippen.
---
#### Basic Properties (3 functions)
##### getMolecularFormula(molecule: Molecule): string
Returns the molecular formula in Hill notation (C first, then H, then alphabetical).
Example: C9H8O4 for aspirin
##### getMolecularMass(molecule: Molecule): number
Returns the molecular mass using average atomic masses from the periodic table.
Example: 180.042 for aspirin
##### getExactMass(molecule: Molecule): number
Returns the exact mass using the most abundant isotope for each element.
Example: 180.042 for aspirin
---
#### Structural Properties (7 functions)
##### getHeavyAtomCount(molecule: Molecule): number
Returns the count of non-hydrogen atoms.
Example: 13 for ibuprofen
##### getHeteroAtomCount(molecule: Molecule): number
Returns the count of heteroatoms (any atom except C and H). Includes N, O, S, P, halogens, etc.
Example: 2 for aspirin (2 oxygen atoms in COOH group)
##### getRingCount(molecule: Molecule): number
Returns the total number of rings in the molecule using cycle detection.
Example: 2 for naphthalene (2 fused rings)
##### getAromaticRingCount(molecule: Molecule): number
Returns the number of aromatic rings.
Example: 1 for benzene, 2 for naphthalene
##### getRingInfo(molecule: Molecule): RingInformation
Returns a comprehensive ring information object providing access to SSSR (Smallest Set of Smallest Rings) and ring membership queries. Similar to RDKit's GetRingInfo() functionality.
Methods:
- numRings() - Number of rings in SSSRrings()
- - Array of rings (each ring is atom ID array)isAtomInRing(atomIdx)
- - Check if atom is in any ringisBondInRing(atom1, atom2)
- - Check if bond is in any ringatomRingMembership(atomIdx)
- - Ring membership count for atom ([Rn] in SMARTS)atomRings(atomIdx)
- - All rings containing specific atomringAtoms(ringIdx)
- - Atoms in specific ringringBonds(ringIdx)
- - Bonds in specific ring
Example:
`typescript`
const ringInfo = getRingInfo(mol);
console.log(ringInfo.numRings()); // 2
console.log(ringInfo.isAtomInRing(5)); // true
console.log(ringInfo.atomRingMembership(3)); // 2 (bridgehead atom)
##### getFractionCSP3(molecule: Molecule): number
Returns the fraction of sp³-hybridized carbons (saturated carbons) relative to total carbons. Higher values indicate greater structural complexity and 3D character. Range: 0.0 to 1.0.
Example: 0.25 for caffeine, 0.67 for ibuprofen
##### getHBondDonorCount(molecule: Molecule): number
Returns the count of hydrogen bond donors (N-H and O-H groups).
Example: 1 for aspirin (carboxylic acid O-H), 0 for caffeine
##### getHBondAcceptorCount(molecule: Molecule): number
Returns the count of hydrogen bond acceptors (N and O atoms).
Example: 4 for aspirin, 6 for caffeine
---
#### Drug-Likeness Properties (5 functions)
##### getTPSA(molecule: Molecule): number
Returns the Topological Polar Surface Area in Ų (square Ångströms) using the Ertl et al. fragment-based algorithm. TPSA is a key descriptor for predicting drug absorption and bioavailability.
Guidelines:
- TPSA < 140 Ų: Good oral bioavailability
- TPSA < 90 Ų: Likely blood-brain barrier penetration
- TPSA > 140 Ų: Poor membrane permeability
Example: 63.60 for aspirin (good oral availability), 52.93 for morphine (CNS-active)
##### getRotatableBondCount(molecule: Molecule): number
Returns the count of rotatable bonds (single non-ring bonds between non-terminal heavy atoms). Used in Veber rules for predicting oral bioavailability.
Example: 3 for aspirin, 4 for ibuprofen
##### checkLipinskiRuleOfFive(molecule: Molecule): LipinskiResult
Evaluates Lipinski's Rule of Five for oral drug-likeness. Returns result object with:
- passes: boolean indicating if all rules passviolations
- : array of violation messagesproperties
- : { molecularWeight, hbondDonors, hbondAcceptors, logP }
Rules:
- Molecular weight ≤ 500 Da
- H-bond donors ≤ 5
- H-bond acceptors ≤ 10
- LogP ≤ 5
##### checkVeberRules(molecule: Molecule): VeberResult
Evaluates Veber rules for oral bioavailability. Returns result object with:
- passes: boolean indicating if all rules passviolations
- : array of violation messagesproperties
- : { rotatableBonds, tpsa }
Rules:
- Rotatable bonds ≤ 10
- TPSA ≤ 140 Ų
##### checkBBBPenetration(molecule: Molecule): BBBResult
Predicts blood-brain barrier penetration. Returns result object with:
- likelyPenetration: boolean (true if TPSA < 90 Ų)tpsa
- : TPSA value
---
`typescript
interface Molecule {
atoms: Atom[];
bonds: Bond[];
}
interface Atom {
id: number;
symbol: string;
aromatic: boolean;
hydrogens: number;
charge: number;
isotope: number | null;
chiral: string | null;
atomClass: number | null;
isBracket: boolean;
atomicNumber: number;
}
interface Bond {
atom1: number;
atom2: number;
type: BondType;
stereo: StereoType;
}
enum BondType {
SINGLE = 1,
DOUBLE = 2,
TRIPLE = 3,
QUADRUPLE = 4,
AROMATIC = 5,
}
`
openchem is designed for production use with real-world performance:
- Parsing: ~1-10ms per molecule (depending on complexity)
- Generation: ~1-5ms per molecule
- Memory: Minimal overhead, compact AST representation
- Zero dependencies: No external runtime dependencies
Benchmark with 325 diverse molecules including commercial drugs: Average parse + generate round-trip < 5ms
openchem uses a post-processing enrichment system that pre-computes expensive molecular properties during parsing. This design significantly improves performance for downstream property queries while maintaining code simplicity.
#### Why Pre-compute Properties?
Molecular property calculations like ring finding, hybridization determination, and rotatable bond classification are computationally expensive (O(n²) complexity). Without pre-computation:
1. Redundant calculations: Ring finding would run every time you query ring count, aromatic rings, or check if atoms/bonds are in rings
2. Performance penalty: Property queries would dominate runtime, especially for drug-likeness checks that need multiple properties
3. Code complexity: Every property function would need to duplicate expensive logic
The Solution: Compute once during parsing, cache results, use everywhere.
#### Key Components
- types.ts — Extended with optional cached properties on Atom, Bond, and Molecule interfacessrc/utils/molecule-enrichment.ts
- — Post-processing module that enriches molecules after parsingsrc/parser.ts
- — Calls enrichMolecule() after validation phase at line 451src/utils/molecular-properties.ts
- — Uses cached properties when available, falls back to computation
#### Cached Properties
- Atom: degree (neighbor count), isInRing, ringIds[], hybridization (sp/sp²/sp³)isInRing
- Bond: , ringIds[], isRotatablerings[][]
- Molecule: (all rings as atom IDs), ringInfo (lookup maps)
#### Performance Impact
Benchmark Results (10,000 molecules, 7 properties each):
- Parse time: 1.22 ms/molecule (includes enrichment)
- Property query time: 0.006 ms/molecule (0.5% of parse time)
- Rotatable bond queries: ~3.1 million ops/second (simple array filter vs 47-line calculation)
Complexity Improvements:
- Ring finding: Once per molecule (O(n²)) → subsequent queries O(1)
- Rotatable bonds: O(n×m) nested loops → O(n) array filter
- Property queries: 200× faster on average
#### Immutability Contract
Important: Molecules are immutable after parsing. All enriched properties remain valid for the lifetime of the molecule object. This design:
- Prevents stale cached properties (no mutation = no invalidation needed)
- Enables safe sharing across threads/workers
- Simplifies reasoning about molecule state
If you need to modify a molecule, create a new one by parsing updated SMILES.
#### Design Notes
- Ring analysis (analyzeRings()) runs only during enrichment?:
- Downstream property functions check cached values first, fall back to computation if missing
- Backward compatible: cached properties are optional () with defensive fallbacks
- New code should always use cached properties when available
openchem handles 100% of tested SMILES correctly (325/325 in bulk validation).
Key implementation details:
- Stereo normalization: Trans alkenes are automatically normalized to use / (up) markers on both ends to match RDKit's canonical form. For example, C\C=C\C and C/C=C/C both represent trans configuration and canonicalize to C/C=C/C.
- Canonical ordering: Atoms are ordered using a modified Morgan algorithm matching RDKit's approach, with tie-breaking by atomic number, degree, and other properties.
- Aromatic validation: Aromatic notation (lowercase letters) is accepted as specified in SMILES. The parser validates that aromatic atoms are in rings but accepts aromatic notation without strict Hückel rule enforcement, matching RDKit's behavior for broader compatibility.
This implementation has been validated against RDKit's canonical SMILES output for diverse molecule sets including stereocenters, complex rings, heteroatoms, and 25 commercial pharmaceutical drugs.
openchem implements the OpenSMILES specification with high fidelity while prioritizing RDKit compatibility for real-world interoperability. In specific areas where the OpenSMILES specification provides recommendations rather than strict requirements, openchem follows RDKit's implementation choices to ensure 100% parity with the industry-standard cheminformatics toolkit.
OpenSMILES Recommendation: Start traversal on heteroatoms first, then terminals.
- Example preference: OCCC over CCCO for propanol
- Rationale: Heteroatoms are "more interesting" chemically
openchem Implementation: Canonical labels first, heteroatoms as tie-breaker.
- Example: Both OCCC and CCCO canonicalize to CCCO
- Rationale: Ensures 100% deterministic output for identical molecules
Why RDKit's Approach:
1. Determinism: Canonical labels guarantee the same molecule always produces identical output, regardless of input order
2. Interoperability: 100% agreement with RDKit enables seamless integration with existing cheminformatics pipelines and databases
3. Real-world usage: Major chemical databases (PubChem, ChEMBL) prioritize canonical determinism over heteroatom preference
4. Chemical equivalence: Both OCCC and CCCO` represent the same molecule; the output difference is purely cosmetic
Impact: Minimal — affects only the order atoms appear in canonical output, not chemical meaning or validity. All SMILES remain valid OpenSMILES syntax.
**OpenSMILES S