High-performance HTML to Markdown converter - Node.js native bindings
npm install html-to-markdown-node> npm package: html-to-markdown-node (this README).
> Use html-to-markdown-wasm for the portable WASM build.
Native Node.js and Bun bindings for html-to-markdown using NAPI-RS v3.
Built on the shared Rust engine that powers the Python wheels, Ruby gem, PHP extension, WebAssembly package, and CLI – ensuring identical Markdown output across every language target.
High-performance HTML to Markdown conversion using native Rust code compiled to platform-specific binaries.










Native NAPI-RS bindings deliver the fastest HTML to Markdown conversion available in JavaScript.
| Document Type | ops/sec | Notes |
| -------------------------- | ---------- | ------------------ |
| Small (5 paragraphs) | 86,233 | Simple documents |
| Medium (25 paragraphs) | 18,979 | Nested formatting |
| Large (100 paragraphs) | 4,907 | Complex structures |
| Tables (20 tables) | 5,003 | Table processing |
| Lists (500 items) | 1,819 | Nested lists |
| Wikipedia (129KB) | 1,125 | Real-world content |
| Wikipedia (653KB) | 156 | Large documents |
Average: ~18,162 ops/sec across varied workloads.
- vs WASM: ~1.17× faster (native has zero startup time, direct memory access)
- vs Python: ~7.4× faster (avoids FFI overhead)
- Best for: Node.js and Bun server-side applications requiring maximum throughput
The shared benchmark harness lives in tools/benchmark-harness. Node keeps pace with the Rust CLI across the board:
| Document | Size | ops/sec (Node) |
| ---------------------- | ------ | -------------- |
| Lists (Timeline) | 129 KB | 3,137 |
| Tables (Countries) | 360 KB | 932 |
| Medium (Python) | 657 KB | 460 |
| Large (Rust) | 567 KB | 554 |
| Small (Intro) | 463 KB | 627 |
| hOCR German PDF | 44 KB | 8,724 |
| hOCR Invoice | 4 KB | 96,138 |
| hOCR Embedded Tables | 37 KB | 9,591 |
> Run task bench:harness -- --frameworks node to regenerate these numbers.
``bash`
npm install html-to-markdown-nodeor
yarn add html-to-markdown-nodeor
pnpm add html-to-markdown-node
`bash`
bun add html-to-markdown-node
`javascript
import { convert } from 'html-to-markdown-node';
const html = '
This is fast!
';$3
`typescript
import { convert } from 'html-to-markdown-node';const markdown = convert(html, {
headingStyle: 'Atx',
codeBlockStyle: 'Backticks',
listIndentWidth: 2,
bullets: '-',
wrap: true,
wrapWidth: 80
});
`$3
`typescript
import { convert } from 'html-to-markdown-node';const html =
| Name | Value |
|---|---|
| Foo | Bar |
;const markdown = convert(html, {
preserveTags: ['table'] // Keep tables as HTML
});
// # Report
//
//
// Name Value
// Foo Bar
//
`TypeScript
Full TypeScript definitions included:
`typescript
import { convert, convertWithInlineImages, type JsConversionOptions } from 'html-to-markdown-node';const options: JsConversionOptions = {
headingStyle: 'Atx',
codeBlockStyle: 'Backticks',
listIndentWidth: 2,
bullets: '-',
wrap: true,
wrapWidth: 80
};
const markdown = convert('
Hello
', options);
`$3
Avoid re-parsing the same options object on every call (benchmarks, tight render loops) by creating a reusable handle:
`ts
import {
createConversionOptionsHandle,
convertWithOptionsHandle,
} from 'html-to-markdown-node';const handle = createConversionOptionsHandle({ hocrSpatialTables: false });
const markdown = convertWithOptionsHandle('
Handles
', handle);
`$3
Skip the intermediate UTF-16 string allocation by feeding
Buffer/Uint8Array inputs directly—handy for benchmark harnesses or when you already have raw bytes:`ts
import {
convertBuffer,
convertInlineImagesBuffer,
convertBufferWithOptionsHandle,
createConversionOptionsHandle,
} from 'html-to-markdown-node';
import { readFileSync } from 'node:fs';const html = readFileSync('fixtures/lists.html'); // Buffer
const markdown = convertBuffer(html);
const handle = createConversionOptionsHandle({ headingStyle: 'Atx' });
const markdownFromHandle = convertBufferWithOptionsHandle(html, handle);
// Inline images work too:
const extraction = convertInlineImagesBuffer(html, null, {
maxDecodedSizeBytes: 5 1024 1024,
});
`Inline Images
Extract and decode inline images (data URIs, SVG):
`typescript
import { convertWithInlineImages } from 'html-to-markdown-node';const html = '
';
const result = convertWithInlineImages(html, null, {
maxDecodedSizeBytes: 5 1024 1024, // 5MB
inferDimensions: true,
filenamePrefix: 'img_',
captureSvg: true
});
console.log(result.markdown);
console.log(
Extracted ${result.inlineImages.length} images);for (const img of result.inlineImages) {
console.log(
${img.filename}: ${img.format}, ${img.data.length} bytes);
// Save image data to disk
require('fs').writeFileSync(img.filename, img.data);
}
`Supported Platforms
Pre-built native binaries are provided for:
| Platform | Architectures |
| ----------- | --------------------------------------------------- |
| macOS | x64 (Intel), ARM64 (Apple Silicon) |
| Linux | x64 (glibc/musl), ARM64 (glibc/musl), ARMv7 (glibc) |
| Windows | x64, ARM64 |
$3
✅ Node.js 18+ (LTS)
✅ Bun 1.0+ (full NAPI-RS support)
❌ Deno (use html-to-markdown-wasm instead)
When to Use
Choose
html-to-markdown-node when:- ✅ Running in Node.js or Bun
- ✅ Maximum performance is required
- ✅ Server-side conversion at scale
html-to-markdown-wasm for:- 🌐 Browser/client-side conversion
- 🦕 Deno runtime
- ☁️ Edge runtimes (Cloudflare Workers, Deno Deploy)
- 📦 Universal packages
Other runtimes:
html-to-markdown
- 💎 Ruby: html-to-markdown
- 🐘 PHP: goldziher/html-to-markdown
- 🌐 WebAssembly: html-to-markdown-wasmConfiguration Options
See ConversionOptions for all available options including:
- Heading styles (ATX, underlined, ATX closed)
- Code block styles (indented, backticks, tildes)
- List formatting (indent width, bullet characters)
- Text escaping and formatting
- Tag preservation (
preserveTags) and stripping (stripTags)
- Preprocessing for web scraping
- hOCR table extraction
- And more...Examples
$3
Keep specific HTML tags in their original form instead of converting to Markdown:
`typescript
import { convert } from 'html-to-markdown-node';const html =
Before table
| Name | Value |
|---|---|
| Item 1 | 100 |
After table
;const markdown = convert(html, {
preserveTags: ['table']
});
// Result includes the table as HTML:
// "Before table\n\n
...
\n\nAfter table\n"
`Combine with
stripTags for fine-grained control:`typescript
const markdown = convert(html, {
preserveTags: ['table', 'form'], // Keep these as HTML
stripTags: ['script', 'style'] // Remove these entirely
});
`$3
`javascript
const { convert } = require('html-to-markdown-node');const scrapedHtml = await fetch('https://example.com').then(r => r.text());
const markdown = convert(scrapedHtml, {
preprocessing: {
enabled: true,
preset: 'Aggressive',
removeNavigation: true,
removeForms: true
},
headingStyle: 'Atx',
codeBlockStyle: 'Backticks'
});
`$3
`javascript
const { convert } = require('html-to-markdown-node');
const fs = require('fs');// OCR output from Tesseract in hOCR format
const hocrHtml = fs.readFileSync('scan.hocr', 'utf8');
// Automatically detects hOCR and reconstructs tables
const markdown = convert(hocrHtml, {
hocrSpatialTables: true // Enable spatial table reconstruction
});
``- GitHub Repository
- Full Documentation
- WASM Package
- Python Package
- Rust Crate
MIT