High-performance HTML to Markdown converter - WebAssembly bindings
npm install @kreuzberg/html-to-markdown-wasm> npm package: @kreuzberg/html-to-markdown-wasm (this README).
> Use @kreuzberg/html-to-markdown-node when you only target Node.js or Bun and want native performance.
Universal HTML to Markdown converter using WebAssembly.
Powered by the same Rust engine as the Node.js, Python, Ruby, and PHP bindings, so Markdown output stays identical regardless of runtime.
Runs anywhere: Node.js, Deno, Bun, browsers, and edge runtimes.










> ⚠️ BREAKING CHANGE: Package Namespace Update
>
> In v2.19.0, the npm package namespace changed from html-to-markdown-wasm to @kreuzberg/html-to-markdown-wasm to reflect the new Kreuzberg.dev organization.
Before (v2.18.x):
``bash`
npm install html-to-markdown-wasm
After (v2.19.0+):
`bash`
npm install @kreuzberg/html-to-markdown-wasm
Before:
`typescript`
import { convert } from 'html-to-markdown-wasm';
// or
import { convert } from "npm:html-to-markdown-wasm"; // Deno
After:
`typescript`
import { convert } from '@kreuzberg/html-to-markdown-wasm';
// or
import { convert } from "npm:@kreuzberg/html-to-markdown-wasm"; // Deno
Before:
`javascript`
import init, { convert } from 'https://unpkg.com/html-to-markdown-wasm/dist-web/html_to_markdown_wasm.js';
After:
`javascript`
import init, { convert } from 'https://unpkg.com/@kreuzberg/html-to-markdown-wasm/dist-web/html_to_markdown_wasm.js';
- Package renamed from html-to-markdown-wasm to @kreuzberg/html-to-markdown-wasm
- All APIs remain identical
- Full backward compatibility after updating package name and imports
---
Universal WebAssembly bindings with excellent performance across all JavaScript runtimes.
| Document Type | ops/sec | Notes |
| -------------------------- | ---------- | ------------------ |
| Small (5 paragraphs) | 70,300 | Simple documents |
| Medium (25 paragraphs) | 15,282 | Nested formatting |
| Large (100 paragraphs) | 3,836 | Complex structures |
| Tables (20 tables) | 3,748 | Table processing |
| Lists (500 items) | 1,391 | Nested lists |
| Wikipedia (129KB) | 1,022 | Real-world content |
| Wikipedia (653KB) | 147 | Large documents |
Average: ~15,536 ops/sec across varied workloads.
- vs Native NAPI: ~1.17× slower (WASM has minimal overhead)
- vs Python: ~6.3× faster (no FFI overhead)
- Best for: Universal deployment (browsers, Deno, edge runtimes, cross-platform apps)
Numbers captured via the shared fixture harness in tools/benchmark-harness:
| Document | Size | ops/sec (WASM) |
| ---------------------- | ------ | -------------- |
| Lists (Timeline) | 129 KB | 882 |
| Tables (Countries) | 360 KB | 242 |
| Medium (Python) | 657 KB | 121 |
| Large (Rust) | 567 KB | 124 |
| Small (Intro) | 463 KB | 163 |
| hOCR German PDF | 44 KB | 1,637 |
| hOCR Invoice | 4 KB | 7,775 |
| hOCR Embedded Tables | 37 KB | 1,667 |
> Expect slightly higher numbers in long-lived browser/Deno workers once the WASM module is warm.
`bash`
npm install @kreuzberg/html-to-markdown-wasmor
yarn add @kreuzberg/html-to-markdown-wasmor
pnpm add @kreuzberg/html-to-markdown-wasm
`typescript`
// Via npm specifier
import { convert } from "npm:@kreuzberg/html-to-markdown-wasm";
`javascript
import { convert } from '@kreuzberg/html-to-markdown-wasm';
const html = '
This is fast!
';> Heads up for edge runtimes: Cloudflare Workers, Vite dev servers, and other environments that instantiate
.wasm files asynchronously must call await initWasm() (or await wasmReady) once during startup before invoking convert. Traditional bundlers (Webpack, Rollup) and Deno/Node imports continue to work without manual initialization.Working Examples:
- Browser with Rollup - Using dist-web target in browser
- Node.js - Using dist-node target
- Cloudflare Workers - Using bundler target with Wrangler
$3
`ts
import {
convertWithOptionsHandle,
createConversionOptionsHandle,
} from '@kreuzberg/html-to-markdown-wasm';const handle = createConversionOptionsHandle({ hocrSpatialTables: false });
const markdown = convertWithOptionsHandle('
Reusable
', handle);
`$3
When you already have raw bytes (e.g.,
fs.readFileSync, Fetch API responses), skip re-encoding with TextDecoder by calling the byte-friendly helpers:`ts
import {
convertBytes,
convertBytesWithOptionsHandle,
createConversionOptionsHandle,
convertBytesWithInlineImages,
} from '@kreuzberg/html-to-markdown-wasm';
import { readFileSync } from 'node:fs';const htmlBytes = readFileSync('input.html'); // Buffer -> Uint8Array
const markdown = convertBytes(htmlBytes);
const handle = createConversionOptionsHandle({ headingStyle: 'atx' });
const markdownFromHandle = convertBytesWithOptionsHandle(htmlBytes, handle);
const inlineExtraction = convertBytesWithInlineImages(htmlBytes, null, {
maxDecodedSizeBytes: 5 1024 1024,
});
`$3
`typescript
import { convert } from '@kreuzberg/html-to-markdown-wasm';const markdown = convert(html, {
headingStyle: 'atx',
codeBlockStyle: 'backticks',
listIndentWidth: 2,
bullets: '-',
wrap: true,
wrapWidth: 80
});
`$3
`typescript
import { convert } from '@kreuzberg/html-to-markdown-wasm';const html =
| Name | Value |
|---|---|
| Foo | Bar |
;const markdown = convert(html, {
preserveTags: ['table'] // Keep tables as HTML
});
`$3
`typescript
import { convert } from "npm:html-to-markdown-wasm";const html = await Deno.readTextFile("input.html");
const markdown = convert(html, { headingStyle: "atx" });
await Deno.writeTextFile("output.md", markdown);
`> Performance Tip: For Node.js/Bun, use @kreuzberg/html-to-markdown-node for 1.17× better performance with native bindings.
$3
`html
HTML to Markdown
`$3
`typescript
import { convert } from '@kreuzberg/html-to-markdown-wasm';const markdown = convert('
Hello
', {
headingStyle: 'atx',
codeBlockStyle: 'backticks'
});
`$3
`typescript
import { convert, initWasm, wasmReady } from '@kreuzberg/html-to-markdown-wasm';// Cloudflare Workers / other edge runtimes instantiate WASM asynchronously.
// Kick off initialization once at module scope.
const ready = wasmReady ?? initWasm();
export default {
async fetch(request: Request): Promise {
await ready;
const html = await request.text();
const markdown = convert(html, { headingStyle: 'atx' });
return new Response(markdown, {
headers: { 'Content-Type': 'text/markdown' }
});
}
};
`> See the full Cloudflare Workers example with Wrangler configuration.
TypeScript
Full TypeScript support with type definitions:
`typescript
import {
convert,
convertWithInlineImages,
WasmInlineImageConfig,
type WasmConversionOptions
} from '@kreuzberg/html-to-markdown-wasm';const options: WasmConversionOptions = {
headingStyle: 'atx',
codeBlockStyle: 'backticks',
listIndentWidth: 2,
wrap: true,
wrapWidth: 80
};
const markdown = convert('
Hello
', options);
`Inline Images
Extract and decode inline images (data URIs, SVG):
`typescript
import { convertWithInlineImages, WasmInlineImageConfig } from '@kreuzberg/html-to-markdown-wasm';const html = '
';
const config = new WasmInlineImageConfig(5 1024 1024); // 5MB max
config.inferDimensions = true;
config.filenamePrefix = 'img_';
config.captureSvg = true;
const result = convertWithInlineImages(html, null, config);
console.log(result.markdown);
console.log(
Extracted ${result.inlineImages.length} images);for (const img of result.inlineImages) {
console.log(
${img.filename}: ${img.format}, ${img.data.length} bytes);
// img.data is a Uint8Array - save to file or upload
}
`Metadata Extraction
Extract document metadata (headers, links, images, structured data) alongside Markdown conversion:
`typescript
import { convertWithMetadata, WasmMetadataConfig } from '@kreuzberg/html-to-markdown-wasm';const html =
Content with a link

;const config = new WasmMetadataConfig();
config.extractHeaders = true;
config.extractLinks = true;
config.extractImages = true;
config.extractStructuredData = true;
config.maxStructuredDataSize = 1_000_000; // 1MB limit
const result = convertWithMetadata(html, null, config);
console.log(result.markdown);
console.log('Document metadata:', result.metadata.document);
// {
// title: 'My Article',
// language: 'en',
// ...
// }
console.log('Headers:', result.metadata.headers);
// [
// { level: 1, text: 'Main Title', id: undefined, depth: 0, htmlOffset: ... }
// ]
console.log('Links:', result.metadata.links);
// [
// {
// href: 'https://example.com',
// text: 'a link',
// linkType: 'external',
// rel: [],
// ...
// }
// ]
console.log('Images:', result.metadata.images);
// [
// {
// src: 'https://example.com/image.jpg',
// alt: 'Example image',
// imageType: 'external',
// ...
// }
// ]
`$3
The
WasmMetadataConfig class controls what metadata is extracted:`typescript
import { WasmMetadataConfig } from '@kreuzberg/html-to-markdown-wasm';const config = new WasmMetadataConfig();
// Enable/disable extraction types
config.extractHeaders = true; // h1-h6 elements
config.extractLinks = true; // elements with link type classification
config.extractImages = true; //
and
// Limit structured data size to prevent memory exhaustion
config.maxStructuredDataSize = 1_000_000; // 1MB default
`$3
The returned metadata object includes:
- document: Document-level metadata (title, description, keywords, language, OG tags, Twitter cards, etc.)
- headers: Array of header elements with level, text, id, and document position
- links: Array of links with href, text, type (anchor/internal/external/email/phone), and rel attributes
- images: Array of images with src, alt text, dimensions, and type classification (dataUri/external/relative/svg)
- structuredData: Array of JSON-LD, Microdata, and RDFa blocks
$3
Convert bytes directly with metadata extraction:
`typescript
import { convertBytesWithMetadata, WasmMetadataConfig } from '@kreuzberg/html-to-markdown-wasm';
import { readFileSync } from 'node:fs';const htmlBytes = readFileSync('article.html');
const config = new WasmMetadataConfig();
const result = convertBytesWithMetadata(htmlBytes, null, config);
console.log(result.markdown);
console.log(result.metadata);
`Build Targets
Three build targets are provided for different environments:
| Target | Path | Use Case |
| ----------- | --------------------------------- | ------------------------------ |
| Bundler |
@kreuzberg/html-to-markdown-wasm | Webpack, Vite, Rollup, esbuild |
| Node.js | @kreuzberg/html-to-markdown-wasm/dist-node | Node.js, Bun (CommonJS/ESM) |
| Web | @kreuzberg/html-to-markdown-wasm/dist-web | Direct browser ESM imports |Runtime Compatibility
| Runtime | Support | Package |
| ------------------------- | ---------------------------- | -------------- |
| ✅ Node.js 18+ | Full support |
dist-node |
| ✅ Deno | Full support | npm: specifier |
| ✅ Bun | Full support (prefer native) | Default export |
| ✅ Browsers | Full support | dist-web |
| ✅ Cloudflare Workers | Full support | Default export |
| ✅ Deno Deploy | Full support | npm: specifier |When to Use
Choose
@kreuzberg/html-to-markdown-wasm when:- 🌐 Running in browsers or edge runtimes
- 🦕 Using Deno
- ☁️ Deploying to Cloudflare Workers, Deno Deploy
- 📦 Building universal libraries
- 🔄 Need consistent behavior across all platforms
Use @kreuzberg/html-to-markdown-node for:
- ⚡ Maximum performance in Node.js/Bun (~3× faster)
- 🖥️ Server-side only applications
Visitor Pattern Support
The WebAssembly binding does not support the visitor pattern. The visitor pattern requires callbacks and stateful execution across the WebAssembly/JavaScript boundary, which has fundamental limitations:
$3
1. Memory safety across FFI boundary: The WASM/JS boundary cannot safely pass mutable function callbacks that maintain state across multiple invocations
2. Single-threaded execution model: WASM runs on a single thread with no equivalent to Node.js's
ThreadsafeFunction FFI primitive
3. No callback marshaling: JavaScript callbacks cannot be directly invoked from within WASM without significant overhead and memory leaks
4. Serialization overhead: Converting context objects between WASM and JS for each visitor callback would eliminate performance benefits$3
Choose one of these approaches:
#### 1. Use Node.js Binding (Recommended)
For best performance with visitor support, use the native Node.js binding:
`typescript
import { convertWithVisitor, type Visitor } from '@kreuzberg/html-to-markdown-node';const visitor: Visitor = {
visitLink(ctx, href, text, title) {
// Your visitor logic here
return { type: 'continue' };
},
};
const markdown = convertWithVisitor(html, { visitor });
`Performance: ~3× faster than WASM, full visitor pattern support.
Use when: Running on Node.js or Bun server-side.
#### 2. Use Server-Side Bindings
For other platforms, use Python, Ruby, or PHP bindings with visitor support:
Python:
`python
from html_to_markdown import convert_with_visitorclass MyVisitor:
def visit_link(self, ctx, href, text, title):
# Your visitor logic here
return {"type": "continue"}
markdown = convert_with_visitor(html, visitor=MyVisitor())
`Ruby:
`ruby
require 'html_to_markdown'class MyVisitor
def visit_link(ctx, href, text, title)
{ type: :continue }
end
end
markdown = HtmlToMarkdown.convert_with_visitor(html, visitor: MyVisitor.new)
`PHP:
`php
use HtmlToMarkdown\Converter;class MyVisitor {
public function visitLink(array $ctx, string $href, string $text, ?string $title): array {
return ['type' => 'continue'];
}
}
$markdown = Converter::convertWithVisitor($html, new MyVisitor());
`#### 3. Preprocess HTML Before Conversion
For simple transformations, manipulate the HTML before passing to WASM:
`typescript
import { convert } from '@kreuzberg/html-to-markdown-wasm';// Rewrite URLs before conversion
const processedHtml = html.replace(
/https:\/\/old-cdn\.com/g,
'https://new-cdn.com'
);
const markdown = convert(processedHtml);
`Use when: Only simple text replacements are needed.
#### 4. Post-Process Markdown
Transform the output Markdown after conversion:
`typescript
import { convert } from '@kreuzberg/html-to-markdown-wasm';const markdown = convert(html);
// Post-process the markdown
const transformed = markdown
.replace(/\[(.+?)\]\(https:\/\/old-cdn\.com/g, '$1
.replace(/!\[(.+?)\]\(https:\/\/old-cdn\.com/g, '!$1;
`Use when: Transformations can be applied to final Markdown output.
$3
| Binding | Visitor Support | Best For |
|---------|-----------------|----------|
| Rust | ✅ Yes | Core library, performance-critical code |
| Python | ✅ Yes (sync & async) | Server-side, bulk processing |
| TypeScript/Node.js | ✅ Yes (sync & async) | Server-side Node.js/Bun, best performance |
| Ruby | ✅ Yes | Server-side Ruby on Rails, Sinatra |
| PHP | ✅ Yes | Server-side PHP, content management |
| Go | ❌ No | Basic conversion only |
| Java | ❌ No | Basic conversion only |
| C# | ❌ No | Basic conversion only |
| Elixir | ❌ No | Basic conversion only |
| WebAssembly | ❌ No | Browser, Edge, Deno (see alternatives above) |
For comprehensive visitor pattern documentation with examples, see Visitor Pattern Guide.
Configuration Options
See the TypeScript definitions for all available options:
- Heading styles (atx, underlined, atxClosed)
- Code block styles (indented, backticks, tildes)
- List formatting (indent width, bullet characters)
- Text escaping and formatting
- Tag preservation (
preserveTags) and stripping (stripTags)
- Preprocessing for web scraping
- hOCR table extraction
- And more...Examples
$3
Keep specific HTML tags in their original form:
`typescript
import { convert } from '@kreuzberg/html-to-markdown-wasm';const html =
Before table
| Name | Value |
|---|---|
| Item 1 | 100 |
After table
;const markdown = convert(html, {
preserveTags: ['table']
});
// Result includes the table as HTML
`Combine with
stripTags:`typescript
const markdown = convert(html, {
preserveTags: ['table', 'form'], // Keep as HTML
stripTags: ['script', 'style'] // Remove entirely
});
`$3
`typescript
import { convert } from "npm:html-to-markdown-wasm";Deno.serve((req) => {
const url = new URL(req.url);
if (url.pathname === "/convert" && req.method === "POST") {
const html = await req.text();
const markdown = convert(html, { headingStyle: "atx" });
return new Response(markdown, {
headers: { "Content-Type": "text/markdown" }
});
}
return new Response("Not found", { status: 404 });
});
`$3
`html
`$3
`typescript
import { convert } from "npm:html-to-markdown-wasm";const response = await fetch("https://example.com");
const html = await response.text();
const markdown = convert(html, {
preprocessing: {
enabled: true,
preset: "aggressive",
removeNavigation: true,
removeForms: true
},
headingStyle: "atx",
codeBlockStyle: "backticks"
});
console.log(markdown);
`Other Runtimes
The same Rust engine ships as native bindings for other ecosystems:
html-to-markdown-node
- 🐍 Python: html-to-markdown
- 💎 Ruby: html-to-markdown
- 🐘 PHP: kreuzberg-dev/html-to-markdown
- 🦀 Rust crate & CLI: html-to-markdown-rs`- GitHub Repository
- Full Documentation
- Native Node Package
- Python Package
- PHP Extension & Helpers
- Rust Crate
MIT