DocFlow

> A developer-friendly transformation engine built on SuperDoc

Transform documents programmatically with a clean, chainable API. Built on top of SuperDoc Editor, DocFlow makes it easy to load, query, transform, and export documents in multiple formats.

Features

- 🔄 Format Conversion: DOCX ↔ Markdown ↔ HTML ↔ JSON ↔ Plain Text
- 📊 Table Support: DOCX tables export cleanly to Markdown with formatting preserved
- 🔍 Powerful Queries: CSS-like selectors and predicate functions
- ⚡ Batch Processing: Process multiple documents concurrently
- 🎯 Type-Safe: Full TypeScript support
- 🔗 Chainable API: Fluent, readable transformations
- 🚫 Error Handling: Multiple error handling modes
- 📦 Headless: Runs in Node.js without a browser

Installation

``bash npm install @paulmeller/docflow`

`Quick Start`

`javascript import DocFlow from '@paulmeller/docflow';

// Simple transformation await new DocFlow() .load('document.docx') .transform('heading[level=2]', node => ({ ...node, text: node.text.toUpperCase() })) .save('output.docx');`

`⚠️ Important:` load() `vs` loadContent()

DO NOT confuse these two methods:

| Method | Purpose | First Parameter | When to Use | |--------|---------|----------------|-------------| |load()| Load from file path or buffer | File path string or Buffer | Reading files from disk | |loadContent() | Load from content string | Content string (markdown/HTML/JSON) | When you have content in memory |

`$3`

`javascript // ❌ WRONG - This treats JSON string as a file path! const jsonString = JSON.stringify(docJson); await doc.load(jsonString, { format: 'json' }); // Error: ENAMETOOLONG or "file not found"

// ❌ WRONG - This treats markdown as a file path! const markdown = '# Hello\n\nWorld'; await doc.load(markdown, { format: 'markdown' });`

`$3`

`javascript // ✅ CORRECT - Use load() for file paths await doc.load('document.docx'); await doc.load('data.json', { format: 'json' });

// ✅ CORRECT - Use loadContent() for content strings const jsonString = JSON.stringify(docJson); await doc.loadContent(jsonString);

const markdown = '# Hello\n\nWorld'; await doc.loadContent(markdown);`

`Core Concepts`

`$3`

Load from file, buffer, or content:

`javascript const doc = new DocFlow();

// From file path await doc.load('document.docx');

// From buffer (binary data) const buffer = fs.readFileSync('document.docx'); await doc.load(buffer);

// From JSON buffer with explicit format const jsonBuffer = Buffer.from(jsonString, 'utf-8'); await doc.load(jsonBuffer, { format: 'json' });

// From content string (auto-detects markdown, HTML, JSON, or text) await doc.loadContent('

`Title`

Content

');
await doc.loadContent('# Title\n\nContent');
await doc.loadContent(JSON.stringify(docJson));


$3
Find nodes using selectors:

`javascript // CSS-like selectors const headings = doc.query('heading[level=2]'); const paragraphs = doc.query('paragraph');

// Predicate functions const longParagraphs = doc.query(node => node.type === 'paragraph' && node.content?.length > 100 );

// Work with results console.log(headings.count); // Number of matches console.log(headings.text()); // Concatenated text console.log(headings.first()); // First match console.log(headings.last()); // Last match`

`$3`

Modify document structure:

`javascript import { Transforms } from '@paulmeller/docflow';

// Transform matching nodes await doc.transform('heading[level=2]', node => ({ ...node, text: Transforms.toTitleCase(node.text) }));

// Transform with built-in helpers await doc.transform('text', Transforms.replace(/foo/g, 'bar')); await doc.transform('text', Transforms.addPrefix('> '));

// Transform entire document await doc.transformDocument(json => { // Modify entire document structure return modifiedJSON; });`

`$3`

Export to various formats:

`javascript // Export to buffer/string const docx = await doc.export('docx'); const html = await doc.export('html'); const markdown = await doc.export('markdown'); const json = await doc.export('json'); const text = await doc.export('text');

// Save directly to file await doc.save('output.docx'); await doc.save('output.md', 'markdown'); await doc.save('output.html', 'html'); await doc.save('output.json', 'json'); await doc.save('output.txt', 'text');`

`Examples`

`$3`

`javascript import DocFlow from '@paulmeller/docflow';

// DOCX to Markdown await new DocFlow() .load('document.docx') .save('document.md', 'markdown');

// Markdown to DOCX await new DocFlow() .load('document.md') .save('document.docx');

// DOCX with tables to Markdown (tables preserved with formatting) await new DocFlow() .load('report-with-tables.docx') .save('report.md', 'markdown');

// Extract plain text for analysis const plainText = await new DocFlow() .load('document.docx') .export('text'); console.log(plainText); // All text without formatting`

`$3`

`javascript import DocFlow from '@paulmeller/docflow';

// Export DOCX to JSON (for storage or transmission) const doc1 = new DocFlow(); await doc1.load('report.docx'); const jsonData = await doc1.export('json'); await fs.writeFile('report.json', JSON.stringify(jsonData, null, 2));

// Import JSON and convert to Markdown const doc2 = new DocFlow(); await doc2.load('report.json'); // Auto-detects .json extension const markdown = await doc2.export('markdown');

// Load JSON from buffer (e.g., from API or virtual filesystem) const jsonString = JSON.stringify(jsonData); const buffer = Buffer.from(jsonString, 'utf-8'); const doc3 = new DocFlow(); await doc3.load(buffer, { format: 'json' }); // Explicit format for buffers const html = await doc3.export('html');

// Load JSON from content string (when you have it in memory) const doc4 = new DocFlow(); await doc4.loadContent(jsonString); // Auto-detects JSON const docx = await doc4.export('docx');`

`$3`

`javascript import { DocFlow, Transforms } from '@paulmeller/docflow';

const doc = new DocFlow(); await doc .load('document.docx') .transform('heading[level=1]', node => ({ ...node, text: Transforms.toTitleCase(node.text) })) .transform('heading[level=2]', node => ({ ...node, text: Transforms.toSentenceCase(node.text) })) .save('standardized.docx');`

`$3`

`javascript const doc = new DocFlow(); await doc .load('template.docx') .transform('text', Transforms.replace(/\{name\}/g, 'John Doe')) .transform('text', Transforms.replace(/\{date\}/g, new Date().toLocaleDateString())) .transform('text', Transforms.replace(/\{company\}/g, 'Acme Corp')) .save('filled-document.docx');`

`$3`

`javascript const doc = new DocFlow(); await doc.load('document.docx');

// Find all headings const headings = doc.query('heading'); console.log(Document has ${headings.count} headings);

// Find specific content const importantSections = doc.query(node => node.type === 'heading' && node.text?.toLowerCase().includes('important') );

console.log('Important sections:'); importantSections.map(h => console.log(- ${h.text}));`

`$3`

`javascript import { BatchProcessor } from '@paulmeller/docflow';

const batch = new BatchProcessor({ concurrency: 5 });

const results = await batch.process( ['doc1.docx', 'doc2.docx', 'doc3.docx'], async (doc, file) => { await doc .transform('heading[level=1]', node => ({ ...node, attrs: { ...node.attrs, level: 2 } })) .save(file.replace('.docx', '-updated.docx'));

return { processed: true }; } );

console.log(✓ Processed ${results.successful} files); console.log(✗ Failed ${results.failed} files);`

`$3`

`javascript const doc = new DocFlow();

await doc .load('input.docx') // Step 1: Standardize heading levels .transform('heading[level=1]', node => ({ ...node, text: node.text.toUpperCase() })) // Step 2: Remove empty paragraphs .transformDocument(json => { json.content = json.content.filter(node => node.type !== 'paragraph' || node.content?.length > 0 ); return json; }) // Step 3: Add metadata .transformDocument(json => ({ ...json, attrs: { ...json.attrs, processed: true, timestamp: Date.now() } })) .save('processed.docx');`

`$3`

`javascript const doc = new DocFlow(); await doc.load('document.docx');

const json = doc.toJSON();

// Analyze structure const stats = { headings: doc.query('heading').count, paragraphs: doc.query('paragraph').count, h1: doc.query('heading[level=1]').count, h2: doc.query('heading[level=2]').count, h3: doc.query('heading[level=3]').count };

console.log('Document Structure:'); console.log(JSON.stringify(stats, null, 2));

// Table of contents const toc = doc.query('heading') .map(h =>${' '.repeat(h.attrs.level - 1)}- ${h.text}) .join('\n');

console.log('\nTable of Contents:'); console.log(toc);`

`$3`

`javascript // Mode 1: Throw on error (default) try { await new DocFlow() .load('missing.docx') .save('output.docx'); } catch (error) { console.error('Failed:', error.message); }

// Mode 2: Collect errors const doc = new DocFlow({ errorMode: 'collect' }); await doc .load('input.docx') .transform('invalid-selector', node => node) .save('output.docx');

const errors = doc.getErrors(); if (errors.length > 0) { console.error('Errors occurred:', errors); }

// Mode 3: Silent (ignore errors) const doc2 = new DocFlow({ errorMode: 'silent' }); await doc2.load('maybe-exists.docx').save('output.docx');`

`$3`

`javascript // Define custom transformation function addTimestamp(node) { if (node.type === 'paragraph') { return { ...node, attrs: { ...node.attrs, timestamp: new Date().toISOString() } }; } return node; }

// Apply it await doc .load('document.docx') .transform('paragraph', addTimestamp) .save('timestamped.docx');`

`$3`

`javascript const doc = new DocFlow(); await doc.load('document.docx');

// Only process if conditions are met const validation = doc.validate();

if (validation.valid) { const headings = doc.query('heading');

if (headings.count > 0) { await doc .transform('heading', node => ({ ...node, text:📌 ${node.text}})) .save('processed.docx'); } }`

`Format Auto-Detection`

DocFlow automatically detects formats to make your code simpler and more intuitive.

`$3`

- Auto-detects format from file extension: .docx, .md, .html, .json, .txt- Useformat option to override: await doc.load('file.txt', { format: 'markdown' })- For Buffers, format defaults to'docx' unless you specify otherwise

`javascript // Auto-detected from extension await doc.load('document.docx'); // → format: 'docx' await doc.load('data.json'); // → format: 'json' await doc.load('readme.md'); // → format: 'markdown'

// Explicit format for buffers const buffer = Buffer.from(jsonString, 'utf-8'); await doc.load(buffer, { format: 'json' });`

`$3`

- Auto-detects JSON: Looks for {"type": "doc"...}pattern - Auto-detects HTML: Looks for,

`,`
`, etc. - Auto-detects Markdown: Everything else treated as markdown - No format parameter needed - detection is automatic`
``javascript // All auto-detected await doc.loadContent('# Heading\n\nText'); // → markdown await doc.loadContent('`

`Title`

Text

');   // → HTML
await doc.loadContent('{"type": "doc", ...}');        // → JSON


API Reference
$3
#### Constructor

`typescript new DocFlow(options?: { headless?: boolean; // Default: true validateSchema?: boolean; // Default: true errorMode?: 'throw' | 'collect' | 'silent'; // Default: 'throw' })`

#### Methods

##### load(source, options?)

Load a document from file or buffer.

`typescript load(source: string | Buffer, options?: { format?: 'docx' | 'html' | 'markdown' | 'json' | 'text' }): Promise`

##### loadContent(content)

Load content directly. Automatically detects if content is markdown, HTML, or plain text. Auto-initializes a blank document if none exists.

`typescript loadContent(content: string): Promise`

##### toJSON()

Get document as ProseMirror JSON.

`typescript toJSON(): ProseMirrorJSON`

##### query(selector)

Query document structure.

`typescript query(selector: string | Function): QueryResult`

Selectors: -"heading"- All heading nodes -"heading[level=2]"- H2 headings -"paragraph"- All paragraphs -node => condition - Custom predicate

##### transform(selector, transformer)

Transform matching nodes.

`typescript transform( selector: string | Function, transformer: (node) => node | Promise | null ): Promise`

##### transformDocument(transformer)

Transform entire document.

`typescript transformDocument( transformer: (json) => json | Promise ): Promise`

##### export(format?, options?)

Export to format.

`typescript export( format?: 'docx' | 'html' | 'markdown' | 'json' | 'text', options?: ExportOptions ): Promise`

##### save(filepath, format?)

Save to file.

`typescript save(filepath: string, format?: string): Promise`

##### validate()

Validate document.

`typescript validate(): { valid: boolean; errors: string[]; warnings: string[]; document: ProseMirrorJSON; }`

##### getHistory()

Get operation history.

`typescript getHistory(): Operation[]`

##### getErrors()

Get collected errors (when errorMode: 'collect').

`typescript getErrors(): ErrorRecord[]`

`$3`

#### Properties

- count: Number of matching nodes -nodes: Array of found nodes -selector: Selector used

#### Methods

##### first()

Get first match.

`typescript first(): Node | null`

##### last()

Get last match.

`typescript last(): Node | null`

##### map(fn)

Map over results.

`typescript map(fn: (node, index) => T): T[]`

##### filter(fn)

Filter results.

`typescript filter(fn: (node, index) => boolean): QueryResult`

##### text()

Get concatenated text content.

`typescript text(): string`

##### transform(transformer)

Transform all found nodes.

`typescript transform(transformer: Function): Promise`

`$3`

Process multiple documents concurrently.

`typescript const batch = new BatchProcessor({ concurrency: 5, errorMode: 'collect' });

const results = await batch.process( ['file1.docx', 'file2.docx'], async (doc, file) => { // Process each document await doc.transform(...).save(...); return { success: true }; } );`

`$3`

Built-in transformation helpers.

`typescript import { Transforms } from '@paulmeller/docflow';

Transforms.toTitleCase(text) // "hello world" → "Hello World" Transforms.toSentenceCase(text) // "HELLO WORLD" → "Hello world" Transforms.replace(pattern, repl) // Replace matching text Transforms.addPrefix(prefix) // Add prefix to text Transforms.remove(condition) // Remove matching nodes`

`ProseMirror JSON Structure`

Documents are represented as ProseMirror JSON:

`json { "type": "doc", "content": [ { "type": "heading", "attrs": { "level": 1 }, "content": [ { "type": "text", "text": "Title" } ] }, { "type": "paragraph", "content": [ { "type": "text", "text": "Content" } ] } ] }`

`$3`

- doc- Root document -heading - Heading (attrs: level) -paragraph- Paragraph -text- Text content -blockquote- Block quote -codeBlock- Code block -bulletList- Bullet list -orderedList- Numbered list -listItem - List item

`Best Practices`

`$3`

`javascript // ✓ Good - Chainable, readable await doc .load('input.docx') .transform('heading', ...) .transform('paragraph', ...) .save('output.docx');

// ✗ Avoid - Verbose await doc.load('input.docx'); await doc.transform('heading', ...); await doc.transform('paragraph', ...); await doc.save('output.docx');`

`$3`

`javascript // ✓ Good - Specific doc.query('heading[level=2]')

// ✗ Avoid - Too broad doc.query('heading').filter(h => h.attrs.level === 2)`

`$3`

`javascript // ✓ Good - Handle errors try { await doc.load('file.docx'); } catch (error) { console.error('Failed to load:', error); }

// Or use collect mode const doc = new DocFlow({ errorMode: 'collect' }); await doc.load('file.docx'); if (doc.getErrors().length > 0) { // Handle errors }`

`$3`

`javascript // ✓ Good - Validate after transformation await doc.transform(...); const validation = doc.validate(); if (!validation.valid) { console.error('Validation failed:', validation.errors); }`

`$3`

`javascript // ✓ Good - Concurrent processing const batch = new BatchProcessor({ concurrency: 5 }); await batch.process(files, pipeline);

// ✗ Avoid - Sequential processing for (const file of files) { await new DocFlow().load(file).transform(...).save(...); }`

`TypeScript Usage`

Full TypeScript support included:

`typescript import DocFlow, { QueryResult, Transforms } from '@paulmeller/docflow';

const doc = new DocFlow({ errorMode: 'collect' });

await doc.load('document.docx');

const headings: QueryResult = doc.query('heading'); const count: number = headings.count;`

`Known Limitations`

Due to limitations in the underlying SuperDoc library's DOCX conversion engine:

`$3`

- Lists: DOCX export/import loses list items beyond the first item (confirmed SuperDoc limitation even with proper command API) - Tables: ✅ Work perfectly for DOCX round-trips - Recommendation: For list transformations, use Markdown↔HTML conversions instead of DOCX round-trips

`$3`

- ✅ Markdown → HTML → Markdown: Full fidelity for lists, tables, links, formatting - ✅ HTML → Markdown: Complete preservation of all content - ✅ DOCX → Markdown/HTML: One-way conversion works well for extracting content - ✅ JSON export: Perfect for analyzing document structure

`$3`

`javascript // ✅ GOOD: Use HTML/Markdown for list transformations await doc.createBlank(); await doc.loadContent('- Item 1\n- Item 2\n- Item 3'); const html = await doc.export('html'); // Preserves all items const md = await doc.export('markdown'); // Preserves all items

// ⚠️ LIMITED: DOCX round-trips may lose list structure await doc.load('document.docx'); const docx = await doc.export('docx'); await doc.load(docx); // May lose list items 2+`

For applications requiring full DOCX round-trip fidelity with complex lists and tables, consider using Microsoft Word's native APIs or alternative DOCX libraries.

Upstream Issue Tracker: These limitations originate from the SuperDoc library. You can track progress at SuperDoc GitHub Issues.

`Performance Tips`

1. Use batch processing for multiple files 2. Reuse DocFlow instances when possible 3. Use specific selectors to minimize traversal 4. Validate only when necessary (disable withvalidateSchema: false) 5. Handle large documents with streaming (future feature)

`Troubleshooting`

`$3`

Error message:`Error: ENAMETOOLONG: name too long, open '{"type": "doc", "content": [...]}'`

Cause: You're passing content to load() instead of a file path.

Fix: Use loadContent() for content strings:

`javascript // ❌ WRONG - Treats JSON string as file path const json = await doc.export('json'); await doc.load(JSON.stringify(json), { format: 'json' });

// ✅ CORRECT - Use loadContent() for content const json = await doc.export('json'); await doc.loadContent(JSON.stringify(json));

// OR use load() with a buffer const buffer = Buffer.from(JSON.stringify(json), 'utf-8'); await doc.load(buffer, { format: 'json' });`

`$3`

`javascript // Check file exists if (fs.existsSync('document.docx')) { await doc.load('document.docx'); }

// Check format const format = path.extname('document.docx'); await doc.load('document.docx', { format: 'docx' });`

`$3`

`javascript // Verify selector matches const matches = doc.query('heading[level=2]'); console.log(Found ${matches.count} matches);

// Check transformation logic await doc.transform('heading', node => { console.log('Transforming:', node); return { ...node, text: node.text.toUpperCase() }; });`

`$3`

`javascript // Explicitly specify format await doc.save('output.docx', 'docx');

// Check supported formats const formats = ['docx', 'html', 'markdown', 'json', 'text'];``

License

MIT

Contributing

Contributions welcome! Please read CONTRIBUTING.md for guidelines.

Credits

Built on SuperDoc by Harbour Enterprises.

Support

- 📖 Documentation
- 💬 Discussions
- 🐛 Issues

DocFlow

> A developer-friendly transformation engine built on SuperDoc

Transform documents programmatically with a clean, chainable API. Built on top of SuperDoc Editor, DocFlow makes it easy to load, query, transform, and export documents in multiple formats.

Features

Installation

``bash npm install @paulmeller/docflow`

`Quick Start`

`javascript import DocFlow from '@paulmeller/docflow';

// Simple transformation await new DocFlow() .load('document.docx') .transform('heading[level=2]', node => ({ ...node, text: node.text.toUpperCase() })) .save('output.docx');`

`⚠️ Important:` load() `vs` loadContent()

DO NOT confuse these two methods:

`$3`

// ❌ WRONG - This treats markdown as a file path! const markdown = '# Hello\n\nWorld'; await doc.load(markdown, { format: 'markdown' });`

`$3`

`javascript // ✅ CORRECT - Use load() for file paths await doc.load('document.docx'); await doc.load('data.json', { format: 'json' });

// ✅ CORRECT - Use loadContent() for content strings const jsonString = JSON.stringify(docJson); await doc.loadContent(jsonString);

const markdown = '# Hello\n\nWorld'; await doc.loadContent(markdown);`

`Core Concepts`

`$3`

Load from file, buffer, or content:

`javascript const doc = new DocFlow();

// From file path await doc.load('document.docx');

// From buffer (binary data) const buffer = fs.readFileSync('document.docx'); await doc.load(buffer);

// From JSON buffer with explicit format const jsonBuffer = Buffer.from(jsonString, 'utf-8'); await doc.load(jsonBuffer, { format: 'json' });

// From content string (auto-detects markdown, HTML, JSON, or text) await doc.loadContent('

`Title`

Content

');
await doc.loadContent('# Title\n\nContent');
await doc.loadContent(JSON.stringify(docJson));


$3
Find nodes using selectors:

`javascript // CSS-like selectors const headings = doc.query('heading[level=2]'); const paragraphs = doc.query('paragraph');

// Predicate functions const longParagraphs = doc.query(node => node.type === 'paragraph' && node.content?.length > 100 );

`$3`

Modify document structure:

`javascript import { Transforms } from '@paulmeller/docflow';

// Transform matching nodes await doc.transform('heading[level=2]', node => ({ ...node, text: Transforms.toTitleCase(node.text) }));

// Transform with built-in helpers await doc.transform('text', Transforms.replace(/foo/g, 'bar')); await doc.transform('text', Transforms.addPrefix('> '));

// Transform entire document await doc.transformDocument(json => { // Modify entire document structure return modifiedJSON; });`

`$3`

Export to various formats:

`Examples`

`$3`

`javascript import DocFlow from '@paulmeller/docflow';

// DOCX to Markdown await new DocFlow() .load('document.docx') .save('document.md', 'markdown');

// Markdown to DOCX await new DocFlow() .load('document.md') .save('document.docx');

// DOCX with tables to Markdown (tables preserved with formatting) await new DocFlow() .load('report-with-tables.docx') .save('report.md', 'markdown');

// Extract plain text for analysis const plainText = await new DocFlow() .load('document.docx') .export('text'); console.log(plainText); // All text without formatting`

`$3`

`javascript import DocFlow from '@paulmeller/docflow';

// Import JSON and convert to Markdown const doc2 = new DocFlow(); await doc2.load('report.json'); // Auto-detects .json extension const markdown = await doc2.export('markdown');

// Load JSON from content string (when you have it in memory) const doc4 = new DocFlow(); await doc4.loadContent(jsonString); // Auto-detects JSON const docx = await doc4.export('docx');`

`$3`

`javascript import { DocFlow, Transforms } from '@paulmeller/docflow';

`$3`

`javascript const doc = new DocFlow(); await doc.load('document.docx');

// Find all headings const headings = doc.query('heading'); console.log(Document has ${headings.count} headings);

// Find specific content const importantSections = doc.query(node => node.type === 'heading' && node.text?.toLowerCase().includes('important') );

console.log('Important sections:'); importantSections.map(h => console.log(- ${h.text}));`

`$3`

`javascript import { BatchProcessor } from '@paulmeller/docflow';

const batch = new BatchProcessor({ concurrency: 5 });

return { processed: true }; } );

console.log(✓ Processed ${results.successful} files); console.log(✗ Failed ${results.failed} files);`

`$3`

`javascript const doc = new DocFlow();

`$3`

`javascript const doc = new DocFlow(); await doc.load('document.docx');

const json = doc.toJSON();

console.log('Document Structure:'); console.log(JSON.stringify(stats, null, 2));

// Table of contents const toc = doc.query('heading') .map(h =>${' '.repeat(h.attrs.level - 1)}- ${h.text}) .join('\n');

console.log('\nTable of Contents:'); console.log(toc);`

`$3`

`javascript // Mode 1: Throw on error (default) try { await new DocFlow() .load('missing.docx') .save('output.docx'); } catch (error) { console.error('Failed:', error.message); }

// Mode 2: Collect errors const doc = new DocFlow({ errorMode: 'collect' }); await doc .load('input.docx') .transform('invalid-selector', node => node) .save('output.docx');

const errors = doc.getErrors(); if (errors.length > 0) { console.error('Errors occurred:', errors); }

// Mode 3: Silent (ignore errors) const doc2 = new DocFlow({ errorMode: 'silent' }); await doc2.load('maybe-exists.docx').save('output.docx');`

`$3`

// Apply it await doc .load('document.docx') .transform('paragraph', addTimestamp) .save('timestamped.docx');`

`$3`

`javascript const doc = new DocFlow(); await doc.load('document.docx');

// Only process if conditions are met const validation = doc.validate();

if (validation.valid) { const headings = doc.query('heading');

if (headings.count > 0) { await doc .transform('heading', node => ({ ...node, text:📌 ${node.text}})) .save('processed.docx'); } }`

`Format Auto-Detection`

DocFlow automatically detects formats to make your code simpler and more intuitive.

`$3`

// Explicit format for buffers const buffer = Buffer.from(jsonString, 'utf-8'); await doc.load(buffer, { format: 'json' });`

`$3`

- Auto-detects JSON: Looks for {"type": "doc"...}pattern - Auto-detects HTML: Looks for,

`,`
`, etc. - Auto-detects Markdown: Everything else treated as markdown - No format parameter needed - detection is automatic`
``javascript // All auto-detected await doc.loadContent('# Heading\n\nText'); // → markdown await doc.loadContent('`

`Title`

Text

');   // → HTML
await doc.loadContent('{"type": "doc", ...}');        // → JSON


API Reference
$3
#### Constructor

`typescript new DocFlow(options?: { headless?: boolean; // Default: true validateSchema?: boolean; // Default: true errorMode?: 'throw' | 'collect' | 'silent'; // Default: 'throw' })`

#### Methods

##### load(source, options?)

Load a document from file or buffer.

`typescript load(source: string | Buffer, options?: { format?: 'docx' | 'html' | 'markdown' | 'json' | 'text' }): Promise`

##### loadContent(content)

Load content directly. Automatically detects if content is markdown, HTML, or plain text. Auto-initializes a blank document if none exists.

`typescript loadContent(content: string): Promise`

##### toJSON()

Get document as ProseMirror JSON.

`typescript toJSON(): ProseMirrorJSON`

##### query(selector)

Query document structure.

`typescript query(selector: string | Function): QueryResult`

Selectors: -"heading"- All heading nodes -"heading[level=2]"- H2 headings -"paragraph"- All paragraphs -node => condition - Custom predicate

##### transform(selector, transformer)

Transform matching nodes.

`typescript transform( selector: string | Function, transformer: (node) => node | Promise | null ): Promise`

##### transformDocument(transformer)

Transform entire document.

`typescript transformDocument( transformer: (json) => json | Promise ): Promise`

##### export(format?, options?)

Export to format.

`typescript export( format?: 'docx' | 'html' | 'markdown' | 'json' | 'text', options?: ExportOptions ): Promise`

##### save(filepath, format?)

Save to file.

`typescript save(filepath: string, format?: string): Promise`

##### validate()

Validate document.

`typescript validate(): { valid: boolean; errors: string[]; warnings: string[]; document: ProseMirrorJSON; }`

##### getHistory()

Get operation history.

`typescript getHistory(): Operation[]`

##### getErrors()

Get collected errors (when errorMode: 'collect').

`typescript getErrors(): ErrorRecord[]`

`$3`

#### Properties

- count: Number of matching nodes -nodes: Array of found nodes -selector: Selector used

#### Methods

##### first()

Get first match.

`typescript first(): Node | null`

##### last()

Get last match.

`typescript last(): Node | null`

##### map(fn)

Map over results.

`typescript map(fn: (node, index) => T): T[]`

##### filter(fn)

Filter results.

`typescript filter(fn: (node, index) => boolean): QueryResult`

##### text()

Get concatenated text content.

`typescript text(): string`

##### transform(transformer)

Transform all found nodes.

`typescript transform(transformer: Function): Promise`

`$3`

Process multiple documents concurrently.

`typescript const batch = new BatchProcessor({ concurrency: 5, errorMode: 'collect' });

const results = await batch.process( ['file1.docx', 'file2.docx'], async (doc, file) => { // Process each document await doc.transform(...).save(...); return { success: true }; } );`

`$3`

Built-in transformation helpers.

`typescript import { Transforms } from '@paulmeller/docflow';

`ProseMirror JSON Structure`

Documents are represented as ProseMirror JSON:

`$3`

`Best Practices`

`$3`

`javascript // ✓ Good - Chainable, readable await doc .load('input.docx') .transform('heading', ...) .transform('paragraph', ...) .save('output.docx');

// ✗ Avoid - Verbose await doc.load('input.docx'); await doc.transform('heading', ...); await doc.transform('paragraph', ...); await doc.save('output.docx');`

`$3`

`javascript // ✓ Good - Specific doc.query('heading[level=2]')

// ✗ Avoid - Too broad doc.query('heading').filter(h => h.attrs.level === 2)`

`$3`

`javascript // ✓ Good - Handle errors try { await doc.load('file.docx'); } catch (error) { console.error('Failed to load:', error); }

// Or use collect mode const doc = new DocFlow({ errorMode: 'collect' }); await doc.load('file.docx'); if (doc.getErrors().length > 0) { // Handle errors }`

`$3`

`javascript // ✓ Good - Concurrent processing const batch = new BatchProcessor({ concurrency: 5 }); await batch.process(files, pipeline);

// ✗ Avoid - Sequential processing for (const file of files) { await new DocFlow().load(file).transform(...).save(...); }`

`TypeScript Usage`

Full TypeScript support included:

`typescript import DocFlow, { QueryResult, Transforms } from '@paulmeller/docflow';

const doc = new DocFlow({ errorMode: 'collect' });

await doc.load('document.docx');

const headings: QueryResult = doc.query('heading'); const count: number = headings.count;`

`Known Limitations`

Due to limitations in the underlying SuperDoc library's DOCX conversion engine:

`$3`

// ⚠️ LIMITED: DOCX round-trips may lose list structure await doc.load('document.docx'); const docx = await doc.export('docx'); await doc.load(docx); // May lose list items 2+`

For applications requiring full DOCX round-trip fidelity with complex lists and tables, consider using Microsoft Word's native APIs or alternative DOCX libraries.

Upstream Issue Tracker: These limitations originate from the SuperDoc library. You can track progress at SuperDoc GitHub Issues.

`Performance Tips`

`Troubleshooting`

`$3`

Error message:`Error: ENAMETOOLONG: name too long, open '{"type": "doc", "content": [...]}'`

Cause: You're passing content to load() instead of a file path.

Fix: Use loadContent() for content strings:

`javascript // ❌ WRONG - Treats JSON string as file path const json = await doc.export('json'); await doc.load(JSON.stringify(json), { format: 'json' });

// ✅ CORRECT - Use loadContent() for content const json = await doc.export('json'); await doc.loadContent(JSON.stringify(json));

// OR use load() with a buffer const buffer = Buffer.from(JSON.stringify(json), 'utf-8'); await doc.load(buffer, { format: 'json' });`

`$3`

`javascript // Check file exists if (fs.existsSync('document.docx')) { await doc.load('document.docx'); }

// Check format const format = path.extname('document.docx'); await doc.load('document.docx', { format: 'docx' });`

`$3`

`javascript // Verify selector matches const matches = doc.query('heading[level=2]'); console.log(Found ${matches.count} matches);

// Check transformation logic await doc.transform('heading', node => { console.log('Transforming:', node); return { ...node, text: node.text.toUpperCase() }; });`

`$3`

`javascript // Explicitly specify format await doc.save('output.docx', 'docx');

// Check supported formats const formats = ['docx', 'html', 'markdown', 'json', 'text'];``

License

MIT

Contributing

Contributions welcome! Please read CONTRIBUTING.md for guidelines.

Credits

Built on SuperDoc by Harbour Enterprises.

Support

- 📖 Documentation
- 💬 Discussions
- 🐛 Issues

@paulmeller/docflow

DocFlow

Features

Installation

Quick Start

⚠️ Important: load() vs loadContent()

$3

$3

Core Concepts

$3

Title

$3

$3

$3

Examples

$3

$3

$3

$3

$3

$3

$3

$3

$3

$3

$3

Format Auto-Detection

$3

$3

, , etc.- Auto-detects Markdown: Everything else treated as markdown- No format parameter needed - detection is automatic`javascript// All auto-detectedawait doc.loadContent('# Heading\n\nText'); // → markdownawait doc.loadContent('

Title

API Reference

$3

$3

$3

$3

ProseMirror JSON Structure

$3

Best Practices

$3

$3

$3

$3

$3

TypeScript Usage

Known Limitations

$3

$3

$3

Performance Tips

Troubleshooting

$3

$3

$3

$3

License

Contributing

Credits

Support

@paulmeller/docflow

DocFlow

Features

Installation

Quick Start

⚠️ Important: load() vs loadContent()

$3

$3

Core Concepts

$3

Title

$3

$3

$3

Examples

$3

$3

$3

$3

$3

$3

`Quick Start`

`⚠️ Important:` load() `vs` loadContent()

`$3`

`$3`

`Core Concepts`

`$3`

`Title`

`$3`

`$3`

`Examples`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`Format Auto-Detection`

`$3`

`$3`

`,`
`, etc. - Auto-detects Markdown: Everything else treated as markdown - No format parameter needed - detection is automatic`
``javascript // All auto-detected await doc.loadContent('# Heading\n\nText'); // → markdown await doc.loadContent('`

`Title`

`$3`

`$3`

`$3`

`ProseMirror JSON Structure`

`$3`

`Best Practices`

`$3`

`$3`

`$3`

`$3`

`$3`

`TypeScript Usage`

`Known Limitations`

`$3`

`$3`

`$3`

`Performance Tips`

`Troubleshooting`

`$3`

`$3`

`$3`

`$3`

`Quick Start`

`⚠️ Important:` load() `vs` loadContent()

`$3`

`$3`

`Core Concepts`

`$3`

`Title`

`$3`

`$3`

`Examples`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`Format Auto-Detection`

`$3`

`$3`

`,`
`, etc. - Auto-detects Markdown: Everything else treated as markdown - No format parameter needed - detection is automatic`
``javascript // All auto-detected await doc.loadContent('# Heading\n\nText'); // → markdown await doc.loadContent('`

`Title`

`$3`

`$3`

`$3`

`ProseMirror JSON Structure`

`$3`

`Best Practices`