Transmutation Lite

> Simplified TypeScript document converter for common formats to Markdown

![Tests](https://github.com/hivellm/transmutation-lite/actions/workflows/test.yml)
![Lint](https://github.com/hivellm/transmutation-lite/actions/workflows/lint.yml)
![Build](https://github.com/hivellm/transmutation-lite/actions/workflows/build.yml)
![License: MIT](https://opensource.org/licenses/MIT)

Version: 0.6.1
Status: ✅ Production Ready - 177/177 Tests Passing (100%)

Overview

Transmutation Lite is a lightweight, TypeScript-based document conversion library that converts common document formats (PDF, DOCX, XLSX, PPTX, HTML, TXT) to Markdown. Designed specifically for integration with the HiveLLM Classify project, it provides fast, reliable conversions without the complexity of OCR, audio, video, or archive processing.

> ⚠️ Important: This is a simplified version optimized for use cases that do not require high precision, such as document classification. For production applications requiring high-quality conversion, advanced features, and superior performance, we strongly recommend using the full Transmutation Rust library, which offers:
> - 98x faster performance than Docling
> - 80%+ conversion quality with precision mode
> - Advanced OCR, audio, and video processing
> - Enterprise-grade reliability and accuracy
>
> Use Transmutation Lite for: Document classification, quick previews, prototyping
> Use Transmutation (Rust) for: RAG systems, production pipelines, high-quality document processing

Features

$3

- ✅ PDF to Markdown - Using pdf-parse-new for reliable text extraction
- ✅ DOCX to Markdown - Using mammoth for Word documents
- ✅ XLSX to Markdown - Using xlsx for Excel spreadsheets (tables)
- ✅ PPTX to Markdown - Basic text extraction from PowerPoint
- ✅ HTML to Markdown - Using turndown for clean conversion
- ✅ TXT to Markdown - Plain text normalization
- ✅ CLI & Library - Use as command-line tool or Node.js library
- ✅ TypeScript - Full type safety and IntelliSense support

$3

- ✅ Result Caching - LRU cache with SHA-256 content hashing
- ✅ Input Validation - Comprehensive validation with security checks
- ✅ Error Handling - Clear, actionable error messages
- ✅ Logging System - Configurable logging (DEBUG, INFO, WARN, ERROR)
- ✅ Metrics Collection - Production monitoring and analytics
- ✅ Batch Processing - Parallel file conversion
- ✅ Fast & Lightweight - No heavy dependencies or external tools

$3

- ✅ 177 Tests Passing - 100% success rate, full coverage
- ✅ CI/CD Ready - GitHub Actions workflows
- ✅ 5 Comprehensive Examples - Complete usage demonstrations
- ✅ Security - Path traversal protection, buffer limits (500MB)
- ✅ Complete Documentation - API docs, guides, examples

Quick Start

$3

npm installation (recommended):
``bash npm install @hivellm/transmutation-lite`

Local usage (within monorepo):`bash

`From classify project`


npm install
Uses: "file:../transmutation-lite" dependency

$3

`typescript import { Converter, Logger, LogLevel } from '@hivellm/transmutation-lite';

// Simple conversion const converter = new Converter(); const result = await converter.convertFile('./document.pdf'); console.log(result.markdown);

// With caching and logging const converter = new Converter({ enableCache: true, cacheSize: 100, logger: new Logger({ level: LogLevel.INFO }), collectMetrics: true, });

const result = await converter.convertFile('./document.docx', { preserveFormatting: true, maxPages: 10, });

console.log('Format:', result.metadata.format); console.log('Pages:', result.metadata.pageCount); console.log('Time:', result.conversionTimeMs, 'ms');

// Get metrics const metrics = converter.getMetricsSummary(); console.log('Success Rate:', metrics.successRate); console.log('Cache Hit Rate:', metrics.cacheHitRate);`

`$3`

`bash

`Convert single file`


npx transmutation-lite convert document.pdf -o output.md
Convert with options

npx transmutation-lite convert report.docx -o report.md --max-pages 5
Batch conversion

npx transmutation-lite batch ./documents -o ./output --parallel 4
Recursive batch conversion

npx transmutation-lite batch ./docs -o ./markdown --recursive
List supported formats

npx transmutation-lite formats


API Reference
$3
Convenience function to convert a single file.

Parameters: -filePath(string): Path to the file to convert -options (ConversionOptions): Optional conversion options

Returns: Promise

`typescript const result = await convert('./document.pdf', { preserveFormatting: true, maxPages: 10, });`

`$3`

Main converter class that manages all format converters.

#### Methods

##### convertFile(filePath, options?)

Convert a file to Markdown.

`typescript const converter = new Converter(); const result = await converter.convertFile('./document.docx');`

##### convertBuffer(buffer, format, options?)

Convert a buffer directly.

`typescript const buffer = await readFile('./document.pdf'); const result = await converter.convertBuffer(buffer, DocumentFormat.PDF);`

##### detectFormat(filePath)

Detect document format from file extension.

`typescript const format = converter.detectFormat('./document.pdf'); // Returns: DocumentFormat.PDF`

##### isSupported(filePath)

Check if a file format is supported.

`typescript const supported = converter.isSupported('./document.pdf'); // Returns: true`

##### getSupportedFormats()

Get list of supported formats.

`typescript const formats = converter.getSupportedFormats(); // Returns: [DocumentFormat.PDF, DocumentFormat.DOCX, ...]`

`$3`

#### ConversionOptions

`typescript interface ConversionOptions { preserveFormatting?: boolean; // Default: true extractImages?: boolean; // Default: false (not implemented) maxPages?: number; // Default: 0 (all pages) formatOptions?: Record; }`

#### ConversionResult

`typescript interface ConversionResult { markdown: string; // Converted markdown content metadata: DocumentMetadata; // Document metadata conversionTimeMs: number; // Conversion time in milliseconds warnings?: string[]; // Any warnings during conversion }`

#### DocumentMetadata

`typescript interface DocumentMetadata { format: DocumentFormat; // Original format fileSize: number; // File size in bytes pageCount?: number; // Number of pages/sheets/slides title?: string; // Document title author?: string; // Document author createdAt?: Date; // Creation date extra?: Record; // Format-specific metadata }`

#### DocumentFormat Enum

`typescript enum DocumentFormat { PDF = 'pdf', DOCX = 'docx', XLSX = 'xlsx', PPTX = 'pptx', TXT = 'txt', HTML = 'html', UNKNOWN = 'unknown', }`

`Supported Formats`

| Format | Extension | Library | Quality | Notes | |--------|-----------|---------|---------|-------| | PDF |.pdf| pdf-parse | Basic | Text extraction only, no images, limited formatting | | DOCX |.docx| mammoth | Good | Formatting support, quality depends on mammoth | | XLSX |.xlsx, .xls| xlsx | Good | Converts to Markdown tables | | PPTX |.pptx, .ppt| jszip (basic) | ⚠️ Limited | Basic text extraction only, experimental | | HTML |.html, .htm| turndown | Good | Clean Markdown conversion | | TXT |.txt, .md | native | Good | Direct text handling |

`Integration with Classify`

Transmutation Lite is designed to integrate seamlessly with the HiveLLM Classify project:

`typescript import { convert } from '@hivellm/transmutation-lite'; import { ClassifyClient } from '@hivellm/classify';

// Convert document to Markdown const conversionResult = await convert('./contract.pdf');

// Classify the markdown content const classifier = new ClassifyClient({ provider: 'deepseek', apiKey: process.env.DEEPSEEK_API_KEY, });

const classificationResult = await classifier.classifyText( conversionResult.markdown );

console.log('Domain:', classificationResult.classification.domain); console.log('Type:', classificationResult.classification.doc_type);`

`CLI Commands`

`$3`

Convert a single file to Markdown.

Options: --o, --output - Output file path (default: .md) --m, --max-pages - Maximum pages/sheets to process ---no-preserve-formatting - Disable formatting preservation

Examples:

`bash

`Basic conversion`


transmutation-lite convert document.pdf
Specify output

transmutation-lite convert document.pdf -o output/document.md
Limit pages

transmutation-lite convert large.pdf --max-pages 10
Disable formatting

transmutation-lite convert document.docx --no-preserve-formatting


$3
Convert all supported files in a directory.

Options: --o, --output - Output directory (default: /output) --r, --recursive- Process subdirectories recursively --m, --max-pages - Maximum pages/sheets to process ---parallel - Number of parallel conversions (default: 4) ---no-preserve-formatting - Disable formatting preservation

Examples:

`bash

`Basic batch conversion`


transmutation-lite batch ./documents
Recursive with custom output

transmutation-lite batch ./docs -o ./markdown --recursive
High parallelism

transmutation-lite batch ./files --parallel 8
Limit pages for all files

transmutation-lite batch ./pdfs --max-pages 5


$3
List all supported file formats.

`bash transmutation-lite formats`

`Examples`

`$3`

`typescript import { Converter } from '@hivellm/transmutation-lite';

const converter = new Converter(); const result = await converter.convertFile('./research-paper.pdf');

console.log('Title:', result.metadata.title); console.log('Author:', result.metadata.author); console.log('Pages:', result.metadata.pageCount); console.log('Created:', result.metadata.createdAt); console.log('Producer:', result.metadata.extra?.producer);`

`$3`

`typescript import { Converter } from '@hivellm/transmutation-lite'; import { readdir } from 'fs/promises'; import { join } from 'path';

const converter = new Converter(); const files = await readdir('./documents');

let completed = 0; for (const file of files) { const filePath = join('./documents', file); if (converter.isSupported(filePath)) { try { const result = await converter.convertFile(filePath); await writeFile(./output/${file}.md, result.markdown ); completed++; console.log(✅ [${completed}/${files.length}] ${file}); } catch (error) { console.error(❌ Failed: ${file}, error); } } }`

`$3`

`typescript import { convert } from '@hivellm/transmutation-lite';

const result = await convert('./data.xlsx');

// Output will be Markdown tables: // # Sheet1 // // | Column A | Column B | Column C | // | --- | --- | --- | // | Value 1 | Value 2 | Value 3 | // | Value 4 | Value 5 | Value 6 |

console.log(result.markdown);`

`$3`

`typescript import { convert, ConversionError } from '@hivellm/transmutation-lite';

try { const result = await convert('./document.pdf'); console.log(result.markdown); } catch (error) { if (error instanceof ConversionError) { console.error('Conversion failed:', error.message); console.error('Format:', error.format); console.error('Cause:', error.cause); } else { console.error('Unexpected error:', error); } }`

`Performance`

> Note: Performance estimates below are theoretical. Real-world performance may vary based on document complexity and system specifications.

Estimated conversion times:

| Format | File Size | Pages | Estimated Time | Notes | |--------|-----------|-------|----------------|-------| | PDF | 2 MB | 15 | ~200-500ms | Text extraction only, depends on pdf-parse | | DOCX | 500 KB | 20 | ~150-300ms | Depends on document complexity | | XLSX | 1 MB | 10 sheets | ~100-200ms | Table conversion | | PPTX | 3 MB | 30 slides | ~300-600ms | Basic text extraction via jszip | | HTML | 200 KB | - | ~50-100ms | Turndown conversion |

Important: For production use requiring guaranteed performance and quality, use the full Transmutation Rust library.

`Limitations`

- No OCR - PDF images are not extracted or processed - No Audio/Video - Audio and video files are not supported - No Archives - ZIP, TAR, and other archives are not supported - PPTX Quality - PowerPoint conversion is basic (text-only) - Images - Image extraction is not implemented in this lite version - Lower Precision - Text extraction quality is lower than the Rust version

`$3`

| Feature | Transmutation Lite (TypeScript) | Transmutation (Rust) | |---------|--------------------------------|---------------------| | Best For | Document classification, quick previews | RAG systems, production pipelines | | Precision | ⚠️ Basic (untested, depends on libraries) | ✅ Excellent (80%+ tested) | | Performance | Moderate (Node.js) | ✅ 98x faster than Docling (tested) | | Memory | Moderate (Node.js overhead) | ✅ Very Low (~20MB tested) | | OCR Support | ❌ No | ✅ Yes (Tesseract) | | Audio/Video | ❌ No | ✅ Yes (Whisper) | | Archives | ❌ No | ✅ Yes (ZIP, TAR, etc.) | | Setup | ✅ npm install (local) | Requires Rust/binary | | Integration | ✅ Easy (Node.js) | Moderate (CLI/FFI) | | Status | ✅ Production Ready | ✅ Production Ready |

Recommendation: For production use cases requiring high-quality document conversion, we strongly recommend using the full Transmutation Rust library. Transmutation Lite is ideal for: - Quick document classification (like HiveLLM Classify) - Prototyping and development - Simple text extraction where precision is not critical - Node.js-only environments without Rust toolchain

`Development`

`$3`

`bash npm run build`

`$3`

`bash npm test`

`$3`

`bash npm run lint npm run lint:fix`

`$3`

`bash npm run type-check`

`Project Structure`

`transmutation-lite/ ├── src/ │ ├── converters/ │ │ ├── base.ts # Base converter interface │ │ ├── pdf.ts # PDF converter │ │ ├── docx.ts # DOCX converter │ │ ├── xlsx.ts # XLSX converter │ │ ├── pptx.ts # PPTX converter │ │ ├── html.ts # HTML converter │ │ ├── txt.ts # TXT converter │ │ └── index.ts # Converter exports │ ├── types.ts # Type definitions │ ├── index.ts # Main library export │ └── cli.ts # CLI implementation ├── tests/ # Test files ├── dist/ # Compiled output ├── package.json ├── tsconfig.json └── README.md``

Related Projects

- Classify - Intelligent document classification using LLMs (uses Transmutation Lite)
- Transmutation ⭐ - Recommended: Full-featured Rust document converter with 98x performance and 80%+ quality
- Vectorizer - Vector database and search engine (uses full Transmutation)
- Nexus - Graph database with vector search

License

MIT

Contributing

This project follows the HiveLLM ecosystem standards:

1. TypeScript 5.x with strict mode
2. Tests with vitest (currently basic coverage)
3. Clear documentation
4. Semantic versioning
5. Conventional commits

Current Status: Production ready - 177/177 tests passing, fully documented, ready for npm publication.

Changelog

See CHANGELOG.md for version history and release notes.

---

Contact: HiveLLM Development Team

Transmutation Lite

> Simplified TypeScript document converter for common formats to Markdown

Version: 0.6.1
Status: ✅ Production Ready - 177/177 Tests Passing (100%)

Overview

Features

$3

Quick Start

$3

npm installation (recommended):
``bash npm install @hivellm/transmutation-lite`

Local usage (within monorepo):`bash

`From classify project`


npm install
Uses: "file:../transmutation-lite" dependency

$3

`typescript import { Converter, Logger, LogLevel } from '@hivellm/transmutation-lite';

// Simple conversion const converter = new Converter(); const result = await converter.convertFile('./document.pdf'); console.log(result.markdown);

// With caching and logging const converter = new Converter({ enableCache: true, cacheSize: 100, logger: new Logger({ level: LogLevel.INFO }), collectMetrics: true, });

const result = await converter.convertFile('./document.docx', { preserveFormatting: true, maxPages: 10, });

console.log('Format:', result.metadata.format); console.log('Pages:', result.metadata.pageCount); console.log('Time:', result.conversionTimeMs, 'ms');

// Get metrics const metrics = converter.getMetricsSummary(); console.log('Success Rate:', metrics.successRate); console.log('Cache Hit Rate:', metrics.cacheHitRate);`

`$3`

`bash

`Convert single file`


npx transmutation-lite convert document.pdf -o output.md
Convert with options

npx transmutation-lite convert report.docx -o report.md --max-pages 5
Batch conversion

npx transmutation-lite batch ./documents -o ./output --parallel 4
Recursive batch conversion

npx transmutation-lite batch ./docs -o ./markdown --recursive
List supported formats

npx transmutation-lite formats


API Reference
$3
Convenience function to convert a single file.

Parameters: -filePath(string): Path to the file to convert -options (ConversionOptions): Optional conversion options

Returns: Promise

`typescript const result = await convert('./document.pdf', { preserveFormatting: true, maxPages: 10, });`

`$3`

Main converter class that manages all format converters.

#### Methods

##### convertFile(filePath, options?)

Convert a file to Markdown.

`typescript const converter = new Converter(); const result = await converter.convertFile('./document.docx');`

##### convertBuffer(buffer, format, options?)

Convert a buffer directly.

`typescript const buffer = await readFile('./document.pdf'); const result = await converter.convertBuffer(buffer, DocumentFormat.PDF);`

##### detectFormat(filePath)

Detect document format from file extension.

`typescript const format = converter.detectFormat('./document.pdf'); // Returns: DocumentFormat.PDF`

##### isSupported(filePath)

Check if a file format is supported.

`typescript const supported = converter.isSupported('./document.pdf'); // Returns: true`

##### getSupportedFormats()

Get list of supported formats.

`typescript const formats = converter.getSupportedFormats(); // Returns: [DocumentFormat.PDF, DocumentFormat.DOCX, ...]`

`$3`

#### ConversionOptions

#### ConversionResult

#### DocumentMetadata

#### DocumentFormat Enum

`typescript enum DocumentFormat { PDF = 'pdf', DOCX = 'docx', XLSX = 'xlsx', PPTX = 'pptx', TXT = 'txt', HTML = 'html', UNKNOWN = 'unknown', }`

`Supported Formats`

`Integration with Classify`

Transmutation Lite is designed to integrate seamlessly with the HiveLLM Classify project:

`typescript import { convert } from '@hivellm/transmutation-lite'; import { ClassifyClient } from '@hivellm/classify';

// Convert document to Markdown const conversionResult = await convert('./contract.pdf');

// Classify the markdown content const classifier = new ClassifyClient({ provider: 'deepseek', apiKey: process.env.DEEPSEEK_API_KEY, });

const classificationResult = await classifier.classifyText( conversionResult.markdown );

console.log('Domain:', classificationResult.classification.domain); console.log('Type:', classificationResult.classification.doc_type);`

`CLI Commands`

`$3`

Convert a single file to Markdown.

Options: --o, --output - Output file path (default: .md) --m, --max-pages - Maximum pages/sheets to process ---no-preserve-formatting - Disable formatting preservation

Examples:

`bash

`Basic conversion`


transmutation-lite convert document.pdf
Specify output

transmutation-lite convert document.pdf -o output/document.md
Limit pages

transmutation-lite convert large.pdf --max-pages 10
Disable formatting

transmutation-lite convert document.docx --no-preserve-formatting


$3
Convert all supported files in a directory.

Examples:

`bash

`Basic batch conversion`


transmutation-lite batch ./documents
Recursive with custom output

transmutation-lite batch ./docs -o ./markdown --recursive
High parallelism

transmutation-lite batch ./files --parallel 8
Limit pages for all files

transmutation-lite batch ./pdfs --max-pages 5


$3
List all supported file formats.

`bash transmutation-lite formats`

`Examples`

`$3`

`typescript import { Converter } from '@hivellm/transmutation-lite';

const converter = new Converter(); const result = await converter.convertFile('./research-paper.pdf');

`$3`

`typescript import { Converter } from '@hivellm/transmutation-lite'; import { readdir } from 'fs/promises'; import { join } from 'path';

const converter = new Converter(); const files = await readdir('./documents');

`$3`

`typescript import { convert } from '@hivellm/transmutation-lite';

const result = await convert('./data.xlsx');

// Output will be Markdown tables: // # Sheet1 // // | Column A | Column B | Column C | // | --- | --- | --- | // | Value 1 | Value 2 | Value 3 | // | Value 4 | Value 5 | Value 6 |

console.log(result.markdown);`

`$3`

`typescript import { convert, ConversionError } from '@hivellm/transmutation-lite';

`Performance`

> Note: Performance estimates below are theoretical. Real-world performance may vary based on document complexity and system specifications.

Estimated conversion times:

Important: For production use requiring guaranteed performance and quality, use the full Transmutation Rust library.

`Limitations`

`$3`

`Development`

`$3`

`bash npm run build`

`$3`

`bash npm test`

`$3`

`bash npm run lint npm run lint:fix`

`$3`

`bash npm run type-check`

`Project Structure`

Related Projects

License

MIT

Contributing

This project follows the HiveLLM ecosystem standards:

1. TypeScript 5.x with strict mode
2. Tests with vitest (currently basic coverage)
3. Clear documentation
4. Semantic versioning
5. Conventional commits

Current Status: Production ready - 177/177 tests passing, fully documented, ready for npm publication.

Changelog

See CHANGELOG.md for version history and release notes.

---

Contact: HiveLLM Development Team

@hivehub/transmutation-lite

Transmutation Lite

Overview

Features

$3

$3

$3

Quick Start

$3

From classify project

Uses: "file:../transmutation-lite" dependency

$3

$3

Convert single file

Convert with options

Batch conversion

Recursive batch conversion

List supported formats

API Reference

$3

$3

$3

Supported Formats

Integration with Classify

CLI Commands

$3

Basic conversion

Specify output

Limit pages

Disable formatting

$3

Basic batch conversion

Recursive with custom output

High parallelism

Limit pages for all files

$3

Examples

$3

$3

$3

$3

Performance

Limitations

$3

Development

$3

$3

$3

$3

Project Structure

Related Projects

License

Contributing

Changelog

@hivehub/transmutation-lite

Transmutation Lite

Overview

Features

$3

$3

$3

Quick Start

$3

From classify project

Uses: "file:../transmutation-lite" dependency

$3

$3

Convert single file

Convert with options

Batch conversion

Recursive batch conversion

List supported formats

API Reference

$3

$3

$3

Supported Formats

Integration with Classify

CLI Commands

$3

`From classify project`

`$3`

`Convert single file`

`$3`

`$3`

`Supported Formats`

`Integration with Classify`

`CLI Commands`

`$3`

`Basic conversion`

`Basic batch conversion`

`Examples`

`$3`

`$3`

`$3`

`$3`

`Performance`

`Limitations`

`$3`

`Development`

`$3`

`$3`

`$3`

`$3`

`Project Structure`

`From classify project`

`$3`

`Convert single file`

`$3`

`$3`

`Supported Formats`

`Integration with Classify`

`CLI Commands`

`$3`

`Basic conversion`

`Basic batch conversion`

`Examples`

`$3`

`$3`

`$3`

`$3`

`Performance`

`Limitations`

`$3`

`Development`

`$3`

`$3`

`$3`

`$3`

`Project Structure`