majk-chat-document-tools

Comprehensive document processing package for majk chat that adds support for parsing and analyzing PDF, Excel, Word, PowerPoint, and CSV files.

Features

- Universal Document Analyzer: Automatically detects file types and routes to appropriate parsers
- PDF Parser: Extract text content and metadata from PDF files
- Excel Parser: Parse XLSX/XLS files with sheet analysis and data extraction
- Word Parser: Extract text from DOCX files with multiple output formats
- PowerPoint Parser: Extract slide content, notes, and presentation structure
- CSV Parser: Intelligent CSV parsing with type detection and column analysis

Supported File Formats

| Format | Extensions | Features |
|--------|------------|----------|
| PDF | .pdf | Text extraction, metadata, page-specific parsing |
| Excel | .xlsx, .xls | Sheet parsing, data analysis, multiple output formats |
| Word | .docx | Text/HTML/Markdown output, style extraction |
| PowerPoint | .pptx | Slide content, speaker notes, presentation metadata |
| CSV | .csv, .tsv | Auto-delimiter detection, type inference, column analysis |

Installation

``bash npm install @majkapp/majk-chat-document-tools`

`Usage`

`$3`

`typescript import { DocumentAnalyzerTool } from '@majkapp/majk-chat-document-tools';

const analyzer = new DocumentAnalyzerTool();

// Automatically detect and parse any supported document const result = await analyzer.execute({ file_path: './document.pdf', analysis_type: 'auto', include_metadata: true }, context);`

`$3`

`typescript import { PdfParserTool, ExcelParserTool, WordParserTool, PowerPointParserV2Tool, CsvParserTool } from '@majkapp/majk-chat-document-tools';

// PDF parsing const pdfParser = new PdfParserTool(); const pdfResult = await pdfParser.execute({ file_path: './report.pdf', page_range: { start: 1, end: 5 }, extract_metadata: true }, context);

// Excel parsing const excelParser = new ExcelParserTool(); const excelResult = await excelParser.execute({ file_path: './data.xlsx', sheet_name: 'Sales Data', output_format: 'json', max_rows: 1000 }, context);`

`$3`

`typescript import { MajkChatBuilder } from '@majkapp/majk-chat-core'; import { registerDocumentTools } from '@majkapp/majk-chat-document-tools';

const builder = new MajkChatBuilder() .withProvider('anthropic') .withModel('claude-3-5-sonnet-20241022');

// Register all document tools registerDocumentTools(builder.getToolRegistry());

const chat = builder.build();`

`Tool Specifications`

`$3`

Automatically detects file type and applies the appropriate parser.

Parameters: -file_path(required): Path to document file -analysis_type: auto | text_only | structured | metadata-max_text_length: Maximum text extraction length (default: 50000) -include_metadata: Extract document metadata (default: true) -output_format: json | summary | detailed

`$3`

Parameters: -file_path(required): Path to PDF file -page_range: { start?: number, end?: number }-extract_metadata: Extract PDF metadata (default: true) -max_text_length: Text length limit (default: 50000)

`$3`

Parameters: -file_path(required): Path to Excel file -sheet_name: Specific sheet to parse -range: Excel range (e.g., "A1:D10") -header_row: Header row number (default: 1) -max_rows: Maximum rows to parse (default: 1000) -output_format: json | csv | table

`$3`

Parameters: -file_path(required): Path to Word file -output_format: plain | html | markdown-include_images: Process image references (default: false) -max_text_length: Text length limit (default: 50000) -extract_styles: Extract style information (default: false)

`$3`

Parameters: -file_path(required): Path to PowerPoint file -include_slide_notes: Extract speaker notes (default: true) -slide_numbers: Array of specific slides to extract -max_text_length: Text length limit (default: 50000) -extract_slide_titles: Extract slide titles (default: true) -include_shapes: Include shape details (default: false)

`$3`

Parameters: -file_path(required): Path to CSV file -delimiter: Column delimiter (auto-detected if not provided) -has_headers: Whether first row contains headers (auto-detected) -encoding: File encoding (utf8 | ascii | latin1) -max_rows: Maximum rows to parse (default: 5000) -output_format: json | table | summary

`Context Management Integration`

All parsers are designed to work seamlessly with majk-chat's context management system:

- Smart Truncation: Automatically truncates large documents while preserving structure - Incremental Reading: Supports offset/limit reading for large files viaread_tool_result- Memory Efficient: Processes documents in chunks to avoid memory issues - Token Optimization: Formats output to minimize token usage while preserving information

`Error Handling`

All tools provide comprehensive error handling:

- File Not Found: Clear error messages with resolved paths - Permission Denied: Specific permission error reporting - Invalid Format: Format validation with supported format guidance - Parsing Errors: Detailed parsing error information with context

`Dependencies`

- pdf-parse: PDF text extraction -xlsx: Excel/XLSX parsing -mammoth: Word document processing -node-pptx-parser: PowerPoint parsing -csv-parser`: CSV parsing and analysis

License

MIT

majk-chat-document-tools

Comprehensive document processing package for majk chat that adds support for parsing and analyzing PDF, Excel, Word, PowerPoint, and CSV files.

Features

Supported File Formats

Installation

``bash npm install @majkapp/majk-chat-document-tools`

`Usage`

`$3`

`typescript import { DocumentAnalyzerTool } from '@majkapp/majk-chat-document-tools';

const analyzer = new DocumentAnalyzerTool();

// Automatically detect and parse any supported document const result = await analyzer.execute({ file_path: './document.pdf', analysis_type: 'auto', include_metadata: true }, context);`

`$3`

`typescript import { PdfParserTool, ExcelParserTool, WordParserTool, PowerPointParserV2Tool, CsvParserTool } from '@majkapp/majk-chat-document-tools';

// PDF parsing const pdfParser = new PdfParserTool(); const pdfResult = await pdfParser.execute({ file_path: './report.pdf', page_range: { start: 1, end: 5 }, extract_metadata: true }, context);

`$3`

`typescript import { MajkChatBuilder } from '@majkapp/majk-chat-core'; import { registerDocumentTools } from '@majkapp/majk-chat-document-tools';

const builder = new MajkChatBuilder() .withProvider('anthropic') .withModel('claude-3-5-sonnet-20241022');

// Register all document tools registerDocumentTools(builder.getToolRegistry());

const chat = builder.build();`

`Tool Specifications`

`$3`

Automatically detects file type and applies the appropriate parser.

`$3`

`Context Management Integration`

All parsers are designed to work seamlessly with majk-chat's context management system:

`Error Handling`

All tools provide comprehensive error handling:

`Dependencies`

- pdf-parse: PDF text extraction -xlsx: Excel/XLSX parsing -mammoth: Word document processing -node-pptx-parser: PowerPoint parsing -csv-parser`: CSV parsing and analysis

License

MIT