Convert PDF and DOCX files to Markdown using AI
npm install f2mdConvert PDF, DOCX, and image files to Markdown using AI. This CLI tool extracts text, images, and preserves table structure while converting documents to clean, well-formatted Markdown. It also supports OCR text extraction from images.
- PDF Support - Full text extraction, image extraction, and page screenshots for layout understanding
- DOCX Support - Text and image extraction with structure preservation
- Image OCR - Extract text from images (PNG, JPG, JPEG, GIF, WEBP) using AI-powered OCR
- AI-Powered Conversion - Uses Google's Gemini AI to intelligently convert content to Markdown
- Interactive CLI - Friendly prompts using clack.js
- Easy Setup - Built-in configuration wizard for API keys
``bash`
npx f2md document.pdf
`bash`
bunx f2md document.pdf
`bash`
pnpm dlx f2md document.pdf
`bash`
npm install -g f2mdor
bun install -g f2md
Before using the tool, you need to configure your Google AI API key.
`bash`
f2md setupor with npx
npx f2md setup
The setup wizard will:
1. Show you where to get a Google AI API key (https://aistudio.google.com/apikey)
2. Prompt you to enter your API key
3. Ask where to save it (local project or global for all projects)
Alternatively, set the environment variable:
`bash`
export GOOGLE_GENERATIVE_AI_API_KEY="your-api-key-here"
Or create a .env file in your project:
``
GOOGLE_GENERATIVE_AI_API_KEY=your-api-key-here
`bash`
f2md
The tool will prompt you for:
- Input file path (PDF, DOCX, or image)
- Output file path
`bashConvert with auto-generated output name
f2md document.pdf
$3
- PDF (
.pdf)
- Word Documents (.docx)
- Images (.png, .jpg, .jpeg, .gif, .webp) - OCR text extractionOptions
`bash
f2md --help # Show help
f2md --version # Show version
f2md setup # Configure API key
`How It Works
$3
1. Extraction - Reads the input file and extracts text, images, and layout information
2. Processing - For PDFs, captures page screenshots to understand visual layout
3. AI Conversion - Sends extracted content to Google's Gemini AI model
4. Markdown Generation - Receives AI-generated Markdown with proper formatting
5. Cleanup - Removes unused images and saves the final output
$3
1. Image Processing - Reads the image file and encodes it for AI processing
2. OCR Analysis - Sends the image to Google's Gemini AI with specialized prompts for text extraction
3. Text Extraction - AI extracts all visible text while preserving structure (headings, lists, tables)
4. Markdown Generation - Converts extracted content to well-formatted Markdown
5. Output - Saves the final Markdown file
Development
$3
- Bun installed
$3
`bash
Clone the repository
git clone
cd f2mdInstall dependencies
bun installRun in development mode
bun run dev
`$3
`bash
bun run build
`$3
`
src/
cli.ts - CLI entry point with clack prompts
convert.ts - Core conversion logic
index.ts - Public API exports
dist/ - Built output (generated)
`API Usage
You can also use this as a library in your Node.js/Bun projects:
`typescript
import { convert } from "f2md";const result = await convert("input.pdf", "output.md", {
onProgress: (message) => console.log(message),
respectPages: false,
});
console.log(
Saved to: ${result.outputPath});
console.log(Images saved: ${result.imagesSaved});
console.log(Images cleaned: ${result.imagesDeleted});
``MIT