A CLI tool that converts PDF files into Markdown using Azure Document Intelligence for text extraction and OpenAI for Markdown formatting.
npm install @transformgovsg/pdf2mdA CLI tool that converts PDF files into Markdown using Azure Document Intelligence for text extraction and OpenAI for Markdown formatting.
- 📜 Converts PDFs into Markdown format
- 🧠 Uses Azure Document Intelligence for text extraction
- 🤖 Enhances Markdown formatting with OpenAI LLM
- 🛠️ Simple CLI usage
Before using pdf2md, ensure you have the following:
- ✅ Node.js installed
- ✅ Azure Document Intelligence API credentials
- ✅ OpenAI API credentials
- ✅ Ensure you have the following environment variables set in your system or in a .env file in the current working directory:
``sh`
AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=
AZURE_DOCUMENT_INTELLIGENCE_API_KEY=
OPENAI_API_BASE_URL=
OPENAI_API_KEY=
OPENAI_CHAT_MODEL=
To convert a PDF file to Markdown without installing locally, run:
`sh`
pnpm dlx @transformgovsg/pdf2md
Alternatively, using yarn or npm:
`sh`
yarn dlx @transformgovsg/pdf2md
`sh`
npx @transformgovsg/pdf2md
Example:
`sh``
pnpm dlx @transformgovsg/pdf2md ./path/to/document.pdf
This project is licensed under the AGPL-3.0-only License.