A PDF to Markdown Converter
npm install @whalecloud/pdf2mdJavaScript npm library to parse PDF files and convert them into Markdown
自 https://github.com/opengovsg/pdf2md/ v0.1.27 版本 fork
增加功能 :
1. 识别目录,根据目录添加对应的h-x
``js
const fs = require('fs')
const pdf2md = require('@opendocsg/pdf2md')
const pdfBuffer = fs.readFileSync(filePath)
pdf2md(pdfBuffer, callbacks)
.then(text => {
let outputFile = allOutputPaths[i] + '.md'
console.log(Writing to ${outputFile}...)`
fs.writeFileSync(path.resolve(outputFile), text)
console.log('Done.')
})
.catch(err => {
console.error(err)
})
``
$ cd [project_folder]
$ npx @opendocsg/pdf2md --inputFolderPath=[your input folder path] --outputFolderPath=[your output folder path] --recursive
If you are converting recursively on a large number of files you might encounter the error "Allocation failed - JavaScript heap out of memory”. Instead, run the command
```
$ node lib/pdf2md-cli.js --max-old-space-size=4096 --inputFolderPath=[your input folder path] --outputFolderPath=[your output folder path] --recursive
Options:
1. Input folder path (should exist)
2. Output folder path (should exist)
3. Recursive - convert all PDFs for folders within folders. Specify the tag if you require recursive, and omit if you don't
pdf-to-markdown - original project by Johannes Zillmann
pdf.js - Mozilla's PDF parsing & rendering platform which is used as a raw parser