Mobile web app that captures photos and extracts text using a local llama.cpp LLM server
npm install ocr-server
A mobile web app that captures photos and extracts text using a local llama.cpp LLM server. Features a queued processing system, real-time status updates, and an accordion-style UI for viewing results.
- 📱 Mobile-optimized camera interface with live preview
- ⏳ Queued processing - capture multiple images, process one at a time
- 🔄 Real-time status updates via polling
- ✅ Visual status indicators: pending, processing, complete, error
- 📂 Expandable accordion UI for viewing OCR results
- 🔒 Self-signed HTTPS (required for mobile camera access)
- ⚡ CLI options for configuration
- Node.js (>=18.0.0)
- llama.cpp server with vision model
``bash
npm install -g ocr-server
$3
`bash
Clone the repository
git clone
cd ocr-serverInstall dependencies
npm install
`Usage
$3
#### Download and setup NuMarkdown model
Download NuMarkdown model files from Hugging Face:
`bash
wget https://huggingface.co/mradermacher/NuMarkdown-8B-Thinking-GGUF/resolve/main/NuMarkdown-8B-Thinking.f16.gguf
wget https://huggingface.co/mradermacher/NuMarkdown-8B-Thinking-GGUF/resolve/main/NuMarkdown-8B-Thinking.mmproj-f16.gguf
`3. Start llama.cpp server with NuMarkdown:
`bash
llama-server -m NuMarkdown-8B-Thinking.f16.gguf --mmproj NuMarkdown-8B-Thinking.mmproj-f16.gguf --port 8080
`> Note: You can use any compatible vision model with llama.cpp. Simply replace the model paths with your own.
$3
`bash
If installed globally, run directly:
ocr-server --helpDefault settings (port 5666, connects to llama.cpp on localhost:8080)
ocr-serverCustom port
ocr-server --port 3000Connect to remote llama.cpp server
ocr-server --llama-host 192.168.1.100 --llama-port 8080Bind to specific host
ocr-server --host 127.0.0.1 --port 3000
`$3
| Option | Short | Description | Default |
|--------|-------|-------------|---------|
|
--host | -h | Host to bind this server | 0.0.0.0 |
| --port | -p | Port for this HTTPS server | 5666 |
| --llama-host | - | Host of llama.cpp server | localhost |
| --llama-port | - | Port of llama.cpp server | 8080 |
| --help | - | Show help message | - |$3
1. Open your mobile browser to:
https://
2. Accept the self-signed certificate warning (required for camera access)
3. Grant camera permissions when prompted
4. Point the camera at text and tap "Capture & Process"
5. Watch the queue status update in real-time
6. Tap completed jobs to expand and view the OCR resultsOutput
OCR results are saved to the current directory as:
`
ocr_YYYY-MM-DD_HH-mm-ss.md
`How It Works
1. Frontend: Captures camera frames and sends to backend via POST
/api/ocr
2. Backend: Images are added to an in-memory queue, job ID returned immediately
3. Worker: Background processor handles one image at a time via llama.cpp API
4. Real-time updates: Frontend polls GET /api/jobs every 2 seconds
5. Save: Results written to timestamped .md files with tags strippedDevelopment
`bash
Run type checking
tsc --noEmit server.tsStart with hot reload
npm run dev
`Publishing to npm
`bash
Build the package
npm run buildPublish to npm
npm publish
``MIT