OCR Server

Gemini_Generated_Image_29fy4c29fy4c29fy

A mobile web app that captures photos and extracts text using a local llama.cpp LLM server. Features a queued processing system, real-time status updates, and an accordion-style UI for viewing results.

Features

- 📱 Mobile-optimized camera interface with live preview
- ⏳ Queued processing - capture multiple images, process one at a time
- 🔄 Real-time status updates via polling
- ✅ Visual status indicators: pending, processing, complete, error
- 📂 Expandable accordion UI for viewing OCR results
- 🔒 Self-signed HTTPS (required for mobile camera access)
- ⚡ CLI options for configuration

Requirements

- Node.js (>=18.0.0)
- llama.cpp server with vision model

Installation

$3

``bash npm install -g ocr-server

`Then run from anywhere:`


ocr-server --help

$3

`bash

`Clone the repository`


git clone 
cd ocr-server
Install dependencies

npm install


Usage
$3
#### Download and setup NuMarkdown model

Download NuMarkdown model files from Hugging Face:`bash wget https://huggingface.co/mradermacher/NuMarkdown-8B-Thinking-GGUF/resolve/main/NuMarkdown-8B-Thinking.f16.gguf wget https://huggingface.co/mradermacher/NuMarkdown-8B-Thinking-GGUF/resolve/main/NuMarkdown-8B-Thinking.mmproj-f16.gguf`

3. Start llama.cpp server with NuMarkdown:`bash llama-server -m NuMarkdown-8B-Thinking.f16.gguf --mmproj NuMarkdown-8B-Thinking.mmproj-f16.gguf --port 8080`

> Note: You can use any compatible vision model with llama.cpp. Simply replace the model paths with your own.

`$3`

`bash

`If installed globally, run directly:`


ocr-server --help
Default settings (port 5666, connects to llama.cpp on localhost:8080)

ocr-server
Custom port

ocr-server --port 3000
Connect to remote llama.cpp server

ocr-server --llama-host 192.168.1.100 --llama-port 8080
Bind to specific host

ocr-server --host 127.0.0.1 --port 3000

$3

| Option | Short | Description | Default | |--------|-------|-------------|---------| |--host | -h | Host to bind this server | 0.0.0.0| |--port | -p | Port for this HTTPS server | 5666| |--llama-host | - | Host of llama.cpp server | localhost| |--llama-port | - | Port of llama.cpp server | 8080| |--help | - | Show help message | - |

`$3`

1. Open your mobile browser to: https://:56662. Accept the self-signed certificate warning (required for camera access) 3. Grant camera permissions when prompted 4. Point the camera at text and tap "Capture & Process" 5. Watch the queue status update in real-time 6. Tap completed jobs to expand and view the OCR results

`Output`

OCR results are saved to the current directory as:`ocr_YYYY-MM-DD_HH-mm-ss.md`

`How It Works`

1. Frontend: Captures camera frames and sends to backend via POST /api/ocr2. Backend: Images are added to an in-memory queue, job ID returned immediately 3. Worker: Background processor handles one image at a time via llama.cpp API 4. Real-time updates: Frontend pollsGET /api/jobsevery 2 seconds 5. Save: Results written to timestamped.md files with tags stripped

`Development`

`bash

`Run type checking`


tsc --noEmit server.ts
Start with hot reload

npm run dev


Publishing to npm

`bash

`Build the package`


npm run build
Publish to npm

npm publish

License

MIT

OCR Server

Gemini_Generated_Image_29fy4c29fy4c29fy

Features

Requirements

- Node.js (>=18.0.0)
- llama.cpp server with vision model

Installation

$3

``bash npm install -g ocr-server

`Then run from anywhere:`


ocr-server --help

$3

`bash

`Clone the repository`


git clone 
cd ocr-server
Install dependencies

npm install


Usage
$3
#### Download and setup NuMarkdown model

3. Start llama.cpp server with NuMarkdown:`bash llama-server -m NuMarkdown-8B-Thinking.f16.gguf --mmproj NuMarkdown-8B-Thinking.mmproj-f16.gguf --port 8080`

> Note: You can use any compatible vision model with llama.cpp. Simply replace the model paths with your own.

`$3`

`bash

`If installed globally, run directly:`


ocr-server --help
Default settings (port 5666, connects to llama.cpp on localhost:8080)

ocr-server
Custom port

ocr-server --port 3000
Connect to remote llama.cpp server

ocr-server --llama-host 192.168.1.100 --llama-port 8080
Bind to specific host

ocr-server --host 127.0.0.1 --port 3000

$3

`$3`

`Output`

OCR results are saved to the current directory as:`ocr_YYYY-MM-DD_HH-mm-ss.md`

`How It Works`

`Development`

`bash

`Run type checking`


tsc --noEmit server.ts
Start with hot reload

npm run dev


Publishing to npm

`bash

`Build the package`


npm run build
Publish to npm

npm publish

License

MIT