Document processing application with CLI and API interfaces
npm install @buildel/ocrA TypeScript implementation using the ZeroX library for OCR and document extraction with vision models.
- Node.js (v16 or higher)
- npm or pnpm
- OpenAI API key
1. Clone this repository
2. Install dependencies:
``bash`
npm installor using pnpm
pnpm install
3. Set up your configuration:
Copy the template file and edit it with your actual values:
`bash`
cp .env.template .env
Then open the .env file and add your OpenAI API key and other configuration options.
`bashOn Linux/macOS
export OPENAI_API_KEY=your_api_key_here
export AUTH_ENABLED=true
export AUTH_USERNAME=your_username
export AUTH_PASSWORD=your_password
Usage
To run the OCR example manually after setting up your .env file or environment variables:
`bash
npm run start
or using pnpm
pnpm start
`This will:
1. Process a sample PDF document (CS101.pdf) using the specified model
2. Extract text while maintaining the original formatting
3. Save the results to the configured output directory
4. Display a sample of the extracted content
API Authentication
The API includes basic authentication to secure your endpoints in production environments.
$3
Authentication is controlled by the following environment variables:
| Variable | Description | Default |
| --------------- | -------------------------------------- | ------- |
|
AUTH_ENABLED | Set to 'true' to enable authentication | false |
| AUTH_USERNAME | Username for basic authentication | None |
| AUTH_PASSWORD | Password for basic authentication | None |When authentication is enabled, all API endpoints under
/api/* will require basic authentication.$3
When authentication is enabled, you need to include the Authorization header with your requests:
`bash
Using curl
curl -X GET "http://localhost:3000/api/health" \
-H "Authorization: Basic $(echo -n 'username:password' | base64)"Using JavaScript fetch
fetch('http://localhost:3000/api/health', {
headers: {
'Authorization': 'Basic ' + btoa('username:password')
}
})
`Environment Variables
The following environment variables can be set in your
.env file:| Variable | Description | Default |
| ----------------- | --------------------------------------- | ------------- |
|
OPENAI_API_KEY | Your OpenAI API key | (required) |
| MODEL | The AI model to use | gpt-4o-mini |
| OUTPUT_DIR | Directory to save results | ./output |
| MAINTAIN_FORMAT | Whether to maintain document formatting | true |
| CONCURRENCY | Number of concurrent processes | 5 |
| AUTH_ENABLED | Enable API authentication | false |
| AUTH_USERNAME | Username for API authentication | None |
| AUTH_PASSWORD | Password for API authentication | None |Customization
You can modify the
src/main.ts file to:- Change the input document path
- Use a different model
- Adjust processing parameters
- Process specific pages instead of the entire document
- Change output options
Configuration Options
The ZeroX library supports various configuration options:
-
filePath: Path or URL to the document to process
- model: AI model to use for extraction
- outputDir: Directory to save results
- pagesToConvertAsImages: Page numbers to process (undefined for all)
- maintainFormat: Whether to maintain document formatting
- cleanup: Whether to clean up temporary files
- concurrency: Number of concurrent processes to run
- credentials: Authentication credentials for the AI provider (required)$3
ZeroX supports multiple AI providers:
1. OpenAI (models like gpt-4o, gpt-4o-mini)
2. Google (Gemini models)
3. Azure OpenAI
4. AWS Bedrock (Claude models)
Each provider requires specific credentials. Refer to the ZeroX documentation for details.
License
ISC
Document Processing CLI
Installation
`bash
npm install
npm run build
`Usage
$3
`bash
node dist/cli.js process --file document.pdf --output ./output
`$3
`bash
node dist/cli.js process --file document.pdf --chunk --max-tokens 500 --overlap 50 --output ./output
`$3
#### OpenAI (Default)
`bash
node dist/cli.js process --file document.pdf --chunk --embedding-provider openai --output ./output
`#### Azure OpenAI
`bash
node dist/cli.js process --file document.pdf --chunk \
--embedding-provider azure \
--azure-api-key "your-azure-api-key" \
--azure-base-url "https://your-resource.openai.azure.com/openai/" \
--azure-api-version "2024-02-01" \
--azure-deployment "your-deployment-name" \
--output ./output
`$3
`bash
node dist/cli.js process --file document.pdf \
--chunk --max-tokens 500 --overlap 50 \
--language "es" \
--embedding-provider azure \
--azure-api-key "your-azure-api-key" \
--azure-base-url "https://your-resource.openai.azure.com/openai/" \
--azure-deployment "your-deployment-name" \
--output ./output
`CLI Options
-
--file : Path to the document file (required)
- --extension : File extension override
- --language : Target language for translation (ISO 639-1 code)
- --output : Output directory (default: ./output)
- --chunk: Enable document chunking
- --max-tokens : Maximum tokens per chunk
- --overlap : Overlap tokens between chunks
- --existing-tags : Comma-separated list of existing tags
- --embedding-model-provider : Embedding provider: 'openai' or 'azure' (default: 'openai')
- --embedding-api-key : ...
- --embedding-endpoint : ...
- --embedding-deployment : ...
- --llm-model-provider : Embedding provider: 'openai' or 'azure' (default: 'openai')
- --llm-api-key: ...
- --llm-endpoint: ...
- --llm-deployment: ...Environment Variables
-
OPENAI_API_KEY: Required for OpenAI provider
- MODEL`: OpenAI model to use (default: gpt-4o-mini)