Optical Character Recognition for Identity Document of the European Union
npm install @tuchsoft/ocrideu

[](https://www.google.com/search?q=https://www.npmjs.com/package/ocrideu)
[](https://spdx.org/licenses/GPL-3.0-or-later.html)
[](https://github.com/RichardLitt/standard-readme)
ocrideu is a JavaScript/Node.js library that parses machine-readable information from European (EU/EEA) identity documents. It performs Optical Character Recognition (OCR) on the Machine-Readable Zone (MRZ), and also extracts data from QR codes and barcodes, following the ICAO 9303 standard for TD1 size documents.
It is most useful when used with an image of the back of an identity card, as the front (for most European documents) does not contain any machine-readable information.
Key features include:
* MRZ Parsing: Reads the ICAO 9303 compliant Machine-Readable Zone.
* QR Code & Barcode Scanning: Attempts to read all QR codes and barcodes on the image.
* Automatic Image Processing: The library automatically attempts to rotate the image upright, optimize it for clarity (brightness/contrast), and crop it to the document boundaries.
* OCR Correction: It applies automatic correction logic to the raw OCR reading of the MRZ to fix common scanning errors.
This library does not parse human-readable information and does not perform any validation or verification of the document's authenticity. Its sole purpose is to parse the existing machine-readable data.
- Install
- Usage
- Configuration Options
- Output Format
- API
- Maintainers
- Contributing
- License
This project is available on npm.
``bash`
npm i @tuchsoft/ocrideu
Import the parse function and pass it a Buffer containing the image of the identity document.
`javascript
import {parse} from '@tuchsoft/ocrideu';
//const { parse } = require('@tuchsoft/ocrideu');
import { readFileSync } from 'fs';
async function parseDocument(imagePath) {
try {
// Load the image file into a Buffer
const imageBuffer = readFileSync(imagePath);
// Pass the buffer to the parse function
const result = await parse(imageBuffer, {
// Optional configuration options
// See "Configuration Options" section below
});
console.log(result);
} catch (error) {
console.error('Failed to parse document:', error);
}
}
// Example: Parse the back of an identity card
parseDocument('ocrideu/test/data/est.jpg');
`
The parse function automatically handles:
* Image type detection
* Image rotation (autorotate)
* Image optimization (brightness, contrast)
* Image cropping (autocrop)
* MRZ OCR and parsing (via tesseract.js and mrz)
* Barcode and QR code scanning
* MRZ OCR correction (attempts to fix common OCR errors)
You can pass an options object as the second argument to the parse function. Any options you provide will be merged with the defaults.
`javascriptmultiple
export const defaultOptions = {
mrz: {
fixLines: true, //Attempt to fix the parsed lines by adding or removing "<"
fixChar: true, //Attempt to fix char in wrong place (@see https://www.npmjs.com/package/mrz)
fixPatterns: true, //Attempt to fix known difficul pattern (Es. "<<
image: {
optimize: { //Optimization options, use false to disable
ocr: {
medianBlurSize: 1, //Kernel size for median blur, or false to disable.
adaptiveThresholdBlockSize: 201, //Block size for adaptive thresholding.
adaptiveThresholdC: 25, //Constant subtracted from the mean or weighted mean.
},
code: {
medianBlurSize: false, //Kernel size for median blur, or false to disable.
adaptiveThresholdBlockSize: 131, //Block size for adaptive thresholding.
adaptiveThresholdC: 5 //Constant subtracted from the mean or weighted mean.
}
},
autorotate: true, // Rotates the image upright (OCR will fail if not upright).
autocrop: true, // Autocrop the image for better result
addBorder: 0.05 //Add a white border of x px to the image (better ocr result)
// 'ext' and 'mime' are dynamically added later
},
barcode: {
multiple: true, // Searches for multiple barcodes (slower).
tryHard: true, // Tries the best to scan barcodes, requires (very slow).`
},
qr: true,
output: {
hashes: true, // Include computed file hashes.
execution: true, // Include execution time metrics.
side: true, // Include detected document side (Front/Back).
image: false // Include the processed image (base64).
},
debug: false /*{
path: './debug' // Output directory for debug files.
ocr: false //Print tesseract debug
}*/
}
The parse function returns a promise that resolves to a comprehensive result object.
`json`
{
"qr": "https://www2.politsei.ee/qr/?qr=AS0002261",
"barcodes": [
"38001085718",
"AS0002261"
],
"mrz": {
"documentCode": "ID",
"issuingState": "EST",
"documentNumber": "AS0002261",
"documentNumberCheckDigit": "9",
"optional1": "38001085718",
"birthDate": "800108",
"birthDateCheckDigit": "1",
"sex": "male",
"expirationDate": "260628",
"expirationDateCheckDigit": "8",
"nationality": "EST",
"optional2": "",
"compositeCheckDigit": "1",
"lastName": "JOEORG",
"firstName": "JAAK KRISTJAN",
"valid": true,
"raw": "IDESTAS0002261938001085718<<<<\n8001081M2606288EST<<<<<<<<<<<1\nJOEORG<
"hashes": {
"image": {
"md5": "cbbb0d0087ef5d505c049fd937e7aa2e",
"sha1": "30cddf06e09868dd748e51b48c278f4cad65aeca",
"crc32": "879451e2"
},
"document": {
"md5": "c99cf2ff79191281b18900df672aa636",
"sha1": "303e908ec6ca129b236a41d614681cacbec73af4",
"crc32": "99f6eea4"
}
},
"side": "back",
"expired": false,
"execution": {
"duration": 1381,
"start": 1763344610408,
"end": 1763344611789,
"id": "6d05d12a-04da-4f21-af98-e6813a007c7f"
},
"image": {
"ocr": "/9j/4AAQSkZJRgABAQAAA...",
"code": "6ery8/T19vf4+fr/2gAM...",
"mime": "image/jpeg",
"ext": "jpg"
}
}mrz
* : The parsed MRZ data from the mrz package. null if not found.barcodes
* : An array of detected barcodes from @ericblade/quagga2. null if none found or disabled.qr
* : An array of detected QR codes from @zxing/library. null if none found or disabled.hashes
: Included if output.hashes is true. Contains hashes of the original image and the raw MRZ string*.side
* : Included if output.side is true. "back" if MRZ was found, "front" otherwise.image
: Included if output.image is true. A base64-encoded string of the processed* (rotated/optimized/cropped) image.execution
* : Included if output.execution is true. Contains timing and job ID.
The primary function exported by the library.
* image: A Buffer containing the raw image data (e.g., from fs.readFileSync).options
* : (Optional) A partial ParseOptions object to override default settings. See Configuration Options for details.
Returns a Promise that resolves to the Output Format object.
* Mattia Bonzi (mattia@mattiabonzi.it)
* TuchSoft (info@tuchsoft.com | tuchsoft.com)
We welcome contributions and pull requests\!
For questions, bug reports, or feature requests, please open an issue on the project's GitHub repository. You can also contact the maintainers directly.
The barcode and QR code parsing implementation needs to be rewritten. The current implementation works, but not all the time, and it is slow. For comparison, using Python, the full parsing of all the barcodes in a card (like the Estonian 2020 module with 3 barcodes) happens in milliseconds without the need of cropping or raotate the image. If you can contribute to this part, any help will be be appreciated.
All credits for the Jscanify code goes to the original authors.
© 2024 Mattia Bonzi, TuchSoft
Licensed under the GPL-3.0-or-later (GNU General Public License v3.0 or later).