![License: GPL-3.0](#)
![Copyright: Tuchsoft](https://tuchsoft.com)

Ocrideu (Optical Character Recognition for Identity Document of the European Union)

[](https://www.google.com/search?q=https://www.npmjs.com/package/ocrideu)
[](https://spdx.org/licenses/GPL-3.0-or-later.html)
[](https://github.com/RichardLitt/standard-readme)

ocrideu is a JavaScript/Node.js library that parses machine-readable information from European (EU/EEA) identity documents. It performs Optical Character Recognition (OCR) on the Machine-Readable Zone (MRZ), and also extracts data from QR codes and barcodes, following the ICAO 9303 standard for TD1 size documents.

It is most useful when used with an image of the back of an identity card, as the front (for most European documents) does not contain any machine-readable information.

Key features include:

* MRZ Parsing: Reads the ICAO 9303 compliant Machine-Readable Zone.
* QR Code & Barcode Scanning: Attempts to read all QR codes and barcodes on the image.
* Automatic Image Processing: The library automatically attempts to rotate the image upright, optimize it for clarity (brightness/contrast), and crop it to the document boundaries.
* OCR Correction: It applies automatic correction logic to the raw OCR reading of the MRZ to fix common scanning errors.

This library does not parse human-readable information and does not perform any validation or verification of the document's authenticity. Its sole purpose is to parse the existing machine-readable data.

- Install
- Usage
- Configuration Options
- Output Format
- API
- Maintainers
- Contributing
- License

Install

This project is available on npm.

``bash npm i @tuchsoft/ocrideu`

`Usage`

Import the parse function and pass it a Buffer containing the image of the identity document.

`javascript import {parse} from '@tuchsoft/ocrideu'; //const { parse } = require('@tuchsoft/ocrideu'); import { readFileSync } from 'fs';

async function parseDocument(imagePath) { try { // Load the image file into a Buffer const imageBuffer = readFileSync(imagePath); // Pass the buffer to the parse function const result = await parse(imageBuffer, { // Optional configuration options // See "Configuration Options" section below }); console.log(result); } catch (error) { console.error('Failed to parse document:', error); } }

// Example: Parse the back of an identity card parseDocument('ocrideu/test/data/est.jpg');`

The parse function automatically handles:

* Image type detection * Image rotation (autorotate) * Image optimization (brightness, contrast) * Image cropping (autocrop) * MRZ OCR and parsing (viatesseract.js and mrz) * Barcode and QR code scanning * MRZ OCR correction (attempts to fix common OCR errors)

`Configuration Options`

You can pass an options object as the second argument to the parse function. Any options you provide will be merged with the defaults.

`$3`

`javascript export const defaultOptions = { mrz: { fixLines: true, //Attempt to fix the parsed lines by adding or removing "<" fixChar: true, //Attempt to fix char in wrong place (@see https://www.npmjs.com/package/mrz) fixPatterns: true, //Attempt to fix known difficul pattern (Es. "<< }, image: { optimize: { //Optimization options, use false to disable ocr: { medianBlurSize: 1, //Kernel size for median blur, or false to disable. adaptiveThresholdBlockSize: 201, //Block size for adaptive thresholding. adaptiveThresholdC: 25, //Constant subtracted from the mean or weighted mean. }, code: { medianBlurSize: false, //Kernel size for median blur, or false to disable. adaptiveThresholdBlockSize: 131, //Block size for adaptive thresholding. adaptiveThresholdC: 5 //Constant subtracted from the mean or weighted mean. } }, autorotate: true, // Rotates the image upright (OCR will fail if not upright). autocrop: true, // Autocrop the image for better result addBorder: 0.05 //Add a white border of x px to the image (better ocr result) // 'ext' and 'mime' are dynamically added later }, barcode: { multiple: true, // Searches for multiple barcodes (slower). tryHard: true, // Tries the best to scan barcodes, requiresmultiple(very slow). }, qr: true, output: { hashes: true, // Include computed file hashes. execution: true, // Include execution time metrics. side: true, // Include detected document side (Front/Back). image: false // Include the processed image (base64). }, debug: false /*{ path: './debug' // Output directory for debug files. ocr: false //Print tesseract debug }*/ }`

`Output Format`

The parse function returns a promise that resolves to a comprehensive result object.

`json { "qr": "https://www2.politsei.ee/qr/?qr=AS0002261", "barcodes": [ "38001085718", "AS0002261" ], "mrz": { "documentCode": "ID", "issuingState": "EST", "documentNumber": "AS0002261", "documentNumberCheckDigit": "9", "optional1": "38001085718", "birthDate": "800108", "birthDateCheckDigit": "1", "sex": "male", "expirationDate": "260628", "expirationDateCheckDigit": "8", "nationality": "EST", "optional2": "", "compositeCheckDigit": "1", "lastName": "JOEORG", "firstName": "JAAK KRISTJAN", "valid": true, "raw": "IDESTAS0002261938001085718<<<<\n8001081M2606288EST<<<<<<<<<<<1\nJOEORG< }, "hashes": { "image": { "md5": "cbbb0d0087ef5d505c049fd937e7aa2e", "sha1": "30cddf06e09868dd748e51b48c278f4cad65aeca", "crc32": "879451e2" }, "document": { "md5": "c99cf2ff79191281b18900df672aa636", "sha1": "303e908ec6ca129b236a41d614681cacbec73af4", "crc32": "99f6eea4" } }, "side": "back", "expired": false, "execution": { "duration": 1381, "start": 1763344610408, "end": 1763344611789, "id": "6d05d12a-04da-4f21-af98-e6813a007c7f" }, "image": { "ocr": "/9j/4AAQSkZJRgABAQAAA...", "code": "6ery8/T19vf4+fr/2gAM...", "mime": "image/jpeg", "ext": "jpg" } }`*mrz: The parsed MRZ data from the mrz package. nullif not found. *barcodes: An array of detected barcodes from @ericblade/quagga2. nullif none found or disabled. *qr: An array of detected QR codes from @zxing/library. nullif none found or disabled.hashes: Included if output.hashes is true. Contains hashes of the original image and the raw MRZ string*. *side: Included if output.side is true. "back" if MRZ was found, "front" otherwise.image: Included if output.image is true. A base64-encoded string of the processed* (rotated/optimized/cropped) image. *execution: Included if output.execution is true. Contains timing and job ID.

`API`

`$3`

The primary function exported by the library.

* image: A Buffer containing the raw image data (e.g., from fs.readFileSync). *options: (Optional) A partial ParseOptions object to override default settings. See Configuration Options for details.

Returns a Promise that resolves to the Output Format object.

`Maintainers`

* Mattia Bonzi (mattia@mattiabonzi.it) * TuchSoft (info@tuchsoft.com | tuchsoft.com)

`Contributing`

We welcome contributions and pull requests\!

For questions, bug reports, or feature requests, please open an issue on the project's GitHub repository. You can also contact the maintainers directly.

`$3`

The barcode and QR code parsing implementation needs to be rewritten. The current implementation works, but not all the time, and it is slow. For comparison, using Python, the full parsing of all the barcodes in a card (like the Estonian 2020 module with 3 barcodes) happens in milliseconds without the need of cropping or raotate the image. If you can contribute to this part, any help will be be appreciated.

`Known limitations`


* Document Side Identification: Any image that is not successfully identified as the back of an identity document (i.e., does not contain a Machine-Readable Zone or MRZ) will be marked as the "front" side.
* Barcode and QR Code Reliability: There is no guarantee that all QR codes and barcodes will be read correctly.
* Performance with "Try Hard" Option: When the barcode.tryHard configuration option is enabled, the execution time for parsing becomes significantly slower.
Dependencies

This library relies on Tesseract.js for Optical Character Recognition (OCR), along with libraries for image manipulation (Open Cv), QR code scanning (@zxing/library), barcode scanning (@ericblade/quagga2), MRZ parsing (mrz), among others. All required dependencies are listed in the

package.json` and should be installed automatically.

$3

This library include a slightly modified version of Jscanify.

All credits for the Jscanify code goes to the original authors.

License

Licensed under the GPL-3.0-or-later (GNU General Public License v3.0 or later).

![License: GPL-3.0](#)
![Copyright: Tuchsoft](https://tuchsoft.com)

Ocrideu (Optical Character Recognition for Identity Document of the European Union)

[](https://www.google.com/search?q=https://www.npmjs.com/package/ocrideu)
[](https://spdx.org/licenses/GPL-3.0-or-later.html)
[](https://github.com/RichardLitt/standard-readme)

It is most useful when used with an image of the back of an identity card, as the front (for most European documents) does not contain any machine-readable information.

Key features include:

- Install
- Usage
- Configuration Options
- Output Format
- API
- Maintainers
- Contributing
- License

Install

This project is available on npm.

``bash npm i @tuchsoft/ocrideu`

`Usage`

Import the parse function and pass it a Buffer containing the image of the identity document.

`javascript import {parse} from '@tuchsoft/ocrideu'; //const { parse } = require('@tuchsoft/ocrideu'); import { readFileSync } from 'fs';

// Example: Parse the back of an identity card parseDocument('ocrideu/test/data/est.jpg');`

The parse function automatically handles:

`Configuration Options`

You can pass an options object as the second argument to the parse function. Any options you provide will be merged with the defaults.

`$3`

`Output Format`

The parse function returns a promise that resolves to a comprehensive result object.

`API`

`$3`

The primary function exported by the library.

Returns a Promise that resolves to the Output Format object.

`Maintainers`

* Mattia Bonzi (mattia@mattiabonzi.it) * TuchSoft (info@tuchsoft.com | tuchsoft.com)

`Contributing`

We welcome contributions and pull requests\!

For questions, bug reports, or feature requests, please open an issue on the project's GitHub repository. You can also contact the maintainers directly.

`$3`

`Known limitations`


* Document Side Identification: Any image that is not successfully identified as the back of an identity document (i.e., does not contain a Machine-Readable Zone or MRZ) will be marked as the "front" side.
* Barcode and QR Code Reliability: There is no guarantee that all QR codes and barcodes will be read correctly.
* Performance with "Try Hard" Option: When the barcode.tryHard configuration option is enabled, the execution time for parsing becomes significantly slower.
Dependencies

This library relies on Tesseract.js for Optical Character Recognition (OCR), along with libraries for image manipulation (Open Cv), QR code scanning (@zxing/library), barcode scanning (@ericblade/quagga2), MRZ parsing (mrz), among others. All required dependencies are listed in the

package.json` and should be installed automatically.

$3

This library include a slightly modified version of Jscanify.

All credits for the Jscanify code goes to the original authors.

License

Licensed under the GPL-3.0-or-later (GNU General Public License v3.0 or later).

@tuchsoft/ocrideu

Ocrideu (Optical Character Recognition for Identity Document of the European Union)

Table of Contents

Install

`Usage`

`Configuration Options`

`$3`

`Output Format`

`API`

`$3`

`Maintainers`

`Contributing`

`$3`

`Known limitations`

Dependencies

$3

License

@tuchsoft/ocrideu

Ocrideu (Optical Character Recognition for Identity Document of the European Union)

Table of Contents

Install

`Usage`

`Configuration Options`

`$3`

`Output Format`

`API`

`$3`

`Maintainers`

`Contributing`

`$3`

`Known limitations`

Dependencies

$3

License