n8n-nodes-azure-document-intelligence

This is an n8n community node that integrates Azure Document Intelligence (formerly Form Recognizer) into your n8n workflows.

Azure Document Intelligence is a cloud-based service that uses machine learning models to extract text, key-value pairs, tables, and structures from documents. Perfect for automated document processing, form recognition, invoice extraction, and OCR tasks.

n8n is a fair-code licensed workflow automation platform.

Disclaimer

This is an unofficial community node and is not affiliated with, endorsed by, or supported by Microsoft Corporation or n8n GmbH.

Azure, Azure Document Intelligence, Form Recognizer, and related trademarks are property of Microsoft Corporation. Users must comply with Microsoft's Azure AI Services terms and conditions.

This package is provided "as is" under the MIT License without warranty of any kind.

- Installation
- Features
- Credentials
- Usage
- Supported Models
- Parameters
- Multiple Outputs
- Examples
- Resources
- Version History

Installation

Follow the installation guide in the n8n community nodes documentation.

$3

``bash npm install n8n-nodes-azure-document-intelligence`

`$3`

`bash

`Clone this repository`


git clone https://github.com/mlangcode/n8n-nodes-azure-document-intelligence.git
cd n8n-nodes-azure-document-intelligence
Install dependencies and build

npm install
npm run build
Link to your local n8n

npm link
cd ~/.n8n
npm link n8n-nodes-azure-document-intelligence
Restart n8n


Features
✅ Multiple Prebuilt Models: Support for 9 prebuilt models (read, layout, invoice, receipt, ID, business card, etc.)  
✅ Flexible Input: Binary data, URL, or base64-encoded content  
✅ Three Outputs: Separate outputs for content, structured data, and tables  
✅ Markdown Support: Extract documents in markdown or plain text format  
✅ Table Processing: Automatically identifies headers and converts tables to structured data  
✅ Page Selection: Analyze specific pages from multi-page documents  
✅ Locale Support: Specify language hints for better recognition  
✅ Long-Running Operations: Automatic polling for document analysis completion  
✅ Error Handling: Comprehensive error messages and validation  
✅ Binary Data Support: Seamlessly integrate with n8n's binary data field  
Credentials
This node uses Azure Document Intelligence credentials with the following fields:

- Endpoint: Your Azure Document Intelligence endpoint URL (e.g., https://your-resource.cognitiveservices.azure.com) - API Key: Your Azure Document Intelligence subscription key - API Version: The API version to use (default:2024-11-30)

`$3`

1. In n8n, go to Credentials → New 2. Search for "Azure Document Intelligence" 3. Fill in your endpoint URL and API key 4. Click Save

`Usage`

`$3`

1. Add the "Azure Document Intelligence" node to your workflow 2. Configure your Azure Document Intelligence credentials 3. Select the appropriate prebuilt model for your document type 4. Choose input source (binary data, URL, or base64) 5. Configure additional options as needed

The node subtitle will display the selected model for easy identification.

`Supported Models`

The node supports the following prebuilt models:

`$3`


- Read (OCR): Basic optical character recognition for extracting printed and handwritten text
- Layout: Extract text, tables, selection marks, and document structure
$3

- General Document: Extract key-value pairs, entities, and general structure from any document type
$3

- Invoice: Extract vendor name, invoice date, total, line items, and other invoice fields
- Receipt: Extract merchant name, transaction date, total, and line items from receipts
- ID Document: Extract information from passports, driver's licenses, and identity cards
- Business Card: Extract contact information including names, companies, emails, and phone numbers
$3

- Health Insurance Card (US): Extract member information, group numbers, and insurance details
- W-2 Tax Form (US): Extract employer information, wages, and tax withholding data
Parameters
$3
- Model: Select the prebuilt model appropriate for your document type
- Input Source: Choose how to provide the document:
  - Binary Data: Use document from a previous node's binary field
  - URL: Provide a public URL to the document
  - Base64: Provide base64-encoded document content
$3

#### Binary Data - Binary Property: Name of the binary property (default:data)

#### URL - Document URL: Public URL to the document

#### Base64 - Base64 Content: Base64-encoded string of the document

`$3`

- Content Type: Specify the document MIME type (PDF, JPEG, PNG, TIFF, BMP, HEIF) - Output Content Format: Choose betweentext or markdownfor extracted content (for read/layout models) - Pages: Specify which pages to analyze (e.g.,1-3,5 or 1,3,5-7) - Locale: Language hint for text recognition (e.g.,en-US, de-DE, fr-FR)

`Multiple Outputs`

The node has three outputs for flexible workflow routing:

`$3`


Contains: Raw text or markdown content extracted from the document

`json { "content": "# Invoice\n\nVendor: Acme Corp...", "contentLength": 1234, "model": "prebuilt-layout" }`

Use this for: - Text extraction and OCR workflows - Full document content for further processing - Feeding to LLMs or text analysis nodes

`$3`


Contains: Extracted fields, key-value pairs, and structured information

`json { "model": "prebuilt-invoice", "pageCount": 2, "documents": [ { "docType": "invoice", "fields": { "VendorName": { "content": "Acme Corp", "confidence": 0.99 }, "InvoiceTotal": { "content": "1,234.56", "confidence": 0.98 }, "InvoiceDate": { "content": "2024-01-15", "confidence": 0.97 } } } ], "pages": [...] }`

Use this for: - Extracting specific fields (invoice data, receipt information) - Key-value pair extraction - Document field validation and processing

`$3`


Contains: Processed tables with identified headers and structured row data

`json { "tableCount": 2, "tables": [ { "headers": ["Item", "Quantity", "Price", "Total"], "dataRows": [ { "Item": "Widget A", "Quantity": "5", "Price": "$10.00", "Total": "$50.00" }, { "Item": "Widget B", "Quantity": "3", "Price": "$15.00", "Total": "$45.00" } ] } ], "model": "prebuilt-layout" }`

Use this for: - Extracting tabular data from documents - Processing invoice line items - Converting document tables to structured data for databases

`$3`

When errors occur (and "Continue on Fail" is enabled): - Error details are sent to all three outputs - Includes HTTP status codes and error messages - Workflow continues instead of stopping

`Examples`

`$3`

Workflow:`HTTP Request (download PDF) → Azure Document Intelligence Model: Read (OCR) Input Source: Binary Data Binary Property: data → [Content Output] → Process extracted text`

---

`$3`

Workflow:`HTTP Request (get invoice PDF) → Azure Document Intelligence Model: Invoice Input Source: Binary Data → [Structured Data Output] → Code Node: Extract $.documents[0].fields → Store in database`

Extracted Fields: - VendorName - CustomerName - InvoiceDate - InvoiceTotal - DueDate - Line items

---

`$3`

Workflow:`Read Binary File (read document) → Azure Document Intelligence Model: Layout Input Source: Binary Data Output Format: Markdown → [Tables Output] → Code Node: Process table rows → Send to Google Sheets`

---

`$3`

Workflow:`Azure Document Intelligence Model: Read (OCR) Input Source: URL Document URL: https://example.com/document.pdf → [Content Output] → Send extracted text to analysis`

---

`$3`

Workflow:`Webhook (receive uploaded image) → Azure Document Intelligence Model: Business Card Input Source: Binary Data → [Structured Data Output] → Extract contact fields: - Name - Company - Email - Phone → Add to CRM`

---

`$3`

Workflow:`Azure Document Intelligence Model: Layout Input Source: Binary Data Pages: 1-5,10 Output Format: Markdown → Process only specified pages`

---

`$3`

Workflow:`Email Trigger (receipt attachments) → Azure Document Intelligence Model: Receipt Input Source: Binary Data → [Structured Data Output] → Extract: - MerchantName - TransactionDate - Total - Items → Log to expense tracking system`

---

`Resources`

- n8n community nodes documentation - Azure Document Intelligence documentation - Azure Document Intelligence models - Azure AI Services

`Compatibility`

- Requires n8n version 1.60.0 or later - Compatible with Azure Document Intelligence API version 2024-11-30 (GA) - Supports all Azure Document Intelligence prebuilt models

`Supported Document Types`

- PDF (application/pdf) - JPEG (image/jpeg) - PNG (image/png) - TIFF (image/tiff) - BMP (image/bmp) - HEIF (image/heif)

`Troubleshooting`

`$3`


- Verify your API key is correct
- Ensure the endpoint URL is correct and includes

https://`

$3

- Check that the model name is spelled correctly
- Verify your Azure region supports the selected prebuilt model

$3

- Ensure the previous node outputs binary data
- Verify the binary property name matches (default: "data")

$3

- The document may be corrupted or in an unsupported format
- Try converting the document to PDF first

$3

- Document analysis can take 10-60 seconds depending on document size
- Multi-page documents take longer to process
- The node automatically polls until completion

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT

Version History

$3

- Initial release with Azure Document Intelligence support
- Support for 9 prebuilt models (read, layout, document, invoice, receipt, ID, business card, health insurance, W-2)
- Three outputs: Content, Structured Data, and Tables
- Flexible input methods: Binary data, URL, and base64
- Automatic table processing with header identification
- Markdown and text output formats
- Page selection and locale support
- Long-running operation polling
- Comprehensive error handling

Author

mlangcode

Support

For issues, questions, or contributions, please visit the GitHub repository.

n8n-nodes-azure-document-intelligence

This is an n8n community node that integrates Azure Document Intelligence (formerly Form Recognizer) into your n8n workflows.

n8n is a fair-code licensed workflow automation platform.

Disclaimer

This is an unofficial community node and is not affiliated with, endorsed by, or supported by Microsoft Corporation or n8n GmbH.

Azure, Azure Document Intelligence, Form Recognizer, and related trademarks are property of Microsoft Corporation. Users must comply with Microsoft's Azure AI Services terms and conditions.

This package is provided "as is" under the MIT License without warranty of any kind.

- Installation
- Features
- Credentials
- Usage
- Supported Models
- Parameters
- Multiple Outputs
- Examples
- Resources
- Version History

Installation

Follow the installation guide in the n8n community nodes documentation.

$3

``bash npm install n8n-nodes-azure-document-intelligence`

`$3`

`bash

`Clone this repository`


git clone https://github.com/mlangcode/n8n-nodes-azure-document-intelligence.git
cd n8n-nodes-azure-document-intelligence
Install dependencies and build

npm install
npm run build
Link to your local n8n

npm link
cd ~/.n8n
npm link n8n-nodes-azure-document-intelligence
Restart n8n


Features
✅ Multiple Prebuilt Models: Support for 9 prebuilt models (read, layout, invoice, receipt, ID, business card, etc.)  
✅ Flexible Input: Binary data, URL, or base64-encoded content  
✅ Three Outputs: Separate outputs for content, structured data, and tables  
✅ Markdown Support: Extract documents in markdown or plain text format  
✅ Table Processing: Automatically identifies headers and converts tables to structured data  
✅ Page Selection: Analyze specific pages from multi-page documents  
✅ Locale Support: Specify language hints for better recognition  
✅ Long-Running Operations: Automatic polling for document analysis completion  
✅ Error Handling: Comprehensive error messages and validation  
✅ Binary Data Support: Seamlessly integrate with n8n's binary data field  
Credentials
This node uses Azure Document Intelligence credentials with the following fields:

`$3`

1. In n8n, go to Credentials → New 2. Search for "Azure Document Intelligence" 3. Fill in your endpoint URL and API key 4. Click Save

`Usage`

`$3`

The node subtitle will display the selected model for easy identification.

`Supported Models`

The node supports the following prebuilt models:

`$3`


- Read (OCR): Basic optical character recognition for extracting printed and handwritten text
- Layout: Extract text, tables, selection marks, and document structure
$3

- General Document: Extract key-value pairs, entities, and general structure from any document type
$3

- Invoice: Extract vendor name, invoice date, total, line items, and other invoice fields
- Receipt: Extract merchant name, transaction date, total, and line items from receipts
- ID Document: Extract information from passports, driver's licenses, and identity cards
- Business Card: Extract contact information including names, companies, emails, and phone numbers
$3

- Health Insurance Card (US): Extract member information, group numbers, and insurance details
- W-2 Tax Form (US): Extract employer information, wages, and tax withholding data
Parameters
$3
- Model: Select the prebuilt model appropriate for your document type
- Input Source: Choose how to provide the document:
  - Binary Data: Use document from a previous node's binary field
  - URL: Provide a public URL to the document
  - Base64: Provide base64-encoded document content
$3

#### Binary Data - Binary Property: Name of the binary property (default:data)

#### URL - Document URL: Public URL to the document

#### Base64 - Base64 Content: Base64-encoded string of the document

`$3`

`Multiple Outputs`

The node has three outputs for flexible workflow routing:

`$3`


Contains: Raw text or markdown content extracted from the document

`json { "content": "# Invoice\n\nVendor: Acme Corp...", "contentLength": 1234, "model": "prebuilt-layout" }`

Use this for: - Text extraction and OCR workflows - Full document content for further processing - Feeding to LLMs or text analysis nodes

`$3`


Contains: Extracted fields, key-value pairs, and structured information

Use this for: - Extracting specific fields (invoice data, receipt information) - Key-value pair extraction - Document field validation and processing

`$3`


Contains: Processed tables with identified headers and structured row data

Use this for: - Extracting tabular data from documents - Processing invoice line items - Converting document tables to structured data for databases

`$3`

When errors occur (and "Continue on Fail" is enabled): - Error details are sent to all three outputs - Includes HTTP status codes and error messages - Workflow continues instead of stopping

`Examples`

`$3`

Workflow:`HTTP Request (download PDF) → Azure Document Intelligence Model: Read (OCR) Input Source: Binary Data Binary Property: data → [Content Output] → Process extracted text`

---

`$3`

Extracted Fields: - VendorName - CustomerName - InvoiceDate - InvoiceTotal - DueDate - Line items

---

`$3`

---

`$3`

Workflow:`Azure Document Intelligence Model: Read (OCR) Input Source: URL Document URL: https://example.com/document.pdf → [Content Output] → Send extracted text to analysis`

---

`$3`

---

`$3`

Workflow:`Azure Document Intelligence Model: Layout Input Source: Binary Data Pages: 1-5,10 Output Format: Markdown → Process only specified pages`

---

`$3`

---

`Resources`

- n8n community nodes documentation - Azure Document Intelligence documentation - Azure Document Intelligence models - Azure AI Services

`Compatibility`

- Requires n8n version 1.60.0 or later - Compatible with Azure Document Intelligence API version 2024-11-30 (GA) - Supports all Azure Document Intelligence prebuilt models

`Supported Document Types`

- PDF (application/pdf) - JPEG (image/jpeg) - PNG (image/png) - TIFF (image/tiff) - BMP (image/bmp) - HEIF (image/heif)

`Troubleshooting`

`$3`


- Verify your API key is correct
- Ensure the endpoint URL is correct and includes

https://`

$3

- Check that the model name is spelled correctly
- Verify your Azure region supports the selected prebuilt model

$3

- Ensure the previous node outputs binary data
- Verify the binary property name matches (default: "data")

$3

- The document may be corrupted or in an unsupported format
- Try converting the document to PDF first

$3

- Document analysis can take 10-60 seconds depending on document size
- Multi-page documents take longer to process
- The node automatically polls until completion

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT

Version History

$3

Author

mlangcode

Support

For issues, questions, or contributions, please visit the GitHub repository.

n8n-nodes-azure-document-intelligence

n8n-nodes-azure-document-intelligence

Disclaimer

Table of Contents

Installation

$3

$3

Clone this repository

Install dependencies and build

Link to your local n8n

Restart n8n

Features

Credentials

$3

Usage

$3

Supported Models

$3

$3

$3

$3

Parameters

$3

$3

$3

Multiple Outputs

$3

$3

$3

$3

Examples

$3

$3

$3

$3

$3

$3

$3

Resources

Compatibility

Supported Document Types

Troubleshooting

$3

$3

$3

$3

$3

Contributing

License

Version History

$3

Author

Support

n8n-nodes-azure-document-intelligence

n8n-nodes-azure-document-intelligence

Disclaimer

Table of Contents

Installation

$3

$3

Clone this repository

Install dependencies and build

Link to your local n8n

Restart n8n

Features

Credentials

$3

Usage

$3

Supported Models

$3

$3

$3

$3

Parameters

$3

$3

$3

Multiple Outputs

$3

`$3`

`Clone this repository`

`$3`

`Usage`

`$3`

`Supported Models`

`$3`

`$3`

`Multiple Outputs`

`$3`

`$3`

`$3`

`$3`

`Examples`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`Resources`

`Compatibility`

`Supported Document Types`

`Troubleshooting`

`$3`

`$3`

`Clone this repository`

`$3`

`Usage`

`$3`

`Supported Models`

`$3`

`$3`

`Multiple Outputs`

`$3`

`$3`

`$3`

`$3`

`Examples`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`$3`

`Resources`

`Compatibility`

`Supported Document Types`

`Troubleshooting`

`$3`