n8n node for intelligent document chunking and processing for vector store ingestion with Smart Qdrant Vector Store supporting Ollama and OpenAI embeddings
npm install n8n-nodes-vector-store-processorThis is an n8n community node that intelligently processes and chunks documents for vector store ingestion with enhanced structure analysis and markdown support. Perfect for RAG (Retrieval-Augmented Generation) workflows and AI applications.
n8n is a fair-code licensed workflow automation platform.
- Intelligent Document Chunking: Splits documents into semantically meaningful chunks optimized for vector embeddings
- Markdown Support: Parses markdown headings, lists, and structure for better organization
- Structure Analysis: Automatically detects chapters, sections, and content hierarchy
- Flexible Processing Modes:
- Run once for all items (combine multiple documents into one knowledge base)
- Run once for each item (process documents separately)
- Rich Metadata: Includes document title, chapter, section, content type, chunk indices, and more
- ASCII Sanitization: Ensures namespace compatibility with all vector stores
- Binary File Support: Process text from binary files (.txt, .md, .pdf) or text fields
- Configurable Chunk Size: Control the maximum size of text chunks for optimal embedding
- Global Chunk Indexing: Maintains sequential chunk numbering across entire documents
- Content Type Classification: Automatically categorizes content (examples, basics, advanced, etc.)
Follow the installation guide in the n8n community nodes documentation.
1. Go to Settings > Community Nodes
2. Select Install
3. Enter n8n-nodes-vector-store-processor in Enter npm package name
4. Agree to the risks and select Install
To install manually, navigate to your n8n installation directory and run:
``bash`
npm install n8n-nodes-vector-store-processor
IMPORTANT: For optimal memory management when using the Smart Qdrant Vector Store node with large documents, you must start n8n with the --expose-gc flag to enable garbage collection:
`bashFor systemd service (recommended)
sudo systemctl edit n8nAdd this line under [Service]:
Environment="NODE_OPTIONS=--expose-gc"
Why is this needed?
- The Smart Qdrant Vector Store processes documents in batches and triggers garbage collection after each batch
- This prevents memory buildup when processing large documents or many documents
- Without
--expose-gc, memory will still be managed by Node.js but less efficiently
- The "Clear Memory" option in the node will work best with this flag enabledOperations
The Vector Store Processor node provides the following configuration options:
$3
- Run Once for All Items: Combines all input items into a single document before processing
- Run Once for Each Item: Processes each input item as a separate document$3
- Text Field: Process text from a JSON field
- Binary File: Process text from a binary file (supports .txt, .md, .pdf text extraction, etc.)$3
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| Mode | Options | Run Once for Each Item | Processing mode |
| Input Type | Options | Text Field | Source of text data |
| Text Field | String | text | Name of the field containing text (when Input Type is Text Field) |
| Binary Property | String | data | Name of the binary property (when Input Type is Binary File) |
| Document Title | String | (auto-detect) | Override the document title |
| Chunk Size | Number | 2000 | Maximum characters per chunk |
| Namespace | String | (auto-generate) | Namespace for vector store organization |
| Parse Markdown | Boolean | true | Enable markdown structure parsing |
Usage
$3
`
1. Add a "Read Binary File" node or "HTTP Request" node to get your document
2. Add the "Vector Store Processor" node
3. Configure:
- Mode: Run Once for Each Item
- Input Type: Binary File (or Text Field if you have text)
- Parse Markdown: true
- Chunk Size: 2000
4. Connect to a Vector Store node (Pinecone, Qdrant, Supabase Vector, etc.)
5. Connect to an embeddings node (OpenAI Embeddings, etc.)
`$3
`
1. Add a node that outputs multiple items (e.g., "Read Files From Folder")
2. Add the "Vector Store Processor" node
3. Configure:
- Mode: Run Once for All Items
- Input Type: Binary File
- Chunk Size: 2000
4. All documents will be combined and chunked together as one knowledge base
5. Connect to your vector store for ingestion
`$3
Each chunk produces an output item with:
`json
{
"pageContent": "This is the actual text content of the chunk...",
"metadata": {
"document_title": "My Document",
"chapter": "Introduction",
"section": "Getting Started",
"content_type": "overview",
"chunk_index": 0,
"local_chunk_index": 0,
"chapter_index": 0,
"total_chunks": 15,
"namespace": "my-document",
"source_file": "document.md",
"character_count": 1850,
"processing_timestamp": "2025-01-15T10:30:00.000Z"
},
"document_title": "My Document",
"document_title_clean": "my-document",
"chapter": "Introduction",
"section": "Getting Started",
"chapter_clean": "introduction",
"section_clean": "getting-started",
"namespace": "my-document"
}
`Markdown Support
When Parse Markdown is enabled, the node recognizes:
- Headings:
#, ##, ###`, etc. for chapter and section detection1. Title Extraction: Automatically detects document title from:
- Metadata fields (title, info.Title, metadata['dc:title'])
- File name
- First heading in markdown
- First meaningful line of text
2. Structure Analysis:
- Detects chapters (H1, H2 headings or specific patterns)
- Identifies sections (H3-H6 headings or subsection patterns)
- Classifies content type (examples, basics, advanced, etc.)
3. Intelligent Chunking:
- Splits by paragraphs first
- Falls back to sentence splitting for long paragraphs
- Respects chunk size limits
- Filters out very short chunks
4. Metadata Enrichment:
- Global chunk indexing across entire document
- Local chunk indexing within sections
- Content type classification
- Timestamp and source tracking
- Tested with n8n version 1.0.0+
- Works with all vector store nodes (Pinecone, Qdrant, Supabase, etc.)
- Compatible with LangChain nodes
* n8n community nodes documentation
* n8n documentation
* n8n community forum