n8n-nodes-vector-store-processor

This is an n8n community node that intelligently processes and chunks documents for vector store ingestion with enhanced structure analysis and markdown support. Perfect for RAG (Retrieval-Augmented Generation) workflows and AI applications.

n8n is a fair-code licensed workflow automation platform.

Features

- Intelligent Document Chunking: Splits documents into semantically meaningful chunks optimized for vector embeddings
- Markdown Support: Parses markdown headings, lists, and structure for better organization
- Structure Analysis: Automatically detects chapters, sections, and content hierarchy
- Flexible Processing Modes:
- Run once for all items (combine multiple documents into one knowledge base)
- Run once for each item (process documents separately)
- Rich Metadata: Includes document title, chapter, section, content type, chunk indices, and more
- ASCII Sanitization: Ensures namespace compatibility with all vector stores
- Binary File Support: Process text from binary files (.txt, .md, .pdf) or text fields
- Configurable Chunk Size: Control the maximum size of text chunks for optimal embedding
- Global Chunk Indexing: Maintains sequential chunk numbering across entire documents
- Content Type Classification: Automatically categorizes content (examples, basics, advanced, etc.)

Installation

Follow the installation guide in the n8n community nodes documentation.

$3

1. Go to Settings > Community Nodes
2. Select Install
3. Enter n8n-nodes-vector-store-processor in Enter npm package name
4. Agree to the risks and select Install

$3

To install manually, navigate to your n8n installation directory and run:

``bash npm install n8n-nodes-vector-store-processor`

`$3`

IMPORTANT: For optimal memory management when using the Smart Qdrant Vector Store node with large documents, you must start n8n with the --expose-gc flag to enable garbage collection:

`bash

`For systemd service (recommended)`


sudo systemctl edit n8n
Add this line under [Service]:

Environment="NODE_OPTIONS=--expose-gc"
Or start n8n directly with:

NODE_OPTIONS="--expose-gc" n8n start
Or for Docker:

docker run -e NODE_OPTIONS="--expose-gc" n8nio/n8n

Why is this needed? - The Smart Qdrant Vector Store processes documents in batches and triggers garbage collection after each batch - This prevents memory buildup when processing large documents or many documents - Without--expose-gc, memory will still be managed by Node.js but less efficiently - The "Clear Memory" option in the node will work best with this flag enabled

`Operations`

The Vector Store Processor node provides the following configuration options:

`$3`


- Run Once for All Items: Combines all input items into a single document before processing
- Run Once for Each Item: Processes each input item as a separate document
$3

- Text Field: Process text from a JSON field
- Binary File: Process text from a binary file (supports .txt, .md, .pdf text extraction, etc.)
$3
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| Mode | Options | Run Once for Each Item | Processing mode |
| Input Type | Options | Text Field | Source of text data |
| Text Field | String | text | Name of the field containing text (when Input Type is Text Field) |
| Binary Property | String | data | Name of the binary property (when Input Type is Binary File) |
| Document Title | String | (auto-detect) | Override the document title |
| Chunk Size | Number | 2000 | Maximum characters per chunk |
| Namespace | String | (auto-generate) | Namespace for vector store organization |
| Parse Markdown | Boolean | true | Enable markdown structure parsing |
Usage
$3

`1. Add a "Read Binary File" node or "HTTP Request" node to get your document 2. Add the "Vector Store Processor" node 3. Configure: - Mode: Run Once for Each Item - Input Type: Binary File (or Text Field if you have text) - Parse Markdown: true - Chunk Size: 2000 4. Connect to a Vector Store node (Pinecone, Qdrant, Supabase Vector, etc.) 5. Connect to an embeddings node (OpenAI Embeddings, etc.)`

`$3`

`1. Add a node that outputs multiple items (e.g., "Read Files From Folder") 2. Add the "Vector Store Processor" node 3. Configure: - Mode: Run Once for All Items - Input Type: Binary File - Chunk Size: 2000 4. All documents will be combined and chunked together as one knowledge base 5. Connect to your vector store for ingestion`

`$3`

Each chunk produces an output item with:

`json { "pageContent": "This is the actual text content of the chunk...", "metadata": { "document_title": "My Document", "chapter": "Introduction", "section": "Getting Started", "content_type": "overview", "chunk_index": 0, "local_chunk_index": 0, "chapter_index": 0, "total_chunks": 15, "namespace": "my-document", "source_file": "document.md", "character_count": 1850, "processing_timestamp": "2025-01-15T10:30:00.000Z" }, "document_title": "My Document", "document_title_clean": "my-document", "chapter": "Introduction", "section": "Getting Started", "chapter_clean": "introduction", "section_clean": "getting-started", "namespace": "my-document" }`

`Markdown Support`

When Parse Markdown is enabled, the node recognizes:

- Headings: #, ##, ###`, etc. for chapter and section detection
- Structure: Automatically organizes content by heading hierarchy
- Lists: Preserves list formatting in chunks
- Code Blocks: Keeps code blocks intact when possible

How It Works

1. Title Extraction: Automatically detects document title from:
- Metadata fields (title, info.Title, metadata['dc:title'])
- File name
- First heading in markdown
- First meaningful line of text

2. Structure Analysis:
- Detects chapters (H1, H2 headings or specific patterns)
- Identifies sections (H3-H6 headings or subsection patterns)
- Classifies content type (examples, basics, advanced, etc.)

3. Intelligent Chunking:
- Splits by paragraphs first
- Falls back to sentence splitting for long paragraphs
- Respects chunk size limits
- Filters out very short chunks

4. Metadata Enrichment:
- Global chunk indexing across entire document
- Local chunk indexing within sections
- Content type classification
- Timestamp and source tracking

Compatibility

- Tested with n8n version 1.0.0+
- Works with all vector store nodes (Pinecone, Qdrant, Supabase, etc.)
- Compatible with LangChain nodes

Resources

* n8n community nodes documentation
* n8n documentation
* n8n community forum

License

MIT

n8n-nodes-vector-store-processor

n8n is a fair-code licensed workflow automation platform.

Features

Installation

Follow the installation guide in the n8n community nodes documentation.

$3

1. Go to Settings > Community Nodes
2. Select Install
3. Enter n8n-nodes-vector-store-processor in Enter npm package name
4. Agree to the risks and select Install

$3

To install manually, navigate to your n8n installation directory and run:

``bash npm install n8n-nodes-vector-store-processor`

`$3`

IMPORTANT: For optimal memory management when using the Smart Qdrant Vector Store node with large documents, you must start n8n with the --expose-gc flag to enable garbage collection:

`bash

`For systemd service (recommended)`


sudo systemctl edit n8n
Add this line under [Service]:

Environment="NODE_OPTIONS=--expose-gc"
Or start n8n directly with:

NODE_OPTIONS="--expose-gc" n8n start
Or for Docker:

docker run -e NODE_OPTIONS="--expose-gc" n8nio/n8n

`Operations`

The Vector Store Processor node provides the following configuration options:

`$3`


- Run Once for All Items: Combines all input items into a single document before processing
- Run Once for Each Item: Processes each input item as a separate document
$3

- Text Field: Process text from a JSON field
- Binary File: Process text from a binary file (supports .txt, .md, .pdf text extraction, etc.)
$3
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| Mode | Options | Run Once for Each Item | Processing mode |
| Input Type | Options | Text Field | Source of text data |
| Text Field | String | text | Name of the field containing text (when Input Type is Text Field) |
| Binary Property | String | data | Name of the binary property (when Input Type is Binary File) |
| Document Title | String | (auto-detect) | Override the document title |
| Chunk Size | Number | 2000 | Maximum characters per chunk |
| Namespace | String | (auto-generate) | Namespace for vector store organization |
| Parse Markdown | Boolean | true | Enable markdown structure parsing |
Usage
$3

`$3`

Each chunk produces an output item with:

`Markdown Support`

When Parse Markdown is enabled, the node recognizes:

How It Works

1. Title Extraction: Automatically detects document title from:
- Metadata fields (title, info.Title, metadata['dc:title'])
- File name
- First heading in markdown
- First meaningful line of text

3. Intelligent Chunking:
- Splits by paragraphs first
- Falls back to sentence splitting for long paragraphs
- Respects chunk size limits
- Filters out very short chunks

4. Metadata Enrichment:
- Global chunk indexing across entire document
- Local chunk indexing within sections
- Content type classification
- Timestamp and source tracking

Compatibility

- Tested with n8n version 1.0.0+
- Works with all vector store nodes (Pinecone, Qdrant, Supabase, etc.)
- Compatible with LangChain nodes

Resources

* n8n community nodes documentation
* n8n documentation
* n8n community forum

License

MIT