RAG System Package

![npm version](https://badge.fury.io/js/rag-system-pgvector)
![License: MIT](https://opensource.org/licenses/MIT)

A production-ready Retrieval-Augmented Generation (RAG) system package built with PostgreSQL pgvector, LangChain, and LangGraph. Supports multiple AI providers including OpenAI, Anthropic, HuggingFace, Azure, Google AI, and local models.

🚀 Features

- 📦 Easy Integration: Simple npm install and ready-to-use API
- 🤖 Multi-Provider Support: OpenAI, Anthropic, HuggingFace, Azure, Google AI, Ollama
- 📚 Multi-format Support: PDF, DOCX, TXT, HTML, Markdown, JSON
- 🔍 Vector Search: High-performance similarity search with pgvector
- 🎯 Structured Data Queries: Accept JSON data for precise, contextual responses
- 💬 Chat History Support: Full conversation memory with summarization
- ⚡ Production Ready: Error handling, connection pooling, monitoring
- 🔧 Flexible Configuration: Choose your preferred embedding and LLM providers
- 💾 Buffer Processing: Process documents directly from memory buffers
- 🌐 URL Processing: Download and process documents from web URLs
- 📊 Batch Operations: Efficient processing of multiple documents

📦 Installation

bash

npm install rag-system-pgvector



Choose your AI provider (one or more):

npm install @langchain/openai          # For OpenAI

npm install @langchain/anthropic       # For Anthropic Claude

npm install @langchain/azure-openai    # For Azure OpenAI

npm install @langchain/google-genai    # For Google AI

npm install @langchain/community       # For HuggingFace, Ollama, etc.





🚀 Quick Start



$3

javascript

import { RAGSystem } from 'rag-system-pgvector';

import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';



// Create provider instances

const embeddings = new OpenAIEmbeddings({

  openAIApiKey: 'your-openai-api-key',

  modelName: 'text-embedding-ada-002',

});



const llm = new ChatOpenAI({

  openAIApiKey: 'your-openai-api-key',

  modelName: 'gpt-4',

  temperature: 0.7,

});



// Initialize RAG system

const rag = new RAGSystem({

  database: {

    host: 'localhost',

    database: 'your_db',

    username: 'postgres',

    password: 'your_password'

  },

  embeddings: embeddings,

  llm: llm,

  embeddingDimensions: 1536,

});



await rag.initialize();



// Add documents and query

await rag.addDocuments(['./docs/file1.pdf', './docs/file2.txt']);



// Simple query

const result = await rag.query("What is the main topic?");

console.log(result.answer);



// Query with structured data for precise responses

const structuredResult = await rag.query("Tell me about iPhone features", {

  structuredData: {

    intent: "product_information",

    entities: { product: "iPhone", category: "smartphone" },

    constraints: ["Focus on latest features", "Include specifications"],

    responseFormat: "structured_list"

  }

});

console.log(structuredResult.answer);

$3

javascript

import { RAGSystem } from 'rag-system-pgvector';

import { OpenAIEmbeddings } from '@langchain/openai';

import { ChatAnthropic } from '@langchain/anthropic';



// Use OpenAI for embeddings, Anthropic for chat

const embeddings = new OpenAIEmbeddings({

  openAIApiKey: 'your-openai-api-key',

  modelName: 'text-embedding-ada-002',

});



const llm = new ChatAnthropic({

  anthropicApiKey: 'your-anthropic-api-key',

  modelName: 'claude-3-haiku-20240307',

  temperature: 0.7,

});



const rag = new RAGSystem({

  database: { / your config / },

  embeddings: embeddings,

  llm: llm,

  embeddingDimensions: 1536,

});

$3

javascript

import { RAGSystem } from 'rag-system-pgvector';

import { HuggingFaceTransformersEmbeddings } from '@langchain/community/embeddings/hf_transformers';

import { Ollama } from '@langchain/community/llms/ollama';



// Use local models (no API keys required)

const embeddings = new HuggingFaceTransformersEmbeddings({

  modelName: 'sentence-transformers/all-MiniLM-L6-v2',

});



const llm = new Ollama({

  baseUrl: 'http://localhost:11434',

  model: 'llama2',

});



const rag = new RAGSystem({

  database: { / your config / },

  embeddings: embeddings,

  llm: llm,

  embeddingDimensions: 384, // all-MiniLM-L6-v2 dimensions

});

$3

javascript

import { DocumentProcessor } from 'rag-system-pgvector/utils';



const processor = new DocumentProcessor();



// Process document from Buffer

const buffer = fs.readFileSync('document.pdf');

const result = await processor.processDocumentFromBuffer(

    buffer, 

    'document.pdf', 

    'pdf',

    { source: 'api-upload', category: 'research' }

);



console.log(result.chunks); // Processed chunks with embeddings

$3

javascript

import { DocumentProcessor } from 'rag-system-pgvector/utils';



const processor = new DocumentProcessor();



// Process single URL

const result = await processor.processDocumentFromUrl(

    'https://example.com/document.pdf',

    { source: 'web-crawl', priority: 'high' }

);



// Process multiple URLs

const urls = [

    'https://example.com/doc1.pdf',

    'https://example.com/doc2.html',

    'https://example.com/doc3.md'

];



const results = await processor.processDocumentsFromUrls(urls, {

    source: 'batch-import',

    maxConcurrent: 3

});



console.log(

Processed ${results.successful.length} documents

);





🎯 Structured Data Queries (New in v2.2.0)



The RAG system now supports structured JSON data alongside natural language queries for more precise and contextual responses.



$3

javascript

const result = await rag.query("Tell me about iPhone features", {

  structuredData: {

    intent: "product_information",

    entities: {

      product: "iPhone",

      category: "smartphone",

      brand: "Apple"

    },

    constraints: [

      "Focus on latest model features",

      "Include technical specifications"

    ],

    context: {

      userType: "potential_buyer",

      priceRange: "premium"

    },

    responseFormat: "structured_list"

  }

});

$3

javascript

const result = await rag.query("My device won't connect to WiFi", {

  structuredData: {

    intent: "troubleshooting",

    entities: {

      issue_type: "connectivity",

      device_category: "mobile",

      problem_area: "wifi"

    },

    constraints: [

      "Provide step-by-step solution",

      "Include alternative methods"

    ],

    responseFormat: "step_by_step_guide"

  }

});

$3

javascript

const result = await rag.query("Compare iPhone vs Samsung Galaxy", {

  structuredData: {

    intent: "comparison",

    entities: {

      item1: "iPhone",

      item2: "Samsung Galaxy"

    },

    constraints: [

      "Compare key specifications",

      "Highlight main differences"

    ],

    responseFormat: "comparison_table"

  }

});

$3

javascript

const result = await rag.query("What about the camera quality?", {

  chatHistory: [

    { role: 'user', content: 'Tell me about iPhone features' },

    { role: 'assistant', content: 'The iPhone offers excellent features...' }

  ],

  structuredData: {

    intent: "follow_up_question",

    entities: {

      topic: "camera",

      context_reference: "previous_iphone_discussion"

    },

    responseFormat: "detailed_explanation"

  }

});

$3

typescript

interface StructuredData {

  intent: string;                    // Query intent/category (required)

  entities?: {                       // Named entities and values

    [key: string]: string | number;

  };

  constraints?: string[];            // Requirements/constraints

  context?: {                        // Additional context

    [key: string]: string | number | boolean;

  };

  responseFormat?: string;           // Desired response format

}





$3

-

product_information

 - Product details and specifications

-

troubleshooting

 - Problem-solving and technical support

-

comparison

 - Comparing multiple items

-

how_to_guide

 - Step-by-step instructions

-

explanation

 - Detailed explanations

-

follow_up_question

 - Context-aware follow-ups



$3

-

structured_list

 - Organized bullet points

-

step_by_step_guide

 - Numbered instructions

-

comparison_table

 - Side-by-side comparison

-

detailed_explanation

 - Comprehensive explanation

-

bullet_points

 - Simple bullet format

-

json_format

 - Structured JSON response



$3

javascript

import RAGSystem from 'rag-system-pgvector';



const rag = new RAGSystem(config);

await rag.initialize();



// Add documents with user/knowledgebot metadata

const documentData = await processor.processDocumentFromBuffer(

    buffer, 

    'user-manual.pdf', 

    'pdf',

    {

        userId: 'user_123',

        knowledgebotId: 'tech_support_bot',

        department: 'engineering',

        priority: 'high'

    }

);



await rag.documentStore.saveDocument(documentData);



// Query with user filtering

const userResults = await rag.query('What technical info is available?', {

    userId: 'user_123',

    limit: 5

});



// Query with knowledgebot filtering

const botResults = await rag.query('Help with technical issues', {

    knowledgebotId: 'tech_support_bot'

});



// Query with multiple filters

const filteredResults = await rag.query('Show important documents', {

    userId: 'user_123',

    filter: {

        priority: 'high',

        department: 'engineering'

    }

});



// Direct search with filtering

const searchResults = await rag.searchDocumentsByUserId(

    'documentation',

    'user_123'

);



// Get all documents for a specific user

const userDocs = await rag.getDocumentsByUserId('user_123');





$3



Enable multi-turn conversations with persistent chat history stored in PostgreSQL.



#### Basic Chat History

javascript

// First query

const result1 = await rag.query('What is machine learning?');



// Follow-up with context

const result2 = await rag.query('Can you give me examples?', {

    chatHistory: result1.chatHistory

});



// Another follow-up

const result3 = await rag.query('Which one is most popular?', {

    chatHistory: result2.chatHistory

});





#### Session Persistence

javascript

const sessionId = 'user_conversation_123';



// Query with automatic session save/load

const result = await rag.query('What is machine learning?', {

    sessionId: sessionId,

    persistSession: true,  // Auto-save after query

    userId: 'user_456',

    knowledgebotId: 'tech_bot'

});



// Continue conversation (automatically loads history)

const result2 = await rag.query('Tell me more', {

    sessionId: sessionId,

    persistSession: true

});



// Load session manually

const session = await rag.loadSession(sessionId);

console.log(

Session has ${session.messageCount} messages

);



// Get all user sessions

const userSessions = await rag.getUserSessions('user_456');

console.log(

User has ${userSessions.length} sessions

);



// Get session statistics

const stats = await rag.getSessionStats({ userId: 'user_456' });

console.log(

Total messages: ${stats.totalMessages}

);





#### History Summarization

javascript

// Long conversations are automatically managed

const result = await rag.query('Complex question', {

    sessionId: sessionId,

    persistSession: true,

    maxHistoryLength: 20  // Keeps recent 20 messages

});





#### Testing Chat Features

bash

Basic chat history

npm run test:chat:basic



Session management

npm run test:chat:session



History summarization

npm run test:chat:summarization



Session persistence

npm run test:chat:persistence





Documentation:

- 📖 Chat History Implementation Guide

- 📖 Session Persistence Guide

- 📖 Chat History Summarization



📚 API Documentation



$3



The

DocumentProcessor

 class provides powerful document processing capabilities for files, buffers, and URLs.



#### Buffer Processing Methods



#####

processDocumentFromBuffer(buffer, fileName, fileType, metadata = {})





Process a document directly from a memory buffer.

javascript

import { DocumentProcessor } from 'rag-system-pgvector/utils';



const processor = new DocumentProcessor();

const buffer = Buffer.from('This is a test document', 'utf8');



const result = await processor.processDocumentFromBuffer(

    buffer,

    'test.txt',

    'txt',

    { source: 'api', category: 'test' }

);



// Returns:

// {

//   title: 'Test Document',

//   content: 'This is a test document',

//   chunks: [...], // Array of processed chunks with embeddings

//   metadata: { ... },

//   fileType: 'txt',

//   filePath: 'test.txt'

// }





Parameters:

-

buffer

 (Buffer): The document content as a Buffer object

-

fileName

 (string): Name of the file (used for metadata)

-

fileType

 (string): File type ('pdf', 'docx', 'txt', 'html', 'md', 'json')

-

metadata

 (object): Additional metadata to attach to the document



Supported Buffer Types:

- TXT: Plain text files

- HTML: HTML documents (extracts text content)

- Markdown: Markdown files

- JSON: JSON files (converts to readable text)



#####

extractTextFromBuffer(buffer, fileType)





Extract raw text from a buffer without processing into chunks.

javascript

const text = await processor.extractTextFromBuffer(buffer, 'html');

console.log(text); // Extracted plain text





#### URL Processing Methods



#####

processDocumentFromUrl(url, metadata = {})





Download and process a document from a URL.

javascript

const result = await processor.processDocumentFromUrl(

    'https://example.com/document.pdf',

    { 

        source: 'web-crawl',

        priority: 'high',

        category: 'research' 

    }

);



// Automatically detects file type from URL and content headers

// Downloads to temp directory and processes





Parameters:

-

url

 (string): HTTP/HTTPS URL to download from

-

metadata

 (object): Additional metadata for the document



Features:

- Automatic file type detection from URL extension and Content-Type headers

- Temporary file handling (auto-cleanup)

- Support for redirects and various HTTP response types

- Comprehensive error handling



#####

processDocumentsFromUrls(urls, options = {})





Process multiple URLs in parallel with concurrency control.

javascript

const urls = [

    'https://site1.com/doc1.pdf',

    'https://site2.com/doc2.html',

    'https://site3.com/doc3.md'

];



const results = await processor.processDocumentsFromUrls(urls, {

    maxConcurrent: 3,           // Process up to 3 URLs simultaneously

    metadata: { batch: 'import-2024' },

    timeout: 30000,             // 30 second timeout per URL

    retries: 2                  // Retry failed downloads

});



// Returns:

// {

//   successful: [...],         // Array of successfully processed documents

//   failed: [...],            // Array of failed URLs with error details

//   total: 3,

//   successCount: 2,

//   failureCount: 1

// }





Options:

-

maxConcurrent

 (number): Maximum concurrent downloads (default: 5)

-

metadata

 (object): Metadata applied to all documents

-

timeout

 (number): Timeout per URL in milliseconds

-

retries

 (number): Number of retry attempts for failed downloads



#### Error Handling



All methods include comprehensive error handling:

javascript

try {

    const result = await processor.processDocumentFromBuffer(buffer, 'test.pdf', 'pdf');

} catch (error) {

    if (error.message.includes('Buffer is empty')) {

        console.log('Empty buffer provided');

    } else if (error.message.includes('Unsupported file type')) {

        console.log('File type not supported for buffer processing');

    } else {

        console.log('Processing error:', error.message);

    }

}





#### Integration with RAG System



Use processed documents with the RAG system:

javascript

import RAGSystem from 'rag-system-pgvector';

import { DocumentProcessor } from 'rag-system-pgvector/utils';



const rag = new RAGSystem(config);

const processor = new DocumentProcessor();



await rag.initialize();



// Process from buffer

const buffer = fs.readFileSync('document.pdf');

const processed = await processor.processDocumentFromBuffer(buffer, 'doc.pdf', 'pdf');



// Add to RAG system

await rag.documentStore.saveDocument(processed);



// Process from URL and add to RAG

const urlProcessed = await processor.processDocumentFromUrl('https://example.com/doc.html');

await rag.documentStore.saveDocument(urlProcessed);



// Now query across all documents

const answer = await rag.query('What information is available?');





🌐 With Web Interface

javascript

const rag = new RAGSystem({

    // ... configuration

    server: { port: 3000, enableWebUI: true }

});



await rag.initialize();

await rag.startServer();

// Visit http://localhost:3000





📖 Documentation



- 📚 Complete Package Documentation - Full API reference and examples

- 🔧 Integration Guide - Step-by-step integration examples

- 🎯 Examples - Ready-to-run examples



⚡ Quick Examples



Run the included examples:

bash

Basic usage example

npm run example:basic



Web server example  

npm run example:server



Advanced integration example

npm run example:advanced



Usage patterns overview

npm run example:patterns





🛠️ Development & Contributing



For local development and contributions:



$3



- Node.js v18+ 

- PostgreSQL v12+ with pgvector extension

- OpenAI API Key



$3

bash

Clone and install

git clone https://github.com/yourusername/rag-system-pgvector.git

cd rag-system-pgvector

npm install



Configure environment

cp .env.example .env

Edit .env with your credentials



Initialize database

npm run setup



Start development

npm run dev

$3

bash

Run examples

npm run example:basic



Run with web interface

npm run example:server

bash

curl -X POST http://localhost:3000/documents/upload \

  -F "document=@path/to/your/document.pdf" \

  -F "title=My Document"





#### Process Document from File Path

bash

curl -X POST http://localhost:3000/documents/process \

  -H "Content-Type: application/json" \

  -d '{

    "filePath": "/path/to/document.pdf",

    "title": "My Document"

  }'





#### Search/Query

bash

curl -X POST http://localhost:3000/search \

  -H "Content-Type: application/json" \

  -d '{

    "query": "What is the main topic of the document?",

    "sessionId": "optional-session-id"

  }'





#### Get All Documents

bash

curl http://localhost:3000/documents





#### Get Specific Document

bash

curl http://localhost:3000/documents/{document-id}





#### Delete Document

bash

curl -X DELETE http://localhost:3000/documents/{document-id}





$3



#### Process Documents from Directory

bash

npm run process-docs /path/to/documents/folder





#### Interactive Search

bash

npm run search





#### Single Query Search

bash

npm run search "Your question here"





🏗️ Architecture



$3



1. Document Processor (

src/utils/documentProcessor.js

)

   - Extracts text from various file formats

   - Splits documents into chunks with configurable overlap

   - Generates embeddings using OpenAI



2. Document Store (

src/services/documentStore.js

)

   - Manages document and chunk storage in PostgreSQL

   - Performs vector similarity search using pgvector

   - Handles CRUD operations



3. RAG Workflow (

src/workflows/ragWorkflow.js

)

   - LangGraph-based workflow orchestration

   - Three-step process: Retrieve → Rerank → Generate

   - Supports conversational context



4. API Server (

src/index.js

)

   - Express.js REST API

   - File upload handling

   - Conversation session management



$3

sql

-- Documents table

CREATE TABLE documents (

  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

  title VARCHAR(255) NOT NULL,

  content TEXT NOT NULL,

  file_path VARCHAR(500),

  file_type VARCHAR(50),

  metadata JSONB DEFAULT '{}',

  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP

);



-- Document chunks with embeddings

CREATE TABLE document_chunks (

  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

  document_id UUID REFERENCES documents(id) ON DELETE CASCADE,

  chunk_index INTEGER NOT NULL,

  content TEXT NOT NULL,

  embedding vector(1536),

  metadata JSONB DEFAULT '{}',

  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP

);



-- Search sessions for tracking

CREATE TABLE search_sessions (

  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

  query TEXT NOT NULL,

  results JSONB,

  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP

);



-- Chat Sessions for conversation persistence (NEW)

CREATE TABLE chat_sessions (

  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

  session_id VARCHAR(255) UNIQUE NOT NULL,

  user_id VARCHAR(255),

  knowledgebot_id VARCHAR(255),

  history JSONB DEFAULT '[]'::jsonb,

  metadata JSONB DEFAULT '{}'::jsonb,

  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

  last_activity TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

  message_count INTEGER DEFAULT 0

);



-- Indexes for chat sessions

CREATE INDEX idx_chat_sessions_session_id ON chat_sessions(session_id);

CREATE INDEX idx_chat_sessions_user_id ON chat_sessions(user_id);

CREATE INDEX idx_chat_sessions_knowledgebot_id ON chat_sessions(knowledgebot_id);

CREATE INDEX idx_chat_sessions_last_activity ON chat_sessions(last_activity);

$3

mermaid

graph TD

    A[Query Input] --> B[Retrieve Node]

    B --> C[Rerank Node]

    C --> D[Generate Node]

    D --> E[Response Output]

    

    B --> F[Vector Search]

    F --> G[Similar Chunks]

    

    C --> H[Score Ranking]

    H --> I[Top Chunks]

    

    D --> J[LLM Generation]

    J --> K[Contextual Response]





🔧 Configuration



The RAG system is highly configurable. You can customize every aspect of its behavior through the constructor configuration object.



$3

javascript

import RAGSystem from 'rag-system-pgvector';

import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';



const rag = new RAGSystem({

  // ========================================

  // 1. Database Configuration (Required)

  // ========================================

  database: {

    host: 'localhost',              // Database host

    port: 5432,                     // Database port

    database: 'rag_db',             // Database name

    username: 'postgres',           // Database user

    password: 'your_password',      // Database password

    

    // Connection Pool Settings

    max: 10,                        // Max connections in pool

    min: 0,                         // Min connections in pool

    maxUses: Infinity,              // Max uses per connection

    allowExitOnIdle: false,         // Allow pool to close when idle

    maxLifetimeSeconds: 0,          // Max connection lifetime (0 = unlimited)

    idleTimeoutMillis: 10000        // Idle timeout (10 seconds)

  },



  // ========================================

  // 2. AI Provider Configuration (Required)

  // ========================================

  embeddings: new OpenAIEmbeddings({

    openAIApiKey: process.env.OPENAI_API_KEY,

    modelName: 'text-embedding-ada-002'

  }),

  

  llm: new ChatOpenAI({

    openAIApiKey: process.env.OPENAI_API_KEY,

    modelName: 'gpt-4',

    temperature: 0.7

  }),



  // ========================================

  // 3. Embedding Configuration

  // ========================================

  embeddingDimensions: 1536,        // Dimensions for embeddings

                                    // OpenAI ada-002: 1536

                                    // HuggingFace MiniLM: 384

                                    // Anthropic: varies



  // ========================================

  // 4. Vector Store Configuration

  // ========================================

  vectorStore: {

    tableName: 'document_chunks_vector',

    vectorColumnName: 'embedding',

    contentColumnName: 'content',

    metadataColumnName: 'metadata'

  },



  // ========================================

  // 5. Document Processing Configuration

  // ========================================

  processing: {

    chunkSize: 1000,                // Characters per chunk

    chunkOverlap: 200               // Overlap between chunks

  },



  // ========================================

  // 6. Chat History Configuration (NEW)

  // ========================================

  chatHistory: {

    enabled: true,                  // Enable chat history feature

    maxMessages: 20,                // Max messages before management kicks in

    maxTokens: 3000,                // Max tokens in chat history

    summarizeThreshold: 30,         // Trigger summarization after N messages

    keepRecentCount: 10,            // Recent messages to preserve

    alwaysKeepFirst: true,          // Always keep conversation starter

    persistSessions: true,          // Store sessions in database

    sessionTimeout: 3600000         // Session timeout (1 hour in ms)

  }

});



await rag.initialize();





$3



#### 1. Database Configuration



Controls PostgreSQL connection and pool behavior:

javascript

database: {

  host: 'localhost',              // Where PostgreSQL is running

  port: 5432,                     // PostgreSQL port (default: 5432)

  database: 'rag_db',             // Your database name

  username: 'postgres',           // Database user

  password: 'your_password',      // User password

  

  // Pool Settings (Advanced)

  max: 10,                        // Maximum concurrent connections

  min: 0,                         // Minimum idle connections

  idleTimeoutMillis: 10000        // Close idle connections after 10s

}





Best Practices:

- Use environment variables for sensitive data

- Set

max

 based on your application's concurrency needs

- Monitor connection pool usage in production



#### 2. AI Provider Configuration



Specify your embedding and language model providers:



OpenAI Example:

javascript

import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';



embeddings: new OpenAIEmbeddings({

  openAIApiKey: process.env.OPENAI_API_KEY,

  modelName: 'text-embedding-ada-002'

}),



llm: new ChatOpenAI({

  openAIApiKey: process.env.OPENAI_API_KEY,

  modelName: 'gpt-4',

  temperature: 0.7

})





Anthropic Example:

javascript

import { OpenAIEmbeddings } from '@langchain/openai';

import { ChatAnthropic } from '@langchain/anthropic';



embeddings: new OpenAIEmbeddings({

  openAIApiKey: process.env.OPENAI_API_KEY,

  modelName: 'text-embedding-ada-002'

}),



llm: new ChatAnthropic({

  anthropicApiKey: process.env.ANTHROPIC_API_KEY,

  modelName: 'claude-3-sonnet-20240229',

  temperature: 0.7

})





Local Models Example:

javascript

import { HuggingFaceTransformersEmbeddings } from '@langchain/community/embeddings/hf_transformers';

import { Ollama } from '@langchain/community/llms/ollama';



embeddings: new HuggingFaceTransformersEmbeddings({

  modelName: 'sentence-transformers/all-MiniLM-L6-v2'

}),



llm: new Ollama({

  baseUrl: 'http://localhost:11434',

  model: 'llama2'

})





#### 3. Embedding Dimensions



Match this to your embedding model's output dimensions:



| Model | Dimensions | Provider |

|-------|------------|----------|

| text-embedding-ada-002 | 1536 | OpenAI |

| all-MiniLM-L6-v2 | 384 | HuggingFace |

| text-embedding-3-small | 1536 | OpenAI |

| text-embedding-3-large | 3072 | OpenAI |

javascript

embeddingDimensions: 1536  // Must match your embedding model





Important: If you change embedding models, you must recreate the database schema!



#### 4. Vector Store Configuration



Customize the vector store table structure:

javascript

vectorStore: {

  tableName: 'document_chunks_vector',    // Table name for vectors

  vectorColumnName: 'embedding',          // Column for embeddings

  contentColumnName: 'content',           // Column for text content

  metadataColumnName: 'metadata'          // Column for metadata

}





Most users can use the defaults.



#### 5. Document Processing



Control how documents are chunked:

javascript

processing: {

  chunkSize: 1000,      // Characters per chunk (500-2000 recommended)

  chunkOverlap: 200     // Overlap between chunks (10-20% of chunkSize)

}





Guidelines:

- Small chunks (500): Better precision, more chunks, higher cost

- Large chunks (2000): Better context, fewer chunks, lower cost

- Overlap: Prevents context loss at boundaries (typically 10-20%)



Examples:

javascript

// For technical documentation (needs precision)

processing: { chunkSize: 800, chunkOverlap: 150 }



// For books/long content (needs context)

processing: { chunkSize: 1500, chunkOverlap: 300 }



// For code documentation (needs structure)

processing: { chunkSize: 1000, chunkOverlap: 200 }





#### 6. Chat History Configuration (NEW in v2.3.0)



Control conversation history management:

javascript

chatHistory: {

  enabled: true,                  // Enable/disable chat history

  maxMessages: 20,                // Start management after N messages

  maxTokens: 3000,                // Maximum tokens in history

  summarizeThreshold: 30,         // Summarize after N messages

  keepRecentCount: 10,            // Recent messages to always keep

  alwaysKeepFirst: true,          // Keep conversation starter

  persistSessions: true,          // Store in database

  sessionTimeout: 3600000         // 1 hour timeout (in milliseconds)

}





Chat History Options Explained:



-

enabled

: Master switch for chat history feature

-

maxMessages

: Soft limit before history management activates

-

maxTokens

: Hard limit on token count (prevents API errors)

-

summarizeThreshold

: When to trigger LLM-based summarization

-

keepRecentCount

: Recent messages to preserve during summarization

-

alwaysKeepFirst

: Preserve conversation context from the beginning

-

persistSessions

: Save sessions to database for persistence

-

sessionTimeout

: Milliseconds before session is considered inactive



Preset Configurations:

javascript

// Minimal (cost-effective)

chatHistory: {

  enabled: true,

  maxMessages: 10,

  maxTokens: 1500,

  summarizeThreshold: 15,

  keepRecentCount: 5,

  persistSessions: false

}



// Balanced (recommended)

chatHistory: {

  enabled: true,

  maxMessages: 20,

  maxTokens: 3000,

  summarizeThreshold: 30,

  keepRecentCount: 10,

  persistSessions: true

}



// Maximum context (for complex conversations)

chatHistory: {

  enabled: true,

  maxMessages: 40,

  maxTokens: 6000,

  summarizeThreshold: 50,

  keepRecentCount: 20,

  persistSessions: true

}



// Disabled (for single-shot queries)

chatHistory: {

  enabled: false

}





$3



Create a

.env

 file for sensitive configuration:

env

Database

DB_HOST=localhost

DB_PORT=5432

DB_NAME=rag_db

DB_USER=postgres

DB_PASSWORD=your_secure_password



OpenAI

OPENAI_API_KEY=sk-...



Anthropic (optional)

ANTHROPIC_API_KEY=sk-ant-...



Azure (optional)

AZURE_OPENAI_API_KEY=...

AZURE_OPENAI_ENDPOINT=https://...



Processing (optional)

CHUNK_SIZE=1000

CHUNK_OVERLAP=200

EMBEDDING_DIMENSIONS=1536





Then use in your code:

javascript

import 'dotenv/config';



const rag = new RAGSystem({

  database: {

    host: process.env.DB_HOST,

    port: parseInt(process.env.DB_PORT),

    database: process.env.DB_NAME,

    username: process.env.DB_USER,

    password: process.env.DB_PASSWORD

  },

  embeddings: new OpenAIEmbeddings({

    openAIApiKey: process.env.OPENAI_API_KEY

  }),

  llm: new ChatOpenAI({

    openAIApiKey: process.env.OPENAI_API_KEY

  }),

  embeddingDimensions: parseInt(process.env.EMBEDDING_DIMENSIONS || '1536')

});





$3



You can also configure behavior at query time:

javascript

const result = await rag.query('Your question', {

  // Filtering

  userId: 'user_123',               // Filter by user

  knowledgebotId: 'bot_456',        // Filter by bot

  filter: { category: 'tech' },     // Custom metadata filters

  

  // Retrieval

  limit: 10,                        // Number of chunks to retrieve

  threshold: 0.5,                   // Similarity threshold (0-1)

  

  // Chat History

  chatHistory: previousHistory,     // Previous conversation

  maxHistoryLength: 15,             // Override default history length

  sessionId: 'session_789',         // Session identifier

  persistSession: true,             // Save session to database

  

  // Context

  context: additionalContext,       // Extra context to include

  metadata: { source: 'api' }       // Custom metadata

});





$3



1. Security: Never hardcode API keys or passwords

2. Environment-Specific: Use different configs for dev/staging/prod

3. Performance: Monitor and adjust based on usage patterns

4. Cost: Balance context size with API costs

5. Testing: Test with different configurations to find optimal settings



📊 Performance Optimization



$3



The system creates optimized indexes:

sql

-- For vector similarity search

CREATE INDEX idx_document_chunks_embedding 

ON document_chunks USING ivfflat (embedding vector_cosine_ops)

WITH (lists = 100);



-- For document relationships

CREATE INDEX idx_document_chunks_document_id 

ON document_chunks(document_id);





$3



- Recursive Character Text Splitter: Preserves semantic boundaries

- Configurable overlap: Ensures context continuity

- Multiple separators: Prioritizes paragraph, sentence, then word boundaries



🧪 Testing



$3

bash

Create test documents directory

mkdir test-docs



Add some test files (PDF, DOCX, TXT, etc.)

Then process them

npm run process-docs ./test-docs

$3

bash

Interactive search

npm run search



Or single query

npm run search "What is machine learning?"





🔍 Troubleshooting



$3



1. pgvector extension not found

sql

   -- Install pgvector extension

   CREATE EXTENSION IF NOT EXISTS vector;





2. OpenAI API quota exceeded

   - Check your OpenAI API usage

   - Consider using alternative embedding models



3. Large document processing fails

   - Increase chunk size or reduce document size

   - Check memory limits



4. Poor search results

   - Lower similarity threshold

   - Adjust chunk size and overlap

   - Verify document content quality



$3



Enable verbose logging by setting:

env

NODE_ENV=development

🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Commit your changes
4. Push to the branch
5. Create a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

- LangChain for the excellent AI/ML framework
- pgvector for vector similarity search
- OpenAI for embedding and language models

📚 Additional Resources

- RAG Best Practices
- pgvector Documentation
- LangGraph Documentation
- OpenAI Embeddings Guide

RAG System Package

🚀 Features

📦 Installation

bash

npm install rag-system-pgvector



Choose your AI provider (one or more):

npm install @langchain/openai          # For OpenAI

npm install @langchain/anthropic       # For Anthropic Claude

npm install @langchain/azure-openai    # For Azure OpenAI

npm install @langchain/google-genai    # For Google AI

npm install @langchain/community       # For HuggingFace, Ollama, etc.





🚀 Quick Start



$3

javascript

import { RAGSystem } from 'rag-system-pgvector';

import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';



// Create provider instances

const embeddings = new OpenAIEmbeddings({

  openAIApiKey: 'your-openai-api-key',

  modelName: 'text-embedding-ada-002',

});



const llm = new ChatOpenAI({

  openAIApiKey: 'your-openai-api-key',

  modelName: 'gpt-4',

  temperature: 0.7,

});



// Initialize RAG system

const rag = new RAGSystem({

  database: {

    host: 'localhost',

    database: 'your_db',

    username: 'postgres',

    password: 'your_password'

  },

  embeddings: embeddings,

  llm: llm,

  embeddingDimensions: 1536,

});



await rag.initialize();



// Add documents and query

await rag.addDocuments(['./docs/file1.pdf', './docs/file2.txt']);



// Simple query

const result = await rag.query("What is the main topic?");

console.log(result.answer);



// Query with structured data for precise responses

const structuredResult = await rag.query("Tell me about iPhone features", {

  structuredData: {

    intent: "product_information",

    entities: { product: "iPhone", category: "smartphone" },

    constraints: ["Focus on latest features", "Include specifications"],

    responseFormat: "structured_list"

  }

});

console.log(structuredResult.answer);

$3

javascript

import { RAGSystem } from 'rag-system-pgvector';

import { OpenAIEmbeddings } from '@langchain/openai';

import { ChatAnthropic } from '@langchain/anthropic';



// Use OpenAI for embeddings, Anthropic for chat

const embeddings = new OpenAIEmbeddings({

  openAIApiKey: 'your-openai-api-key',

  modelName: 'text-embedding-ada-002',

});



const llm = new ChatAnthropic({

  anthropicApiKey: 'your-anthropic-api-key',

  modelName: 'claude-3-haiku-20240307',

  temperature: 0.7,

});



const rag = new RAGSystem({

  database: { / your config / },

  embeddings: embeddings,

  llm: llm,

  embeddingDimensions: 1536,

});

$3

javascript

import { RAGSystem } from 'rag-system-pgvector';

import { HuggingFaceTransformersEmbeddings } from '@langchain/community/embeddings/hf_transformers';

import { Ollama } from '@langchain/community/llms/ollama';



// Use local models (no API keys required)

const embeddings = new HuggingFaceTransformersEmbeddings({

  modelName: 'sentence-transformers/all-MiniLM-L6-v2',

});



const llm = new Ollama({

  baseUrl: 'http://localhost:11434',

  model: 'llama2',

});



const rag = new RAGSystem({

  database: { / your config / },

  embeddings: embeddings,

  llm: llm,

  embeddingDimensions: 384, // all-MiniLM-L6-v2 dimensions

});

$3

javascript

import { DocumentProcessor } from 'rag-system-pgvector/utils';



const processor = new DocumentProcessor();



// Process document from Buffer

const buffer = fs.readFileSync('document.pdf');

const result = await processor.processDocumentFromBuffer(

    buffer, 

    'document.pdf', 

    'pdf',

    { source: 'api-upload', category: 'research' }

);



console.log(result.chunks); // Processed chunks with embeddings

$3

javascript

import { DocumentProcessor } from 'rag-system-pgvector/utils';



const processor = new DocumentProcessor();



// Process single URL

const result = await processor.processDocumentFromUrl(

    'https://example.com/document.pdf',

    { source: 'web-crawl', priority: 'high' }

);



// Process multiple URLs

const urls = [

    'https://example.com/doc1.pdf',

    'https://example.com/doc2.html',

    'https://example.com/doc3.md'

];



const results = await processor.processDocumentsFromUrls(urls, {

    source: 'batch-import',

    maxConcurrent: 3

});



console.log(

Processed ${results.successful.length} documents

);





🎯 Structured Data Queries (New in v2.2.0)



The RAG system now supports structured JSON data alongside natural language queries for more precise and contextual responses.



$3

javascript

const result = await rag.query("Tell me about iPhone features", {

  structuredData: {

    intent: "product_information",

    entities: {

      product: "iPhone",

      category: "smartphone",

      brand: "Apple"

    },

    constraints: [

      "Focus on latest model features",

      "Include technical specifications"

    ],

    context: {

      userType: "potential_buyer",

      priceRange: "premium"

    },

    responseFormat: "structured_list"

  }

});

$3

javascript

const result = await rag.query("My device won't connect to WiFi", {

  structuredData: {

    intent: "troubleshooting",

    entities: {

      issue_type: "connectivity",

      device_category: "mobile",

      problem_area: "wifi"

    },

    constraints: [

      "Provide step-by-step solution",

      "Include alternative methods"

    ],

    responseFormat: "step_by_step_guide"

  }

});

$3

javascript

const result = await rag.query("Compare iPhone vs Samsung Galaxy", {

  structuredData: {

    intent: "comparison",

    entities: {

      item1: "iPhone",

      item2: "Samsung Galaxy"

    },

    constraints: [

      "Compare key specifications",

      "Highlight main differences"

    ],

    responseFormat: "comparison_table"

  }

});

$3

javascript

const result = await rag.query("What about the camera quality?", {

  chatHistory: [

    { role: 'user', content: 'Tell me about iPhone features' },

    { role: 'assistant', content: 'The iPhone offers excellent features...' }

  ],

  structuredData: {

    intent: "follow_up_question",

    entities: {

      topic: "camera",

      context_reference: "previous_iphone_discussion"

    },

    responseFormat: "detailed_explanation"

  }

});

$3

typescript

interface StructuredData {

  intent: string;                    // Query intent/category (required)

  entities?: {                       // Named entities and values

    [key: string]: string | number;

  };

  constraints?: string[];            // Requirements/constraints

  context?: {                        // Additional context

    [key: string]: string | number | boolean;

  };

  responseFormat?: string;           // Desired response format

}





$3

-

product_information

 - Product details and specifications

-

troubleshooting

 - Problem-solving and technical support

-

comparison

 - Comparing multiple items

-

how_to_guide

 - Step-by-step instructions

-

explanation

 - Detailed explanations

-

follow_up_question

 - Context-aware follow-ups



$3

-

structured_list

 - Organized bullet points

-

step_by_step_guide

 - Numbered instructions

-

comparison_table

 - Side-by-side comparison

-

detailed_explanation

 - Comprehensive explanation

-

bullet_points

 - Simple bullet format

-

json_format

 - Structured JSON response



$3

javascript

import RAGSystem from 'rag-system-pgvector';



const rag = new RAGSystem(config);

await rag.initialize();



// Add documents with user/knowledgebot metadata

const documentData = await processor.processDocumentFromBuffer(

    buffer, 

    'user-manual.pdf', 

    'pdf',

    {

        userId: 'user_123',

        knowledgebotId: 'tech_support_bot',

        department: 'engineering',

        priority: 'high'

    }

);



await rag.documentStore.saveDocument(documentData);



// Query with user filtering

const userResults = await rag.query('What technical info is available?', {

    userId: 'user_123',

    limit: 5

});



// Query with knowledgebot filtering

const botResults = await rag.query('Help with technical issues', {

    knowledgebotId: 'tech_support_bot'

});



// Query with multiple filters

const filteredResults = await rag.query('Show important documents', {

    userId: 'user_123',

    filter: {

        priority: 'high',

        department: 'engineering'

    }

});



// Direct search with filtering

const searchResults = await rag.searchDocumentsByUserId(

    'documentation',

    'user_123'

);



// Get all documents for a specific user

const userDocs = await rag.getDocumentsByUserId('user_123');





$3



Enable multi-turn conversations with persistent chat history stored in PostgreSQL.



#### Basic Chat History

javascript

// First query

const result1 = await rag.query('What is machine learning?');



// Follow-up with context

const result2 = await rag.query('Can you give me examples?', {

    chatHistory: result1.chatHistory

});



// Another follow-up

const result3 = await rag.query('Which one is most popular?', {

    chatHistory: result2.chatHistory

});





#### Session Persistence

javascript

const sessionId = 'user_conversation_123';



// Query with automatic session save/load

const result = await rag.query('What is machine learning?', {

    sessionId: sessionId,

    persistSession: true,  // Auto-save after query

    userId: 'user_456',

    knowledgebotId: 'tech_bot'

});



// Continue conversation (automatically loads history)

const result2 = await rag.query('Tell me more', {

    sessionId: sessionId,

    persistSession: true

});



// Load session manually

const session = await rag.loadSession(sessionId);

console.log(

Session has ${session.messageCount} messages

);



// Get all user sessions

const userSessions = await rag.getUserSessions('user_456');

console.log(

User has ${userSessions.length} sessions

);



// Get session statistics

const stats = await rag.getSessionStats({ userId: 'user_456' });

console.log(

Total messages: ${stats.totalMessages}

);





#### History Summarization

javascript

// Long conversations are automatically managed

const result = await rag.query('Complex question', {

    sessionId: sessionId,

    persistSession: true,

    maxHistoryLength: 20  // Keeps recent 20 messages

});





#### Testing Chat Features

bash

Basic chat history

npm run test:chat:basic



Session management

npm run test:chat:session



History summarization

npm run test:chat:summarization



Session persistence

npm run test:chat:persistence





Documentation:

- 📖 Chat History Implementation Guide

- 📖 Session Persistence Guide

- 📖 Chat History Summarization



📚 API Documentation



$3



The

DocumentProcessor

 class provides powerful document processing capabilities for files, buffers, and URLs.



#### Buffer Processing Methods



#####

processDocumentFromBuffer(buffer, fileName, fileType, metadata = {})





Process a document directly from a memory buffer.

javascript

import { DocumentProcessor } from 'rag-system-pgvector/utils';



const processor = new DocumentProcessor();

const buffer = Buffer.from('This is a test document', 'utf8');



const result = await processor.processDocumentFromBuffer(

    buffer,

    'test.txt',

    'txt',

    { source: 'api', category: 'test' }

);



// Returns:

// {

//   title: 'Test Document',

//   content: 'This is a test document',

//   chunks: [...], // Array of processed chunks with embeddings

//   metadata: { ... },

//   fileType: 'txt',

//   filePath: 'test.txt'

// }





Parameters:

-

buffer

 (Buffer): The document content as a Buffer object

-

fileName

 (string): Name of the file (used for metadata)

-

fileType

 (string): File type ('pdf', 'docx', 'txt', 'html', 'md', 'json')

-

metadata

 (object): Additional metadata to attach to the document



Supported Buffer Types:

- TXT: Plain text files

- HTML: HTML documents (extracts text content)

- Markdown: Markdown files

- JSON: JSON files (converts to readable text)



#####

extractTextFromBuffer(buffer, fileType)





Extract raw text from a buffer without processing into chunks.

javascript

const text = await processor.extractTextFromBuffer(buffer, 'html');

console.log(text); // Extracted plain text





#### URL Processing Methods



#####

processDocumentFromUrl(url, metadata = {})





Download and process a document from a URL.

javascript

const result = await processor.processDocumentFromUrl(

    'https://example.com/document.pdf',

    { 

        source: 'web-crawl',

        priority: 'high',

        category: 'research' 

    }

);



// Automatically detects file type from URL and content headers

// Downloads to temp directory and processes





Parameters:

-

url

 (string): HTTP/HTTPS URL to download from

-

metadata

 (object): Additional metadata for the document



Features:

- Automatic file type detection from URL extension and Content-Type headers

- Temporary file handling (auto-cleanup)

- Support for redirects and various HTTP response types

- Comprehensive error handling



#####

processDocumentsFromUrls(urls, options = {})





Process multiple URLs in parallel with concurrency control.

javascript

const urls = [

    'https://site1.com/doc1.pdf',

    'https://site2.com/doc2.html',

    'https://site3.com/doc3.md'

];



const results = await processor.processDocumentsFromUrls(urls, {

    maxConcurrent: 3,           // Process up to 3 URLs simultaneously

    metadata: { batch: 'import-2024' },

    timeout: 30000,             // 30 second timeout per URL

    retries: 2                  // Retry failed downloads

});



// Returns:

// {

//   successful: [...],         // Array of successfully processed documents

//   failed: [...],            // Array of failed URLs with error details

//   total: 3,

//   successCount: 2,

//   failureCount: 1

// }





Options:

-

maxConcurrent

 (number): Maximum concurrent downloads (default: 5)

-

metadata

 (object): Metadata applied to all documents

-

timeout

 (number): Timeout per URL in milliseconds

-

retries

 (number): Number of retry attempts for failed downloads



#### Error Handling



All methods include comprehensive error handling:

javascript

try {

    const result = await processor.processDocumentFromBuffer(buffer, 'test.pdf', 'pdf');

} catch (error) {

    if (error.message.includes('Buffer is empty')) {

        console.log('Empty buffer provided');

    } else if (error.message.includes('Unsupported file type')) {

        console.log('File type not supported for buffer processing');

    } else {

        console.log('Processing error:', error.message);

    }

}





#### Integration with RAG System



Use processed documents with the RAG system:

javascript

import RAGSystem from 'rag-system-pgvector';

import { DocumentProcessor } from 'rag-system-pgvector/utils';



const rag = new RAGSystem(config);

const processor = new DocumentProcessor();



await rag.initialize();



// Process from buffer

const buffer = fs.readFileSync('document.pdf');

const processed = await processor.processDocumentFromBuffer(buffer, 'doc.pdf', 'pdf');



// Add to RAG system

await rag.documentStore.saveDocument(processed);



// Process from URL and add to RAG

const urlProcessed = await processor.processDocumentFromUrl('https://example.com/doc.html');

await rag.documentStore.saveDocument(urlProcessed);



// Now query across all documents

const answer = await rag.query('What information is available?');





🌐 With Web Interface

javascript

const rag = new RAGSystem({

    // ... configuration

    server: { port: 3000, enableWebUI: true }

});



await rag.initialize();

await rag.startServer();

// Visit http://localhost:3000





📖 Documentation



- 📚 Complete Package Documentation - Full API reference and examples

- 🔧 Integration Guide - Step-by-step integration examples

- 🎯 Examples - Ready-to-run examples



⚡ Quick Examples



Run the included examples:

bash

Basic usage example

npm run example:basic



Web server example  

npm run example:server



Advanced integration example

npm run example:advanced



Usage patterns overview

npm run example:patterns





🛠️ Development & Contributing



For local development and contributions:



$3



- Node.js v18+ 

- PostgreSQL v12+ with pgvector extension

- OpenAI API Key



$3

bash

Clone and install

git clone https://github.com/yourusername/rag-system-pgvector.git

cd rag-system-pgvector

npm install



Configure environment

cp .env.example .env

Edit .env with your credentials



Initialize database

npm run setup



Start development

npm run dev

$3

bash

Run examples

npm run example:basic



Run with web interface

npm run example:server

bash

curl -X POST http://localhost:3000/documents/upload \

  -F "document=@path/to/your/document.pdf" \

  -F "title=My Document"





#### Process Document from File Path

bash

curl -X POST http://localhost:3000/documents/process \

  -H "Content-Type: application/json" \

  -d '{

    "filePath": "/path/to/document.pdf",

    "title": "My Document"

  }'





#### Search/Query

bash

curl -X POST http://localhost:3000/search \

  -H "Content-Type: application/json" \

  -d '{

    "query": "What is the main topic of the document?",

    "sessionId": "optional-session-id"

  }'





#### Get All Documents

bash

curl http://localhost:3000/documents





#### Get Specific Document

bash

curl http://localhost:3000/documents/{document-id}





#### Delete Document

bash

curl -X DELETE http://localhost:3000/documents/{document-id}





$3



#### Process Documents from Directory

bash

npm run process-docs /path/to/documents/folder





#### Interactive Search

bash

npm run search





#### Single Query Search

bash

npm run search "Your question here"





🏗️ Architecture



$3



1. Document Processor (

src/utils/documentProcessor.js

)

   - Extracts text from various file formats

   - Splits documents into chunks with configurable overlap

   - Generates embeddings using OpenAI



2. Document Store (

src/services/documentStore.js

)

   - Manages document and chunk storage in PostgreSQL

   - Performs vector similarity search using pgvector

   - Handles CRUD operations



3. RAG Workflow (

src/workflows/ragWorkflow.js

)

   - LangGraph-based workflow orchestration

   - Three-step process: Retrieve → Rerank → Generate

   - Supports conversational context



4. API Server (

src/index.js

)

   - Express.js REST API

   - File upload handling

   - Conversation session management



$3

sql

-- Documents table

CREATE TABLE documents (

  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

  title VARCHAR(255) NOT NULL,

  content TEXT NOT NULL,

  file_path VARCHAR(500),

  file_type VARCHAR(50),

  metadata JSONB DEFAULT '{}',

  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP

);



-- Document chunks with embeddings

CREATE TABLE document_chunks (

  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

  document_id UUID REFERENCES documents(id) ON DELETE CASCADE,

  chunk_index INTEGER NOT NULL,

  content TEXT NOT NULL,

  embedding vector(1536),

  metadata JSONB DEFAULT '{}',

  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP

);



-- Search sessions for tracking

CREATE TABLE search_sessions (

  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

  query TEXT NOT NULL,

  results JSONB,

  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP

);



-- Chat Sessions for conversation persistence (NEW)

CREATE TABLE chat_sessions (

  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

  session_id VARCHAR(255) UNIQUE NOT NULL,

  user_id VARCHAR(255),

  knowledgebot_id VARCHAR(255),

  history JSONB DEFAULT '[]'::jsonb,

  metadata JSONB DEFAULT '{}'::jsonb,

  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

  last_activity TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

  message_count INTEGER DEFAULT 0

);



-- Indexes for chat sessions

CREATE INDEX idx_chat_sessions_session_id ON chat_sessions(session_id);

CREATE INDEX idx_chat_sessions_user_id ON chat_sessions(user_id);

CREATE INDEX idx_chat_sessions_knowledgebot_id ON chat_sessions(knowledgebot_id);

CREATE INDEX idx_chat_sessions_last_activity ON chat_sessions(last_activity);

$3

mermaid

graph TD

    A[Query Input] --> B[Retrieve Node]

    B --> C[Rerank Node]

    C --> D[Generate Node]

    D --> E[Response Output]

    

    B --> F[Vector Search]

    F --> G[Similar Chunks]

    

    C --> H[Score Ranking]

    H --> I[Top Chunks]

    

    D --> J[LLM Generation]

    J --> K[Contextual Response]





🔧 Configuration



The RAG system is highly configurable. You can customize every aspect of its behavior through the constructor configuration object.



$3

javascript

import RAGSystem from 'rag-system-pgvector';

import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';



const rag = new RAGSystem({

  // ========================================

  // 1. Database Configuration (Required)

  // ========================================

  database: {

    host: 'localhost',              // Database host

    port: 5432,                     // Database port

    database: 'rag_db',             // Database name

    username: 'postgres',           // Database user

    password: 'your_password',      // Database password

    

    // Connection Pool Settings

    max: 10,                        // Max connections in pool

    min: 0,                         // Min connections in pool

    maxUses: Infinity,              // Max uses per connection

    allowExitOnIdle: false,         // Allow pool to close when idle

    maxLifetimeSeconds: 0,          // Max connection lifetime (0 = unlimited)

    idleTimeoutMillis: 10000        // Idle timeout (10 seconds)

  },



  // ========================================

  // 2. AI Provider Configuration (Required)

  // ========================================

  embeddings: new OpenAIEmbeddings({

    openAIApiKey: process.env.OPENAI_API_KEY,

    modelName: 'text-embedding-ada-002'

  }),

  

  llm: new ChatOpenAI({

    openAIApiKey: process.env.OPENAI_API_KEY,

    modelName: 'gpt-4',

    temperature: 0.7

  }),



  // ========================================

  // 3. Embedding Configuration

  // ========================================

  embeddingDimensions: 1536,        // Dimensions for embeddings

                                    // OpenAI ada-002: 1536

                                    // HuggingFace MiniLM: 384

                                    // Anthropic: varies



  // ========================================

  // 4. Vector Store Configuration

  // ========================================

  vectorStore: {

    tableName: 'document_chunks_vector',

    vectorColumnName: 'embedding',

    contentColumnName: 'content',

    metadataColumnName: 'metadata'

  },



  // ========================================

  // 5. Document Processing Configuration

  // ========================================

  processing: {

    chunkSize: 1000,                // Characters per chunk

    chunkOverlap: 200               // Overlap between chunks

  },



  // ========================================

  // 6. Chat History Configuration (NEW)

  // ========================================

  chatHistory: {

    enabled: true,                  // Enable chat history feature

    maxMessages: 20,                // Max messages before management kicks in

    maxTokens: 3000,                // Max tokens in chat history

    summarizeThreshold: 30,         // Trigger summarization after N messages

    keepRecentCount: 10,            // Recent messages to preserve

    alwaysKeepFirst: true,          // Always keep conversation starter

    persistSessions: true,          // Store sessions in database

    sessionTimeout: 3600000         // Session timeout (1 hour in ms)

  }

});



await rag.initialize();





$3



#### 1. Database Configuration



Controls PostgreSQL connection and pool behavior:

javascript

database: {

  host: 'localhost',              // Where PostgreSQL is running

  port: 5432,                     // PostgreSQL port (default: 5432)

  database: 'rag_db',             // Your database name

  username: 'postgres',           // Database user

  password: 'your_password',      // User password

  

  // Pool Settings (Advanced)

  max: 10,                        // Maximum concurrent connections

  min: 0,                         // Minimum idle connections

  idleTimeoutMillis: 10000        // Close idle connections after 10s

}





Best Practices:

- Use environment variables for sensitive data

- Set

max

 based on your application's concurrency needs

- Monitor connection pool usage in production



#### 2. AI Provider Configuration



Specify your embedding and language model providers:



OpenAI Example:

javascript

import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';



embeddings: new OpenAIEmbeddings({

  openAIApiKey: process.env.OPENAI_API_KEY,

  modelName: 'text-embedding-ada-002'

}),



llm: new ChatOpenAI({

  openAIApiKey: process.env.OPENAI_API_KEY,

  modelName: 'gpt-4',

  temperature: 0.7

})





Anthropic Example:

javascript

import { OpenAIEmbeddings } from '@langchain/openai';

import { ChatAnthropic } from '@langchain/anthropic';



embeddings: new OpenAIEmbeddings({

  openAIApiKey: process.env.OPENAI_API_KEY,

  modelName: 'text-embedding-ada-002'

}),



llm: new ChatAnthropic({

  anthropicApiKey: process.env.ANTHROPIC_API_KEY,

  modelName: 'claude-3-sonnet-20240229',

  temperature: 0.7

})





Local Models Example:

javascript

import { HuggingFaceTransformersEmbeddings } from '@langchain/community/embeddings/hf_transformers';

import { Ollama } from '@langchain/community/llms/ollama';



embeddings: new HuggingFaceTransformersEmbeddings({

  modelName: 'sentence-transformers/all-MiniLM-L6-v2'

}),



llm: new Ollama({

  baseUrl: 'http://localhost:11434',

  model: 'llama2'

})





#### 3. Embedding Dimensions



Match this to your embedding model's output dimensions:



| Model | Dimensions | Provider |

|-------|------------|----------|

| text-embedding-ada-002 | 1536 | OpenAI |

| all-MiniLM-L6-v2 | 384 | HuggingFace |

| text-embedding-3-small | 1536 | OpenAI |

| text-embedding-3-large | 3072 | OpenAI |

javascript

embeddingDimensions: 1536  // Must match your embedding model





Important: If you change embedding models, you must recreate the database schema!



#### 4. Vector Store Configuration



Customize the vector store table structure:

javascript

vectorStore: {

  tableName: 'document_chunks_vector',    // Table name for vectors

  vectorColumnName: 'embedding',          // Column for embeddings

  contentColumnName: 'content',           // Column for text content

  metadataColumnName: 'metadata'          // Column for metadata

}





Most users can use the defaults.



#### 5. Document Processing



Control how documents are chunked:

javascript

processing: {

  chunkSize: 1000,      // Characters per chunk (500-2000 recommended)

  chunkOverlap: 200     // Overlap between chunks (10-20% of chunkSize)

}





Guidelines:

- Small chunks (500): Better precision, more chunks, higher cost

- Large chunks (2000): Better context, fewer chunks, lower cost

- Overlap: Prevents context loss at boundaries (typically 10-20%)



Examples:

javascript

// For technical documentation (needs precision)

processing: { chunkSize: 800, chunkOverlap: 150 }



// For books/long content (needs context)

processing: { chunkSize: 1500, chunkOverlap: 300 }



// For code documentation (needs structure)

processing: { chunkSize: 1000, chunkOverlap: 200 }





#### 6. Chat History Configuration (NEW in v2.3.0)



Control conversation history management:

javascript

chatHistory: {

  enabled: true,                  // Enable/disable chat history

  maxMessages: 20,                // Start management after N messages

  maxTokens: 3000,                // Maximum tokens in history

  summarizeThreshold: 30,         // Summarize after N messages

  keepRecentCount: 10,            // Recent messages to always keep

  alwaysKeepFirst: true,          // Keep conversation starter

  persistSessions: true,          // Store in database

  sessionTimeout: 3600000         // 1 hour timeout (in milliseconds)

}





Chat History Options Explained:



-

enabled

: Master switch for chat history feature

-

maxMessages

: Soft limit before history management activates

-

maxTokens

: Hard limit on token count (prevents API errors)

-

summarizeThreshold

: When to trigger LLM-based summarization

-

keepRecentCount

: Recent messages to preserve during summarization

-

alwaysKeepFirst

: Preserve conversation context from the beginning

-

persistSessions

: Save sessions to database for persistence

-

sessionTimeout

: Milliseconds before session is considered inactive



Preset Configurations:

javascript

// Minimal (cost-effective)

chatHistory: {

  enabled: true,

  maxMessages: 10,

  maxTokens: 1500,

  summarizeThreshold: 15,

  keepRecentCount: 5,

  persistSessions: false

}



// Balanced (recommended)

chatHistory: {

  enabled: true,

  maxMessages: 20,

  maxTokens: 3000,

  summarizeThreshold: 30,

  keepRecentCount: 10,

  persistSessions: true

}



// Maximum context (for complex conversations)

chatHistory: {

  enabled: true,

  maxMessages: 40,

  maxTokens: 6000,

  summarizeThreshold: 50,

  keepRecentCount: 20,

  persistSessions: true

}



// Disabled (for single-shot queries)

chatHistory: {

  enabled: false

}





$3



Create a

.env

 file for sensitive configuration:

env

Database

DB_HOST=localhost

DB_PORT=5432

DB_NAME=rag_db

DB_USER=postgres

DB_PASSWORD=your_secure_password



OpenAI

OPENAI_API_KEY=sk-...



Anthropic (optional)

ANTHROPIC_API_KEY=sk-ant-...



Azure (optional)

AZURE_OPENAI_API_KEY=...

AZURE_OPENAI_ENDPOINT=https://...



Processing (optional)

CHUNK_SIZE=1000

CHUNK_OVERLAP=200

EMBEDDING_DIMENSIONS=1536





Then use in your code:

javascript

import 'dotenv/config';



const rag = new RAGSystem({

  database: {

    host: process.env.DB_HOST,

    port: parseInt(process.env.DB_PORT),

    database: process.env.DB_NAME,

    username: process.env.DB_USER,

    password: process.env.DB_PASSWORD

  },

  embeddings: new OpenAIEmbeddings({

    openAIApiKey: process.env.OPENAI_API_KEY

  }),

  llm: new ChatOpenAI({

    openAIApiKey: process.env.OPENAI_API_KEY

  }),

  embeddingDimensions: parseInt(process.env.EMBEDDING_DIMENSIONS || '1536')

});





$3



You can also configure behavior at query time:

javascript

const result = await rag.query('Your question', {

  // Filtering

  userId: 'user_123',               // Filter by user

  knowledgebotId: 'bot_456',        // Filter by bot

  filter: { category: 'tech' },     // Custom metadata filters

  

  // Retrieval

  limit: 10,                        // Number of chunks to retrieve

  threshold: 0.5,                   // Similarity threshold (0-1)

  

  // Chat History

  chatHistory: previousHistory,     // Previous conversation

  maxHistoryLength: 15,             // Override default history length

  sessionId: 'session_789',         // Session identifier

  persistSession: true,             // Save session to database

  

  // Context

  context: additionalContext,       // Extra context to include

  metadata: { source: 'api' }       // Custom metadata

});





$3



1. Security: Never hardcode API keys or passwords

2. Environment-Specific: Use different configs for dev/staging/prod

3. Performance: Monitor and adjust based on usage patterns

4. Cost: Balance context size with API costs

5. Testing: Test with different configurations to find optimal settings



📊 Performance Optimization



$3



The system creates optimized indexes:

sql

-- For vector similarity search

CREATE INDEX idx_document_chunks_embedding 

ON document_chunks USING ivfflat (embedding vector_cosine_ops)

WITH (lists = 100);



-- For document relationships

CREATE INDEX idx_document_chunks_document_id 

ON document_chunks(document_id);





$3



- Recursive Character Text Splitter: Preserves semantic boundaries

- Configurable overlap: Ensures context continuity

- Multiple separators: Prioritizes paragraph, sentence, then word boundaries



🧪 Testing



$3

bash

Create test documents directory

mkdir test-docs



Add some test files (PDF, DOCX, TXT, etc.)

Then process them

npm run process-docs ./test-docs

$3

bash

Interactive search

npm run search



Or single query

npm run search "What is machine learning?"





🔍 Troubleshooting



$3



1. pgvector extension not found

sql

   -- Install pgvector extension

   CREATE EXTENSION IF NOT EXISTS vector;





2. OpenAI API quota exceeded

   - Check your OpenAI API usage

   - Consider using alternative embedding models



3. Large document processing fails

   - Increase chunk size or reduce document size

   - Check memory limits



4. Poor search results

   - Lower similarity threshold

   - Adjust chunk size and overlap

   - Verify document content quality



$3



Enable verbose logging by setting:

env

NODE_ENV=development

🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Commit your changes
4. Push to the branch
5. Create a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

- LangChain for the excellent AI/ML framework
- pgvector for vector similarity search
- OpenAI for embedding and language models

📚 Additional Resources

- RAG Best Practices
- pgvector Documentation
- LangGraph Documentation
- OpenAI Embeddings Guide