Official TypeScript/JavaScript SDK for CortexDB - Multi-modal RAG Platform with advanced document processing
npm install @dooor-ai/cortexdb```
██████╗ ██████╗ ██████╗ ██████╗ ██████╗
██╔══██╗██╔═══██╗██╔═══██╗██╔═══██╗██╔══██╗
██║ ██║██║ ██║██║ ██║██║ ██║██████╔╝
██║ ██║██║ ██║██║ ██║██║ ██║██╔══██╗
██████╔╝╚██████╔╝╚██████╔╝╚██████╔╝██║ ██║
╚═════╝ ╚═════╝ ╚═════╝ ╚═════╝ ╚═╝ ╚═╝
Official TypeScript/JavaScript SDK for CortexDB


---
CortexDB is a multi-modal RAG (Retrieval Augmented Generation) platform that combines traditional database capabilities with vector search and advanced document processing. It enables you to:
- Store structured and unstructured data in a unified database
- Automatically extract text from documents (PDF, DOCX, XLSX) using Docling
- Generate embeddings for semantic search using various providers (OpenAI, Gemini, etc.)
- Perform hybrid search combining filters with vector similarity
- Build RAG applications with automatic chunking and vectorization
CortexDB handles the complex infrastructure of vector databases (Qdrant), object storage (MinIO), and traditional databases (PostgreSQL) behind a simple API.
- Multi-modal document processing: Upload PDFs, DOCX, XLSX files and automatically extract text with OCR fallback
- Semantic search: Vector-based search using embeddings from OpenAI, Gemini, or custom providers
- Automatic chunking: Smart text splitting optimized for RAG applications
- Flexible schema: Define collections with typed fields (string, number, boolean, file, array)
- Hybrid queries: Combine exact filters with semantic search
- Storage control: Choose where each field is stored (PostgreSQL, Qdrant, MinIO)
- Type-safe: Full TypeScript support with comprehensive type definitions
- Modern API: Async/await using native fetch (Node.js 18+)
- Infra management: Database (client.databases) and embedding provider (client.embeddingProviders) APIs built-in
- 🆕 TypeScript Decorators: Define schemas using decorators (like TypeORM) with full IDE support - see Schema Decorators Guide
`bash`
npm install @dooor-ai/cortexdb
Or with yarn:
`bash`
yarn add @dooor-ai/cortexdb
Or with pnpm:
`bash`
pnpm add @dooor-ai/cortexdb
`typescript
import { CortexClient, FieldType, StoreLocation } from '@dooor-ai/cortexdb';
async function main() {
// Initialize with database in connection string
const client = new CortexClient('cortexdb://my-api-key@localhost:8000/production');
// Create a collection with vectorization enabled
await client.collections.create(
'documents',
[
{ name: 'title', type: FieldType.STRING },
{ name: 'content', type: FieldType.TEXT, vectorize: true },
{ name: 'published_at', type: FieldType.DATETIME, store_in: [StoreLocation.POSTGRES] }
],
'your-embedding-provider-id' // Required when vectorize=true
// database parameter is optional here since we set 'production' as default
);
// Create a record
const record = await client.records.create('documents', {
title: 'Introduction to AI',
content: 'Artificial intelligence is transforming how we build software...'
});
// Semantic search - finds relevant content by meaning, not just keywords
const results = await client.records.search(
'documents',
'How is AI changing software development?',
undefined, // filters
10 // limit - database parameter optional since we have default
);
results.results.forEach(result => {
console.log(Score: ${result.score.toFixed(4)});Title: ${result.record.data.title}
console.log();Content: ${result.record.data.content}\n
console.log();
});
await client.close();
}
main();
`
The SDK becomes fully type-safe once you apply your YAML schema with the Dooor CLI:
`bash`
npx dooor schema apply # reads dooor/schemas by default and generates types in dooor/generated/
This command creates dooor/generated/cortex-schema.ts and automatically augments the SDK types. After the file exists in your project, you can keep importing CortexClient from @dooor-ai/cortexdb; TypeScript will infer the fields/collections defined in your YAML. Invalid field names or missing required properties inside client.records.create('my_collection', {...}) now trigger compile-time errors, Prisma-style.
If you need an explicit factory, the generated file also exports createCortexClient() and TypedCortexClient helpers.
> ℹ️ The CLI also drops a lightweight .d.ts shim in node_modules/@dooor-ai/cortexdb/generated/schema.d.ts, so TypeScript picks up your schema automatically—no need to tweak tsconfig.json.
Once the schema is generated, you can call collections with property access instead of passing strings:
`ts
// Fully typed
const record = await client.records.tool_calls.create({
chatId: "chat-123",
description: "RAG invocation summary",
createdAt: new Date().toISOString(),
});
// String form still available when you need something dynamic
await client.records.create("tool_calls", {
chatId,
description,
createdAt,
});
`
`typescript
import { CortexClient } from '@dooor-ai/cortexdb';
// Using connection string with database (recommended)
const client = new CortexClient('cortexdb://my-api-key@localhost:8000/production');
// Without database in connection string (must pass database to each method)
const client = new CortexClient('cortexdb://my-api-key@localhost:8000');
// Production (HTTPS auto-detected)
const client = new CortexClient('cortexdb://my-key@api.cortexdb.com/production');
// Using options object (alternative)
const client = new CortexClient({
baseUrl: 'http://localhost:8000',
apiKey: 'your-api-key',
database: 'production', // Optional: set default database
timeout: 1800000, // Optional: override timeout (default = 30 min to cover large uploads)
waitUntilComplete: true, // Optional: keep SDK waiting for async ingestion to finish (default = true)
});
`
Connection String Format:
cortexdb://[api_key@]host[:port][/database]
Benefits:
- Single string configuration
- Easy to store in environment variables
- Familiar pattern (like PostgreSQL, MongoDB, Redis)
- Auto-detects HTTP vs HTTPS
- Optional database specification for multi-tenant isolation
Database Parameter:
- If you specify a database in the connection string or options, it becomes the default for all operations
- You can override the default database on a per-method basis
- If no default database is set, you must pass the database parameter to each method
Large documents (PDFs, DOCXs, etc.) are ingested asynchronously to avoid timeouts. When you call client.records.create(...) the gateway now responds immediately with a payload like:
`json`
{
"id": "rec_123",
"status": "pending",
"processing_state": {
"record_id": "rec_123",
"status": "pending",
"processed_chunks": 0,
"total_chunks": 0
}
}
By default the SDK keeps polling the processing_state endpoint until the background worker finishes and only then resolves with the final CreateRecordResponse. That preserves backward compatibility with existing backends that expect a fully processed record once create() returns.
You can control this behavior:
`ts
// Return immediately (HTTP 202) and poll manually later
const pending = await client.records.create(
'documents',
{ title: 'Async', content: '...' },
undefined,
{ waitUntilComplete: false }
);
// Later in your workflow…
const status = await client.records.getStatus('documents', pending.id);
if (status?.status === 'completed') {
const finalRecord = await client.records.waitForCompletion('documents', pending.id);
}
`
Useful options:
- waitUntilComplete (default true): let the SDK poll automatically.pollingIntervalMs
- (default 5000): change how often the SDK checks status.timeoutMs
- (default 30 min): upper bound for the auto-poll loop.
Under the hood the SDK calls GET /records/{id}/status until the worker updates the processing_state to completed or failed. You can also call that endpoint directly via client.records.getStatus(...) to drive custom progress indicators.
`typescript
// Create database
await client.databases.create({ name: 'ai_docs', description: 'Knowledge base' });
// List databases
const databases = await client.databases.list();
// Delete database
await client.databases.delete('ai_docs');
`
`typescript
await client.embeddingProviders.create({
name: 'Gemini Flash',
provider: 'gemini',
embedding_model: 'models/text-embedding-004',
api_key: process.env.GEMINI_API_KEY!,
});
const providers = await client.embeddingProviders.list();
`
Collections define the schema for your data. Each collection can have multiple fields with different types and storage options.
`typescript
import { FieldType, StoreLocation } from '@dooor-ai/cortexdb';
// Create collection with vectorization (database required)
const collection = await client.collections.create(
'articles',
[
{
name: 'title',
type: FieldType.STRING
},
{
name: 'content',
type: FieldType.TEXT,
vectorize: true // Enable semantic search on this field
},
{
name: 'year',
type: FieldType.INT,
store_in: [StoreLocation.POSTGRES, StoreLocation.QDRANT_PAYLOAD]
}
],
'embedding-provider-id', // Required when any field has vectorize=true
'production' // Database name (or omit if default database is set)
);
// List collections (uses default database if set, or pass specific database)
const collections = await client.collections.list('production');
// Get collection schema
const schema = await client.collections.get('articles', 'production');
// Delete collection and all its records
await client.collections.delete('articles', 'production');
// If you set a default database in the client, you can omit it:
const client = new CortexClient('cortexdb://key@host:8000/production');
const collections = await client.collections.list(); // Uses 'production'
`
Records are the actual data stored in collections. They must match the collection schema.
`typescript
import fs from 'node:fs';
// Create record (with optional file upload and database)
const created = await client.records.create(
'articles',
{
title: 'Machine Learning Basics',
content: 'Machine learning is a subset of AI focused on learning from data...',
year: 2024,
},
{
attachment: fs.readFileSync('ml-intro.pdf'),
},
'production' // Database name
);
// Get record by ID
const fetched = await client.records.get('articles', created.id, 'production');
// Update record
const updated = await client.records.update('articles', created.id, {
year: 2025,
}, 'production');
// Delete record
await client.records.delete('articles', created.id, 'production');
// List records with filters/pagination
const results = await client.records.list('articles', {
limit: 10,
offset: 0,
filters: { year: { $gte: 2023 } },
});
`
#### Tags (metadata)
Tags are stored as metadata (not part of your schema) and preserve casing. Updating tags replaces the entire set.
Limits: max 10 tags per record, each tag up to 50 characters.
`typescript
// Create with tags (options param; database optional)
const tagged = await client.records.create(
'articles',
{
title: 'Mercado outlook',
content: 'Q1 macro trends and earnings guidance...',
},
undefined,
{ tags: ['Mercado', 'Q1-2024'] }
);
// Update tags (replaces existing tags; pass [] to clear)
await client.records.update(
'articles',
tagged.id,
{ year: 2025 },
{ tags: ['Mercado', 'Q2-2024'] }
);
// Filter by tags
const withTag = await client.records.list('articles', {
filters: { tags: { $in: ['Mercado'] } },
});
// If your schema already has a "tags" field, use "__tags" to filter metadata tags instead.
`
Install the CLI (recommended in devDependencies):
`bash`
npm install --save-dev dooor
Use the unified dooor CLI to synchronize declarative schemas.
Also install the "Dooor Tools" extension in VS Code/Cursor for real-time validation (Open VSX).
`bashCheck differences between local YAML and CortexDB
npx dooor schema diff --dir dooor/schemas
$3
After synchronizing the schema, the CLI generates
dooor/generated/cortex-schema.ts with derived types. Provide this schema to the SDK to get Prisma-like autocomplete and validation:`ts
import { CortexClient } from '@dooor-ai/cortexdb';
import type {
CortexGeneratedSchema,
CollectionCreateInput,
} from '../dooor/generated/cortex-schema';const client = new CortexClient(
process.env.CORTEXDB_CONNECTION!,
);
const payload: CollectionCreateInput<'tool_calls'> = {
chatId,
workspaceId,
toolName,
description,
toolOutput,
createdAt: new Date().toISOString(),
};
await client.records.create('tool_calls', payload);
`Generics propagate to
records.update, records.list, records.get, and records.search. If you prefer the old dynamic mode, instantiate new CortexClient() without the generic parameter.Set
CORTEXDB_CONNECTION (e.g., cortexdb://key@host:8000) or the CORTEXDB_BASE_URL + CORTEXDB_API_KEY variables before running commands. If no directory is specified, the CLI automatically looks in dooor/schemas.To avoid repeating flags, configure
dooor/config.yaml at the project root:`yaml
cortexdb:
connection: env(CORTEXDB_CONNECTION)
defaultEmbeddingProvider: default-providerschema:
dir: dooor/schemas
typesOut: dooor/generated/cortex-schema.ts
`You can override with
dooor/config.local.yaml or point to another path via DOOOR_CONFIG.$3
Semantic search finds records by meaning, not just exact keyword matches. It uses vector embeddings to understand context.
`typescript
// Basic semantic search
const results = await client.records.search(
'articles',
'machine learning fundamentals',
undefined,
10
);// Search with filters - combine semantic search with exact matches
const filteredResults = await client.records.search(
'articles',
'neural networks',
{
year: 2024,
category: 'AI'
},
5
);
// Process results - ordered by relevance score
filteredResults.results.forEach(result => {
console.log(
Score: ${result.score.toFixed(4)}); // Higher = more relevant
console.log(Title: ${result.record.data.title});
console.log(Year: ${result.record.data.year});
});
`$3
CortexDB can process documents and automatically extract text for vectorization.
`typescript
// Create collection with file field
await client.collections.create(
'documents',
[
{ name: 'title', type: FieldType.STRING },
{
name: 'document',
type: FieldType.FILE,
vectorize: true // Extract text and create embeddings
}
],
'embedding-provider-id'
);// Note: File upload support is currently available in the REST API
// TypeScript SDK file upload will be added in a future version
`$3
`typescript
// Exact match filters
const results = await client.records.list('articles', {
filters: {
category: 'technology',
published: true,
year: 2024
}
});// Combine multiple filters
const filtered = await client.records.list('articles', {
filters: {
year: 2024,
category: 'AI',
author: 'John Doe'
},
limit: 20
});
`Error Handling
The SDK provides specific error types for different failure scenarios.
`typescript
import {
CortexDBError,
CortexDBNotFoundError,
CortexDBValidationError,
CortexDBConnectionError,
CortexDBTimeoutError
} from '@dooor-ai/cortexdb';try {
const record = await client.records.get('articles', 'invalid-id');
} catch (error) {
if (error instanceof CortexDBNotFoundError) {
console.log('Record not found');
} else if (error instanceof CortexDBValidationError) {
console.log('Invalid data:', error.message);
} else if (error instanceof CortexDBConnectionError) {
console.log('Connection failed:', error.message);
} else if (error instanceof CortexDBTimeoutError) {
console.log('Request timed out:', error.message);
} else if (error instanceof CortexDBError) {
console.log('General error:', error.message);
}
}
`Examples
examples/ directory for complete working examples:quickstart.ts - Complete walkthrough of SDK features
- search.ts - Semantic search with filters and providers
- basic.ts - Basic CRUD operationsRun examples:
`bash
npx ts-node -O '{"module":"commonjs"}' examples/quickstart.ts
`Development
$3
`bash
Clone repository
git clone https://github.com/yourusername/cortexdb
cd cortexdb/clients/typescriptInstall dependencies
npm installBuild
npm run build
`$3
`bash
Build TypeScript
npm run buildBuild in watch mode
npm run build:watchClean build artifacts
npm run cleanLint code
npm run lintFormat code
npm run format
`Requirements
- Node.js >= 18.0.0 (for native fetch support)
- CortexDB gateway running locally or remotely
- Embedding provider configured (OpenAI, Gemini, etc.) if using vectorization
Architecture
CortexDB integrates multiple technologies:
- PostgreSQL: Stores structured data and metadata
- Qdrant: Vector database for semantic search
- MinIO: Object storage for files
- Docling: Advanced document processing and text extraction
The SDK abstracts this complexity into a simple, unified API.
Advanced RAG Strategies (v0.4.0+)
CortexDB now supports multiple RAG strategies to improve search quality and relevance. Choose the strategy that best fits your use case:
$3
- SIMPLE: Basic vector similarity search (default)
- MULTI_QUERY: Generate multiple query variations and combine results using Reciprocal Rank Fusion
- HYDE: Generate hypothetical documents and use them for improved retrieval
- RERANK: Use LLM to rerank search results by relevance
- FUSION: Combine multi-query expansion with LLM reranking
- CONTEXTUAL_QUERY: Reformulate queries based on conversation context
$3
Before using advanced strategies, configure an AI provider:
`typescript
// Create an AI provider for query expansion/reranking
const aiProvider = await client.aiProviders.create({
name: "Gemini Flash",
provider: "gemini",
api_key: "your-gemini-api-key",
model: "gemini-2.5-flash",
enabled: true,
});// List providers
const providers = await client.aiProviders.list();
// Update provider
await client.aiProviders.update(aiProvider.id, {
model: "gemini-2.0-flash",
});
`$3
`typescript
import { RAGStrategy } from '@dooor-ai/cortexdb';// Simple search (default)
const simpleResults = await client.records.searchAdvanced('documents', {
query: 'What is machine learning?',
limit: 10,
strategy: RAGStrategy.SIMPLE,
});
// Multi-query with automatic query expansion
const multiQueryResults = await client.records.searchAdvanced('documents', {
query: 'What is machine learning?',
limit: 10,
strategy: RAGStrategy.MULTI_QUERY,
strategyConfig: {
num_queries: 5, // Generate 5 query variations
},
aiProviderName: "Gemini Flash", // Use provider by name
});
// HyDE: Generate hypothetical document for better retrieval
const hydeResults = await client.records.searchAdvanced('documents', {
query: 'Explain neural networks',
limit: 10,
strategy: RAGStrategy.HYDE,
strategyConfig: {
document_length: 200, // Length of hypothetical document
},
aiProviderName: "Gemini Flash",
});
// Rerank: Use LLM to reorder results by relevance
const rerankResults = await client.records.searchAdvanced('documents', {
query: 'Benefits of deep learning',
limit: 10,
strategy: RAGStrategy.RERANK,
strategyConfig: {
initial_k: 50, // Fetch 50 results then rerank to top 10
},
aiProviderName: "Gemini Flash",
});
// Fusion: Best of both worlds (multi-query + reranking)
const fusionResults = await client.records.searchAdvanced('documents', {
query: 'How does AI work?',
limit: 10,
strategy: RAGStrategy.FUSION,
strategyConfig: {
num_queries: 5,
initial_k: 50,
},
aiProviderName: "Gemini Flash",
});
// Contextual: Reformulate query based on conversation history
const contextualResults = await client.records.searchAdvanced('documents', {
query: 'What about its applications?',
limit: 10,
strategy: RAGStrategy.CONTEXTUAL_QUERY,
strategyConfig: {
context: [
'Previous: What is machine learning?',
'Answer: Machine learning is a subset of AI...',
],
},
aiProviderName: "Gemini Flash",
});
// Access results
fusionResults.results.forEach(result => {
console.log(
Score: ${result.score});
console.log(Content: ${result.record.content});
console.log(Strategy used: ${fusionResults.strategy_used});
});
`$3
The advanced search is also available on collection delegates:
`typescript
// Using the facade pattern
const results = await client.records.documents.searchAdvanced({
query: 'Machine learning applications',
strategy: RAGStrategy.FUSION,
aiProviderName: "Gemini Flash",
});
`$3
For finding documents by exact keyword matches (not semantic), use
findByKeyword:`typescript
// Find all documents containing "contract" in any text field
const results = await client.records.findByKeyword('documents', {
keywords: ['contract'],
});// Find documents containing ALL keywords (AND)
const allKeywords = await client.records.findByKeyword('documents', {
keywords: ['contract', 'signature'],
matchAll: true, // must contain BOTH keywords
});
// Find documents containing ANY keyword (OR) - default
const anyKeyword = await client.records.findByKeyword('documents', {
keywords: ['contract', 'agreement'],
matchAll: false,
});
// Search only in specific fields
const inFields = await client.records.findByKeyword('documents', {
keywords: ['quarterly'],
fields: ['title', 'content'], // only search these fields
});
// Combine with filters and tags
const filtered = await client.records.findByKeyword('documents', {
keywords: ['report'],
filters: {
status: 'published',
__tags: { $in: ['finance'] }
},
limit: 50,
offset: 0,
});
`Response:
`typescript
{
results: [{ id, record, files?, previews?, tags? }],
total: number,
keywords: string[],
match_all: boolean
}
``- SIMPLE: Fastest, use for basic semantic search
- MULTI_QUERY: 5x slower than simple (generates 5 queries)
- HYDE: Similar to multi-query, good for questions
- RERANK: Moderate cost, great for accuracy improvement
- FUSION: Highest cost and latency, best quality
- CONTEXTUAL_QUERY: Use for conversational interfaces
For more details, see RAG Strategies Documentation.
MIT License - see LICENSE for details.
- CortexDB Python SDK - Python client for CortexDB
- CortexDB Documentation - Complete platform documentation