A REST wrapper for SAP AI Core Vector API with document grounding capabilities
npm install @timangames/vector-grounding-servicesemantic-chunking library
bash
git clone
cd vector-grounding-service
`
2. Install dependencies:
`bash
npm install
`
3. Configure environment variables:
`bash
cp .env.example .env
`
Edit .env with your SAP AI Core service key:
`env
SAP AI Core Configuration
SAP_AI_CORE_RESOURCE_GROUP=your-resource-group
SAP AI Core Service Key (complete JSON from SAP BTP)
AICORE_SERVICE_KEY='{
"clientid": "your-client-id",
"clientsecret": "your-client-secret",
"url": "https://your-auth-url.authentication.region.hana.ondemand.com",
"identityzone": "your-identity-zone",
"identityzoneid": "your-identity-zone-id",
"appname": "your-app-name",
"credential-type": "binding-secret",
"serviceurls": {
"AI_API_URL": "https://api.ai.prod.region.aws.ml.hana.ondemand.com"
}
}'
Server Configuration
PORT=3000
NODE_ENV=development
`
4. Start the service:
`bash
npm start
`
The service will be available at http://localhost:3000
Using as an npm Package
You can also use this service as a library in your own Node.js projects:
$3
`bash
npm install @timangames/vector-grounding-service
`
$3
This package supports both ES Modules and CommonJS for maximum compatibility. See COMPATIBILITY.md for detailed usage instructions.
#### ES Module Usage (Recommended)
`javascript
import { DocumentProcessor, SapAiService } from '@timangames/vector-grounding-service';
// Create instances
const documentProcessor = new DocumentProcessor();
const sapAiService = new SapAiService();
// Process a file
const result = await documentProcessor.processFileForGrounding(file);
// Upload to SAP AI Core
const uploadResult = await sapAiService.createDocumentsInBatches(
collectionId,
result.documents
);
`
#### CommonJS Usage
`javascript
const { load } = require('@timangames/vector-grounding-service');
async function main() {
const { DocumentProcessor, SapAiService } = await load();
const documentProcessor = new DocumentProcessor();
const sapAiService = new SapAiService();
// Use the services...
}
`
$3
#### Method 1: Import Individual Classes
`javascript
import { DocumentProcessor, SapAiService } from '@timangames/vector-grounding-service';
// Create instances
const documentProcessor = new DocumentProcessor();
const sapAiService = new SapAiService();
// Process a file
const result = await documentProcessor.processFileForGrounding(file);
// Upload to SAP AI Core
const uploadResult = await sapAiService.createDocumentsInBatches(
collectionId,
result.documents
);
`
#### Method 2: Use Convenience Function
`javascript
import { createVectorGroundingService } from '@timangames/vector-grounding-service';
const service = createVectorGroundingService();
// Process and upload in one step
const result = await service.processAndUpload(file, collectionId, {
chunkingOptions: {
maxTokenSize: 2000,
similarityThreshold: 0.3
},
batchOptions: {
initialBatchSize: 3,
maxPayloadSizeMB: 2
}
});
`
#### Method 3: Default Import
`javascript
import VectorGroundingService from '@timangames/vector-grounding-service';
const { DocumentProcessor, SapAiService } = VectorGroundingService;
const documentProcessor = new DocumentProcessor();
`
$3
#### DocumentProcessor Class
- processFile(file) - Extract text from PDF, DOCX, TXT, CSV, and XLSX/XLS files
- chunkText(textContent, fileName, options) - Chunk text using semantic chunking
- processFileForGrounding(file, chunkingOptions) - Complete file processing pipeline
- getSupportedTypes() - Get list of supported file types
- isValidFileType(filename) - Check if file type is supported
#### SapAiService Class
- createCollection(collectionData) - Create a new collection
- getCollections(options) - Get all collections
- getCollection(collectionId) - Get specific collection
- deleteCollection(collectionId) - Delete a collection
- createDocuments(collectionId, documents) - Create documents
- createDocumentsInBatches(collectionId, documents, options) - Batch upload with error handling
- getDocuments(collectionId, options) - Get documents in collection
- getDocument(collectionId, documentId) - Get specific document
- updateDocument(collectionId, documentId, documentData) - Update a document
- deleteDocument(collectionId, documentId) - Delete a document
- vectorSearch(query, options) - Perform vector search
$3
The file object should have this structure (compatible with multer):
`javascript
const file = {
originalname: 'document.pdf',
buffer: Buffer.from(fileContent),
size: fileSize
};
`
$3
When using as a library, make sure to set these environment variables in your project:
`env
SAP_AI_CORE_RESOURCE_GROUP=your-resource-group
DEFAULT_MAX_TOKEN_SIZE=2000
DEFAULT_SIMILARITY_THRESHOLD=0.3
DEFAULT_EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2
MAX_CHUNKS_PER_DOCUMENT=50
MAX_DOCUMENT_SIZE_MB=5
`
$3
See example-usage.js and test-library.js in the repository for complete working examples.
Authentication
The service uses the official SAP AI SDK which automatically handles authentication using the service key provided in the AICORE_SERVICE_KEY environment variable. No additional authentication setup is required.
The SDK will:
- Automatically extract credentials from the service key
- Handle OAuth2 token management
- Manage token refresh automatically
- Pass the AI-Resource-Group header correctly
API Documentation
$3
#### Create Collection
`http
POST /api/collections
Content-Type: application/json
{
"title": "My Document Collection",
"embeddingConfig": {
"modelName": "text-embedding-3-small"
},
"metadata": [
{
"key": "purpose",
"value": ["knowledge-base"]
}
]
}
`
#### Get All Collections
`http
GET /api/collections?$top=10&$skip=0&$count=true
`
#### Get Collection
`http
GET /api/collections/{collectionId}
`
#### Delete Collection
`http
DELETE /api/collections/{collectionId}
`
$3
#### Upload Documents
`http
POST /api/collections/{collectionId}/documents
Content-Type: multipart/form-data
files: [file1.pdf, file2.docx, file3.txt]
maxTokenSize: 500
similarityThreshold: 0.5
embeddingModel: Xenova/all-MiniLM-L6-v2
`
#### Get Documents
`http
GET /api/collections/{collectionId}/documents?$top=10&$skip=0
`
#### Get Document
`http
GET /api/collections/{collectionId}/documents/{documentId}
`
#### Update Document
`http
PATCH /api/collections/{collectionId}/documents/{documentId}
Content-Type: application/json
{
"documents": [
{
"id": "document-id",
"metadata": [
{
"key": "updated",
"value": ["2024-01-01"]
}
],
"chunks": [
{
"content": "Updated document content",
"metadata": [
{
"key": "index",
"value": ["1"]
}
]
}
]
}
]
}
`
#### Delete Document
`http
DELETE /api/collections/{collectionId}/documents/{documentId}
`
$3
#### Vector Search
`http
POST /api/search
Content-Type: application/json
{
"query": "What is machine learning?",
"filters": [
{
"id": "my-filter",
"collectionIds": ["*"],
"configuration": {
"maxChunkCount": 10
},
"documentMetadata": [
{
"key": "fileType",
"value": ["pdf"]
}
]
}
]
}
`
Configuration
$3
| Variable | Description | Default | Required |
|----------|-------------|---------|----------|
| SAP_AI_CORE_RESOURCE_GROUP | SAP AI Core resource group | - | ā
|
| AICORE_SERVICE_KEY | Complete SAP AI Core service key JSON | - | ā
|
| PORT | Server port | 3000 | ā |
| NODE_ENV | Environment | development | ā |
| MAX_FILE_SIZE | Max upload size (bytes) | 10485760 (10MB) | ā |
| DEFAULT_MAX_TOKEN_SIZE | Default chunk size | 500 | ā |
| DEFAULT_SIMILARITY_THRESHOLD | Default similarity threshold | 0.5 | ā |
| DEFAULT_EMBEDDING_MODEL | Default embedding model | Xenova/all-MiniLM-L6-v2 | ā |
$3
1. Go to SAP BTP Cockpit
2. Navigate to your subaccount
3. Go to Services ā Instances and Subscriptions
4. Find your AI Core service instance
5. Click on the service key
6. Copy the entire JSON object and paste it as the value for AICORE_SERVICE_KEY
$3
- PDF (.pdf) - Extracted using pdf-parse with dynamic imports
- Word Documents (.docx) - Extracted using mammoth
- Text Files (.txt) - Direct text processing
- CSV Files (.csv) - Parsed and converted to structured text format
- Excel Files (.xlsx, .xls) - All sheets processed with structured data extraction
$3
The service uses semantic chunking to create meaningful text segments:
- maxTokenSize: Maximum tokens per chunk (50-2500)
- similarityThreshold: Similarity threshold for grouping (0.1-1.0)
- embeddingModel: Model for generating embeddings
- returnEmbedding: Include embeddings in response
- returnTokenLength: Include token counts
Development
$3
`bash
npm run dev
`
$3
| Script | Command | Description |
|--------|---------|-------------|
| npm start | node src/index.js | Start the production server |
| npm run dev | node --watch src/index.js | Start development server with auto-reload |
| npm test | Currently shows placeholder message | Run tests (use individual test files instead) |
| npm run upload | node scripts/upload-files.js | Upload files via command line |
| npm run upload:example | node scripts/example-upload.js | Run example upload script |
| npm run upload:help | Show upload script help | Display upload command options |
Note: The npm test command currently shows a placeholder. Use the individual test files directly (e.g., node test-service.js) for testing functionality.
$3
`
src/
āāā controllers/ # Request handlers
ā āāā collectionsController.js
ā āāā documentsController.js
ā āāā searchController.js
āāā services/ # Business logic
ā āāā sapAiService.js # SAP AI SDK wrapper
ā āāā documentProcessor.js # File processing & chunking
āāā middleware/ # Express middleware
ā āāā errorHandler.js
ā āāā requestLogger.js
āāā routes/ # API routes
ā āāā collections.js
ā āāā documents.js
ā āāā search.js
āāā index.js # Application entry point
scripts/ # Utility scripts
āāā upload-files.js # Command-line file upload utility
āāā example-upload.js # Programmatic upload example
test/ # Test data and files
āāā data/ # Sample test files
Root-level test files
āāā test-service.js # Main service test suite
āāā test-batching-approach.js # Batching functionality tests
āāā test-improved-chunking.js # Chunking algorithm tests
āāā test-large-file.js # Large file processing tests
āāā test-optimized-large-file.js # Optimized large file tests
āāā test-splitting-configurations.js # Text splitting configuration tests
`
$3
- SAP AI Service: Simple wrapper for SAP AI SDK document grounding operations using the official pattern
- Document Processor: Handles file parsing and semantic chunking with dynamic imports
- Controllers: Handle HTTP requests and responses with proper error handling
- Middleware: Error handling, logging, and request validation
$3
The service uses the official SAP AI SDK pattern:
`javascript
import { VectorApi } from '@sap-ai-sdk/document-grounding';
// Simple API calls with AI-Resource-Group header
const response = await VectorApi.getAllCollections(options, {
'AI-Resource-Group': resourceGroup
}).execute();
`
Key benefits:
- ā
Automatic authentication handling
- ā
Built-in token management
- ā
Proper header passing
- ā
No complex destination configuration needed
Error Handling
The service provides comprehensive error handling with structured responses:
`json
{
"error": "Bad Request",
"message": "Collection title is required",
"timestamp": "2024-01-01T12:00:00.000Z",
"path": "/api/collections",
"method": "POST"
}
`
Common error scenarios:
- Invalid service key configuration
- Missing AI-Resource-Group
- File upload size limits
- Unsupported file types
- SAP AI Core API errors
Health Check
Check service health:
`http
GET /health
`
Response:
`json
{
"status": "healthy",
"timestamp": "2024-01-01T12:00:00.000Z",
"version": "1.0.0"
}
`
File Upload Scripts
The service includes convenient scripts for uploading files programmatically:
$3
Upload files directly from the command line:
`bash
Using npm scripts
npm run upload [file2] [file3] ...
npm run upload:help # Show help
Direct usage
node scripts/upload-files.js [file2] [file3] ...
`
#### Examples
`bash
Upload single file
npm run upload my-collection-id ./documents/report.pdf
Upload multiple files
npm run upload my-collection-id ./docs/file1.pdf ./docs/file2.docx ./docs/file3.txt
Upload with custom chunking options
node scripts/upload-files.js my-collection-id ./docs/report.pdf \
--max-token-size 1024 \
--similarity-threshold 0.7 \
--service-url http://localhost:3000
Upload with strict validation
node scripts/upload-files.js my-collection-id ./docs/*.pdf --strict
`
#### Command Line Options
| Option | Description | Default |
|--------|-------------|---------|
| --max-token-size | Maximum tokens per chunk | 512 |
| --similarity-threshold | Similarity threshold (0-1) | 0.5 |
| --embedding-model | ONNX embedding model | - |
| --service-url | Service URL | http://localhost:3000 |
| --strict | Fail on any file validation error | false |
| --help | Show help message | - |
$3
For programmatic usage, use the example script:
`bash
Run the example (customize first)
npm run upload:example
Show customization instructions
node scripts/example-upload.js --instructions
`
#### Customizing the Example Script
Edit scripts/example-upload.js:
`javascript
// Update file paths
const filePaths = [
'./path/to/your/document.pdf',
'./path/to/your/report.docx',
'./path/to/your/notes.txt'
];
// Set your collection ID
const collectionId = 'my-actual-collection-id';
// Configure options
const options = {
maxTokenSize: 512,
similarityThreshold: 0.5,
serviceUrl: 'http://localhost:3000',
strict: false
};
`
#### Using the FileUploader Class
You can also import and use the FileUploader class directly:
`javascript
import FileUploader from './scripts/upload-files.js';
const uploader = new FileUploader('http://localhost:3000');
const result = await uploader.uploadFiles(
'my-collection-id',
['./file1.pdf', './file2.docx'],
{
maxTokenSize: 512,
similarityThreshold: 0.5
}
);
console.log(Uploaded ${result.processedFiles.length} files);
`
$3
- ā
File Validation: Checks file existence and supported types
- ā
Progress Tracking: Shows upload progress with emojis
- ā
Error Handling: Graceful error handling with troubleshooting tips
- ā
Flexible Options: Customizable chunking and processing options
- ā
Batch Processing: Upload multiple files in a single request
- ā
Detailed Results: Shows chunk counts and processing results
$3
The upload scripts support the same file types as the API:
- PDF (.pdf)
- Word Documents (.docx)
- Text Files (.txt)
- CSV Files (.csv)
- Excel Files (.xlsx, .xls)
Testing
$3
The service includes a comprehensive test suite to verify functionality:
`bash
Run the main test suite
node test-service.js
`
This test suite will:
- ā
Check service health
- ā
Test collection creation and retrieval
- ā
Test document upload with sample content
- ā
Test vector search functionality
$3
The project includes specialized test files for different aspects:
`bash
Test chunking algorithms and configurations
node test-improved-chunking.js
node test-splitting-configurations.js
Test large file processing
node test-large-file.js
node test-optimized-large-file.js
Test batching approaches
node test-batching-approach.js
`
$3
Test the service manually with curl:
`bash
Get all collections
curl -X GET "http://localhost:3000/api/collections" \
-H "Content-Type: application/json"
Get specific collection
curl -X GET "http://localhost:3000/api/collections/{collection-id}" \
-H "Content-Type: application/json"
Create collection
curl -X POST "http://localhost:3000/api/collections" \
-H "Content-Type: application/json" \
-d '{
"title": "Test Collection",
"embeddingConfig": {
"modelName": "text-embedding-3-small"
}
}'
Upload files using curl
curl -X POST "http://localhost:3000/api/collections/{collection-id}/documents" \
-F "files=@./documents/report.pdf" \
-F "files=@./documents/manual.docx" \
-F "maxTokenSize=512" \
-F "similarityThreshold=0.5"
`
$3
Sample test files are available in the test/data/ directory for testing document processing functionality.
TypeScript Support
This package includes comprehensive TypeScript definitions with strict type safety to provide excellent developer experience when using the Vector Grounding Service in TypeScript projects. All types are fully specified without using any types, ensuring maximum type safety and IntelliSense support.
Installation
The TypeScript definitions are included automatically when you install the package:
`bash
npm install @timangames/vector-grounding-service
`
For development, you may also want to install the Express and Multer types:
`bash
npm install --save-dev @types/express @types/multer
`
Usage Examples
$3
`typescript
import {
DocumentProcessor,
SapAiService,
createVectorGroundingService,
ChunkingOptions,
BatchOptions
} from '@timangames/vector-grounding-service';
// Create service instances
const documentProcessor = new DocumentProcessor();
const sapAiService = new SapAiService();
const vectorService = createVectorGroundingService();
`
$3
`typescript
import { MulterFile, ChunkingOptions, ProcessingResult } from '@timangames/vector-grounding-service';
async function processDocument(file: MulterFile): Promise {
const processor = new DocumentProcessor();
const chunkingOptions: ChunkingOptions = {
maxTokenSize: 2000,
similarityThreshold: 0.3,
logging: true
};
return await processor.processFileForGrounding(file, chunkingOptions);
}
`
$3
`typescript
import {
SapAiService,
CollectionData,
BatchOptions,
ApiResponse,
Collection,
GetCollectionsOptions,
VectorSearchResponse,
SearchOptions
} from '@timangames/vector-grounding-service';
async function uploadDocuments() {
const sapAi = new SapAiService();
// Create a collection with full type safety
const collectionData: CollectionData = {
name: "My Document Collection",
description: "Collection for RAG documents",
metadata: {
"project": "my-project",
"version": "1.0"
},
embeddingModel: "text-embedding-ada-002",
vectorDimension: 1536
};
// Properly typed API response
const collectionResponse: ApiResponse = await sapAi.createCollection(collectionData);
const collection = collectionResponse.data;
// Upload documents with batch options
const batchOptions: BatchOptions = {
initialBatchSize: 3,
maxPayloadSizeMB: 2,
maxRetries: 3
};
const result = await sapAi.createDocumentsInBatches(
collection.id,
documents,
batchOptions
);
// Perform vector search with typed options and response
const searchOptions: SearchOptions = {
collectionIds: [collection.id],
limit: 10,
threshold: 0.7,
includeMetadata: true
};
const searchResponse: ApiResponse = await sapAi.vectorSearch(
"search query",
searchOptions
);
console.log( Found ${searchResponse.data.results.length} results);
}
`
$3
`typescript
import { createVectorGroundingService, MulterFile } from '@timangames/vector-grounding-service';
async function processAndUpload(file: MulterFile, collectionId: string) {
const service = createVectorGroundingService();
const result = await service.processAndUpload(file, collectionId, {
chunkingOptions: {
maxTokenSize: 1500,
similarityThreshold: 0.4
},
batchOptions: {
initialBatchSize: 5,
maxPayloadSizeMB: 3
}
});
console.log(Processed ${result.chunkCount} chunks in ${result.documentCount} documents);
console.log(Uploaded ${result.uploadResult.totalDocuments} documents in ${result.uploadResult.totalBatches} batches);
}
`
Available Types
$3
- MulterFile - File object interface compatible with Express.js Multer
- FileMetadata - Metadata extracted from processed files with specific PDF/DOCX info types
- PdfInfo - Structured PDF metadata (title, author, dates, etc.)
- DocxMessage - Typed DOCX processing messages (warning/error/info)
- ChunkingOptions - Configuration for text chunking
- BatchOptions - Configuration for batch uploading
- ProcessingResult - Result of file processing operations
- BatchUploadResult - Result of batch upload operations with typed responses
$3
- SapDocument - Document structure for SAP AI Core
- SapChunk - Individual chunk within a document
- MetadataItem - Key-value metadata structure
- CollectionData - Collection creation data with metadata and embedding options
- Collection - Full collection object with timestamps and counts
- Document - Complete document object with metadata and chunks
- SearchOptions - Vector search configuration with specific options
- SearchResult - Individual search result with score and metadata
- VectorSearchResponse - Complete search response with results and execution info
$3
- ApiResponse - Generic API response wrapper with status and data
- DocumentCreationResponse - Response for document creation operations
- UpdateDocumentData - Data structure for document updates
$3
- VectorGroundingServiceConfig - Configuration for the convenience service
- GetCollectionsOptions - Options for fetching collections
- GetDocumentsOptions - Options for fetching documents
$3
- DocumentProcessor - File processing and chunking service
- SapAiService - SAP AI Core integration service with fully typed methods
- VectorGroundingService - Combined convenience service
Type Safety Features
ā
No any types - All interfaces use specific, well-defined types
ā
Strict API responses - All API methods return properly typed ApiResponse objects
ā
Comprehensive options - All configuration objects have specific properties
ā
Union types - Enums and literal types for better validation (e.g., 'warning' | 'error' | 'info')
ā
Generic types - Flexible yet type-safe generic interfaces where appropriate
ā
Optional properties - Clear distinction between required and optional fields
Type Validation
The package includes scripts to validate type definitions:
`bash
Validate type definitions
npm run build:types
Types are automatically validated before publishing
npm run prepublishOnly
`
Note: The type definitions are manually crafted for maximum type safety and are validated (not auto-generated) to ensure they remain accurate and comprehensive.
Development
When contributing to this package, ensure that:
1. All public methods have proper type annotations
2. Interfaces are exported for consumer use
3. Type definitions are tested with npx tsc --noEmit
4. Documentation includes TypeScript examples
The type definitions are located in types/index.d.ts and are automatically included when the package is installed.
Troubleshooting
$3
1. "Missing header parameter 'AI-Resource-Group'"
- Ensure SAP_AI_CORE_RESOURCE_GROUP is set in your .env file
- Verify the resource group exists in your SAP AI Core instance
2. Authentication errors
- Check that AICORE_SERVICE_KEY contains valid JSON
- Verify the service key has the necessary permissions
- Ensure the service key is not expired
3. "Service key not found"
- Make sure AICORE_SERVICE_KEY is properly formatted JSON
- Check that all required fields are present in the service key
4. File upload errors
- Check file size limits (MAX_FILE_SIZE`)