Vector Grounding Service

A REST wrapper for SAP AI Core Vector API that simplifies document grounding for RAG (Retrieval-Augmented Generation) applications. This service handles document ingestion, semantic chunking, and vector indexing using the official SAP AI SDK.

🎉 Now available as an npm package! Use it as a library in your own projects or run it as a standalone REST API server.

Features

- 🚀 Easy Document Upload: Support for PDF, DOCX, TXT, CSV, and XLSX/XLS files
- 🧠 Semantic Chunking: High-quality text chunking using the semantic-chunking library
- 🔍 Vector Search: Powerful semantic search across your document collections
- 📊 Collection Management: Organize documents into logical collections
- 🛡️ Enterprise Ready: Built with SAP AI Core integration and security best practices
- 📝 Rich Metadata: Automatic extraction and storage of document metadata

Quick Start

$3

- Node.js 18+
- SAP AI Core instance with Document Grounding enabled
- Valid SAP AI Core service key

$3

1. Clone the repository:
``

bash

git clone 

cd vector-grounding-service





2. Install dependencies:

bash

npm install





3. Configure environment variables:

bash

cp .env.example .env





Edit

.env

 with your SAP AI Core service key:

env

SAP AI Core Configuration

SAP_AI_CORE_RESOURCE_GROUP=your-resource-group



SAP AI Core Service Key (complete JSON from SAP BTP)

AICORE_SERVICE_KEY='{

    "clientid": "your-client-id",

    "clientsecret": "your-client-secret", 

    "url": "https://your-auth-url.authentication.region.hana.ondemand.com",

    "identityzone": "your-identity-zone",

    "identityzoneid": "your-identity-zone-id",

    "appname": "your-app-name",

    "credential-type": "binding-secret",

    "serviceurls": {

        "AI_API_URL": "https://api.ai.prod.region.aws.ml.hana.ondemand.com"

    }

}'



Server Configuration

PORT=3000

NODE_ENV=development





4. Start the service:

bash

npm start





The service will be available at

http://localhost:3000





Using as an npm Package



You can also use this service as a library in your own Node.js projects:



$3

bash

npm install @timangames/vector-grounding-service





$3



This package supports both ES Modules and CommonJS for maximum compatibility. See COMPATIBILITY.md for detailed usage instructions.



#### ES Module Usage (Recommended)

javascript

import { DocumentProcessor, SapAiService } from '@timangames/vector-grounding-service';



// Create instances

const documentProcessor = new DocumentProcessor();

const sapAiService = new SapAiService();



// Process a file

const result = await documentProcessor.processFileForGrounding(file);



// Upload to SAP AI Core

const uploadResult = await sapAiService.createDocumentsInBatches(

  collectionId, 

  result.documents

);





#### CommonJS Usage

javascript

const { load } = require('@timangames/vector-grounding-service');



async function main() {

  const { DocumentProcessor, SapAiService } = await load();

  

  const documentProcessor = new DocumentProcessor();

  const sapAiService = new SapAiService();

  

  // Use the services...

}





$3



#### Method 1: Import Individual Classes

javascript

import { DocumentProcessor, SapAiService } from '@timangames/vector-grounding-service';



// Create instances

const documentProcessor = new DocumentProcessor();

const sapAiService = new SapAiService();



// Process a file

const result = await documentProcessor.processFileForGrounding(file);



// Upload to SAP AI Core

const uploadResult = await sapAiService.createDocumentsInBatches(

  collectionId, 

  result.documents

);





#### Method 2: Use Convenience Function

javascript

import { createVectorGroundingService } from '@timangames/vector-grounding-service';



const service = createVectorGroundingService();



// Process and upload in one step

const result = await service.processAndUpload(file, collectionId, {

  chunkingOptions: {

    maxTokenSize: 2000,

    similarityThreshold: 0.3

  },

  batchOptions: {

    initialBatchSize: 3,

    maxPayloadSizeMB: 2

  }

});





#### Method 3: Default Import

javascript

import VectorGroundingService from '@timangames/vector-grounding-service';



const { DocumentProcessor, SapAiService } = VectorGroundingService;

const documentProcessor = new DocumentProcessor();





$3



#### DocumentProcessor Class



-

processFile(file)

 - Extract text from PDF, DOCX, TXT, CSV, and XLSX/XLS files

-

chunkText(textContent, fileName, options)

 - Chunk text using semantic chunking

-

processFileForGrounding(file, chunkingOptions)

 - Complete file processing pipeline

-

getSupportedTypes()

 - Get list of supported file types

-

isValidFileType(filename)

 - Check if file type is supported



#### SapAiService Class



-

createCollection(collectionData)

 - Create a new collection

-

getCollections(options)

 - Get all collections

-

getCollection(collectionId)

 - Get specific collection

-

deleteCollection(collectionId)

 - Delete a collection

-

createDocuments(collectionId, documents)

 - Create documents

-

createDocumentsInBatches(collectionId, documents, options)

 - Batch upload with error handling

-

getDocuments(collectionId, options)

 - Get documents in collection

-

getDocument(collectionId, documentId)

 - Get specific document

-

updateDocument(collectionId, documentId, documentData)

 - Update a document

-

deleteDocument(collectionId, documentId)

 - Delete a document

-

vectorSearch(query, options)

 - Perform vector search



$3



The file object should have this structure (compatible with multer):

javascript

const file = {

  originalname: 'document.pdf',

  buffer: Buffer.from(fileContent),

  size: fileSize

};





$3



When using as a library, make sure to set these environment variables in your project:

env

SAP_AI_CORE_RESOURCE_GROUP=your-resource-group

DEFAULT_MAX_TOKEN_SIZE=2000

DEFAULT_SIMILARITY_THRESHOLD=0.3

DEFAULT_EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2

MAX_CHUNKS_PER_DOCUMENT=50

MAX_DOCUMENT_SIZE_MB=5





$3



See

example-usage.js and test-library.js

 in the repository for complete working examples.



Authentication



The service uses the official SAP AI SDK which automatically handles authentication using the service key provided in the

AICORE_SERVICE_KEY

 environment variable. No additional authentication setup is required.



The SDK will:

- Automatically extract credentials from the service key

- Handle OAuth2 token management

- Manage token refresh automatically

- Pass the

AI-Resource-Group

 header correctly



API Documentation



$3



#### Create Collection

http

POST /api/collections

Content-Type: application/json



{

  "title": "My Document Collection",

  "embeddingConfig": {

    "modelName": "text-embedding-3-small"

  },

  "metadata": [

    {

      "key": "purpose",

      "value": ["knowledge-base"]

    }

  ]

}





#### Get All Collections

http

GET /api/collections?$top=10&$skip=0&$count=true





#### Get Collection

http

GET /api/collections/{collectionId}





#### Delete Collection

http

DELETE /api/collections/{collectionId}





$3



#### Upload Documents

http

POST /api/collections/{collectionId}/documents

Content-Type: multipart/form-data



files: [file1.pdf, file2.docx, file3.txt]

maxTokenSize: 500

similarityThreshold: 0.5

embeddingModel: Xenova/all-MiniLM-L6-v2





#### Get Documents

http

GET /api/collections/{collectionId}/documents?$top=10&$skip=0





#### Get Document

http

GET /api/collections/{collectionId}/documents/{documentId}





#### Update Document

http

PATCH /api/collections/{collectionId}/documents/{documentId}

Content-Type: application/json



{

  "documents": [

    {

      "id": "document-id",

      "metadata": [

        {

          "key": "updated",

          "value": ["2024-01-01"]

        }

      ],

      "chunks": [

        {

          "content": "Updated document content",

          "metadata": [

            {

              "key": "index",

              "value": ["1"]

            }

          ]

        }

      ]

    }

  ]

}





#### Delete Document

http

DELETE /api/collections/{collectionId}/documents/{documentId}





$3



#### Vector Search

http

POST /api/search

Content-Type: application/json



{

  "query": "What is machine learning?",

  "filters": [

    {

      "id": "my-filter",

      "collectionIds": ["*"],

      "configuration": {

        "maxChunkCount": 10

      },

      "documentMetadata": [

        {

          "key": "fileType",

          "value": ["pdf"]

        }

      ]

    }

  ]

}





Configuration



$3



| Variable | Description | Default | Required |

|----------|-------------|---------|----------|

|

SAP_AI_CORE_RESOURCE_GROUP

 | SAP AI Core resource group | - | ✅ |

|

AICORE_SERVICE_KEY

 | Complete SAP AI Core service key JSON | - | ✅ |

|

PORT | Server port | 3000

 | ❌ |

|

NODE_ENV | Environment | development

 | ❌ |

|

MAX_FILE_SIZE | Max upload size (bytes) | 10485760

 (10MB) | ❌ |

|

DEFAULT_MAX_TOKEN_SIZE | Default chunk size | 500

 | ❌ |

|

DEFAULT_SIMILARITY_THRESHOLD | Default similarity threshold | 0.5

 | ❌ |

|

DEFAULT_EMBEDDING_MODEL | Default embedding model | Xenova/all-MiniLM-L6-v2

 | ❌ |



$3



1. Go to SAP BTP Cockpit

2. Navigate to your subaccount

3. Go to Services → Instances and Subscriptions

4. Find your AI Core service instance

5. Click on the service key

6. Copy the entire JSON object and paste it as the value for

AICORE_SERVICE_KEY





$3



- PDF (.pdf) - Extracted using pdf-parse with dynamic imports

- Word Documents (.docx) - Extracted using mammoth

- Text Files (.txt) - Direct text processing

- CSV Files (.csv) - Parsed and converted to structured text format

- Excel Files (.xlsx, .xls) - All sheets processed with structured data extraction



$3



The service uses semantic chunking to create meaningful text segments:



-

maxTokenSize

: Maximum tokens per chunk (50-2500)

-

similarityThreshold

: Similarity threshold for grouping (0.1-1.0)

-

embeddingModel

: Model for generating embeddings

-

returnEmbedding

: Include embeddings in response

-

returnTokenLength

: Include token counts



Development



$3

bash

npm run dev





$3



| Script | Command | Description |

|--------|---------|-------------|

|

npm start | node src/index.js

 | Start the production server |

|

npm run dev | node --watch src/index.js

 | Start development server with auto-reload |

|

npm test

 | Currently shows placeholder message | Run tests (use individual test files instead) |

|

npm run upload | node scripts/upload-files.js

 | Upload files via command line |

|

npm run upload:example | node scripts/example-upload.js

 | Run example upload script |

|

npm run upload:help

 | Show upload script help | Display upload command options |



Note: The

npm test command currently shows a placeholder. Use the individual test files directly (e.g., node test-service.js

) for testing functionality.



$3



src/

├── controllers/     # Request handlers

│   ├── collectionsController.js

│   ├── documentsController.js

│   └── searchController.js

├── services/        # Business logic

│   ├── sapAiService.js          # SAP AI SDK wrapper

│   └── documentProcessor.js     # File processing & chunking

├── middleware/      # Express middleware

│   ├── errorHandler.js

│   └── requestLogger.js

├── routes/          # API routes

│   ├── collections.js

│   ├── documents.js

│   └── search.js

└── index.js        # Application entry point



scripts/             # Utility scripts

├── upload-files.js  # Command-line file upload utility

└── example-upload.js # Programmatic upload example



test/               # Test data and files

└── data/           # Sample test files



Root-level test files

├── test-service.js                    # Main service test suite

├── test-batching-approach.js         # Batching functionality tests

├── test-improved-chunking.js         # Chunking algorithm tests

├── test-large-file.js               # Large file processing tests

├── test-optimized-large-file.js     # Optimized large file tests

└── test-splitting-configurations.js  # Text splitting configuration tests





$3



- SAP AI Service: Simple wrapper for SAP AI SDK document grounding operations using the official pattern

- Document Processor: Handles file parsing and semantic chunking with dynamic imports

- Controllers: Handle HTTP requests and responses with proper error handling

- Middleware: Error handling, logging, and request validation



$3



The service uses the official SAP AI SDK pattern:

javascript

import { VectorApi } from '@sap-ai-sdk/document-grounding';



// Simple API calls with AI-Resource-Group header

const response = await VectorApi.getAllCollections(options, {

  'AI-Resource-Group': resourceGroup

}).execute();





Key benefits:

- ✅ Automatic authentication handling

- ✅ Built-in token management

- ✅ Proper header passing

- ✅ No complex destination configuration needed



Error Handling



The service provides comprehensive error handling with structured responses:

json

{

  "error": "Bad Request",

  "message": "Collection title is required",

  "timestamp": "2024-01-01T12:00:00.000Z",

  "path": "/api/collections",

  "method": "POST"

}





Common error scenarios:

- Invalid service key configuration

- Missing AI-Resource-Group

- File upload size limits

- Unsupported file types

- SAP AI Core API errors



Health Check



Check service health:

http

GET /health





Response:

json

{

  "status": "healthy",

  "timestamp": "2024-01-01T12:00:00.000Z",

  "version": "1.0.0"

}





File Upload Scripts



The service includes convenient scripts for uploading files programmatically:



$3



Upload files directly from the command line:

bash

Using npm scripts

npm run upload   [file2] [file3] ...

npm run upload:help  # Show help



Direct usage

node scripts/upload-files.js   [file2] [file3] ...





#### Examples

bash

Upload single file

npm run upload my-collection-id ./documents/report.pdf



Upload multiple files

npm run upload my-collection-id ./docs/file1.pdf ./docs/file2.docx ./docs/file3.txt



Upload with custom chunking options

node scripts/upload-files.js my-collection-id ./docs/report.pdf \

  --max-token-size 1024 \

  --similarity-threshold 0.7 \

  --service-url http://localhost:3000



Upload with strict validation

node scripts/upload-files.js my-collection-id ./docs/*.pdf --strict





#### Command Line Options



| Option | Description | Default |

|--------|-------------|---------|

|

--max-token-size

 | Maximum tokens per chunk | 512 |

|

--similarity-threshold

 | Similarity threshold (0-1) | 0.5 |

|

--embedding-model

 | ONNX embedding model | - |

|

--service-url

 | Service URL | http://localhost:3000 |

|

--strict

 | Fail on any file validation error | false |

|

--help

 | Show help message | - |



$3



For programmatic usage, use the example script:

bash

Run the example (customize first)

npm run upload:example



Show customization instructions

node scripts/example-upload.js --instructions





#### Customizing the Example Script



Edit

scripts/example-upload.js

javascript

// Update file paths

const filePaths = [

  './path/to/your/document.pdf',

  './path/to/your/report.docx',

  './path/to/your/notes.txt'

];



// Set your collection ID

const collectionId = 'my-actual-collection-id';



// Configure options

const options = {

  maxTokenSize: 512,

  similarityThreshold: 0.5,

  serviceUrl: 'http://localhost:3000',

  strict: false

};





#### Using the FileUploader Class



You can also import and use the FileUploader class directly:

javascript

import FileUploader from './scripts/upload-files.js';



const uploader = new FileUploader('http://localhost:3000');



const result = await uploader.uploadFiles(

  'my-collection-id',

  ['./file1.pdf', './file2.docx'],

  {

    maxTokenSize: 512,

    similarityThreshold: 0.5

  }

);



console.log(

Uploaded ${result.processedFiles.length} files

);





$3



- ✅ File Validation: Checks file existence and supported types

- ✅ Progress Tracking: Shows upload progress with emojis

- ✅ Error Handling: Graceful error handling with troubleshooting tips

- ✅ Flexible Options: Customizable chunking and processing options

- ✅ Batch Processing: Upload multiple files in a single request

- ✅ Detailed Results: Shows chunk counts and processing results



$3



The upload scripts support the same file types as the API:

- PDF (.pdf)

- Word Documents (.docx) 

- Text Files (.txt)

- CSV Files (.csv)

- Excel Files (.xlsx, .xls)



Testing



$3



The service includes a comprehensive test suite to verify functionality:

bash

Run the main test suite

node test-service.js





This test suite will:

- ✅ Check service health

- ✅ Test collection creation and retrieval

- ✅ Test document upload with sample content

- ✅ Test vector search functionality



$3



The project includes specialized test files for different aspects:

bash

Test chunking algorithms and configurations

node test-improved-chunking.js

node test-splitting-configurations.js



Test large file processing

node test-large-file.js

node test-optimized-large-file.js



Test batching approaches

node test-batching-approach.js





$3



Test the service manually with curl:

bash

Get all collections

curl -X GET "http://localhost:3000/api/collections" \

  -H "Content-Type: application/json"



Get specific collection

curl -X GET "http://localhost:3000/api/collections/{collection-id}" \

  -H "Content-Type: application/json"



Create collection

curl -X POST "http://localhost:3000/api/collections" \

  -H "Content-Type: application/json" \

  -d '{

    "title": "Test Collection",

    "embeddingConfig": {

      "modelName": "text-embedding-3-small"

    }

  }'



Upload files using curl

curl -X POST "http://localhost:3000/api/collections/{collection-id}/documents" \

  -F "files=@./documents/report.pdf" \

  -F "files=@./documents/manual.docx" \

  -F "maxTokenSize=512" \

  -F "similarityThreshold=0.5"





$3



Sample test files are available in the

test/data/

 directory for testing document processing functionality.





TypeScript Support



This package includes comprehensive TypeScript definitions with strict type safety to provide excellent developer experience when using the Vector Grounding Service in TypeScript projects. All types are fully specified without using

any

 types, ensuring maximum type safety and IntelliSense support.



Installation



The TypeScript definitions are included automatically when you install the package:

bash

npm install @timangames/vector-grounding-service





For development, you may also want to install the Express and Multer types:

bash

npm install --save-dev @types/express @types/multer





Usage Examples



$3

typescript

import { 

  DocumentProcessor, 

  SapAiService, 

  createVectorGroundingService,

  ChunkingOptions,

  BatchOptions 

} from '@timangames/vector-grounding-service';



// Create service instances

const documentProcessor = new DocumentProcessor();

const sapAiService = new SapAiService();

const vectorService = createVectorGroundingService();

$3

typescript

import { MulterFile, ChunkingOptions, ProcessingResult } from '@timangames/vector-grounding-service';



async function processDocument(file: MulterFile): Promise {

  const processor = new DocumentProcessor();

  

  const chunkingOptions: ChunkingOptions = {

    maxTokenSize: 2000,

    similarityThreshold: 0.3,

    logging: true

  };

  

  return await processor.processFileForGrounding(file, chunkingOptions);

}

$3

typescript

import { 

  SapAiService, 

  CollectionData, 

  BatchOptions, 

  ApiResponse, 

  Collection,

  GetCollectionsOptions,

  VectorSearchResponse,

  SearchOptions

} from '@timangames/vector-grounding-service';



async function uploadDocuments() {

  const sapAi = new SapAiService();

  

  // Create a collection with full type safety

  const collectionData: CollectionData = {

    name: "My Document Collection",

    description: "Collection for RAG documents",

    metadata: {

      "project": "my-project",

      "version": "1.0"

    },

    embeddingModel: "text-embedding-ada-002",

    vectorDimension: 1536

  };

  

  // Properly typed API response

  const collectionResponse: ApiResponse = await sapAi.createCollection(collectionData);

  const collection = collectionResponse.data;

  

  // Upload documents with batch options

  const batchOptions: BatchOptions = {

    initialBatchSize: 3,

    maxPayloadSizeMB: 2,

    maxRetries: 3

  };

  

  const result = await sapAi.createDocumentsInBatches(

    collection.id, 

    documents, 

    batchOptions

  );

  

  // Perform vector search with typed options and response

  const searchOptions: SearchOptions = {

    collectionIds: [collection.id],

    limit: 10,

    threshold: 0.7,

    includeMetadata: true

  };

  

  const searchResponse: ApiResponse = await sapAi.vectorSearch(

    "search query",

    searchOptions

  );

  

  console.log(

Found ${searchResponse.data.results.length} results

);

}

$3

typescript

import { createVectorGroundingService, MulterFile } from '@timangames/vector-grounding-service';



async function processAndUpload(file: MulterFile, collectionId: string) {

  const service = createVectorGroundingService();

  

  const result = await service.processAndUpload(file, collectionId, {

    chunkingOptions: {

      maxTokenSize: 1500,

      similarityThreshold: 0.4

    },

    batchOptions: {

      initialBatchSize: 5,

      maxPayloadSizeMB: 3

    }

  });

  

  console.log(

Processed ${result.chunkCount} chunks in ${result.documentCount} documents

);

  console.log(

Uploaded ${result.uploadResult.totalDocuments} documents in ${result.uploadResult.totalBatches} batches

);

}





Available Types



$3



-

MulterFile

 - File object interface compatible with Express.js Multer

-

FileMetadata

 - Metadata extracted from processed files with specific PDF/DOCX info types

-

PdfInfo

 - Structured PDF metadata (title, author, dates, etc.)

-

DocxMessage

 - Typed DOCX processing messages (warning/error/info)

-

ChunkingOptions

 - Configuration for text chunking

-

BatchOptions

 - Configuration for batch uploading

-

ProcessingResult

 - Result of file processing operations

-

BatchUploadResult

 - Result of batch upload operations with typed responses



$3



-

SapDocument

 - Document structure for SAP AI Core

-

SapChunk

 - Individual chunk within a document

-

MetadataItem

 - Key-value metadata structure

-

CollectionData

 - Collection creation data with metadata and embedding options

-

Collection

 - Full collection object with timestamps and counts

-

Document

 - Complete document object with metadata and chunks

-

SearchOptions

 - Vector search configuration with specific options

-

SearchResult

 - Individual search result with score and metadata

-

VectorSearchResponse

 - Complete search response with results and execution info



$3



-

ApiResponse

 - Generic API response wrapper with status and data

-

DocumentCreationResponse

 - Response for document creation operations

-

UpdateDocumentData

 - Data structure for document updates



$3



-

VectorGroundingServiceConfig

 - Configuration for the convenience service

-

GetCollectionsOptions

 - Options for fetching collections

-

GetDocumentsOptions

 - Options for fetching documents



$3



-

DocumentProcessor

 - File processing and chunking service

-

SapAiService

 - SAP AI Core integration service with fully typed methods

-

VectorGroundingService

 - Combined convenience service



Type Safety Features



✅ No

any types

 - All interfaces use specific, well-defined types

✅ Strict API responses - All API methods return properly typed

ApiResponse

 objects

✅ Comprehensive options - All configuration objects have specific properties

✅ Union types - Enums and literal types for better validation (e.g.,

'warning' | 'error' | 'info'

)

✅ Generic types - Flexible yet type-safe generic interfaces where appropriate

✅ Optional properties - Clear distinction between required and optional fields



Type Validation



The package includes scripts to validate type definitions:

bash

Validate type definitions

npm run build:types



Types are automatically validated before publishing

npm run prepublishOnly





Note: The type definitions are manually crafted for maximum type safety and are validated (not auto-generated) to ensure they remain accurate and comprehensive.



Development



When contributing to this package, ensure that:



1. All public methods have proper type annotations

2. Interfaces are exported for consumer use

3. Type definitions are tested with

npx tsc --noEmit



4. Documentation includes TypeScript examples



The type definitions are located in

types/index.d.ts

 and are automatically included when the package is installed.



Troubleshooting



$3



1. "Missing header parameter 'AI-Resource-Group'"

   - Ensure

SAP_AI_CORE_RESOURCE_GROUP is set in your .env

 file

   - Verify the resource group exists in your SAP AI Core instance



2. Authentication errors

   - Check that

AICORE_SERVICE_KEY

 contains valid JSON

   - Verify the service key has the necessary permissions

   - Ensure the service key is not expired



3. "Service key not found"

   - Make sure

AICORE_SERVICE_KEY

 is properly formatted JSON

   - Check that all required fields are present in the service key



4. File upload errors

   - Check file size limits (

MAX_FILE_SIZE`)
- Verify file type is supported (see supported file types section)
- Ensure sufficient disk space

Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

License

This project is licensed under the ISC License.

Support

For issues and questions:
- Check the SAP AI Core Documentation
- Review the SAP AI SDK Documentation
- Check the official SAP AI SDK examples
- Open an issue in this repository

Vector Grounding Service

Features

Quick Start

$3

- Node.js 18+
- SAP AI Core instance with Document Grounding enabled
- Valid SAP AI Core service key

$3

1. Clone the repository:
``

bash

git clone 

cd vector-grounding-service





2. Install dependencies:

bash

npm install





3. Configure environment variables:

bash

cp .env.example .env





Edit

.env

 with your SAP AI Core service key:

env

SAP AI Core Configuration

SAP_AI_CORE_RESOURCE_GROUP=your-resource-group



SAP AI Core Service Key (complete JSON from SAP BTP)

AICORE_SERVICE_KEY='{

    "clientid": "your-client-id",

    "clientsecret": "your-client-secret", 

    "url": "https://your-auth-url.authentication.region.hana.ondemand.com",

    "identityzone": "your-identity-zone",

    "identityzoneid": "your-identity-zone-id",

    "appname": "your-app-name",

    "credential-type": "binding-secret",

    "serviceurls": {

        "AI_API_URL": "https://api.ai.prod.region.aws.ml.hana.ondemand.com"

    }

}'



Server Configuration

PORT=3000

NODE_ENV=development





4. Start the service:

bash

npm start





The service will be available at

http://localhost:3000





Using as an npm Package



You can also use this service as a library in your own Node.js projects:



$3

bash

npm install @timangames/vector-grounding-service





$3



This package supports both ES Modules and CommonJS for maximum compatibility. See COMPATIBILITY.md for detailed usage instructions.



#### ES Module Usage (Recommended)

javascript

import { DocumentProcessor, SapAiService } from '@timangames/vector-grounding-service';



// Create instances

const documentProcessor = new DocumentProcessor();

const sapAiService = new SapAiService();



// Process a file

const result = await documentProcessor.processFileForGrounding(file);



// Upload to SAP AI Core

const uploadResult = await sapAiService.createDocumentsInBatches(

  collectionId, 

  result.documents

);





#### CommonJS Usage

javascript

const { load } = require('@timangames/vector-grounding-service');



async function main() {

  const { DocumentProcessor, SapAiService } = await load();

  

  const documentProcessor = new DocumentProcessor();

  const sapAiService = new SapAiService();

  

  // Use the services...

}





$3



#### Method 1: Import Individual Classes

javascript

import { DocumentProcessor, SapAiService } from '@timangames/vector-grounding-service';



// Create instances

const documentProcessor = new DocumentProcessor();

const sapAiService = new SapAiService();



// Process a file

const result = await documentProcessor.processFileForGrounding(file);



// Upload to SAP AI Core

const uploadResult = await sapAiService.createDocumentsInBatches(

  collectionId, 

  result.documents

);





#### Method 2: Use Convenience Function

javascript

import { createVectorGroundingService } from '@timangames/vector-grounding-service';



const service = createVectorGroundingService();



// Process and upload in one step

const result = await service.processAndUpload(file, collectionId, {

  chunkingOptions: {

    maxTokenSize: 2000,

    similarityThreshold: 0.3

  },

  batchOptions: {

    initialBatchSize: 3,

    maxPayloadSizeMB: 2

  }

});





#### Method 3: Default Import

javascript

import VectorGroundingService from '@timangames/vector-grounding-service';



const { DocumentProcessor, SapAiService } = VectorGroundingService;

const documentProcessor = new DocumentProcessor();





$3



#### DocumentProcessor Class



-

processFile(file)

 - Extract text from PDF, DOCX, TXT, CSV, and XLSX/XLS files

-

chunkText(textContent, fileName, options)

 - Chunk text using semantic chunking

-

processFileForGrounding(file, chunkingOptions)

 - Complete file processing pipeline

-

getSupportedTypes()

 - Get list of supported file types

-

isValidFileType(filename)

 - Check if file type is supported



#### SapAiService Class



-

createCollection(collectionData)

 - Create a new collection

-

getCollections(options)

 - Get all collections

-

getCollection(collectionId)

 - Get specific collection

-

deleteCollection(collectionId)

 - Delete a collection

-

createDocuments(collectionId, documents)

 - Create documents

-

createDocumentsInBatches(collectionId, documents, options)

 - Batch upload with error handling

-

getDocuments(collectionId, options)

 - Get documents in collection

-

getDocument(collectionId, documentId)

 - Get specific document

-

updateDocument(collectionId, documentId, documentData)

 - Update a document

-

deleteDocument(collectionId, documentId)

 - Delete a document

-

vectorSearch(query, options)

 - Perform vector search



$3



The file object should have this structure (compatible with multer):

javascript

const file = {

  originalname: 'document.pdf',

  buffer: Buffer.from(fileContent),

  size: fileSize

};





$3



When using as a library, make sure to set these environment variables in your project:

env

SAP_AI_CORE_RESOURCE_GROUP=your-resource-group

DEFAULT_MAX_TOKEN_SIZE=2000

DEFAULT_SIMILARITY_THRESHOLD=0.3

DEFAULT_EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2

MAX_CHUNKS_PER_DOCUMENT=50

MAX_DOCUMENT_SIZE_MB=5





$3



See

example-usage.js and test-library.js

 in the repository for complete working examples.



Authentication



The service uses the official SAP AI SDK which automatically handles authentication using the service key provided in the

AICORE_SERVICE_KEY

 environment variable. No additional authentication setup is required.



The SDK will:

- Automatically extract credentials from the service key

- Handle OAuth2 token management

- Manage token refresh automatically

- Pass the

AI-Resource-Group

 header correctly



API Documentation



$3



#### Create Collection

http

POST /api/collections

Content-Type: application/json



{

  "title": "My Document Collection",

  "embeddingConfig": {

    "modelName": "text-embedding-3-small"

  },

  "metadata": [

    {

      "key": "purpose",

      "value": ["knowledge-base"]

    }

  ]

}





#### Get All Collections

http

GET /api/collections?$top=10&$skip=0&$count=true





#### Get Collection

http

GET /api/collections/{collectionId}





#### Delete Collection

http

DELETE /api/collections/{collectionId}





$3



#### Upload Documents

http

POST /api/collections/{collectionId}/documents

Content-Type: multipart/form-data



files: [file1.pdf, file2.docx, file3.txt]

maxTokenSize: 500

similarityThreshold: 0.5

embeddingModel: Xenova/all-MiniLM-L6-v2





#### Get Documents

http

GET /api/collections/{collectionId}/documents?$top=10&$skip=0





#### Get Document

http

GET /api/collections/{collectionId}/documents/{documentId}





#### Update Document

http

PATCH /api/collections/{collectionId}/documents/{documentId}

Content-Type: application/json



{

  "documents": [

    {

      "id": "document-id",

      "metadata": [

        {

          "key": "updated",

          "value": ["2024-01-01"]

        }

      ],

      "chunks": [

        {

          "content": "Updated document content",

          "metadata": [

            {

              "key": "index",

              "value": ["1"]

            }

          ]

        }

      ]

    }

  ]

}





#### Delete Document

http

DELETE /api/collections/{collectionId}/documents/{documentId}





$3



#### Vector Search

http

POST /api/search

Content-Type: application/json



{

  "query": "What is machine learning?",

  "filters": [

    {

      "id": "my-filter",

      "collectionIds": ["*"],

      "configuration": {

        "maxChunkCount": 10

      },

      "documentMetadata": [

        {

          "key": "fileType",

          "value": ["pdf"]

        }

      ]

    }

  ]

}





Configuration



$3



| Variable | Description | Default | Required |

|----------|-------------|---------|----------|

|

SAP_AI_CORE_RESOURCE_GROUP

 | SAP AI Core resource group | - | ✅ |

|

AICORE_SERVICE_KEY

 | Complete SAP AI Core service key JSON | - | ✅ |

|

PORT | Server port | 3000

 | ❌ |

|

NODE_ENV | Environment | development

 | ❌ |

|

MAX_FILE_SIZE | Max upload size (bytes) | 10485760

 (10MB) | ❌ |

|

DEFAULT_MAX_TOKEN_SIZE | Default chunk size | 500

 | ❌ |

|

DEFAULT_SIMILARITY_THRESHOLD | Default similarity threshold | 0.5

 | ❌ |

|

DEFAULT_EMBEDDING_MODEL | Default embedding model | Xenova/all-MiniLM-L6-v2

 | ❌ |



$3



1. Go to SAP BTP Cockpit

2. Navigate to your subaccount

3. Go to Services → Instances and Subscriptions

4. Find your AI Core service instance

5. Click on the service key

6. Copy the entire JSON object and paste it as the value for

AICORE_SERVICE_KEY





$3



- PDF (.pdf) - Extracted using pdf-parse with dynamic imports

- Word Documents (.docx) - Extracted using mammoth

- Text Files (.txt) - Direct text processing

- CSV Files (.csv) - Parsed and converted to structured text format

- Excel Files (.xlsx, .xls) - All sheets processed with structured data extraction



$3



The service uses semantic chunking to create meaningful text segments:



-

maxTokenSize

: Maximum tokens per chunk (50-2500)

-

similarityThreshold

: Similarity threshold for grouping (0.1-1.0)

-

embeddingModel

: Model for generating embeddings

-

returnEmbedding

: Include embeddings in response

-

returnTokenLength

: Include token counts



Development



$3

bash

npm run dev





$3



| Script | Command | Description |

|--------|---------|-------------|

|

npm start | node src/index.js

 | Start the production server |

|

npm run dev | node --watch src/index.js

 | Start development server with auto-reload |

|

npm test

 | Currently shows placeholder message | Run tests (use individual test files instead) |

|

npm run upload | node scripts/upload-files.js

 | Upload files via command line |

|

npm run upload:example | node scripts/example-upload.js

 | Run example upload script |

|

npm run upload:help

 | Show upload script help | Display upload command options |



Note: The

npm test command currently shows a placeholder. Use the individual test files directly (e.g., node test-service.js

) for testing functionality.



$3



src/

├── controllers/     # Request handlers

│   ├── collectionsController.js

│   ├── documentsController.js

│   └── searchController.js

├── services/        # Business logic

│   ├── sapAiService.js          # SAP AI SDK wrapper

│   └── documentProcessor.js     # File processing & chunking

├── middleware/      # Express middleware

│   ├── errorHandler.js

│   └── requestLogger.js

├── routes/          # API routes

│   ├── collections.js

│   ├── documents.js

│   └── search.js

└── index.js        # Application entry point



scripts/             # Utility scripts

├── upload-files.js  # Command-line file upload utility

└── example-upload.js # Programmatic upload example



test/               # Test data and files

└── data/           # Sample test files



Root-level test files

├── test-service.js                    # Main service test suite

├── test-batching-approach.js         # Batching functionality tests

├── test-improved-chunking.js         # Chunking algorithm tests

├── test-large-file.js               # Large file processing tests

├── test-optimized-large-file.js     # Optimized large file tests

└── test-splitting-configurations.js  # Text splitting configuration tests





$3



- SAP AI Service: Simple wrapper for SAP AI SDK document grounding operations using the official pattern

- Document Processor: Handles file parsing and semantic chunking with dynamic imports

- Controllers: Handle HTTP requests and responses with proper error handling

- Middleware: Error handling, logging, and request validation



$3



The service uses the official SAP AI SDK pattern:

javascript

import { VectorApi } from '@sap-ai-sdk/document-grounding';



// Simple API calls with AI-Resource-Group header

const response = await VectorApi.getAllCollections(options, {

  'AI-Resource-Group': resourceGroup

}).execute();





Key benefits:

- ✅ Automatic authentication handling

- ✅ Built-in token management

- ✅ Proper header passing

- ✅ No complex destination configuration needed



Error Handling



The service provides comprehensive error handling with structured responses:

json

{

  "error": "Bad Request",

  "message": "Collection title is required",

  "timestamp": "2024-01-01T12:00:00.000Z",

  "path": "/api/collections",

  "method": "POST"

}





Common error scenarios:

- Invalid service key configuration

- Missing AI-Resource-Group

- File upload size limits

- Unsupported file types

- SAP AI Core API errors



Health Check



Check service health:

http

GET /health





Response:

json

{

  "status": "healthy",

  "timestamp": "2024-01-01T12:00:00.000Z",

  "version": "1.0.0"

}





File Upload Scripts



The service includes convenient scripts for uploading files programmatically:



$3



Upload files directly from the command line:

bash

Using npm scripts

npm run upload   [file2] [file3] ...

npm run upload:help  # Show help



Direct usage

node scripts/upload-files.js   [file2] [file3] ...





#### Examples

bash

Upload single file

npm run upload my-collection-id ./documents/report.pdf



Upload multiple files

npm run upload my-collection-id ./docs/file1.pdf ./docs/file2.docx ./docs/file3.txt



Upload with custom chunking options

node scripts/upload-files.js my-collection-id ./docs/report.pdf \

  --max-token-size 1024 \

  --similarity-threshold 0.7 \

  --service-url http://localhost:3000



Upload with strict validation

node scripts/upload-files.js my-collection-id ./docs/*.pdf --strict





#### Command Line Options



| Option | Description | Default |

|--------|-------------|---------|

|

--max-token-size

 | Maximum tokens per chunk | 512 |

|

--similarity-threshold

 | Similarity threshold (0-1) | 0.5 |

|

--embedding-model

 | ONNX embedding model | - |

|

--service-url

 | Service URL | http://localhost:3000 |

|

--strict

 | Fail on any file validation error | false |

|

--help

 | Show help message | - |



$3



For programmatic usage, use the example script:

bash

Run the example (customize first)

npm run upload:example



Show customization instructions

node scripts/example-upload.js --instructions





#### Customizing the Example Script



Edit

scripts/example-upload.js

javascript

// Update file paths

const filePaths = [

  './path/to/your/document.pdf',

  './path/to/your/report.docx',

  './path/to/your/notes.txt'

];



// Set your collection ID

const collectionId = 'my-actual-collection-id';



// Configure options

const options = {

  maxTokenSize: 512,

  similarityThreshold: 0.5,

  serviceUrl: 'http://localhost:3000',

  strict: false

};





#### Using the FileUploader Class



You can also import and use the FileUploader class directly:

javascript

import FileUploader from './scripts/upload-files.js';



const uploader = new FileUploader('http://localhost:3000');



const result = await uploader.uploadFiles(

  'my-collection-id',

  ['./file1.pdf', './file2.docx'],

  {

    maxTokenSize: 512,

    similarityThreshold: 0.5

  }

);



console.log(

Uploaded ${result.processedFiles.length} files

);





$3



- ✅ File Validation: Checks file existence and supported types

- ✅ Progress Tracking: Shows upload progress with emojis

- ✅ Error Handling: Graceful error handling with troubleshooting tips

- ✅ Flexible Options: Customizable chunking and processing options

- ✅ Batch Processing: Upload multiple files in a single request

- ✅ Detailed Results: Shows chunk counts and processing results



$3



The upload scripts support the same file types as the API:

- PDF (.pdf)

- Word Documents (.docx) 

- Text Files (.txt)

- CSV Files (.csv)

- Excel Files (.xlsx, .xls)



Testing



$3



The service includes a comprehensive test suite to verify functionality:

bash

Run the main test suite

node test-service.js





This test suite will:

- ✅ Check service health

- ✅ Test collection creation and retrieval

- ✅ Test document upload with sample content

- ✅ Test vector search functionality



$3



The project includes specialized test files for different aspects:

bash

Test chunking algorithms and configurations

node test-improved-chunking.js

node test-splitting-configurations.js



Test large file processing

node test-large-file.js

node test-optimized-large-file.js



Test batching approaches

node test-batching-approach.js





$3



Test the service manually with curl:

bash

Get all collections

curl -X GET "http://localhost:3000/api/collections" \

  -H "Content-Type: application/json"



Get specific collection

curl -X GET "http://localhost:3000/api/collections/{collection-id}" \

  -H "Content-Type: application/json"



Create collection

curl -X POST "http://localhost:3000/api/collections" \

  -H "Content-Type: application/json" \

  -d '{

    "title": "Test Collection",

    "embeddingConfig": {

      "modelName": "text-embedding-3-small"

    }

  }'



Upload files using curl

curl -X POST "http://localhost:3000/api/collections/{collection-id}/documents" \

  -F "files=@./documents/report.pdf" \

  -F "files=@./documents/manual.docx" \

  -F "maxTokenSize=512" \

  -F "similarityThreshold=0.5"





$3



Sample test files are available in the

test/data/

 directory for testing document processing functionality.





TypeScript Support



This package includes comprehensive TypeScript definitions with strict type safety to provide excellent developer experience when using the Vector Grounding Service in TypeScript projects. All types are fully specified without using

any

 types, ensuring maximum type safety and IntelliSense support.



Installation



The TypeScript definitions are included automatically when you install the package:

bash

npm install @timangames/vector-grounding-service





For development, you may also want to install the Express and Multer types:

bash

npm install --save-dev @types/express @types/multer





Usage Examples



$3

typescript

import { 

  DocumentProcessor, 

  SapAiService, 

  createVectorGroundingService,

  ChunkingOptions,

  BatchOptions 

} from '@timangames/vector-grounding-service';



// Create service instances

const documentProcessor = new DocumentProcessor();

const sapAiService = new SapAiService();

const vectorService = createVectorGroundingService();

$3

typescript

import { MulterFile, ChunkingOptions, ProcessingResult } from '@timangames/vector-grounding-service';



async function processDocument(file: MulterFile): Promise {

  const processor = new DocumentProcessor();

  

  const chunkingOptions: ChunkingOptions = {

    maxTokenSize: 2000,

    similarityThreshold: 0.3,

    logging: true

  };

  

  return await processor.processFileForGrounding(file, chunkingOptions);

}

$3

typescript

import { 

  SapAiService, 

  CollectionData, 

  BatchOptions, 

  ApiResponse, 

  Collection,

  GetCollectionsOptions,

  VectorSearchResponse,

  SearchOptions

} from '@timangames/vector-grounding-service';



async function uploadDocuments() {

  const sapAi = new SapAiService();

  

  // Create a collection with full type safety

  const collectionData: CollectionData = {

    name: "My Document Collection",

    description: "Collection for RAG documents",

    metadata: {

      "project": "my-project",

      "version": "1.0"

    },

    embeddingModel: "text-embedding-ada-002",

    vectorDimension: 1536

  };

  

  // Properly typed API response

  const collectionResponse: ApiResponse = await sapAi.createCollection(collectionData);

  const collection = collectionResponse.data;

  

  // Upload documents with batch options

  const batchOptions: BatchOptions = {

    initialBatchSize: 3,

    maxPayloadSizeMB: 2,

    maxRetries: 3

  };

  

  const result = await sapAi.createDocumentsInBatches(

    collection.id, 

    documents, 

    batchOptions

  );

  

  // Perform vector search with typed options and response

  const searchOptions: SearchOptions = {

    collectionIds: [collection.id],

    limit: 10,

    threshold: 0.7,

    includeMetadata: true

  };

  

  const searchResponse: ApiResponse = await sapAi.vectorSearch(

    "search query",

    searchOptions

  );

  

  console.log(

Found ${searchResponse.data.results.length} results

);

}

$3

typescript

import { createVectorGroundingService, MulterFile } from '@timangames/vector-grounding-service';



async function processAndUpload(file: MulterFile, collectionId: string) {

  const service = createVectorGroundingService();

  

  const result = await service.processAndUpload(file, collectionId, {

    chunkingOptions: {

      maxTokenSize: 1500,

      similarityThreshold: 0.4

    },

    batchOptions: {

      initialBatchSize: 5,

      maxPayloadSizeMB: 3

    }

  });

  

  console.log(

Processed ${result.chunkCount} chunks in ${result.documentCount} documents

);

  console.log(

Uploaded ${result.uploadResult.totalDocuments} documents in ${result.uploadResult.totalBatches} batches

);

}





Available Types



$3



-

MulterFile

 - File object interface compatible with Express.js Multer

-

FileMetadata

 - Metadata extracted from processed files with specific PDF/DOCX info types

-

PdfInfo

 - Structured PDF metadata (title, author, dates, etc.)

-

DocxMessage

 - Typed DOCX processing messages (warning/error/info)

-

ChunkingOptions

 - Configuration for text chunking

-

BatchOptions

 - Configuration for batch uploading

-

ProcessingResult

 - Result of file processing operations

-

BatchUploadResult

 - Result of batch upload operations with typed responses



$3



-

SapDocument

 - Document structure for SAP AI Core

-

SapChunk

 - Individual chunk within a document

-

MetadataItem

 - Key-value metadata structure

-

CollectionData

 - Collection creation data with metadata and embedding options

-

Collection

 - Full collection object with timestamps and counts

-

Document

 - Complete document object with metadata and chunks

-

SearchOptions

 - Vector search configuration with specific options

-

SearchResult

 - Individual search result with score and metadata

-

VectorSearchResponse

 - Complete search response with results and execution info



$3



-

ApiResponse

 - Generic API response wrapper with status and data

-

DocumentCreationResponse

 - Response for document creation operations

-

UpdateDocumentData

 - Data structure for document updates



$3



-

VectorGroundingServiceConfig

 - Configuration for the convenience service

-

GetCollectionsOptions

 - Options for fetching collections

-

GetDocumentsOptions

 - Options for fetching documents



$3



-

DocumentProcessor

 - File processing and chunking service

-

SapAiService

 - SAP AI Core integration service with fully typed methods

-

VectorGroundingService

 - Combined convenience service



Type Safety Features



✅ No

any types

 - All interfaces use specific, well-defined types

✅ Strict API responses - All API methods return properly typed

ApiResponse

 objects

✅ Comprehensive options - All configuration objects have specific properties

✅ Union types - Enums and literal types for better validation (e.g.,

'warning' | 'error' | 'info'

)

✅ Generic types - Flexible yet type-safe generic interfaces where appropriate

✅ Optional properties - Clear distinction between required and optional fields



Type Validation



The package includes scripts to validate type definitions:

bash

Validate type definitions

npm run build:types



Types are automatically validated before publishing

npm run prepublishOnly





Note: The type definitions are manually crafted for maximum type safety and are validated (not auto-generated) to ensure they remain accurate and comprehensive.



Development



When contributing to this package, ensure that:



1. All public methods have proper type annotations

2. Interfaces are exported for consumer use

3. Type definitions are tested with

npx tsc --noEmit



4. Documentation includes TypeScript examples



The type definitions are located in

types/index.d.ts

 and are automatically included when the package is installed.



Troubleshooting



$3



1. "Missing header parameter 'AI-Resource-Group'"

   - Ensure

SAP_AI_CORE_RESOURCE_GROUP is set in your .env

 file

   - Verify the resource group exists in your SAP AI Core instance



2. Authentication errors

   - Check that

AICORE_SERVICE_KEY

 contains valid JSON

   - Verify the service key has the necessary permissions

   - Ensure the service key is not expired



3. "Service key not found"

   - Make sure

AICORE_SERVICE_KEY

 is properly formatted JSON

   - Check that all required fields are present in the service key



4. File upload errors

   - Check file size limits (

MAX_FILE_SIZE`)
- Verify file type is supported (see supported file types section)
- Ensure sufficient disk space

Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

License

This project is licensed under the ISC License.

Support

For issues and questions:
- Check the SAP AI Core Documentation
- Review the SAP AI SDK Documentation
- Check the official SAP AI SDK examples
- Open an issue in this repository