Vector search and document processing library
npm install ozymandias_osirisOsiris is a powerful document ingestion pipeline designed to process content into vector embeddings and store them in a Qdrant vector database. It's built to work seamlessly with the Ibis chat application.
- Node.js (v18 or higher)
- npm or yarn
- A Qdrant instance (local or cloud)
- OpenAI API key
1. Clone the repository:
``bash
git clone
cd osiris
2. Install dependencies:
`bash`
npm install
3. Create a .env file in the root directory:
`json``
{
"title": "Document Title",
"content": "Document content goes here...",
"metadata": {
"source": "optional source information",
"author": "optional author information",
"date": "optional date information"
}
}
Store your JSON files in a directory (e.g., ./data).
1. Build the project:
`bash`
npm run build
2. Run the ingestion pipeline:
`bash`
node dist/index.js ingest
Options:
- --collection
- --batch-size
- --max-retries
- --max-concurrent
Example
`bash`
node dist/index.js ingest ./data --collection documents --batch-size 50
or add this in .zshrc
`bashOsiris data ingestion function
osiris() {
# Show help if no arguments provided
if [ -z "$1" ] || [ "$1" = "-h" ] || [ "$1" = "--help" ]; then
echo "Usage: osiris
echo ""
echo "Commands:"
echo " ingest
echo " health # Check system health"
echo " clean
echo " delete-by-group -c
echo ""
echo "Examples:"
echo " osiris ingest ./content -c my-collection -g client1"
echo " osiris health"
echo " osiris clean my-collection"
echo " osiris delete-by-group -c my-collection -g client1"
return 1
fi
# Get the directory of the script
local OSIRIS_PATH="/users/ivan/sites/ozymandias/osiris"
# If the first argument looks like a path and not a command, insert 'ingest'
if [[ "$1" != "health" && "$1" != "clean" && "$1" != "delete-by-group" ]]; then
set -- "ingest" "$@"
fi
# Run the command using tsx instead of node
cd "$OSIRIS_PATH" && npx tsx src/index.ts "$@"
}
`
and then run
`bashIngest content
osiris ./content -c collection-name -g client1
Features
- Content Validation: Validates JSON files and their content structure
- Text Chunking: Intelligently splits documents into appropriate chunks
- Embedding Generation: Generates embeddings using OpenAI's API
- Vector Storage: Stores embeddings in Qdrant vector database
- Progress Tracking: Shows real-time progress and statistics
- Error Handling: Robust error handling with retries
- Concurrent Processing: Efficient parallel processing of documentsMonitoring
The ingestion process provides real-time feedback:
- Progress of file processing
- Number of chunks generated
- Embedding generation progress
- Success/failure statistics
Error Handling
Errors are logged with detailed information. Failed operations are automatically retried based on the --max-retries setting.
Integration with Ibis
Osiris is designed to work with the Ibis chat application. Make sure to:
- Use the same Qdrant instance in both applications
- Set the collection name to match Ibis's configuration (default: 'documents')
Development
Run tests:
`bash
npm run test
`Watch mode:
`bash
npm run test:watch
`Generate coverage report:
`bash
npm run coverage
``