San Francisco documentation embeddings - example RAG data package
npm install @san-francisco/sf-docs-embeddings
San Francisco documentation embeddings - example RAG data package.
This package demonstrates how to ship pre-computed embeddings as PGPM migrations. It creates a sample collection with San Francisco city documentation and example embeddings.
1. Creates an embedding model configuration for text-embedding-3-small
2. Creates the sf-docs collection with semantic chunking config
3. Seeds example documents and chunks
4. (In production) Would include actual vector embeddings
``bashInstall the RAG schema first
pgpm deploy @sf-ai/rag-core
Data Structure
After installation, you'll have:
- Collection:
sf-docs - San Francisco city documentation
- Model: text-embedding-3-small (OpenAI, 1536 dimensions)
- Documents: Example SF city services content
- Chunks: Semantically chunked document segmentsCreating Your Own Data Package
To create a similar data package for your own embeddings:
1. Generate embeddings using your preferred model
2. Export using
rag.export_collection_json()
3. Convert the JSON to SQL INSERT statements
4. Package as a PGPM moduleExample workflow:
`sql
-- Export your collection
SELECT rag.export_collection_json('your-collection-id');-- Or export as CSV for processing
SELECT * FROM rag.export_embeddings_csv('your-collection-id');
`Dependencies
-
@sf-ai/rag-core`