Capture, store, and analyze coding sessions for AI model training. A production-ready CLI tool with cloud sync.
npm install datahiveCapture, store, and analyze your coding sessions for AI model training



---
DataHive is a production-ready CLI tool that captures terminal sessions with high fidelity and syncs them to the cloud. Built for developers, researchers, and teams training AI coding models on real-world development workflows.
- š¬ High-Fidelity Capture - Uses node-pty for accurate terminal recording
- āļø Cloud-Native - Real-time sync to Supabase with automatic fallback to local storage
- š§ Universal Compatibility - Works with any CLI tool (Claude, Cursor, Gemini, Vim, etc.)
- š Zero-Config Installation - Global NPM package, ready in seconds
- š Structured Data - Outputs clean, queryable session metadata
- š Secure - Credentials stored locally with industry-standard encryption
---
- Node.js >= 14.0.0 (Download)
- Supabase Account (Sign up free)
``bash`
npm install -g datahive
`bash`
datahive --version
---
First, create the required database table:
`sql
-- Run this in your Supabase SQL Editor
CREATE TABLE sessions (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
created_at TIMESTAMPTZ DEFAULT NOW(),
tool_name TEXT NOT NULL,
start_time TIMESTAMPTZ,
end_time TIMESTAMPTZ,
raw_log_path TEXT,
content TEXT
);
-- Enable Row Level Security
ALTER TABLE sessions ENABLE ROW LEVEL SECURITY;
-- Allow inserts from authenticated/anon users
CREATE POLICY "Enable insert for anon users"
ON sessions FOR INSERT TO anon WITH CHECK (true);
CREATE POLICY "Enable read for anon users"
ON sessions FOR SELECT TO anon USING (true);
`
`bash`
datahive config \
--url "https://your-project.supabase.co" \
--key "your-anon-key"
> š” Find your credentials at: Supabase Dashboard ā Settings ā API
`bashCapture a Claude Code session
datahive claude
---
š Usage
$3
| Command | Description | Example |
|---------|-------------|---------|
|
datahive config | Configure Supabase credentials | datahive config --url |
| datahive claude | Start a captured Claude Code session | datahive claude |
| datahive gemini | Start a captured Gemini CLI session | datahive gemini |
| datahive cursor | Start a captured Cursor session | datahive cursor |
| datahive run | Capture any command | datahive run python app.py |$3
`bash
Capture a debugging session
datahive run node --inspect app.jsCapture a git workflow
datahive run git commit -m "feat: add feature"Capture an interactive session
datahive run python
`---
š Data Schema
Each captured session creates a record with:
`typescript
{
id: UUID,
created_at: Timestamp,
tool_name: string, // e.g., "claude", "vim", "generic"
start_time: Timestamp,
end_time: Timestamp,
raw_log_path: string, // Local backup path
content: string // Full terminal output
}
`$3
`sql
-- Get all Claude sessions from today
SELECT * FROM sessions
WHERE tool_name = 'claude'
AND created_at > CURRENT_DATE;-- Calculate average session duration
SELECT AVG(EXTRACT(EPOCH FROM (end_time - start_time))) / 60 AS avg_minutes
FROM sessions;
`---
š§ Configuration
$3
- Config:
~/.config/datahive/config.json (macOS/Linux)
- Logs: ./raw_data/session_*.log (local backup)$3
You can also configure via environment variables:
`bash
export SUPABASE_URL="https://your-project.supabase.co"
export SUPABASE_KEY="your-key"
`---
š Troubleshooting
$3
Solution: Ensure NPM global bin is in PATH:
`bash
npm config get prefix # Should be in your PATH
export PATH="$(npm config get prefix)/bin:$PATH"
`$3
Solution: Re-run config:
`bash
datahive config --url --key
`$3
Checklist:
1. ā
Verify credentials:
datahive config --url ... --key ...
2. ā
Check table exists: Run the schema SQL above
3. ā
Verify RLS policies are set
4. ā
Check local logs exist: ls raw_data/---
š¤ Contributing
We welcome contributions! See docs/DEVELOPER.md for:
- Architecture overview
- Development setup
- Testing guidelines
- Code standards
---
š Project Structure
`
DataHive/
āāā bin/
ā āāā cli.js # CLI entry point
āāā lib/
ā āāā capture.js # PTY session capture
ā āāā cleaner.js # Terminal output cleaning
ā āāā config.js # Configuration management
ā āāā db.js # Supabase database operations
ā āāā exporter.js # JSONL export functionality
āāā docs/
ā āāā schema.sql # Database schema
ā āāā DEVELOPER.md # Developer guide
ā āāā CHANGELOG.md # Version history
ā āāā ... # Other documentation
āāā examples/ # Sample export files
āāā raw_data/ # Local session logs (gitignored)
āāā exports/ # Generated exports (gitignored)
``---
MIT Ā© DataHive Contributors
---