Production-ready AI agent library using AI SDK v6 ToolLoopAgent for GAIA benchmarks with swappable providers
npm install @gaia-agent/sdk---
$3Pre-configured agent ready for GAIA benchmarks out of the box $3Built-in Reasoning + Acting framework for structured thinking $3Multi-step planning + answer verification for complex tasks $3Organized by category with official SDKs (Tavily, Exa, E2B, BrowserUse, Steel) $3Easy provider switching for sandbox, browser, search, and memory $3Integrated Tavily and Exa for intelligent web search | $3E2B cloud sandbox with code execution + filesystem operations $3Steel, BrowserUse or AWS AgentCore for web interactions $3Persistent memory with Mem0 or AWS AgentCore $3ESM with granular exports, TypeScript-first |
---
Whether you're creating a production-ready AI assistant for your product or competing in GAIA benchmarks, GAIA Agent provides the enterprise-grade foundation you need.
$3- Days/weeks setting up APIs - Writing tool wrappers manually - Error handling for each service - Figuring out which providers to use - Integration testing headaches | $3- 3 lines of code to get started - 16 tools ready with official SDKs - GAIA benchmark ready immediately - Swap providers with one line - Production-tested implementations |
Time savings: From weeks of infrastructure setup โ 3 lines of code
Result: A world-class, production-ready Super Agent that rivals top AI systems
---
``bash`
npm install @gaia-agent/sdk ai @ai-sdk/openai zod
`typescript
import { createGaiaAgent } from '@gaia-agent/sdk';
// Create the agent - reads from environment variables
const agent = createGaiaAgent();
const result = await agent.generate({
prompt: 'Calculate 15 * 23 and search for the latest AI papers',
});
console.log(result.text);
`
Create a .env file:
`bashRequired
OPENAI_API_KEY=sk-...
๐ Complete environment variables guide โ
---
๐ ๏ธ Built-in Tools
| Category | Tools | Providers |
|----------|-------|-----------|
| ๐งฎ Core | calculator, httpRequest | Built-in |
| ๏ฟฝ Planning | planner, verifier | Built-in |
| ๏ฟฝ๐ Search | tavilySearch, exaSearch, exaGetContents | Tavily (default), Exa |
| ๐ก๏ธ Sandbox | e2bSandbox, sandockExecute | E2B (default), Sandock |
| ๐ฅ๏ธ Browser | steelBrowser, browserUseTool, awsBrowser | Steel (default), BrowserUse, AWS |
| ๐ง Memory | mem0Remember, mem0Recall, memoryStore | Mem0 (default), AWS AgentCore |
๐ Full tools documentation โ
๐ Provider comparison โ
๐ ReAct + Planning guide โ โญ NEW
---
๐ Swap Providers
Switch providers with one line:
`typescript
import { createGaiaAgent } from '@gaia-agent/sdk';const agent = createGaiaAgent({
providers: {
search: 'exa', // Use Exa instead of Tavily
sandbox: 'sandock', // Use Sandock instead of E2B
browser: 'browseruse', // Use BrowserUse instead of Steel
},
});
`Or set via environment variables:
`bash
GAIA_AGENT_SEARCH_PROVIDER=exa
GAIA_AGENT_SANDBOX_PROVIDER=sandock
GAIA_AGENT_BROWSER_PROVIDER=browseruse
`---
๐ฏ GAIA Benchmark
Run official GAIA benchmarks with enhanced results tracking:
`bash
Basic benchmark
pnpm benchmark # Run validation set
pnpm benchmark --limit 10 # Test with 10 tasksResume interrupted runs
pnpm benchmark --resume # Continue from checkpointFilter by capability
pnpm benchmark:files # Tasks with file attachments
pnpm benchmark:code # Code execution tasks
pnpm benchmark:search # Web search tasks
pnpm benchmark:browser # Browser automation tasksStream mode (real-time thinking)
pnpm benchmark:random --stream # Watch agent think in real-timeWrong answers collection
pnpm benchmark:wrong # Retry only failed tasks
`$3
Automatically track and retry failed tasks:
`bash
1. Run benchmark (auto-creates wrong-answers.json)
pnpm benchmark --limit 202. View wrong answers
cat benchmark-results/wrong-answers.json3. Retry only failed tasks
pnpm benchmark:wrong --verbose4. Keep retrying until all pass
pnpm benchmark:wrong
โ "๐ No wrong answers! All previous tasks passed."
`๐ Wrong answers guide โ
๐ Resume feature guide โ
๐ Benchmark module docs โ
๐ GAIA setup guide โ
---
๐ Enhanced Benchmark Results
Benchmark results now include full task details:
`json
{
"taskId": "abc123",
"question": "What year was X founded?",
"level": 2,
"files": ["image.png"],
"answer": "1927",
"expectedAnswer": "1927",
"correct": true,
"durationMs": 5234,
"steps": 3,
"toolsUsed": ["search", "browser"],
"summary": {
"totalToolCalls": 5,
"uniqueTools": ["search", "browser", "calculator"],
"hadError": false
},
"stepDetails": [ / ... / ]
}
`Easier to analyze and debug! ๐
---
๐ Benchmark Results
Latest benchmark performance across different task categories:
| Benchmark Command | Timestamp | Results | Accuracy | Model | Providers | Details |
|-------------------|-----------|---------|----------|-------|-----------|---------|
|
pnpm benchmark | 2025-11-26 08:33 | 22/53 | 41.51% | gpt-4o | Search: tavily, Sandbox: e2b, Browser: steel | View Details |
| pnpm benchmark:level1 | 2025-11-27 10:38 | 16/53 | 30.19% | gpt-4o | Search: openai, Sandbox: e2b, Browser: steel, Memory: mem0 | - |
| pnpm benchmark:level1 | 2025-12-03 04:12 | 21/53 | 39.62% | Claude Sonnet 4.5 | Search: openai, Sandbox: e2b, Browser: steel, Memory: mem0 | View Details |๐ See detailed task-by-task results โ
Note: Benchmark results are automatically updated after each benchmark run.
---
๐งช Testing
Run unit tests with Vitest:
`bash
pnpm test # Run all tests
pnpm test:watch # Watch mode
pnpm test:coverage # Coverage report
`๐ Testing guide โ
---
๐ฏ Advanced Usage
$3
`typescript
import { createGaiaAgent, getDefaultTools } from '@gaia-agent/sdk';
import { tool } from 'ai';
import { z } from 'zod';const agent = createGaiaAgent({
tools: {
...getDefaultTools(),
weatherTool: tool({
description: 'Get weather',
inputSchema: z.object({ city: z.string() }),
execute: async ({ city }) => ({ temp: 72, condition: 'sunny' }),
}),
},
});
`$3
Integrate thousands of tools from ToolSDK.ai ecosystem:
`typescript
import { createGaiaAgent, getDefaultTools } from '@gaia-agent/sdk';
import { ToolSDKApiClient } from 'toolsdk/api'; // npm install toolsdk// Initialize ToolSDK client
const toolSDK = new ToolSDKApiClient({ apiKey: process.env.TOOLSDK_AI_API_KEY });
// Load tools from ToolSDK packages
const emailTool = await toolSDK.package('@toolsdk.ai/mcp-send-email', {
RESEND_API_KEY: process.env.RESEND_API_KEY,
}).getAISDKTool("send-email");
const agent = createGaiaAgent({
tools: {
...getDefaultTools(),
emailTool
},
});
const result = await agent.generate({
prompt: 'Help me search for the latest AI news and send it to john@example.com',
});
`๐ ToolSDK Packages โ
$3
`typescript
import { GAIAAgent } from '@gaia-agent/sdk';class ResearchAgent extends GAIAAgent {
constructor() {
super({
instructions: 'Research assistant specialized in AI papers',
additionalTools: { / custom tools / },
});
}
}
`๐ Advanced usage guide โ
๐ API reference โ
---
๐ Documentation
$3
- Quick Start Guide - Get started in 5 minutes
- ReAct + Planning Guide โญ NEW - Enhanced reasoning & planning
- Reflection Guide โญ NEW - Step-by-step reflection (optional)
- Environment Variables - Complete configuration guide
- GAIA Benchmark - Requirements, setup, tips
- Improving GAIA Scores - Strategies for better performance & self-evolution
- Wrong Answers Collection - Error tracking and retry
- Provider Comparison - Detailed provider comparison
$3
- API Reference - Complete API documentation
- Tools Reference - All available tools
- Advanced Usage - Extension examples, patterns
- Benchmark Module - Modular architecture
- Testing Guide - Unit tests with Vitest
---
๐ค Contributing
This project uses automated NPM publishing. When changes are merged to
main:1. โ
Tests run automatically
2. ๐ฆ Version bumps to next patch (e.g., 0.1.0 โ 0.1.1)
3. ๐ Changelog created in
changelog/`For manual version bumps (minor/major), see docs/NPM_PUBLISH_SETUP.md.
---
Apache License 2.0
---
Made with โค๏ธ for the AI community