# GAIA Super Agent SDK

### 🤖 Build GAIA-Benchmark-ready Super AI Agents in seconds, not weeks

Production-ready Super AI agent with 18+ tools and swappable providers
Built on AI SDK v6 ToolLoopAgent & ToolSDK.ai with ReAct reasoning

![npm version](https://www.npmjs.com/package/@gaia-agent/sdk)
![License](LICENSE)
![TypeScript](https://www.typescriptlang.org/)
![AI SDK](https://sdk.vercel.ai/)

Quick Start · Features · GAIA Benchmark · Documentation

---

✨ Features

$3

Pre-configured agent ready for GAIA benchmarks out of the box

$3

Built-in Reasoning + Acting framework for structured thinking

$3

Multi-step planning + answer verification for complex tasks

$3

Organized by category with official SDKs (Tavily, Exa, E2B, BrowserUse, Steel)

$3

Easy provider switching for sandbox, browser, search, and memory

$3

Integrated Tavily and Exa for intelligent web search

$3

E2B cloud sandbox with code execution + filesystem operations

$3

Steel, BrowserUse or AWS AgentCore for web interactions

$3

Persistent memory with Mem0 or AWS AgentCore

$3

ESM with granular exports, TypeScript-first

---

🎯 Why GAIA Agent?

$3

Empower developers to build world-class Super AI Agents in minutes, not months.

Whether you're creating a production-ready AI assistant for your product or competing in GAIA benchmarks, GAIA Agent provides the enterprise-grade foundation you need.

$3

- Days/weeks setting up APIs
- Writing tool wrappers manually
- Error handling for each service
- Figuring out which providers to use
- Integration testing headaches

$3

- 3 lines of code to get started
- 16 tools ready with official SDKs
- GAIA benchmark ready immediately
- Swap providers with one line
- Production-tested implementations

Time savings: From weeks of infrastructure setup → 3 lines of code

Result: A world-class, production-ready Super Agent that rivals top AI systems

$3

The GAIA Benchmark is a comprehensive evaluation suite designed to test the capabilities of AI agents across a wide range of tasks, including reasoning, search, code execution, and browser automation.

📖 Read more about GAIA →

---

🚀 Quick Start

$3

``bash npm install @gaia-agent/sdk ai @ai-sdk/openai zod`

`$3`

`typescript import { createGaiaAgent } from '@gaia-agent/sdk';

// Create the agent - reads from environment variables const agent = createGaiaAgent();

const result = await agent.generate({ prompt: 'Calculate 15 * 23 and search for the latest AI papers', });

console.log(result.text);`

`$3`

Create a .env file:

`bash

`Required`


OPENAI_API_KEY=sk-...
Default providers (at least one required)

TAVILY_API_KEY=tvly-...      # Search
E2B_API_KEY=...              # Sandbox
STEEL_API_KEY=steel_live_... # Browser


📖 Complete environment variables guide →
---
🛠️ Built-in Tools
| Category | Tools | Providers |
|----------|-------|-----------|
| 🧮 Core | calculator, httpRequest | Built-in |
| � Planning | planner, verifier | Built-in |
| �🔍 Search | tavilySearch, exaSearch, exaGetContents | Tavily (default), Exa |
| 🛡️ Sandbox | e2bSandbox, sandockExecute | E2B (default), Sandock |
| 🖥️ Browser | steelBrowser, browserUseTool, awsBrowser | Steel (default), BrowserUse, AWS |
| 🧠 Memory | mem0Remember, mem0Recall, memoryStore | Mem0 (default), AWS AgentCore |
📖 Full tools documentation →  
📖 Provider comparison →  
📖 ReAct + Planning guide → ⭐ NEW
---
🔄 Swap Providers
Switch providers with one line:

`typescript import { createGaiaAgent } from '@gaia-agent/sdk';

const agent = createGaiaAgent({ providers: { search: 'exa', // Use Exa instead of Tavily sandbox: 'sandock', // Use Sandock instead of E2B browser: 'browseruse', // Use BrowserUse instead of Steel }, });`

Or set via environment variables:

`bash GAIA_AGENT_SEARCH_PROVIDER=exa GAIA_AGENT_SANDBOX_PROVIDER=sandock GAIA_AGENT_BROWSER_PROVIDER=browseruse`

---

`🎯 GAIA Benchmark`

Run official GAIA benchmarks with enhanced results tracking:

`bash

`Basic benchmark`


pnpm benchmark                  # Run validation set
pnpm benchmark --limit 10       # Test with 10 tasks
Resume interrupted runs

pnpm benchmark --resume         # Continue from checkpoint
Filter by capability

pnpm benchmark:files            # Tasks with file attachments
pnpm benchmark:code             # Code execution tasks
pnpm benchmark:search           # Web search tasks
pnpm benchmark:browser          # Browser automation tasks
Stream mode (real-time thinking)

pnpm benchmark:random --stream  # Watch agent think in real-time
Wrong answers collection

pnpm benchmark:wrong            # Retry only failed tasks


$3
Automatically track and retry failed tasks:

`bash

`1. Run benchmark (auto-creates wrong-answers.json)`


pnpm benchmark --limit 20
2. View wrong answers

cat benchmark-results/wrong-answers.json
3. Retry only failed tasks

pnpm benchmark:wrong --verbose
4. Keep retrying until all pass

pnpm benchmark:wrong
→ "🎉 No wrong answers! All previous tasks passed."


📖 Wrong answers guide →  
📖 Resume feature guide →  
📖 Benchmark module docs →  
📖 GAIA setup guide →
---
📊 Enhanced Benchmark Results
Benchmark results now include full task details:

`json { "taskId": "abc123", "question": "What year was X founded?", "level": 2, "files": ["image.png"], "answer": "1927", "expectedAnswer": "1927", "correct": true, "durationMs": 5234, "steps": 3, "toolsUsed": ["search", "browser"], "summary": { "totalToolCalls": 5, "uniqueTools": ["search", "browser", "calculator"], "hadError": false }, "stepDetails": [ / ... / ] }`

Easier to analyze and debug! 🎉

---

`📈 Benchmark Results`

Latest benchmark performance across different task categories:

| Benchmark Command | Timestamp | Results | Accuracy | Model | Providers | Details | |-------------------|-----------|---------|----------|-------|-----------|---------| |pnpm benchmark| 2025-11-26 08:33 | 22/53 | 41.51% | gpt-4o | Search: tavily, Sandbox: e2b, Browser: steel | View Details | |pnpm benchmark:level1| 2025-11-27 10:38 | 16/53 | 30.19% | gpt-4o | Search: openai, Sandbox: e2b, Browser: steel, Memory: mem0 | - | |pnpm benchmark:level1 | 2025-12-03 04:12 | 21/53 | 39.62% | Claude Sonnet 4.5 | Search: openai, Sandbox: e2b, Browser: steel, Memory: mem0 | View Details |

📖 See detailed task-by-task results →

Note: Benchmark results are automatically updated after each benchmark run.

---

`🧪 Testing`

Run unit tests with Vitest:

`bash pnpm test # Run all tests pnpm test:watch # Watch mode pnpm test:coverage # Coverage report`

📖 Testing guide →

---

`🎯 Advanced Usage`

`$3`

`typescript import { createGaiaAgent, getDefaultTools } from '@gaia-agent/sdk'; import { tool } from 'ai'; import { z } from 'zod';

const agent = createGaiaAgent({ tools: { ...getDefaultTools(), weatherTool: tool({ description: 'Get weather', inputSchema: z.object({ city: z.string() }), execute: async ({ city }) => ({ temp: 72, condition: 'sunny' }), }), }, });`

`$3`

Integrate thousands of tools from ToolSDK.ai ecosystem:

`typescript import { createGaiaAgent, getDefaultTools } from '@gaia-agent/sdk'; import { ToolSDKApiClient } from 'toolsdk/api'; // npm install toolsdk

// Initialize ToolSDK client const toolSDK = new ToolSDKApiClient({ apiKey: process.env.TOOLSDK_AI_API_KEY });

// Load tools from ToolSDK packages const emailTool = await toolSDK.package('@toolsdk.ai/mcp-send-email', { RESEND_API_KEY: process.env.RESEND_API_KEY, }).getAISDKTool("send-email");

const agent = createGaiaAgent({ tools: { ...getDefaultTools(), emailTool }, });

const result = await agent.generate({ prompt: 'Help me search for the latest AI news and send it to john@example.com', });`

📖 ToolSDK Packages →

`$3`

`typescript import { GAIAAgent } from '@gaia-agent/sdk';

class ResearchAgent extends GAIAAgent { constructor() { super({ instructions: 'Research assistant specialized in AI papers', additionalTools: { / custom tools / }, }); } }`

📖 Advanced usage guide → 📖 API reference →

---

`📚 Documentation`

$3

- Quick Start Guide - Get started in 5 minutes
- ReAct + Planning Guide ⭐ NEW - Enhanced reasoning & planning
- Reflection Guide ⭐ NEW - Step-by-step reflection (optional)
- Environment Variables - Complete configuration guide
- GAIA Benchmark - Requirements, setup, tips
- Improving GAIA Scores - Strategies for better performance & self-evolution
- Wrong Answers Collection - Error tracking and retry
- Provider Comparison - Detailed provider comparison

$3

- API Reference - Complete API documentation
- Tools Reference - All available tools
- Advanced Usage - Extension examples, patterns
- Benchmark Module - Modular architecture
- Testing Guide - Unit tests with Vitest

---

`🤝 Contributing`

This project uses automated NPM publishing. When changes are merged to main:

1. ✅ Tests run automatically 2. 📦 Version bumps to next patch (e.g., 0.1.0 → 0.1.1) 3. 📝 Changelog created inchangelog/`
4. 🚀 Published to NPM
5. 🏷️ Git tag created

For manual version bumps (minor/major), see docs/NPM_PUBLISH_SETUP.md.

---

📄 License

Apache License 2.0

---

Made with ❤️ for the AI community

---

✨ Features

$3

Pre-configured agent ready for GAIA benchmarks out of the box

$3

Built-in Reasoning + Acting framework for structured thinking

$3

Multi-step planning + answer verification for complex tasks

$3

Organized by category with official SDKs (Tavily, Exa, E2B, BrowserUse, Steel)

$3

Easy provider switching for sandbox, browser, search, and memory

$3

Integrated Tavily and Exa for intelligent web search

$3

E2B cloud sandbox with code execution + filesystem operations

$3

Steel, BrowserUse or AWS AgentCore for web interactions

$3

Persistent memory with Mem0 or AWS AgentCore

$3

ESM with granular exports, TypeScript-first

---

🎯 Why GAIA Agent?

$3

Empower developers to build world-class Super AI Agents in minutes, not months.

Whether you're creating a production-ready AI assistant for your product or competing in GAIA benchmarks, GAIA Agent provides the enterprise-grade foundation you need.

$3

- Days/weeks setting up APIs
- Writing tool wrappers manually
- Error handling for each service
- Figuring out which providers to use
- Integration testing headaches

$3

- 3 lines of code to get started
- 16 tools ready with official SDKs
- GAIA benchmark ready immediately
- Swap providers with one line
- Production-tested implementations

Time savings: From weeks of infrastructure setup → 3 lines of code

Result: A world-class, production-ready Super Agent that rivals top AI systems

$3

📖 Read more about GAIA →

---

🚀 Quick Start

$3

``bash npm install @gaia-agent/sdk ai @ai-sdk/openai zod`

`$3`

`typescript import { createGaiaAgent } from '@gaia-agent/sdk';

// Create the agent - reads from environment variables const agent = createGaiaAgent();

const result = await agent.generate({ prompt: 'Calculate 15 * 23 and search for the latest AI papers', });

console.log(result.text);`

`$3`

Create a .env file:

`bash

`Required`


OPENAI_API_KEY=sk-...
Default providers (at least one required)

TAVILY_API_KEY=tvly-...      # Search
E2B_API_KEY=...              # Sandbox
STEEL_API_KEY=steel_live_... # Browser


📖 Complete environment variables guide →
---
🛠️ Built-in Tools
| Category | Tools | Providers |
|----------|-------|-----------|
| 🧮 Core | calculator, httpRequest | Built-in |
| � Planning | planner, verifier | Built-in |
| �🔍 Search | tavilySearch, exaSearch, exaGetContents | Tavily (default), Exa |
| 🛡️ Sandbox | e2bSandbox, sandockExecute | E2B (default), Sandock |
| 🖥️ Browser | steelBrowser, browserUseTool, awsBrowser | Steel (default), BrowserUse, AWS |
| 🧠 Memory | mem0Remember, mem0Recall, memoryStore | Mem0 (default), AWS AgentCore |
📖 Full tools documentation →  
📖 Provider comparison →  
📖 ReAct + Planning guide → ⭐ NEW
---
🔄 Swap Providers
Switch providers with one line:

`typescript import { createGaiaAgent } from '@gaia-agent/sdk';

Or set via environment variables:

`bash GAIA_AGENT_SEARCH_PROVIDER=exa GAIA_AGENT_SANDBOX_PROVIDER=sandock GAIA_AGENT_BROWSER_PROVIDER=browseruse`

---

`🎯 GAIA Benchmark`

Run official GAIA benchmarks with enhanced results tracking:

`bash

`Basic benchmark`


pnpm benchmark                  # Run validation set
pnpm benchmark --limit 10       # Test with 10 tasks
Resume interrupted runs

pnpm benchmark --resume         # Continue from checkpoint
Filter by capability

pnpm benchmark:files            # Tasks with file attachments
pnpm benchmark:code             # Code execution tasks
pnpm benchmark:search           # Web search tasks
pnpm benchmark:browser          # Browser automation tasks
Stream mode (real-time thinking)

pnpm benchmark:random --stream  # Watch agent think in real-time
Wrong answers collection

pnpm benchmark:wrong            # Retry only failed tasks


$3
Automatically track and retry failed tasks:

`bash

`1. Run benchmark (auto-creates wrong-answers.json)`


pnpm benchmark --limit 20
2. View wrong answers

cat benchmark-results/wrong-answers.json
3. Retry only failed tasks

pnpm benchmark:wrong --verbose
4. Keep retrying until all pass

pnpm benchmark:wrong
→ "🎉 No wrong answers! All previous tasks passed."


📖 Wrong answers guide →  
📖 Resume feature guide →  
📖 Benchmark module docs →  
📖 GAIA setup guide →
---
📊 Enhanced Benchmark Results
Benchmark results now include full task details:

Easier to analyze and debug! 🎉

---

`📈 Benchmark Results`

Latest benchmark performance across different task categories:

📖 See detailed task-by-task results →

Note: Benchmark results are automatically updated after each benchmark run.

---

`🧪 Testing`

Run unit tests with Vitest:

`bash pnpm test # Run all tests pnpm test:watch # Watch mode pnpm test:coverage # Coverage report`

📖 Testing guide →

---

`🎯 Advanced Usage`

`$3`

`typescript import { createGaiaAgent, getDefaultTools } from '@gaia-agent/sdk'; import { tool } from 'ai'; import { z } from 'zod';

`$3`

Integrate thousands of tools from ToolSDK.ai ecosystem:

`typescript import { createGaiaAgent, getDefaultTools } from '@gaia-agent/sdk'; import { ToolSDKApiClient } from 'toolsdk/api'; // npm install toolsdk

// Initialize ToolSDK client const toolSDK = new ToolSDKApiClient({ apiKey: process.env.TOOLSDK_AI_API_KEY });

// Load tools from ToolSDK packages const emailTool = await toolSDK.package('@toolsdk.ai/mcp-send-email', { RESEND_API_KEY: process.env.RESEND_API_KEY, }).getAISDKTool("send-email");

const agent = createGaiaAgent({ tools: { ...getDefaultTools(), emailTool }, });

const result = await agent.generate({ prompt: 'Help me search for the latest AI news and send it to john@example.com', });`

📖 ToolSDK Packages →

`$3`

`typescript import { GAIAAgent } from '@gaia-agent/sdk';

class ResearchAgent extends GAIAAgent { constructor() { super({ instructions: 'Research assistant specialized in AI papers', additionalTools: { / custom tools / }, }); } }`

📖 Advanced usage guide → 📖 API reference →

---

`📚 Documentation`

$3

---

`🤝 Contributing`

This project uses automated NPM publishing. When changes are merged to main:

1. ✅ Tests run automatically 2. 📦 Version bumps to next patch (e.g., 0.1.0 → 0.1.1) 3. 📝 Changelog created inchangelog/`
4. 🚀 Published to NPM
5. 🏷️ Git tag created

For manual version bumps (minor/major), see docs/NPM_PUBLISH_SETUP.md.

---

📄 License

Apache License 2.0

---

Made with ❤️ for the AI community