🌉 Claude-Gemini Multimodal Bridge

$3

An MCP bridge that seamlessly integrates Claude Code, Gemini CLI, and Google AI Studio

🇯🇵 日本語版 • 📦 NPM • 🐛 Issues

---

![npm version](https://www.npmjs.com/package/claude-gemini-multimodal-bridge)
![License: MIT](https://opensource.org/licenses/MIT)
![Node.js](https://nodejs.org/)
![TypeScript](https://www.typescriptlang.org/)

![MCP Compatible](https://modelcontextprotocol.io/)
![Gemini](https://ai.google.dev/)
![Claude](https://www.anthropic.com/)
![GitHub Sponsors](https://github.com/sponsors/goodaymmm)

![Windows](#-windows-environment)
![macOS](#-quick-start)
![Linux](#-quick-start)

---

🤔 Why CGMB?

$3

Optimally integrates Claude's reasoning power, Gemini CLI's search capabilities, and AI Studio's generation power. Ahead of the 2026 AI trend: "Specialized AI Collaboration"

$3

Complete with a single npm install. Tedious setup is automated

$3

Follows the Anthropic Model Context Protocol. Enterprise-grade reliability with 95% self-healing rate

---

✨ What's New in v1.1.0

| Feature | Description |
|---------|-------------|
| 🪟 Full Windows Support | Native support for both CLI and MCP |
| 📝 Enhanced OCR Processing | Automatic text extraction from scanned PDFs |
| 🚀 Latest Gemini Models | Support for gemini-2.5-flash, gemini-3-flash |
| 🔐 OAuth Authentication | File-based authentication compatible with Claude Code |
| 🌐 Auto Translation | Japanese to English translation for image generation |
| 📊 Smart Routing | PDF URLs to AI Studio, web pages to Gemini CLI |
| ⚡ Performance Optimization | Reduced timeouts, lazy loading, caching |
| 🛡️ Error Recovery | 95% self-healing with exponential backoff |

---

🏗️ Architecture

mermaid

flowchart TD

    A[Claude Code] --> B[CGMB]



    B --> C[Gemini CLI]

    B --> D[Claude Code]

    B --> E[AI Studio]





| Layer | Specialization | Timeout |

|:-----:|:---------------|:-------:|

| 🔍 Gemini CLI | Web search, real-time information | 30s |

| 🧠 Claude Code | Complex reasoning, code analysis | 300s |

| 🎨 AI Studio | Image generation, audio synthesis, OCR | 120s |



---



🚀 Quick Start



$3



- Node.js ≥ 22.0.0

- Claude Code CLI installed

- Gemini CLI (auto-installed)



$3

bash

npm install -g claude-gemini-multimodal-bridge





> 💡 The postinstall script automatically:

> - Installs Gemini CLI

> - Sets up Claude Code MCP integration

> - Creates

.env

 template

> - Verifies system requirements



$3



Create a

.env

 file in your working directory:

bash

AI_STUDIO_API_KEY=your_api_key_here





🔗 Get API key: https://aistudio.google.com/app/apikey



$3

bash

gemini

$3



I installed CGMB via NPM. Please check my current environment for the cgmb command and help me use it.





---



💡 Usage Examples



CGMB integrates seamlessly with Claude Code. Just use the "CGMB" keyword:

bash

🎨 Image generation

"CGMB generate an image of a futuristic city"



📄 Document analysis (use absolute paths)

"CGMB analyze the document at /full/path/to/report.pdf"



🌐 URL analysis

"CGMB analyze https://example.com/document.pdf"



🔍 Web search

"CGMB search for the latest AI news"



🎵 Audio generation

"CGMB create audio saying 'Welcome to our podcast'"



📝 OCR-enabled PDF analysis

"CGMB analyze this scanned PDF document with OCR"





$3



1. Include "CGMB" in your Claude Code request

2. CGMB automatically routes to the optimal AI layer:

   - 🔍 Gemini CLI: Web search, latest information

   - 🎨 AI Studio: Images, audio, file processing

   - 🧠 Claude Code: Complex reasoning, code analysis



---



🤖 Models Used



| Purpose | Model ID | Layer |

|:-------:|:---------|:-----:|

| 🔍 Web Search |

gemini-3-flash

 | Gemini CLI |

| 🎨 Image Generation |

gemini-2.5-flash-image

 | AI Studio |

| 🎵 Audio Generation |

gemini-2.5-flash-preview-tts

 | AI Studio |

| 📄 Document Processing |

gemini-2.5-flash

 | AI Studio |

| 📝 OCR/Text Extraction |

gemini-2.5-flash

 | AI Studio |

| 🔮 General Multimodal |

gemini-2.0-flash-exp

 | AI Studio |



---



📈 Performance















$3

Authentication Overhead Reduction








$3

Search Cache Hit Rate








$3

Automatic Error Recovery Rate









---



📄 PDF Processing & OCR



$3



- ✅ Supports both text-based and scanned PDFs

- ✅ Automatic OCR detection

- ✅ Native OCR processing via Gemini File API

- ✅ Multi-language support



$3



PDF Input → Upload → OCR Processing → Content Analysis → Output Results





$3



- Text-based PDFs

- Scanned PDFs (OCR processing)

- Image-based PDFs (OCR conversion)

- Mixed content

- Complex layouts (tables, charts, formatted content)



---



📂 File Organization



Generated content is automatically organized:



output/

├── images/     # 🎨 Generated images

├── audio/      # 🎵 Generated audio files

└── documents/  # 📄 Processed documents





Access via Claude Code:

-

get_generated_file

: Retrieve specific files

-

list_generated_files

: List all generated files

-

get_file_info

: Get file metadata



---



🔧 Configuration



$3

bash

Required

AI_STUDIO_API_KEY=your_api_key_here



Optional

GEMINI_API_KEY=your_api_key_here

ENABLE_CACHING=true

CACHE_TTL=3600

LOG_LEVEL=info





$3



CGMB automatically configures Claude Code MCP integration:

- 📍 Config path:

~/.claude-code/mcp_servers.json



- ⚡ Direct Node.js execution

- 🔒 Safe merge without overwriting existing servers



---



🪟 Windows Environment



CGMB fully supports Windows in v1.1.0:



| Feature | Status |

|---------|:------:|

| CLI | ✅ All commands work |

| MCP Integration | ✅ MCP tool calls work correctly |

| Path Resolution | ✅ Automatically handles

C:\path\to\file

 format |

| Gemini CLI | ✅ Full compatibility with Windows version |

powershell

Absolute paths recommended

cgmb analyze "C:\Users\name\Documents\report.pdf"



Set environment variable (PowerShell)

$env:AI_STUDIO_API_KEY = "your_api_key_here"



Set environment variable (Command Prompt)

set AI_STUDIO_API_KEY=your_api_key_here





---



🐧 Linux / WSL Environment



CGMB works fully on Linux and WSL:



| Feature | Status |

|---------|:------:|

| CLI | ✅ All commands work |

| MCP Integration | ✅ MCP tool calls work correctly |

| Path Resolution | ✅ Supports

/mnt/

 WSL paths and Unix paths |

| Gemini CLI | ✅ Full compatibility with Linux version |

bash

Use Unix path format

cgmb analyze /home/user/documents/report.pdf



WSL environment example

cgmb analyze /mnt/c/Users/name/Documents/report.pdf



Set environment variables

export AI_STUDIO_API_KEY="your_api_key_here"

export CGMB_CHAT_MODEL="gemini-2.5-flash"





---



🔍 Troubleshooting



$3

bash

export CGMB_DEBUG=true

export LOG_LEVEL=debug

cgmb serve --debug





$3



If OCR results are inaccurate:

- Use high-resolution scanned PDFs (300+ DPI)

- Ensure clear, high-contrast text

- Avoid skewed or rotated documents



If large documents timeout:

- Split large PDFs before processing (limit: 50MB, 1,000 pages)

- Extend timeout:

export AI_STUDIO_TIMEOUT=180000





---



💰 API Costs



CGMB uses pay-per-use APIs:

- 📊 Google AI Studio API Pricing Details



---



📁 Project Structure



src/

├── core/           # 🎯 Main MCP server and layer management

├── layers/         # 🔌 AI layer implementations

├── auth/           # 🔐 Authentication system

├── tools/          # 🛠️ Processing tools

├── workflows/      # 📋 Workflow implementations

├── utils/          # 🔧 Utilities and helpers

└── mcp-servers/    # 🌐 Custom MCP servers

``

---

🔗 Links

$3

- GitHub
- NPM
- Issues

$3

- Claude Code
- Gemini CLI
- Google AI Studio
- MCP

$3

- Google AI Studio
- Claude
- Gemini API

---

📜 Version History

$3

- 🪟 Full Windows Support: Native Windows support for both CLI and MCP
- 📝 Enhanced OCR: Automatic OCR processing for image-based PDFs
- 🚀 Latest Gemini Models: Support for gemini-2.5-flash, gemini-3-flash
- ⚡ Improved MCP Integration: Optimized async layer initialization
- 📈 Performance Improvements: Reduced timeouts, lazy loading, enhanced caching
- 🛡️ Error Recovery: 95% self-healing rate with exponential backoff

$3

- 🎉 Initial release
- 🏗️ 3-layer architecture implementation
- 🎨 Basic multimodal processing

---

📄 License

MIT - See LICENSE

---

Made with ❤️ by goodaymmm

⭐ If this project helped you, please give it a star!

![Sponsor](https://github.com/sponsors/goodaymmm)

🌉 Claude-Gemini Multimodal Bridge

$3

---

🤔 Why CGMB?

$3

Optimally integrates Claude's reasoning power, Gemini CLI's search capabilities, and AI Studio's generation power. Ahead of the 2026 AI trend: "Specialized AI Collaboration"

$3

Complete with a single npm install. Tedious setup is automated

$3

Follows the Anthropic Model Context Protocol. Enterprise-grade reliability with 95% self-healing rate

---

✨ What's New in v1.1.0

🏗️ Architecture

mermaid

flowchart TD

    A[Claude Code] --> B[CGMB]



    B --> C[Gemini CLI]

    B --> D[Claude Code]

    B --> E[AI Studio]





| Layer | Specialization | Timeout |

|:-----:|:---------------|:-------:|

| 🔍 Gemini CLI | Web search, real-time information | 30s |

| 🧠 Claude Code | Complex reasoning, code analysis | 300s |

| 🎨 AI Studio | Image generation, audio synthesis, OCR | 120s |



---



🚀 Quick Start



$3



- Node.js ≥ 22.0.0

- Claude Code CLI installed

- Gemini CLI (auto-installed)



$3

bash

npm install -g claude-gemini-multimodal-bridge





> 💡 The postinstall script automatically:

> - Installs Gemini CLI

> - Sets up Claude Code MCP integration

> - Creates

.env

 template

> - Verifies system requirements



$3



Create a

.env

 file in your working directory:

bash

AI_STUDIO_API_KEY=your_api_key_here





🔗 Get API key: https://aistudio.google.com/app/apikey



$3

bash

gemini

$3



I installed CGMB via NPM. Please check my current environment for the cgmb command and help me use it.





---



💡 Usage Examples



CGMB integrates seamlessly with Claude Code. Just use the "CGMB" keyword:

bash

🎨 Image generation

"CGMB generate an image of a futuristic city"



📄 Document analysis (use absolute paths)

"CGMB analyze the document at /full/path/to/report.pdf"



🌐 URL analysis

"CGMB analyze https://example.com/document.pdf"



🔍 Web search

"CGMB search for the latest AI news"



🎵 Audio generation

"CGMB create audio saying 'Welcome to our podcast'"



📝 OCR-enabled PDF analysis

"CGMB analyze this scanned PDF document with OCR"





$3



1. Include "CGMB" in your Claude Code request

2. CGMB automatically routes to the optimal AI layer:

   - 🔍 Gemini CLI: Web search, latest information

   - 🎨 AI Studio: Images, audio, file processing

   - 🧠 Claude Code: Complex reasoning, code analysis



---



🤖 Models Used



| Purpose | Model ID | Layer |

|:-------:|:---------|:-----:|

| 🔍 Web Search |

gemini-3-flash

 | Gemini CLI |

| 🎨 Image Generation |

gemini-2.5-flash-image

 | AI Studio |

| 🎵 Audio Generation |

gemini-2.5-flash-preview-tts

 | AI Studio |

| 📄 Document Processing |

gemini-2.5-flash

 | AI Studio |

| 📝 OCR/Text Extraction |

gemini-2.5-flash

 | AI Studio |

| 🔮 General Multimodal |

gemini-2.0-flash-exp

 | AI Studio |



---



📈 Performance















$3

Authentication Overhead Reduction








$3

Search Cache Hit Rate








$3

Automatic Error Recovery Rate









---



📄 PDF Processing & OCR



$3



- ✅ Supports both text-based and scanned PDFs

- ✅ Automatic OCR detection

- ✅ Native OCR processing via Gemini File API

- ✅ Multi-language support



$3



PDF Input → Upload → OCR Processing → Content Analysis → Output Results





$3



- Text-based PDFs

- Scanned PDFs (OCR processing)

- Image-based PDFs (OCR conversion)

- Mixed content

- Complex layouts (tables, charts, formatted content)



---



📂 File Organization



Generated content is automatically organized:



output/

├── images/     # 🎨 Generated images

├── audio/      # 🎵 Generated audio files

└── documents/  # 📄 Processed documents





Access via Claude Code:

-

get_generated_file

: Retrieve specific files

-

list_generated_files

: List all generated files

-

get_file_info

: Get file metadata



---



🔧 Configuration



$3

bash

Required

AI_STUDIO_API_KEY=your_api_key_here



Optional

GEMINI_API_KEY=your_api_key_here

ENABLE_CACHING=true

CACHE_TTL=3600

LOG_LEVEL=info





$3



CGMB automatically configures Claude Code MCP integration:

- 📍 Config path:

~/.claude-code/mcp_servers.json



- ⚡ Direct Node.js execution

- 🔒 Safe merge without overwriting existing servers



---



🪟 Windows Environment



CGMB fully supports Windows in v1.1.0:



| Feature | Status |

|---------|:------:|

| CLI | ✅ All commands work |

| MCP Integration | ✅ MCP tool calls work correctly |

| Path Resolution | ✅ Automatically handles

C:\path\to\file

 format |

| Gemini CLI | ✅ Full compatibility with Windows version |

powershell

Absolute paths recommended

cgmb analyze "C:\Users\name\Documents\report.pdf"



Set environment variable (PowerShell)

$env:AI_STUDIO_API_KEY = "your_api_key_here"



Set environment variable (Command Prompt)

set AI_STUDIO_API_KEY=your_api_key_here





---



🐧 Linux / WSL Environment



CGMB works fully on Linux and WSL:



| Feature | Status |

|---------|:------:|

| CLI | ✅ All commands work |

| MCP Integration | ✅ MCP tool calls work correctly |

| Path Resolution | ✅ Supports

/mnt/

 WSL paths and Unix paths |

| Gemini CLI | ✅ Full compatibility with Linux version |

bash

Use Unix path format

cgmb analyze /home/user/documents/report.pdf



WSL environment example

cgmb analyze /mnt/c/Users/name/Documents/report.pdf



Set environment variables

export AI_STUDIO_API_KEY="your_api_key_here"

export CGMB_CHAT_MODEL="gemini-2.5-flash"





---



🔍 Troubleshooting



$3

bash

export CGMB_DEBUG=true

export LOG_LEVEL=debug

cgmb serve --debug





$3



If OCR results are inaccurate:

- Use high-resolution scanned PDFs (300+ DPI)

- Ensure clear, high-contrast text

- Avoid skewed or rotated documents



If large documents timeout:

- Split large PDFs before processing (limit: 50MB, 1,000 pages)

- Extend timeout:

export AI_STUDIO_TIMEOUT=180000





---



💰 API Costs



CGMB uses pay-per-use APIs:

- 📊 Google AI Studio API Pricing Details



---



📁 Project Structure



src/

├── core/           # 🎯 Main MCP server and layer management

├── layers/         # 🔌 AI layer implementations

├── auth/           # 🔐 Authentication system

├── tools/          # 🛠️ Processing tools

├── workflows/      # 📋 Workflow implementations

├── utils/          # 🔧 Utilities and helpers

└── mcp-servers/    # 🌐 Custom MCP servers

``

---

🔗 Links

$3

- GitHub
- NPM
- Issues

$3

- Claude Code
- Gemini CLI
- Google AI Studio
- MCP

$3

- Google AI Studio
- Claude
- Gemini API

---

📜 Version History

$3

- 🎉 Initial release
- 🏗️ 3-layer architecture implementation
- 🎨 Basic multimodal processing

---

📄 License

MIT - See LICENSE

---

Made with ❤️ by goodaymmm

⭐ If this project helped you, please give it a star!

![Sponsor](https://github.com/sponsors/goodaymmm)