MCP server providing web search and content extraction for LLMs
npm install @iflow-mcp/ankushthakur2007-miyami-websearch-mcp

> Connect your LLM to the internet! Search the web and extract content from any webpage using the Model Context Protocol.
- 🔍 Web Search - Search across Google, DuckDuckGo, Bing, Brave, Wikipedia
- 🧠 Deep Research - Multi-query parallel research with compiled reports
- 🌐 Site Crawl - Depth-limited crawling with Trafilatura extraction
- 🎬 YouTube Transcripts - Fetch captions/subtitles from any YouTube video - NEW!
- 🛡️ FREE Stealth Mode - Anti-bot bypass (Cloudflare, DataDome, etc.)
- ⏰ Time-Range Filters - Filter results by recency (day, week, month, year)
- 📄 Enhanced Content Extraction - Trafilatura-powered (Firecrawl-quality) extraction
- 📝 Markdown Output - Get structured markdown from webpages
- 🎯 Rich Metadata - Automatically extract authors, dates, site names
- ⚡ Fast & Easy - One-line installation, zero configuration
- 🤖 LLM Optimized - Formatted responses perfect for AI consumption
- 🆓 100% Free - No API keys, no signup, no configuration needed
- 🔒 Privacy-First - No tracking, no data collection
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
``json`
{
"mcpServers": {
"miyami-websearch": {
"command": "npx",
"args": ["-y", "miyami-websearch-mcp"]
}
}
}
`bash`
npm install -g miyami-websearch-mcp
Then configure Claude Desktop:
`json`
{
"mcpServers": {
"miyami-websearch": {
"command": "miyami-websearch-mcp"
}
}
}
That's it! Restart Claude Desktop and you're ready to search the web! 🎉
After adding to Claude Desktop config and restarting, try these prompts:
``
Search for the latest news about AI
``
Search for Python tutorials and summarize the top result
``
Fetch the content from https://example.com and summarize it
Parameters:
- query (required) - Your search querycategories
- (optional) - general, news, images, videos, sciencelanguage
- (optional) - Language code (default: en)page
- (optional) - Page number (default: 1)time_range
- (optional) - NEW! Filter by recency: day, week, month, year
Examples:
``
Search for "quantum computing breakthroughs" in news category
``
Search for AI news from the past 24 hours with time_range: day
``
Find recent Python tutorials from the past week with time_range: week
Parameters:
- url (required) - The webpage URLinclude_links
- (optional) - Include links (default: true)include_images
- (optional) - Include images (default: true)max_content_length
- (optional) - Max length in characters (default: 50000)format
- (optional) - Output format: text, markdown (default), htmlextraction_mode
- (optional) - Engine: trafilatura (default, best quality), readability (faster)stealth_mode
- (optional) - NEW! Anti-bot bypass: off, low, medium, high (default: off)auto_bypass
- (optional) - NEW! Auto-escalate stealth if bot protection detected (default: false)
Enhanced Features:
- 📝 Markdown output - Get structured markdown like Firecrawl
- 🎯 Rich metadata - Authors, dates, site names automatically extracted
- 📊 Extraction stats - Word count, content length, format info
- 🛡️ Stealth mode - Bypass Cloudflare, DataDome, Akamai, etc.
Example:
``
Fetch and summarize https://en.wikipedia.org/wiki/Artificial_intelligence in markdown format
Parameters:
- query (required) - Your search querynum_results
- (optional) - How many results to fetch (1-5, default: 3)categories
- (optional) - Search categoriestime_range
- (optional) - Filter by recency: day, week, month, yearformat
- (optional) - Output format: text, markdown (default), htmlstealth_mode
- (optional) - NEW! Anti-bot bypass: off, low, medium, high (default: off)auto_bypass
- (optional) - NEW! Auto-escalate stealth if bot protection detected (default: false)
What it does:
- ✅ Searches for your query (with optional time filter)
- ✅ Gets top N results
- ✅ Automatically fetches full content (parallel)
- ✅ Uses Trafilatura for Firecrawl-quality extraction
- ✅ Returns both search snippets AND full webpage content
- ✅ FREE stealth mode for protected sites
Examples:
``
Research "climate change solutions" and give me detailed info from top 3 sources
``
Get recent AI breakthroughs from past 24 hours with full articles (time_range: day, num_results: 5)
``
Research recent web development tutorials from past week (time_range: week, format: markdown)
Parameters:
- queries (required) - Comma-separated list of research queries (max 10)breadth
- (optional) - Results to fetch per query (1-5, default: 3)time_range
- (optional) - Filter by recency: day, week, month, yearmax_content_length
- (optional) - Max content per result (default: 30000)stealth_mode
- (optional) - Anti-bot bypass: off, low, medium, high (default: off)auto_bypass
- (optional) - Auto-escalate stealth if bot protection detected (default: false)
What it does:
- ✅ Process up to 10 queries in parallel for speed
- ✅ AI reranking for better relevance (always enabled)
- ✅ Auto-generates compiled markdown report
- ✅ Rich metadata extraction (author, date, source)
- ✅ Server-side caching (30 minutes)
- ✅ Aggregated statistics across all queries
- ✅ FREE stealth mode for protected sites
Examples:
``
Research "AI trends 2024,machine learning basics,ChatGPT use cases" with deep_research
``
Deep research on "React vs Vue,Next.js features,frontend trends" from past month
``
Comprehensive research: "climate solutions,renewable energy,carbon capture" with breadth: 5
Parameters:
- start_url (required) - Starting URL to crawlmax_pages
- (optional) - Max pages to crawl (1-200, default: 50)max_depth
- (optional) - Link depth (0-5, default: 2)format
- (optional) - Output format: text, markdown (default), htmlinclude_links
- (optional) - Include extracted links (default: true)include_images
- (optional) - Include image URLs (default: true)url_patterns
- (optional) - Comma-separated regex to include (e.g. /blog/,/docs/)exclude_patterns
- (optional) - Comma-separated regex to excludestealth_mode
- (optional) - Anti-bot bypass: off, low, medium, high (default: off)obey_robots
- (optional) - Respect robots.txt (default: true; set false to bypass)
What it does:
- ✅ Depth-limited recursive crawling (Scrapy subprocess)
- ✅ Trafilatura extraction with metadata + word counts
- ✅ Include/exclude URL filtering
- ✅ FREE stealth mode with optional auto-escalation on server
- ✅ 15-minute crawl timeout and 30-minute cache
Examples:
``
Crawl docs site: start_url=https://docs.example.com max_depth=3 url_patterns=/api/,/guides/
``
Bypass robots on a small crawl: start_url=https://site.com max_pages=5 obey_robots=false stealth_mode=high
``
Filter sections: start_url=https://blog.example.com url_patterns=/2024/,/tech/ exclude_patterns=/archive/
Parameters:
- video (required) - YouTube video URL or 11-character video ID (supports all formats: full URL, youtu.be, embed, shorts)format
- (optional) - Output format: text (default), json (with timestamps), srt (subtitles)lang
- (optional) - Preferred language code (e.g., en, es, hi, fr). Default: autotranslate
- (optional) - Translate transcript to target language codestart
- (optional) - Start time in seconds for trimmingend
- (optional) - End time in seconds for trimminglist_langs
- (optional) - List available transcript languages instead of fetching (default: false)
What it does:
- ✅ Extract transcripts from any YouTube video with captions
- ✅ Multiple output formats (plain text, JSON with timestamps, SRT subtitles)
- ✅ Language selection for multilingual videos
- ✅ Translation to any supported language (via YouTube)
- ✅ Time-range slicing for specific segments
- ✅ List available transcript languages
- ✅ Stats: word count, segment count, duration
- ✅ 1-hour server-side caching
Examples:
``
Get transcript from YouTube video: video=dQw4w9WgXcQ format=text
``
Get transcript with timestamps: video=https://www.youtube.com/watch?v=dQw4w9WgXcQ format=json
``
Get Spanish transcript: video=dQw4w9WgXcQ lang=es
``
Translate to French: video=dQw4w9WgXcQ translate=fr
``
Get specific time range (60-120 seconds): video=dQw4w9WgXcQ start=60 end=120
``
List available languages: video=dQw4w9WgXcQ list_langs=true
Use search_and_fetch to research "artificial general intelligence latest developments"
from the top 3 results and give me a comprehensive summary
`$3
`
Search for AI breakthroughs from the past 24 hours using time_range: day
`$3
`
Find Python tutorials from the past week using search with time_range: week
`$3
`
Fetch this article in markdown format: https://example.com/article
`$3
`
Use search_and_fetch to research "quantum computing" from the past week
with time_range: week and get full article content in markdown
`$3
`
Search for "best restaurants in Tokyo" and show me the top 5 results
`$3
`
1. Search for "Python web scraping libraries"
2. Fetch the documentation page in markdown format
3. Explain how to use it with examples
`🔧 Configuration
No configuration needed! 🎉
This MCP server connects to a free public API automatically. Just add it to your Claude Desktop config and it works immediately.
If you're looking for advanced configuration options, there aren't any - we've kept it simple on purpose!
🐛 Troubleshooting
$3
1. Check your
claude_desktop_config.json is valid JSON
2. Restart Claude Desktop completely (Quit and reopen)
3. Check Console.app (macOS) for error messages$3
This is normal! The free tier API sleeps after inactivity. Subsequent requests are fast.
$3
The backend API is on Render free tier and may be waking up. Wait 60 seconds and retry.
$3
1. Ensure you have Node.js 18+ installed:
node --version`This MCP server connects to a free public API:
- URL: https://websearch.miyami.tech (hardcoded, no config needed)
- Cost: 100% Free - no API keys or signup required
- Privacy: No logging, no tracking, no data collection
- Engines: Google, DuckDuckGo, Bing, Brave, Wikipedia, Startpage
- Stealth Mode: FREE anti-bot bypass (Cloudflare, DataDome, Akamai, etc.)
Contributions welcome! Please:
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Submit a pull request
Found a bug? Open an issue
MIT License - see LICENSE file for details
If this tool helps you, please star the repo! ⭐
---
Made with ❤️ for the LLM community
Connect your AI to the internet in seconds, not hours.