MCP server for scraping Salesforce developer documentation and converting to Markdown. Use with Cursor, Claude Desktop, or any MCP client. Deploy to Heroku with one click.
npm install @salesforcebob/sf-docs-mcp-server



An MCP (Model Context Protocol) server for scraping Salesforce developer documentation and converting it to Markdown. Integrates with Cursor, Claude Desktop, and other MCP-compatible AI assistants. Deploy locally or to Heroku with one click.
---
- π Smart page analysis - Automatically detects optimal extraction strategy for any Salesforce doc page
- πΈοΈ Shadow DOM traversal - Handles React components and deeply nested shadow DOMs
- π Multiple page types - Supports guide, reference, API reference, type definitions, and landing pages
- π― Dynamic selectors - Fall back to custom selectors when automatic extraction fails
- π Clean Markdown output - Converts HTML to GFM-compatible Markdown with tables
- π Heroku ready - One-click deploy for remote/hosted access
---
- Prerequisites
- Install
- Run via npx
- Using with Cursor
- Using with Claude Desktop
- Running Remotely (Heroku)
- Available Tools
- Things You Can Ask
- How It Works
- Agent Usage Guide
- Batch Scraping
- Troubleshooting
- Dependencies
- Disclaimer
- License
---
- Node.js >= 18.0.0
- Chrome/Chromium (installed automatically by Puppeteer)
``bash`
npm install -g @salesforcebob/sf-docs-mcp-server
Or use directly with npx (no installation required):
`bash`
npx @salesforcebob/sf-docs-mcp-server
`bash`
npx @salesforcebob/sf-docs-mcp-server
This starts an MCP stdio server. Use it with MCP-compatible clients like Cursor or Claude Desktop.
---
1. Open Cursor settings β MCP/Servers
2. Add a new stdio server:
`json`
{
"mcpServers": {
"sf-docs": {
"command": "npx",
"args": ["-y", "@salesforcebob/sf-docs-mcp-server"]
}
}
}
Or add to your Cursor MCP configuration file (~/.cursor/mcp.json).
3. Save and reload tools. You should see:
- scrape_sf_docsanalyze_page_structure
-
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
`json`
{
"mcpServers": {
"sf-docs": {
"command": "npx",
"args": ["-y", "@salesforcebob/sf-docs-mcp-server"]
}
}
}
---
This server includes an Express HTTP transport for remote deployment.

After clicking Deploy:
1. Choose an app name
2. Deploy the app
3. Verify endpoints:
- GET /health β { ok: true }GET /docs
- β Documentation JSONPOST /mcp
- β MCP HTTP endpoint
`bash`
npm run serveor
npx @salesforcebob/sf-docs-mcp-server serve
Endpoints:
- GET http://localhost:3000/health β Health checkGET http://localhost:3000/docs
- β DocumentationPOST http://localhost:3000/mcp
- β MCP HTTP endpoint
Point your client at as the MCP HTTP endpoint.
---
Scrape a Salesforce documentation page and return the content as Markdown.
Input:
- url (string, required): The Salesforce documentation URL to scrapeselector
- (string, optional): CSS selector for content container (light DOM only)shadowPath
- (string[], optional): Array of selectors to traverse shadow DOM boundaries
Examples:
`json
// Basic usage (automatic detection)
{
"url": "https://developer.salesforce.com/docs/einstein/genai/guide/get-started.html"
}
// With shadow path for nested shadow DOM
{
"url": "https://developer.salesforce.com/docs/commerce/einstein-api/references/einstein-profile-connector?meta=type:ClientIdParam",
"shadowPath": ["doc-amf-reference", "doc-amf-topic", "api-type-documentation"]
}
`
Analyze the DOM structure of a Salesforce documentation page to determine the best extraction approach. Use this first when the default scraper fails or returns empty content.
Input:
- url (string, required): The Salesforce documentation URL to analyze
Output:
- Detected page type
- List of custom elements found
- Elements with shadow DOM
- Content containers with suggested selectors/shadow paths
- Suggested extraction approach
- DOM tree snapshot for debugging
---
Here are examples of what you can ask your AI assistant:
- "Get the Agentforce getting started documentation"
- "Scrape the Models API reference page"
- "Extract the GraphQL Send Query endpoint documentation"
- "Analyze the page structure of this Commerce Cloud API page"
- "Get all the type definitions from the Einstein Profile Connector API"
- "Show me the Agent Script language reference"
Quick JSON examples:
Scrape a guide page:
`json`
{
"tool": "scrape_sf_docs",
"input": {
"url": "https://developer.salesforce.com/docs/einstein/genai/guide/agent-script.html"
}
}
Analyze a failing page:
`json`
{
"tool": "analyze_page_structure",
"input": {
"url": "https://developer.salesforce.com/docs/commerce/einstein-api/references/einstein-profile-connector?meta=type:CookieIdParam"
}
}
---
The Salesforce developer docs use a React-based architecture with nested shadow DOM components. This server handles multiple page structures:
| Type | URL Pattern | Description |
|------|-------------|-------------|
| guide | /guide/* | Guide/tutorial pages |reference
| | /references/* with markdown | Reference pages with markdown content |api-reference
| | /references/*?meta=Summary | API summary pages |api-type
| | /references/?meta=type: | Type definition pages |api-method
| | /references/?meta= | Method/endpoint pages |overview
| | Landing pages | Overview/landing pages |
- doc-heading - Headings with nested shadow DOMdoc-content-callout
- - Tips, notes, warningsdx-code-block
- - Code snippets with syntax highlightingapi-summary
- - API overview pagesapi-type-documentation
- - Type definition pagesapi-method-documentation
- - Method/endpoint pagesdx-group-text
- - Landing page content
---
For detailed instructions on how AI agents should use these tools, see AGENT_GUIDE.md.
---
For batch scraping multiple pages at once, you can use the included scraper script:
`javascript`
// Edit the urls array in scraper.js
const urls = [
'https://developer.salesforce.com/docs/einstein/genai/guide/get-started.html',
// Add more URLs here
];
Then run:
`bash`
npm run scrape
---
| Problem | Solution |
|---------|----------|
| Empty content with pageType: "fallback" | Use analyze_page_structure to find the right extraction method |shadowPath
| Shadow path not working | Check the DOM snapshot for the correct element names |
| Content looks incomplete | Try a different or selector |PUPPETEER_EXECUTABLE_PATH` |
| "Could not find element" error | The shadow path is incorrect - re-analyze the page |
| Puppeteer/Chrome issues | Ensure Chrome is installed or set
---
- Puppeteer - Headless browser automation
- Turndown - HTML to Markdown conversion
- turndown-plugin-gfm - GFM table support
- Express - HTTP server for remote deployment
- @modelcontextprotocol/sdk - MCP server implementation
---
- This repository and MCP server are provided "as is" without warranties or guarantees of any kind, express or implied, including but not limited to functionality, security, merchantability, or fitness for a particular purpose.
- Use at your own risk. Review the source, perform a security assessment, and harden before any production deployment.
- Do not expose the HTTP endpoints publicly without proper authentication/authorization, rate limiting, logging, and monitoring.
- This tool scrapes publicly available Salesforce documentation. Ensure your usage complies with Salesforce's terms of service.
- You are solely responsible for the protection of your data and compliance with your organization's security policies.
---
MIT