Standalone CLI tool for documentation crawling with SPA support, error detection, and code validation
npm install @gulibs/safe-coder-cliStandalone CLI tool for documentation crawling with SPA support, error detection, and code validation.
@gulibs/safe-coder-cli is an independent command-line tool that crawls documentation websites and generates structured output. It supports both static sites and Single Page Applications (SPAs) using browser automation.
This CLI is designed to work standalone or as part of the Safe Coder ecosystem, where it's called by the @gulibs/safe-coder MCP Server.
- HTTP & Browser Crawling: Supports both static HTTP crawling and browser-based rendering for SPAs
- Intelligent Content Extraction: Cleans and structures documentation content
- Parallel Processing: Multi-worker support for faster crawling
- Progress Reporting: Real-time progress updates via stderr
- JSON Output: Machine-readable JSON output for programmatic use
- Skill Generation: Generates AI-ready SKILL files from documentation
- Checkpoint Support: Resume interrupted crawls
- Proxy Support: Configure HTTP/HTTPS proxies
``bash`
npm install -g @gulibs/safe-coder-cli
Or using yarn:
`bash`
yarn global add @gulibs/safe-coder-cli
Or using pnpm:
`bash`
pnpm add -g @gulibs/safe-coder-cli
`bash`
safe-coder-cli --version
safe-coder-cli --help
`bash`
safe-coder-cli crawl https://react.dev
`bashLimit pages and depth
safe-coder-cli crawl https://react.dev --max-pages 50 --max-depth 3
$3
`bash
Output machine-readable JSON
safe-coder-cli crawl https://react.dev --output-format jsonCapture output to file
safe-coder-cli crawl https://react.dev --output-format json > output.json
`Command Reference
$3
Crawl documentation website and optionally generate skill file.
#### Options
-
-c, --config - Path to configuration file
- -b, --browser - Browser type: puppeteer | playwright
- -d, --max-depth - Maximum crawl depth (default: 3)
- -p, --max-pages - Maximum number of pages to crawl (default: 50)
- -w, --workers - Number of parallel workers (default: 1)
- --spa-strategy - SPA strategy: smart | auto | manual (default: smart)
- -o, --output-dir - Output directory for skill files
- -f, --filename - Skill name for directory and file names
- --checkpoint - Enable checkpoint/resume functionality
- --resume - Resume from last checkpoint if available
- --rate-limit - Delay in milliseconds between requests (default: 500)
- --output-format - Output format: json | pretty (default: pretty)
- --include-paths - Additional path patterns to include (comma-separated)
- --exclude-paths - Path patterns to exclude (comma-separated)$3
Detect errors and warnings in code files.
`bash
safe-coder-cli detect-errors ./src/app.ts
safe-coder-cli detect-errors ./src/app.ts --format json
`$3
Validate and optionally fix code errors.
`bash
safe-coder-cli validate-code ./src/app.ts
safe-coder-cli validate-code ./src/app.ts --output ./src/app.fixed.ts
`Configuration File
Create a
.doc-crawler.json file in your project root:`json
{
"browser": "puppeteer",
"spaStrategy": "smart",
"crawl": {
"maxDepth": 3,
"maxPages": 200,
"workers": 5,
"rateLimit": 300,
"checkpoint": {
"enabled": true,
"interval": 50
}
},
"proxy": "http://127.0.0.1:7890"
}
`Output Format
$3
When using
--output-format json, the CLI outputs:`json
{
"success": true,
"data": {
"source": {
"url": "https://react.dev",
"crawledAt": "2024-01-15T10:30:00.000Z",
"pageCount": 50,
"depth": 3
},
"pages": [
{
"url": "https://react.dev/learn",
"title": "Learn React",
"content": "...",
"wordCount": 1500,
"codeBlocks": 5,
"headings": ["Getting Started", "Components"]
}
],
"metadata": {
"technology": "react.dev",
"categories": ["tutorial", "api", "guide"]
},
"statistics": {
"totalPages": 50,
"maxDepthReached": 3,
"errors": 0
},
"skill": {
"skillMd": "...",
"quality": 85
}
}
}
`$3
Progress information is output to stderr in JSON format:
`json
{"type":"progress","message":"Crawled 10/50 pages","timestamp":"...","current":10,"total":50,"percentage":20}
`Browser Setup
For SPA crawling, you need Chrome/Chromium installed:
$3
`bash
brew install --cask google-chrome
`$3
`bash
winget install Google.Chrome
`$3
`bash
sudo apt install google-chrome-stable
`$3
`bash
export CHROME_PATH=/path/to/chrome
`Environment Variables
-
CHROME_PATH - Path to Chrome executable
- HTTP_PROXY - HTTP proxy URL
- HTTPS_PROXY - HTTPS proxy URL
- LOG_LEVEL - Log level (INFO, DEBUG, ERROR)Integration with MCP Server
The CLI is designed to be called by
@gulibs/safe-coder MCP Server. The MCP Server:1. Checks if CLI is installed
2. Spawns CLI with appropriate parameters
3. Monitors progress via stderr
4. Parses JSON output from stdout
5. Post-processes results and generates SKILL guidance
Examples
$3
`bash
safe-coder-cli crawl https://docs.example.com --max-pages 30
`$3
`bash
safe-coder-cli crawl https://docs.example.com --workers 8 --max-pages 200
`$3
`bash
safe-coder-cli crawl https://spa-site.com --spa-strategy auto --browser playwright
`$3
`bash
safe-coder-cli crawl https://react.dev \
--output-dir ~/.cursor/skills \
--filename react-docs \
--max-pages 100
`$3
`bash
safe-coder-cli crawl https://docs.example.com \
--output-format json \
--max-pages 20 > output.jsonProcess with jq
cat output.json | jq '.data.statistics'
`Troubleshooting
$3
After installation, if
safe-coder-cli is not found:`bash
Check npm global bin path
npm config get prefixAdd to PATH if needed (macOS/Linux)
export PATH="$(npm config get prefix)/bin:$PATH"
`$3
If you see "Chrome/Chromium not found":
1. Install Chrome (see Browser Setup above)
2. Set
CHROME_PATH environment variable
3. Or install full puppeteer: npm install -g puppeteer$3
On Linux/macOS, you may need sudo for global installation:
`bash
sudo npm install -g @gulibs/safe-coder-cli
`Or use a version manager like nvm to avoid sudo.
Development
`bash
Clone repository
git clone
cd safe-coder-cliInstall dependencies
npm installBuild
npm run buildLink for local testing
npm linkTest
safe-coder-cli --version
`License
MIT
Related Projects
@gulibs/safe-coder` - MCP Server that orchestrates this CLI