@lukaszraczylo/cloudflare-crawl-mcp
MCP server for crawling websites using Cloudflare Browser Rendering API. Supports multiple output formats including Markdown, HTML, and JSON.
Features
- Multiple Output Formats: Choose between Markdown, HTML, or JSON output
- Configurable Crawling: Control depth, page limits, and link following
- Pattern Filtering: Include/exclude URLs using wildcard patterns
- JavaScript Rendering: Execute JavaScript for dynamic content (or disable for static content)
- Environment-Based Secrets: Securely manage credentials via environment variables
Prerequisites
- Node.js 18+
- Cloudflare account with Browser Rendering API access
- Cloudflare API Token with
Browser Renderingpermissions - Cloudflare Account ID
Quick Start
# Clone and setup
npm install
npm run build
# Run with environment variables
CF_API_TOKEN=your_token CF_ACCOUNT_ID=your_account_id npm start
Installation
1. Clone the Repository
git clone https://github.com/lukaszraczylo/cloudflare-crawl-mcp.git
cd cloudflare-crawl-mcp
2. Install Dependencies
npm install
3. Build the Server
npm run build
4. Configure Environment Variables
Copy the example environment file and add your credentials:
cp .env.example .env
Edit .env with your Cloudflare credentials:
CF_API_TOKEN=your_cloudflare_api_token
CF_ACCOUNT_ID=your_cloudflare_account_id
Getting Cloudflare Credentials
- Account ID: Find it at https://dash.cloudflare.com/_/account
- API Token: Create one at https://dash.cloudflare.com/profile/api-tokens with these permissions:
Account>Browser Rendering>Edit
MCP Configuration
Claude Desktop (macOS)
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"cloudflare-crawl": {
"command": "npm",
"args": ["start"],
"env": {
"CF_API_TOKEN": "your_api_token",
"CF_ACCOUNT_ID": "your_account_id"
},
"path": "/path/to/cloudflare-crawl-mcp"
}
}
}
Claude Code (CLI)
{
"mcpServers": {
"cloudflare-crawl": {
"command": "npm",
"args": ["start"],
"env": {
"CF_API_TOKEN": "your_api_token",
"CF_ACCOUNT_ID": "your_account_id"
}
}
}
}
Cursor
Add to ~/.cursor/settings.json (MCP configuration):
{
"mcpServers": {
"cloudflare-crawl": {
"command": "npm",
"args": ["start"],
"env": {
"CF_API_TOKEN": "your_api_token",
"CF_ACCOUNT_ID": "your_account_id"
},
"path": "/path/to/cloudflare-crawl-mcp"
}
}
}
Available Tools
crawl_url_markdown
Crawl a website and return content in Markdown format.
{
"name": "crawl_url_markdown",
"arguments": {
"url": "https://example.com/docs",
"limit": 50,
"depth": 2,
"includePatterns": ["https://example.com/docs/**"],
"excludePatterns": ["https://example.com/docs/archive/**"],
"render": true
}
}
crawl_url_html
Crawl a website and return content in HTML format.
{
"name": "crawl_url_html",
"arguments": {
"url": "https://example.com",
"limit": 10
}
}
crawl_url_json
Crawl a website and return content in JSON format (uses Workers AI for data extraction).
{
"name": "crawl_url_json",
"arguments": {
"url": "https://example.com/products",
"limit": 20
}
}
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
string | required | Starting URL to crawl |
limit |
number | 10 | Maximum pages to crawl (max: 100,000) |
depth |
number | 1 | Maximum link depth from starting URL |
includeSubdomains |
boolean | false | Follow links to subdomains |
includeExternalLinks |
boolean | false | Follow links to external domains |
includePatterns |
string[] | [] | Wildcard patterns to include |
excludePatterns |
string[] | [] | Wildcard patterns to exclude |
render |
boolean | true | Execute JavaScript (false = faster static fetch) |
Pattern Syntax
*- Matches any characters except/**- Matches any characters including/
Examples:
https://example.com/docs/**- All URLs under /docshttps://example.com/*.html- All HTML files directly in root
Development
Commands
npm install # Install dependencies
npm run typecheck # Type check with tsc
npm run lint # Lint with ESLint
npm run build # Build TypeScript
npm start # Run server
npm test # Run tests
npm run test:watch # Run tests in watch mode
CI runs typecheck, lint, build and test.
Testing
The project includes comprehensive tests covering:
- Environment variable handling
- Crawl options building
- Result formatting (Markdown, HTML, JSON)
- Error handling
- API integration
Run tests:
npm test
Architecture
src/
├── index.ts # Main MCP server implementation
│
├── API Layer
│ ├── initiateCrawl() # POST to /crawl endpoint
│ ├── waitForCrawl() # Poll for job completion
│ └── getCrawlResults() # Fetch final results
│
├── Formatters
│ ├── formatMarkdownResult()
│ ├── formatHtmlResult()
│ └── formatJsonResult()
│
└── MCP Handlers
├── ListToolsRequestSchema # Tool registration
└── CallToolRequestSchema # Tool execution
Cloudflare Limits
- Max crawl duration: 7 days
- Results available: 14 days after completion
- Max pages per job: 100,000
- Free plan: 10 minutes of browser time per day
See Cloudflare Browser Rendering Limits for details.
Troubleshooting
Crawl returns no results
- Check
robots.txtblocking (userender: falseto bypass) - Verify
includePatternsmatch actual URLs - Try increasing
depthor disabling pattern filters
Job cancelled due to limits
- Upgrade to Workers Paid plan
- Use
render: falsefor static content - Reduce
limitparameter
Authentication errors
- Verify API Token has Browser Rendering permissions
- Confirm Account ID is correct
License
MIT License - see LICENSE file.
Contributing
Contributions are welcome! Please read our contributing guidelines before submitting PRs at https://github.com/lukaszraczylo/cloudflare-crawl-mcp.
Support
- Open an issue at https://github.com/lukaszraczylo/cloudflare-crawl-mcp/issues
- Check Cloudflare's Browser Rendering Docs for API details