2026-03-11 19:24:29 +00:00
2026-03-11 19:24:29 +00:00
2026-03-11 19:24:29 +00:00
2026-03-11 19:24:29 +00:00
2026-03-11 19:24:29 +00:00
2026-03-11 19:35:05 +00:00
2026-03-11 19:24:29 +00:00

@lukaszraczylo/cloudflare-crawl-mcp

NPM Version License

MCP server for crawling websites using Cloudflare Browser Rendering API. Supports multiple output formats including Markdown, HTML, and JSON.

Features

  • Multiple Output Formats: Choose between Markdown, HTML, or JSON output
  • Configurable Crawling: Control depth, page limits, and link following
  • Pattern Filtering: Include/exclude URLs using wildcard patterns
  • JavaScript Rendering: Execute JavaScript for dynamic content (or disable for static content)
  • Environment-Based Secrets: Securely manage credentials via environment variables

Prerequisites

  • Node.js 18+
  • Cloudflare account with Browser Rendering API access
  • Cloudflare API Token with Browser Rendering permissions
  • Cloudflare Account ID

Quick Start

# Clone and setup
npm install
npm run build

# Run with environment variables
CF_API_TOKEN=your_token CF_ACCOUNT_ID=your_account_id npm start

Installation

1. Clone the Repository

git clone https://github.com/lukaszraczylo/cloudflare-crawl-mcp.git
cd cloudflare-crawl-mcp

2. Install Dependencies

npm install

3. Build the Server

npm run build

4. Configure Environment Variables

Copy the example environment file and add your credentials:

cp .env.example .env

Edit .env with your Cloudflare credentials:

CF_API_TOKEN=your_cloudflare_api_token
CF_ACCOUNT_ID=your_cloudflare_account_id

Getting Cloudflare Credentials

  1. Account ID: Find it at https://dash.cloudflare.com/_/account
  2. API Token: Create one at https://dash.cloudflare.com/profile/api-tokens with these permissions:
    • Account > Browser Rendering > Edit

MCP Configuration

Claude Desktop (macOS)

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "cloudflare-crawl": {
      "command": "npm",
      "args": ["start"],
      "env": {
        "CF_API_TOKEN": "your_api_token",
        "CF_ACCOUNT_ID": "your_account_id"
      },
      "path": "/path/to/cloudflare-crawl-mcp"
    }
  }
}

Claude Code (CLI)

{
  "mcpServers": {
    "cloudflare-crawl": {
      "command": "npm",
      "args": ["start"],
      "env": {
        "CF_API_TOKEN": "your_api_token",
        "CF_ACCOUNT_ID": "your_account_id"
      }
    }
  }
}

Cursor

Add to ~/.cursor/settings.json (MCP configuration):

{
  "mcpServers": {
    "cloudflare-crawl": {
      "command": "npm",
      "args": ["start"],
      "env": {
        "CF_API_TOKEN": "your_api_token",
        "CF_ACCOUNT_ID": "your_account_id"
      },
      "path": "/path/to/cloudflare-crawl-mcp"
    }
  }
}

Available Tools

crawl_url_markdown

Crawl a website and return content in Markdown format.

{
  "name": "crawl_url_markdown",
  "arguments": {
    "url": "https://example.com/docs",
    "limit": 50,
    "depth": 2,
    "includePatterns": ["https://example.com/docs/**"],
    "excludePatterns": ["https://example.com/docs/archive/**"],
    "render": true
  }
}

crawl_url_html

Crawl a website and return content in HTML format.

{
  "name": "crawl_url_html",
  "arguments": {
    "url": "https://example.com",
    "limit": 10
  }
}

crawl_url_json

Crawl a website and return content in JSON format (uses Workers AI for data extraction).

{
  "name": "crawl_url_json",
  "arguments": {
    "url": "https://example.com/products",
    "limit": 20
  }
}

Parameters

Parameter Type Default Description
url string required Starting URL to crawl
limit number 10 Maximum pages to crawl (max: 100,000)
depth number 1 Maximum link depth from starting URL
includeSubdomains boolean false Follow links to subdomains
includeExternalLinks boolean false Follow links to external domains
includePatterns string[] [] Wildcard patterns to include
excludePatterns string[] [] Wildcard patterns to exclude
render boolean true Execute JavaScript (false = faster static fetch)

Pattern Syntax

  • * - Matches any characters except /
  • ** - Matches any characters including /

Examples:

  • https://example.com/docs/** - All URLs under /docs
  • https://example.com/*.html - All HTML files directly in root

Development

Commands

npm install             # Install dependencies
npm run typecheck       # Type check with tsc
npm run lint            # Lint with ESLint
npm run build           # Build TypeScript
npm start               # Run server
npm test                # Run tests
npm run test:watch      # Run tests in watch mode

CI runs typecheck, lint, build and test.

Testing

The project includes comprehensive tests covering:

  • Environment variable handling
  • Crawl options building
  • Result formatting (Markdown, HTML, JSON)
  • Error handling
  • API integration

Run tests:

npm test

Architecture

src/
├── index.ts          # Main MCP server implementation
│
├── API Layer
│   ├── initiateCrawl()    # POST to /crawl endpoint
│   ├── waitForCrawl()     # Poll for job completion
│   └── getCrawlResults()  # Fetch final results
│
├── Formatters
│   ├── formatMarkdownResult()
│   ├── formatHtmlResult()
│   └── formatJsonResult()
│
└── MCP Handlers
    ├── ListToolsRequestSchema    # Tool registration
    └── CallToolRequestSchema     # Tool execution

Cloudflare Limits

  • Max crawl duration: 7 days
  • Results available: 14 days after completion
  • Max pages per job: 100,000
  • Free plan: 10 minutes of browser time per day

See Cloudflare Browser Rendering Limits for details.

Troubleshooting

Crawl returns no results

  • Check robots.txt blocking (use render: false to bypass)
  • Verify includePatterns match actual URLs
  • Try increasing depth or disabling pattern filters

Job cancelled due to limits

  • Upgrade to Workers Paid plan
  • Use render: false for static content
  • Reduce limit parameter

Authentication errors

  • Verify API Token has Browser Rendering permissions
  • Confirm Account ID is correct

License

MIT License - see LICENSE file.

Contributing

Contributions are welcome! Please read our contributing guidelines before submitting PRs at https://github.com/lukaszraczylo/cloudflare-crawl-mcp.

Support

S
Description
Mirror of github.com/lukaszraczylo/cloudflare-crawl-mcp
Readme MIT 150 KiB
Languages
TypeScript 98.8%
JavaScript 1.2%