What is this and what can it do?
mcp-server-webcrawl is a MCP server run on your computer. It creates a gateway to web crawler archives so that language models (OpenAI, Claude) can filter and process the data, either automously or under your direction, augmenting the base LLM knowledgebase with your web content.
Use mcp-server-webcrawl for technical inference, content management, marketing, SEO, and more. The sky is the limit!
The server supports a host of web crawlers including two mainstream crawl formats, wget and WARC. InterroBot, Katana, and SiteOne crawlers are also supported.
mcp-server-webcrawl is free and open source.

Requirements
Claude Desktop (macOS/Windows) currently has everything necessary to run mcp-server-webcrawl. In addition to the Claude Desktop, you'll need to have installed Python (>=3.10).
With Python installed, you should now have "pip" access on Terminal (macOS) or Powershell (Windows). You can install mcp-server-webcrawl with the following command.
pip install mcp-server-webcrawl
At time of writing (4/2025), OpenAI support for MCP was announced, but nothing tangible yet. Hang tight!
MCP Configuration
{ "mcpServers": { "webcrawl": { "command": "mcp-server-webcrawl", "args": ["--crawler", "wget", "--datasrc", "/path/to/wget/archives/"] } } } # tested configurations (macOS Terminal/Windows WSL) # --adjust-extension for file extensions, e.g. *.html $ wget --mirror https://example.com $ wget --mirror https://example.com --adjust-extension
{ "mcpServers": { "webcrawl": { "command": "mcp-server-webcrawl", "args": ["--crawler", "warc", "--datasrc", "/path/to/warc/archives/"] } } } # tested configurations (macOS Terminal/Windows WSL) $ wget --warc-file=example --recursive https://example.com $ wget --warc-file=example --recursive --page-requisites https://example.com
{ "mcpServers": { "webcrawl": { "command": "mcp-server-webcrawl", "args": ["--crawler", "interrobot", "--datasrc", "[homedir]/Documents/InterroBot/interrobot.v2.db"] } } } # crawls executed in InterroBot (windowed) # Windows: replace [homedir] with /Users/... # macOS: path provided on InterroBot settings page
{ "mcpServers": { "webcrawl": { "command": "mcp-server-webcrawl", "args": ["--crawler", "katana", "--datasrc", "/path/to/katana/crawls/"] } } } # tested configurations (macOS Terminal/Powershell/WSL) # -store-response to save crawl contents # -store-response-dir allows for many site crawls in one dir $ katana -u https://example.com -store-response -store-response-dir crawls/
{ "mcpServers": { "webcrawl": { "command": "mcp-server-webcrawl", "args": ["--crawler", "siteone", "--datasrc", "/path/to/siteone/archives/"] } } } # crawls executed in SiteOne (windowed) # *Generate offline website* must be checked
To connect mcp-server-webcrawl to Claude: from developer settings, find and edit the MCP configuration to include your crawl.
Pay close attention to the datasrc path. It should be the directory of webroot directories for file crawlers, and the db itself for database-based crawlers.