AI Search and Retrieval for Web Crawlers
With mcp-server-webcrawl, your AI client filters and analyzes web content under your direction or autonomously.
Support for multiple crawlers, including ArchiveBox, HTTrack, InterroBot, Katana, SiteOne, WARC, and wget is baked in.
The server includes a full-text search interface with boolean support, filtering by type, HTTP status, and more.
Main Features
- Claude Desktop ready
- Multi-crawler compatible
- Filter by type, status, and more
- Boolean search support
- Support for Markdown and snippets
- Roll your own website knowledgebase
Getting Started
Select a crawler for setup and configuration information.
Installation
mcp-server-webcrawl requires Claude Desktop (or compatable MCP host) and Python (>=3.10). Install it on the command line, via pip install:
pip install mcp-server-webcrawl
Watch the setup video or if you prefer text-only, a ArchiveBox MCP step-action guide is available.
MCP Configuration (ArchiveBox)
From Claude's Developer Settings, find the local MCP configuration to include your crawl data. Open in a text editor and modify the example to reflect your command and datasrc path.
For additional technical information, including crawler feature support, check out help.
# Windows: command set to "mcp-server-webcrawl" # macOS: command set to absolute path, i.e. # the value of $ which mcp-server-webcrawl { "mcpServers": { "webcrawl": { "command": "/path/to/mcp-server-webcrawl", "args": ["--crawler", "archivebox", "--datasrc", "/path/to/archivebox-data/"] } } } # tested configurations (macOS/Linux) # each collection appears as a separate "site" in MCP $ mkdir ~/archivebox-data/example && cd ~/archivebox-data/example $ archivebox init && archivebox add https://example.com
Installation
mcp-server-webcrawl requires Claude Desktop (or compatable MCP host) and Python (>=3.10). Install it on the command line, via pip install:
pip install mcp-server-webcrawl
Watch the setup video or if you prefer text-only, a HTTrack MCP step-action guide is available.
MCP Configuration (HTTrack)
From Claude's Developer Settings, find the local MCP configuration to include your crawl data. Open in a text editor and modify the example to reflect your command and datasrc path.
For additional technical information, including crawler feature support, check out help.
# Windows: command set to "mcp-server-webcrawl" # macOS: command set to absolute path, i.e. # the value of $ which mcp-server-webcrawl { "mcpServers": { "webcrawl": { "command": "/path/to/mcp-server-webcrawl", "args": ["--crawler", "httrack", "--datasrc", "/path/to/httrack/projects/"] } } } # crawls executed in HTTrack (windowed) # creates organized project directories # under specified location (typically "My Web Sites" on Windows # or "websites" on macOS/Linux)
Installation
mcp-server-webcrawl requires Claude Desktop (or compatable MCP host) and Python (>=3.10). Install it on the command line, via pip install:
pip install mcp-server-webcrawl
Watch the setup video or if you prefer text-only, a InterroBot MCP step-action guide is available.
MCP Configuration (InterroBot)
From Claude's Developer Settings, find the local MCP configuration to include your crawl data. Open in a text editor and modify the example to reflect your command and datasrc path.
For additional technical information, including crawler feature support, check out help.
# Windows: command set to "mcp-server-webcrawl" # macOS: command set to absolute path, i.e. # the value of $ which mcp-server-webcrawl { "mcpServers": { "webcrawl": { "command": "/path/to/mcp-server-webcrawl", "args": ["--crawler", "interrobot", "--datasrc", "[homedir]/Documents/InterroBot/interrobot.v2.db"] } } } # crawls executed in InterroBot (windowed) # Windows: replace [homedir] with /Users/... # macOS: path provided on InterroBot settings page
Installation
mcp-server-webcrawl requires Claude Desktop (or compatable MCP host) and Python (>=3.10). Install it on the command line, via pip install:
pip install mcp-server-webcrawl
Watch the setup video or if you prefer text-only, a Katana MCP step-action guide is available.
MCP Configuration (Katana)
From Claude's Developer Settings, find the local MCP configuration to include your crawl data. Open in a text editor and modify the example to reflect your command and datasrc path.
For additional technical information, including crawler feature support, check out help.
# Windows: command set to "mcp-server-webcrawl" # macOS: command set to absolute path, i.e. # the value of $ which mcp-server-webcrawl { "mcpServers": { "webcrawl": { "command": "/path/to/mcp-server-webcrawl", "args": ["--crawler", "katana", "--datasrc", "/path/to/katana/crawls/"] } } } # tested configurations (macOS Terminal/Powershell/WSL) # -store-response to save crawl contents # -store-response-dir allows for expansion of hosts # consistent with default Katana behavior to # spread assets across host directories $ katana -u https://example.com -store-response -store-response-dir /path/to/katana/crawls/example.com/
Installation
mcp-server-webcrawl requires Claude Desktop (or compatable MCP host) and Python (>=3.10). Install it on the command line, via pip install:
pip install mcp-server-webcrawl
Watch the setup video or if you prefer text-only, a SiteOne MCP step-action guide is available.
MCP Configuration (SiteOne)
From Claude's Developer Settings, find the local MCP configuration to include your crawl data. Open in a text editor and modify the example to reflect your command and datasrc path.
For additional technical information, including crawler feature support, check out help.
# Windows: command set to "mcp-server-webcrawl" # macOS: command set to absolute path, i.e. # the value of $ which mcp-server-webcrawl { "mcpServers": { "webcrawl": { "command": "/path/to/mcp-server-webcrawl", "args": ["--crawler", "siteone", "--datasrc", "/path/to/siteone/archives/"] } } } # crawls executed in SiteOne (windowed) # *Generate offline website* must be checked
Installation
mcp-server-webcrawl requires Claude Desktop (or compatable MCP host) and Python (>=3.10). Install it on the command line, via pip install:
pip install mcp-server-webcrawl
Watch the setup video or if you prefer text-only, a WARC MCP step-action guide is available.
MCP Configuration (WARC)
From Claude's Developer Settings, find the local MCP configuration to include your crawl data. Open in a text editor and modify the example to reflect your command and datasrc path.
For additional technical information, including crawler feature support, check out help.
# Windows: command set to "mcp-server-webcrawl" # macOS: command set to absolute path, i.e. # the value of $ which mcp-server-webcrawl { "mcpServers": { "webcrawl": { "command": "/path/to/mcp-server-webcrawl", "args": ["--crawler", "warc", "--datasrc", "/path/to/warc/archives/"] } } } # tested configurations (macOS Terminal/Windows WSL) # from /path/to/warc/archives/ as current working directory $ wget --warc-file=example --recursive https://example.com $ wget --warc-file=example --recursive --page-requisites https://example.com
Installation
mcp-server-webcrawl requires Claude Desktop (or compatable MCP host) and Python (>=3.10). Install it on the command line, via pip install:
pip install mcp-server-webcrawl
Watch the setup video or if you prefer text-only, a wget MCP step-action guide is available.
MCP Configuration (wget)
From Claude's Developer Settings, find the local MCP configuration to include your crawl data. Open in a text editor and modify the example to reflect your command and datasrc path.
For additional technical information, including crawler feature support, check out help.
# Windows: command set to "mcp-server-webcrawl" # macOS: command set to absolute path, i.e. # the value of $ which mcp-server-webcrawl { "mcpServers": { "webcrawl": { "command": "/path/to/mcp-server-webcrawl", "args": ["--crawler", "wget", "--datasrc", "/path/to/wget/archives/"] } } } # tested configurations (macOS Terminal/Windows WSL) # from /path/to/wget/archives/ as current working directory # --adjust-extension for file extensions, e.g. *.html $ wget --mirror https://example.com $ wget --mirror https://example.com --adjust-extension