This project implements an MCP-compliant client and server for communication between AI assistants and external tools.
fetch-mcp/
├── src/ # Source code directory
│ ├── lib/ # Library files
│ │ ├── fetchers/ # Web fetching implementation
│ │ │ ├── browser/ # Browser-based fetching
│ │ │ │ ├── BrowserFetcher.ts # Browser fetcher implementation
│ │ │ │ ├── BrowserInstance.ts # Browser instance management
│ │ │ │ └── PageOperations.ts # Page interaction operations
│ │ │ ├── node/ # Node.js-based fetching
│ │ │ └── common/ # Shared fetching utilities
│ │ ├── utils/ # Utility modules
│ │ │ ├── ChunkManager.ts # Content chunking
│ │ │ ├── ContentProcessor.ts # HTML to text conversion
│ │ │ ├── ContentExtractor.ts # Intelligent content extraction
│ │ │ ├── ContentSizeManager.ts # Content size limiting
│ │ │ └── ErrorHandler.ts # Error handling
│ │ ├── server/ # Server-related modules
│ │ │ ├── index.ts # Server entry
│ │ │ ├── browser.ts # Browser management
│ │ │ ├── fetcher.ts # Web fetching logic
│ │ │ ├── tools.ts # Tool registration and handling
│ │ │ ├── resources.ts # Resource handling
│ │ │ ├── prompts.ts # Prompt templates
│ │ │ └── types.ts # Server type definitions
│ │ ├── i18n/ # Internationalization support
│ │ └── types.ts # Common type definitions
│ ├── client.ts # MCP client implementation
│ └── mcp-server.ts # MCP server main entry
├── index.ts # Server entry point
├── tests/ # Test files
└── dist/ # Compiled files
The Model Context Protocol (MCP) defines two main transport methods:
- Standard Input/Output (Stdio): The client starts the MCP server as a child process, and they communicate through standard input (stdin) and standard output (stdout).
- Server-Sent Events (SSE): Used to pass messages between client and server.
This project implements the Standard Input/Output (Stdio) transport method.
- Implementation based on the official MCP SDK
- Support for Standard Input/Output (Stdio) transport
- Multiple web scraping methods (HTML, JSON, text, Markdown, plain text conversion)
- Intelligent mode switching: automatic switching between standard requests and browser mode
- Content size management: automatically splits large content into manageable chunks to solve AI model context size limitations
- Chunked content retrieval: ability to request specific chunks of large content while maintaining context continuity
- Detailed debug logging to stderr
- Bilingual internationalization (English and Chinese)
- Modular design for easy maintenance and extension
- Intelligent Content Extraction: Based on Mozilla's Readability library, capable of extracting meaningful content from web pages while filtering out advertisements and navigation elements
- Metadata Support: Ability to extract webpage metadata such as title, author, publication date, and site information
- Smart Content Detection: Automatically detects if a page contains meaningful content, filtering out login pages, error pages, and other pages without substantial content
- Browser Automation Enhancements: Support for page scrolling, cookie management, selector waiting, and other advanced browser interactions
To install Mult Fetch MCP Server for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install @lmcc-dev/mult-fetch-mcp-server --client claude
pnpm install
pnpm add -g @lmcc-dev/mult-fetch-mcp-server
Or run directly with npx (no installation required):
npx @lmcc-dev/mult-fetch-mcp-server
To integrate this tool with Claude desktop, you need to add server configuration:
-
MacOS:
~/Library/Application Support/Claude/claude_desktop_config.json
-
Windows:
%APPDATA%/Claude/claude_desktop_config.json
This method is the simplest, doesn't require specifying the full path, and is suitable for global installation or direct use with npx:
{
"mcpServers": {
"mult-fetch-mcp-server": {
"command": "npx",
"args": ["@lmcc-dev/mult-fetch-mcp-server"],
"env": {
"MCP_LANG": "en" // Set language to English, options: "zh" or "en"
}
}
}
}
If you need to use a specific installation location, you can specify the full path:
{
"mcpServers": {
"mult-fetch-mcp-server": {
"command": "path-to/bin/node",
"args": ["path-to/@lmcc-dev/mult-fetch-mcp-server/dist/index.js"],
"env": {
"MCP_LANG": "en" // Set language to English, options: "zh" or "en"
}
}
}
}
Please replace path-to/bin/node
with the path to the Node.js executable on your system, and replace path-to/@lmcc-dev/mult-fetch-mcp-server
with the actual path to this project.
Below is an example of using this tool in Claude desktop client:
The image shows how Claude can use the fetch tools to retrieve web content and process it according to your instructions.
After configuration, restart Claude desktop, and you can use the following tools in your conversation:
-
fetch_html
: Get HTML content of a webpage -
fetch_json
: Get JSON data -
fetch_txt
: Get plain text content -
fetch_markdown
: Get Markdown formatted content -
fetch_plaintext
: Get plain text content converted from HTML (strips HTML tags)
pnpm run build
pnpm run server
# or
node dist/index.js
# if globally installed, you can run directly
@lmcc-dev/mult-fetch-mcp-server
# or use npx
npx @lmcc-dev/mult-fetch-mcp-server
Note: The following client.js functionality is provided for demonstration and testing purposes only. When used with Claude or other AI assistants, the MCP server is driven by the AI, which manages the chunking process automatically.
The project includes a command-line client for testing and development purposes:
pnpm run client <method> <params_json>
# example
pnpm run client fetch_html '{"url": "https://example.com", "debug": true}'
When testing with the command-line client, you can use these parameters to demonstrate content chunking capabilities:
-
--all-chunks
: Command line flag to automatically fetch all chunks in sequence (demonstration purpose only) -
--max-chunks
: Command line flag to limit the maximum number of chunks to fetch (optional, default is 10)
The client.js demo tool provides real-time output capabilities:
node dist/src/client.js fetch_html '{"url":"https://example.com", "startCursor": 0, "contentSizeLimit": 500}' --all-chunks --debug
The demo client will automatically fetch all chunks in sequence and display them immediately, showcasing how large content can be processed in real-time.
# Run MCP functionality tests
npm run test:mcp
# Run mini4k.com website tests
npm run test:mini4k
# Run direct client call tests
npm run test:direct
This project supports Chinese and English bilingual internationalization. You can set the language using environment variables:
Set the MCP_LANG
environment variable to control the language:
# Set to English
export MCP_LANG=en
npm run server
# Set to Chinese
export MCP_LANG=zh
npm run server
# Windows system
set MCP_LANG=zh
npm run server
Using environment variables ensures that all related processes (including the MCP server) use the same language settings.
By default, the system will choose a language according to the following priority:
-
MCP_LANG
environment variable - Operating system language (if it starts with "zh", use Chinese)
- English (as the final fallback option)
This project follows the MCP protocol specification and does not output any logs by default to avoid interfering with JSON-RPC communication. Debug information is controlled through call parameters:
Set the debug: true
parameter when calling a tool:
{
"url": "https://example.com",
"debug": true
}
Debug messages are sent to the standard error stream (stderr) using the following format:
[MCP-SERVER] MCP server starting...
[CLIENT] Fetching URL: https://example.com
When debug mode is enabled, all debug messages are also written to a log file located at:
~/.mult-fetch-mcp-server/debug.log
This log file can be accessed through the MCP resources API:
// Access the debug log file
const result = await client.readResource({ uri: "file:///logs/debug" });
console.log(result.contents[0].text);
// Clear the debug log file
const clearResult = await client.readResource({ uri: "file:///logs/clear" });
console.log(clearResult.contents[0].text);
This tool supports various methods to configure proxy settings:
The most direct way is to specify the proxy in the request parameters:
{
"url": "https://example.com",
"proxy": "http://your-proxy-server:port",
"debug": true
}
The tool will automatically detect and use proxy settings from standard environment variables:
# Set proxy environment variables
export HTTP_PROXY=http://your-proxy-server:port
export HTTPS_PROXY=http://your-proxy-server:port
# Run the server
npm run server
The tool attempts to detect system proxy settings based on your operating system:
-
Windows: Reads proxy settings from environment variables using the
set
command -
macOS/Linux: Reads proxy settings from environment variables using the
env
command
If you're having issues with proxy detection:
- Use the
debug: true
parameter to see detailed logs about proxy detection - Explicitly specify the proxy using the
proxy
parameter - Ensure your proxy URL is in the correct format:
http://host:port
orhttps://host:port
- For websites that require browser capabilities, set
useBrowser: true
to use browser mode
When using browser mode (useBrowser: true
), the tool will:
- First try to use the explicitly specified proxy (if provided)
- Then try to use system proxy settings
- Finally, proceed without a proxy if none is found
Browser mode is particularly useful for websites that implement anti-scraping measures or require JavaScript execution.
This project handles parameters in the following ways:
- debug: Passed through call parameters, each request can individually control whether to enable debug output
- MCP_LANG: Retrieved from environment variables, controls the language settings of the entire server
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';
import path from 'path';
import { fileURLToPath } from 'url';
// Get the directory path of the current file
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
// Create client transport layer
const transport = new StdioClientTransport({
command: 'node',
args: [path.resolve(__dirname, 'dist/index.js')],
stderr: 'inherit',
env: {
...process.env // Pass all environment variables, including MCP_LANG
}
});
// Create client
const client = new Client({
name: "example-client",
version: "1.0.0"
});
// Connect to transport layer
await client.connect(transport);
// Use client
const result = await client.callTool({
name: 'fetch_html',
arguments: {
url: 'https://example.com',
debug: true // Control debug output through parameters
}
});
if (result.isError) {
console.error('Fetch failed:', result.content[0].text);
} else {
console.log('Fetch successful!');
console.log('Content preview:', result.content[0].text.substring(0, 500));
}
-
fetch_html
: Get HTML content of a webpage -
fetch_json
: Get JSON data -
fetch_txt
: Get plain text content -
fetch_markdown
: Get Markdown formatted content -
fetch_plaintext
: Get plain text content converted from HTML (strips HTML tags)
The server includes support for the resources/list and resources/read methods, but currently no resources are defined in the implementation. The resource system is designed to provide access to project files and documentation, but this feature is not fully implemented yet.
// Example: List available resources
const resourcesResult = await client.listResources({});
console.log('Available resources:', resourcesResult);
// Note: Currently this will return empty lists for resources and resourceTemplates
The server provides the following prompt templates:
-
fetch-website
: Get website content, supporting different formats and browser mode -
extract-content
: Extract specific content from a website, supporting CSS selectors and data type specification -
debug-fetch
: Debug website fetching issues, analyze possible causes and provide solutions
- Use
prompts/list
to get a list of available prompt templates - Use
prompts/get
to get specific prompt template content
// Example: List available prompt templates
const promptsResult = await client.listPrompts({});
console.log('Available prompts:', promptsResult);
// Example: Get website content prompt
const fetchPrompt = await client.getPrompt({
name: "fetch-website",
arguments: {
url: "https://example.com",
format: "html",
useBrowser: "false"
}
});
console.log('Fetch website prompt:', fetchPrompt);
// Example: Debug website fetching issues
const debugPrompt = await client.getPrompt({
name: "debug-fetch",
arguments: {
url: "https://example.com",
error: "Connection timeout"
}
});
console.log('Debug fetch prompt:', debugPrompt);
Each tool supports the following parameters:
-
url
: URL to fetch (required) -
headers
: Custom request headers (optional, default {}) -
proxy
: Proxy server URL in the format http://host:port or https://host:port (optional)
-
timeout
: Timeout in milliseconds (optional, default is 30000) -
maxRedirects
: Maximum number of redirects to follow (optional, default is 10) -
noDelay
: Whether to disable random delay between requests (optional, default is false) -
useSystemProxy
: Whether to use system proxy (optional, default is true)
-
enableContentSplitting
: Whether to split large content into chunks (optional, default is true) -
contentSizeLimit
: Maximum content size in bytes before splitting (optional, default is 50000) -
startCursor
: Starting cursor position in bytes for retrieving content from a specific position (optional, default is 0)
These parameters help manage large content that would exceed AI model context size limits, allowing you to retrieve web content in manageable chunks while maintaining the ability to process the complete information.
-
chunkId
: Unique identifier for a chunk set when content is split (used for requesting subsequent chunks)
When content is split into chunks, the response includes metadata that allows the AI to request subsequent chunks using the chunkId
and startCursor
parameters. The system uses byte-level chunk management to provide precise control over content retrieval, enabling seamless processing of content from any position.
-
useBrowser
: Whether to use browser mode (optional, default is false) -
useNodeFetch
: Whether to force using Node.js mode (optional, default is false, mutually exclusive withuseBrowser
) -
autoDetectMode
: Whether to automatically detect and switch to browser mode if standard mode fails with 403/Forbidden errors (optional, default is true). Set to false to strictly use the specified mode without automatic switching.
-
waitForSelector
: Selector to wait for in browser mode (optional, default is 'body') -
waitForTimeout
: Timeout to wait in browser mode in milliseconds (optional, default is 5000) -
scrollToBottom
: Whether to scroll to the bottom of the page in browser mode (optional, default is false) -
saveCookies
: Whether to save cookies in browser mode (optional, default is true) -
closeBrowser
: Whether to close the browser instance (optional, default is false)
-
extractContent
: Whether to use the Readability algorithm to extract main content (optional, default false) -
includeMetadata
: Whether to include metadata in the extracted content (optional, default false, only works whenextractContent
is true) -
fallbackToOriginal
: Whether to fall back to the original content when extraction fails (optional, default true, only works whenextractContent
is true)
-
debug
: Whether to enable debug output (optional, default false)
Use the content extraction feature to get the core content of a webpage, filtering out navigation bars, advertisements, sidebars, and other distracting elements:
{
"url": "https://example.com/article",
"extractContent": true,
"includeMetadata": true
}
The extracted content will include the following metadata (if available):
- Title
- Byline (author)
- Site name
- Excerpt
- Content length
- Readability flag (isReaderable)
To extract only the meaningful content from an article webpage:
{
"url": "https://example.com/news/article",
"extractContent": true,
"includeMetadata": true
}
For websites where content extraction might fail, you can use fallbackToOriginal
to ensure you get some content:
{
"url": "https://example.com/complex-layout",
"extractContent": true,
"fallbackToOriginal": true
}
To close the browser instance without performing any fetch operation:
{
"url": "about:blank",
"closeBrowser": true
}
The proxy is determined in the following order:
- Command line specified proxy
-
proxy
parameter in the request - Environment variables (if
useSystemProxy
is true) - Git configuration (if
useSystemProxy
is true)
If proxy
is set, useSystemProxy
will be automatically set to false.
When debug: true
is set, logs will be output to stderr with the following prefixes:
-
[MCP-SERVER]
: Logs from the MCP server -
[NODE-FETCH]
: Logs from the Node.js fetcher -
[BROWSER-FETCH]
: Logs from the browser fetcher -
[CLIENT]
: Logs from the client -
[TOOLS]
: Logs from the tool implementation -
[FETCHER]
: Logs from the main fetcher interface -
[CONTENT]
: Logs related to content handling -
[CONTENT-PROCESSOR]
: Logs from the HTML content processor -
[CONTENT-SIZE]
: Logs related to content size management -
[CHUNK-MANAGER]
: Logs related to content chunking operations -
[ERROR-HANDLER]
: Logs related to error handling -
[BROWSER-MANAGER]
: Logs from the browser instance manager -
[CONTENT-EXTRACTOR]
: Logs from the content extractor
MIT
Updated by lmcc-dev