English | 简体中文
A tiny stealth-mode web search and content extraction library built on top of Puppeteer, inspired by EGOIST's local-web-search.
- 🔍 Multi-Engine Search - Support for Google, Bing, and Baidu search engines
- 📑 Content Extraction - Extract readable content from web pages using Readability
- 🚀 Concurrent Processing - Built-in queue system for efficient parallel processing
- 🛡️ Stealth Mode - Advanced browser fingerprint spoofing
- 📝 Markdown Output - Automatically converts HTML content to clean Markdown
- ⚡ Performance Optimized - Smart request interception for faster page loads
npm install @agent-infra/browser-search
# or
yarn add @agent-infra/browser-search
# or
pnpm add @agent-infra/browser-search
import { BrowserSearch } from '@agent-infra/browser-search';
import { ConsoleLogger } from '@agent-infra/logger';
// Create a logger (optional)
const logger = new ConsoleLogger('[BrowserSearch]');
// Initialize the search client
const browserSearch = new BrowserSearch({
logger,
browserOptions: {
headless: true,
},
});
// Perform a search
const results = await browserSearch.perform({
query: 'climate change solutions',
count: 5,
});
console.log(`Found ${results.length} results`);
results.forEach((result) => {
console.log(`Title: ${result.title}`);
console.log(`URL: ${result.url}`);
console.log(`Content preview: ${result.content.substring(0, 150)}...`);
});
constructor(config?: BrowserSearchConfig)
Configuration options:
interface BrowserSearchConfig {
logger?: Logger; // Custom logger
browser?: BrowserInterface; // Custom browser instance
}
Performs a search and extracts content from result pages.
async perform(options: BrowserSearchOptions): Promise<SearchResult[]>
Search options:
interface BrowserSearchOptions {
query: string | string[]; // Search query or array of queries
count?: number; // Maximum results to fetch
concurrency?: number; // Concurrent requests (default: 15)
excludeDomains?: string[]; // Domains to exclude
truncate?: number; // Truncate content length
browserOptions?: {
headless?: boolean; // Run in headless mode
proxy?: string; // Proxy server
executablePath?: string; // Custom browser path
profilePath?: string; // Browser profile path
};
}
interface SearchResult {
title: string; // Page title
url: string; // Page URL
content: string; // Extracted content in Markdown format
}
const results = await browserSearch.perform({
query: ['renewable energy', 'solar power technology'],
count: 10, // Will fetch approximately 5 results per query
concurrency: 5,
});
const results = await browserSearch.perform({
query: 'artificial intelligence',
excludeDomains: ['reddit.com', 'twitter.com', 'youtube.com'],
count: 10,
});
const results = await browserSearch.perform({
query: 'machine learning',
count: 5,
truncate: 1000, // Limit content to 1000 characters
});
import { ChromeBrowser } from '@agent-infra/browser';
import { BrowserSearch } from '@agent-infra/browser-search';
const browser = new ChromeBrowser({
// Custom browser configuration
});
const browserSearch = new BrowserSearch({
browser,
});
const results = await browserSearch.perform({
query: 'typescript best practices',
});
See examples.
Thanks to:
- EGOIST for creating a great AI chatbot product ChatWise from which we draw a lot of inspiration for local-browser based search.
- The puppeteer project which helps us operate the browser better.
Copyright (c) 2025 ByteDance, Inc. and its affiliates.
Licensed under the Apache License, Version 2.0.