Fiend
Fiend is still a work in progress. It's planned to be able to:
- Be used via different interfaces:
- API
- CLI
- Choose a processor for different types of websites:
- Static HTML
- Dynamic web apps
- Retrieve data in variable formats:
- Raw HTML
- Cheerio
- A list of links and assets from the page
- Use a queue broker to persist tasks and distribute load:
- RabbitMQ
- Redis
- Set resource restrictions:
- Concurrency limit
- Random or static delay
- Timeouts
- Retries
- Spoof User Agent:
- Custom static
- Random from a custom array
- Random from a predefined array of the most common ones
- Random from an automatically updated array of the most common ones
- Restrict requests to certain domains
- Detect and bypass CloudFlare or other protection
- Force requests to respect robots.txt
- Schedule tasks and requests
- Measure performance and memory consumption
- Log every important event
- Use proxies
- Cache responses
- Authenticate in a site and keep the state between requests
- Search a website for some info
- Monitor a page for changes