simple web crawler with node. experimenting with streams.
This example will start at wikipedia.org, and follow any link and print the contents of every p element on every leaf page to stdout
There are three kinds of routes, leaf routes, tree routes, and ignored routes.
Callbacks in the leaf and tree routes are passed a trumpet instance for the page that was just fetched, as well as the url that was matched.
Specifies how many requests can be running at the same time
Specifies which URL to start crawling at
Controls how links are enqueued for crawling. Two are included, wubwub.Backends.Simple() and wubwub.Backends.Redis() which implement put() and get(onLink) methods for links, and keep track of what URLs have already been seen. You can implement your own backend as long as it implements those methods. get() is async and takes a callback to be executed once the link is fetched.