_(_) ____ _ _| |_ _____ ____ _ _ _____ ____ _ _/___)| |(_ _)| ___ | / _ || | | || ___ | / ___)| | | ||___ || | | |_ | ____|| |_| || |_| || ____|| | | |_| |(___/ |_| \__)|_____) \__ ||____/ |_____)|_| \__ ||_| (____/
A reactive framework for asynchronous web crawling.
sitequery is a reactive webcrawling framework that enables
web crawling through server-side execution of jQuery selectors.
uses rx.js to
model crawls as async sequence of pages that map to a async sequence of jQuery selected page elements.
sitequery requires a redis installation see: http://redis.io/download
[sudo] npm install sitequery
sitequery has two main abstractions
SiteQuery which provide the following features:
Allows you to crawl to a depth of n into a website
var SiteCrawl = SiteCrawl;// create a new SiteCrawl of depth 2 with a delay of 1s between next page and will only run for 10s// Note: Webcrawling is delayed and will not be executed// until Subscriptionvar siteCrawl = url:'' maxDepth:2 delay:1000 maxCrawlTime:10000;// ask for the observable sequence and subscribe for the CrawlResult(s)siteCrawl;
Execute jQuery selector to a depth of n on a website
var SiteQuery = SiteQuery;// create a new SiteQuery of depth 2 with a delay of 1s between next page crawl// selecting for `img` elements on each page// Note: Webcrawling is delayed and will not be executed// until Subscriptionvar siteQuery = url:'' maxDepth:2 delay:1000 'img';// ask for the observable sequence and subscribe for selected jQuery element(s)siteQuery;
Copyright (c) Loku. All rights reserved. The use and distribution terms for this software are covered by the Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php) which can be found in the file epl-v10.html at the root of this distribution. By using this software in any fashion, you are agreeing to be bound by the terms of this license. You must not remove this notice, or any other, from this software.