xplanspider

0.0.5 • Public • Published

X Plan Spider

It's the spider framework only for xplan of www.cst.zju.edu.cn.

Attention: You should install rabbitMQ first.

Usage

var XPlanSpider = require("xplanspider");

Pioneer

Create a new pionner

var pioneer = new XPlanSpider.SpiderPioneer();

Set the page count function

pioneer.setGetPageCountFunc(function(urlWithPage, spider, cb) {
    // use spider(nodegrassex, refer to https://github.com/XadillaX/nodegrass)
    // to get the pageCount, and pass it to cb.
    var foo = [ 1, 2, 3 ];
    cb(foo);
});

Attention: urlWithPage is some string like "http://foo/bar?page=:page". You should write ":page" instead the true page.

Set the list parsing function

pioneer.setParseListFunc(function(status, html, respHeader) {
    // you should parse `html` and get the content url to an array.
    // return the array or return `false`
    var list = [ "http://foo/bar", "http://foo/and/bar" ];
    return list;
});

Set the url

You should pass all the url with ":page".

Eg. there're two types of the system. you should pass:

pioneer.addListPage("http://www.cst.zju.edu.cn/index.php?c=Index&a=tlist&catid=31&p=:page");
pioneer.addListPage("http://www.cst.zju.edu.cn/index.php?c=Index&a=tlist&catid=28&p=:page");

Start the service

pioneer.start("rabbitMQ connection string", "message queue router key", timeout);

Attention: timeout is the millisecond. Spider will start per timeout milliseconds.

Impl

TODO.

Contributors

  • XadillaX
  • Waiting for you

Readme

Keywords

none

Package Sidebar

Install

npm i xplanspider

Weekly Downloads

0

Version

0.0.5

License

BSD-2-Clause

Last publish

Collaborators

  • xadillax