xplanspider

0.0.5 • Public • Published

X Plan Spider

It's the spider framework only for xplan of www.cst.zju.edu.cn.

Attention: You should install rabbitMQ first.

Usage

var XPlanSpider = require("xplanspider");

Pioneer

Create a new pionner

var pioneer = new XPlanSpider.SpiderPioneer();

Set the page count function

pioneer.setGetPageCountFunc(function(urlWithPage, spider, cb) {
    // use spider(nodegrassex, refer to https://github.com/XadillaX/nodegrass)
    // to get the pageCount, and pass it to cb.
    var foo = [ 1, 2, 3 ];
    cb(foo);
});

Attention: urlWithPage is some string like "http://foo/bar?page=:page". You should write ":page" instead the true page.

Set the list parsing function

pioneer.setParseListFunc(function(status, html, respHeader) {
    // you should parse `html` and get the content url to an array.
    // return the array or return `false`
    var list = [ "http://foo/bar", "http://foo/and/bar" ];
    return list;
});

Set the url

You should pass all the url with ":page".

Eg. there're two types of the system. you should pass:

pioneer.addListPage("http://www.cst.zju.edu.cn/index.php?c=Index&a=tlist&catid=31&p=:page");
pioneer.addListPage("http://www.cst.zju.edu.cn/index.php?c=Index&a=tlist&catid=28&p=:page");

Start the service

pioneer.start("rabbitMQ connection string", "message queue router key", timeout);

Attention: timeout is the millisecond. Spider will start per timeout milliseconds.

Impl

TODO.

Contributors

  • XadillaX
  • Waiting for you

Dependencies (4)

Dev Dependencies (0)

    Package Sidebar

    Install

    npm i xplanspider

    Weekly Downloads

    1

    Version

    0.0.5

    License

    BSD-2-Clause

    Last publish

    Collaborators

    • xadillax