rssSpider

Design and coding with all the love in the world by ShaneLau.

The simplest way to use rssspide to fetch rss list and site info.
Fetch post'content ,give clean view to you.
rss 爬虫，快速抓取站点信息和文章列表，文章的正文抓取

This project is base on feedparser and node-readability

Usage

npm install rssspider

Then:

var spide = require('rssspider');
var url = 'http://www.bigertech.com/rss';
spide.fetchRss(url).then(function(data){
        console.log(data); // rss  post list
});

API Documentation

1. `fetchRss(url,[options])`

get rss site'post list ,like this www.bigertech.com/rss

url : webiste'rss url
options :what data you need ? default value:

    ['title','description','summary','date','link','guid','author','comments','origlink','image','source','categories','enclosures']

response data Array

[{ title: '一个营销人员的自我修养',
  description: '<p></p>',
  summary: '</p>',
  date: Wed Oct 08 2014 17:14:26 GMT+0800 (CST),
  link: 'http://www.bigertech.com/learn-social-media-marketing/',
  guid: 'a623d78a-dae9-4915-9caa-0fd34fb3757c',
  author: '巴依老爷',
  comments: null,
  origlink: null,
  image: {},
  source: {},
  categories: [],
  enclosures: [] },
  ....  // more
    ]

2. `siteInfo(url,[options])`

get website info

url webiste'rss url
options what data you need ? default value:

['title','description','date','link','xmlurl','author','favicon','copyright','generator','image']

```

response data Array

{ title: '笔戈科技',
description: '简单、有趣、有价值',
date: Thu Oct 09 2014 18:15:14 GMT+0800 (CST),
link: 'http://www.bigertech.com/',
xmlurl: 'http://www.bigertech.com/rss/',
author: null,
favicon: null,
copyright: null,
generator: 'Ghost 0.5',
image: {},
feedurl: 'http://www.bigertech.com/rss' }

** 以下功能在 1.2.0 才能使用， readability 的库支持不是很好 **

3. `getCleanBody(url)`

Turn any web page into a clean view. This module is based on arc90's readability project.

html url or html code.
options is an optional options object
callback is the callback to run - callback(error, article, meta)

var url = 'http://www.bigertech.com/learn-social-media-marketing/';
spide.getCleanBody(url).then(function(article){
      console.log(article.content);   //clean code view
  });

More info node-readability

article.content is clean view

The article content of the web page. Return false if failed.

4. `getAllByUrl(url,[options])`

This method is similar to fetchRss

What'more ,it fetch the clean page content.

Turn any web page into a clean view. This module is based on arc90's readability project.

url website'rss url
Array respose data

get clean view code , Clean view content


[{ title: '一个营销人员的自我修养',
   content:'clean code view',     // clean code view
  description: '<p></p>',
  summary: '</p>',
  date: Wed Oct 08 2014 17:14:26 GMT+0800 (CST),
  link: 'http://www.bigertech.com/learn-social-media-marketing/',
  guid: 'a623d78a-dae9-4915-9caa-0fd34fb3757c',
  author: '巴依老爷',
  comments: null,
  origlink: null,
  image: {},
  source: {},
  categories: [],
  enclosures: [] },
    ....... // more
    ]

test 100%

nodeunit test/index.js

upgrade

Add node 4.x support

Any question shanelau

or
shanelau1021@gmail.com

rssspider

rssSpider

Usage

API Documentation

1. `fetchRss(url,[options])`

2. `siteInfo(url,[options])`

3. `getCleanBody(url)`

More info node-readability

article.content is clean view

4. `getAllByUrl(url,[options])`

What'more ,it fetch the clean page content.

test 100%

upgrade

Any question shanelau

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

Weekly Downloads

Version

License

Last publish

Collaborators

rssspider

rssSpider

Usage

API Documentation

1. fetchRss(url,[options])

2. siteInfo(url,[options])

3. getCleanBody(url)

More info node-readability

article.content is clean view

4. getAllByUrl(url,[options])

What'more ,it fetch the clean page content.

test 100%

upgrade

Any question shanelau

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

DownloadsWeekly Downloads

Version

License

Last publish

Collaborators

1. `fetchRss(url,[options])`

2. `siteInfo(url,[options])`

3. `getCleanBody(url)`

4. `getAllByUrl(url,[options])`

Weekly Downloads