Routers News
Routers is a collection of web-crawlers for various popular technology news sources.
It exposes a command-line interface to these crawlers, allowing for the distinguishing tech-news enthusiast to avoid leaving the comfort of their terminal.
It Currently Supports:
Technology News Sources
- Ars Technica
- Wired.com
Major Technology Blogs
- TechCrunch
- Mashable
- Gizmodo
- Fast Company
- FastCo.Labs
Personal Technology Blogs
- Codes From The Underground, my blog
Mainstream News Sources
- New York Times
- USA Today
- L.A. Times
Other Random Stuff
- Github
- The Oatmeal
- xkcd
(this categorization is loose, please feel free to shuffle stuff around.)
It's Also An Experiment
It is my hope that, by open-sourcing a collection of news scrapers, a community can be built around building a powerful set of real-time news aggregation tools.
Installation
npm install routers-news -g
Usage
Listing News Sources
routers-news --sources
Outputs
Routers News Sources: news: major: NewYorkTimes: The New York Times Bits blog. LATimes: The business and culture of our digital lives, from the L.A. Times. USAToday: Power up with breaking news on personal technology, electronics, gaming and computers. tech: Wired.com: Wired magazine is a monthly US technology publication. ArsTechnica: Ars Technica is a technology news site catering to PC enthusiasts. TechCrunch: A network of technology-oriented blogs and other web properties. other: Github: Trending and featured repos on Github.com
Displaying Headlines
routers-news --source=github
Outputs
[1] MacLemon / CongressChecklist https://github.com/MacLemon/CongressChecklist [2] dejan / rails_panel https://github.com/dejan/rails_panel [3] feross / md5-password-cracker.js https://github.com/feross/md5-password-cracker.js [4] shadowsocks / shadowsocks-go https://github.com/shadowsocks/shadowsocks-go [5] bcoe / routers-news https://github.com/bcoe/routers-news [6] andrew / 24pullrequests https://github.com/andrew/24pullrequests [7] nkohari / jwalk https://github.com/nkohari/jwalk [8] lockitron / selfstarter https://github.com/lockitron/selfstarter [9] twitter / bower https://github.com/twitter/bower [10] Spaceman-Labs / SMPageControl https://github.com/Spaceman-Labs/SMPageControl
Loading Articles
routers-news --source=github --article=5
Outputs:
bcoe / routers-news: A crawler
The Crawlers
The news crawlers used by Routers come in two varieties:
- Page scrapers which use CSS selectors to extract content from news sources.
- RSS/Atom feed parsers, which crawl articles using an RSS or Atom news feed.
Examples of both can be found in the lib/sources directory.
Contributing
It's easy to add a new news source:
- fork the routers news repo.
- clone it locally.
- run npm install to install the libraries locally.
- create a new crawler in the lib/sources directory (everything in this hierarchy is automatically loaded).
- to test your crawler run: node ./bin/routers-news.js.
You can also help a ton by:
- reporting when crawlers are broken.
- extending on the crawelrs, I'd love to have:
- Dates.
- Authors.
- Better image extraction.
- improving on the CLI client.
Help make our dreams of a collaborative web-crawler a reality :)
Copyright
Copyright (c) 2012 Benjamin Coe and Joshua Hull and Gabriel Silk. See LICENSE.txt for further details.