Turns web pages into rss feeds
After the shutdown of Yahoo pipes and the Kimono Labs I got tired of finding yet another rss scraper online. Instead I figured I can write a server of my own in a weekend and host it on my own instance.
This sever is really pretty simple. Grab some elements of a website, compare to what you got before and then server that content as rss feed:
- Crawl multiple websites in parallel
- Configure using json file
- Use CSS selectors from cheerio/jquery
- Persist feed data either on the file system or keep it in memory as long as the server runs
npm install rssify
If you want to have the server run as a daemon I recommend using a 3rd party tool such as initd-forever
With node installed just go to the project folder and run
node .. If you installed the library globally you should
have a new executable available called
There's only one (optional) argument the script accepts and that is the location of config file. Where ever you decide to put your config, make sure it ends with .json, otherwise the program will not know how to parse it.
The configuration is stored in a file called config.json in the project directory and consists of 2 basic elements. The global element holds certain properties that are used by the environment, which are:
debug Wether you want the server to print a few message or want it to shut up
storage Right now supports "file" or "meme"
path Used with file storage, configures the location on disk (default = /feeds)
port The port the server is going to listen on for incoming requests
host The hostname used when generating the rss feed, that a reader can link back to (defaults to http://localhost:10001)
Other elements of the global config will be applied to each of the feed configs.
Feed configs are defined by their feed name as property and the configuration object:
url The address where the server should check for updates
interval The interval in minutes between crawling a web page again
cooldown How long to wait (in minutes) after a new entry has been found until we look again
size The maximum number of items that are reporting on this feed
validate An array of field names that will be checked to determine if content has changed/updated
fields An array of field configurations. See below for more info.
Fields are mapped directly to the rss item properties. The fields are used to define where to grab content from and potentially transform it:
field Name of the field as it will appear in the rss feed
selector a cheerio/jquery selector. If multiple elements are selected, they will be concatenated.
attr The attribute of the selected element to use ("text" and "html" are special values that will return the
content). This field is only evaluated if a selector has been set.
format A standard util.format string that gets the content from selected + attribute passed in as string. Will
only be evaluated if a selector has been set and is applied to each individual element if a selector returns more than
content Used to set a static string as content. If any of the other methods produces a string, this value will be
To see an example just take a look at the config.
Getting Feedly to work
Feedly needs some extra love to understand this feed. To help it along the way you can use feedburner to host a compatible version.