noodlejs

noodle is a proxy server which serves for cross domain data extraction from web documents for any client.

noodle

noodle is a Node.js server and module for querying and scraping data from web documents. It features:

{
  "url": "https://github.com/explore",
  "selector": "ol.ranked-repositories h3 a",
  "extract": "href"
}
  • Cross domain document querying (html, json, xml, atom, rss feeds)
  • Server supports querying via JSONP and JSON POST
  • Multiple queries per request
  • Access to queried server headers
  • Allows for POSTing to web documents
  • In memory caching for query results and web documents

Setup

$ npm install noodlejs

or

$ git clone git@github.com:dharmafly/noodle.git
$ cd noodle
$ npm install

Start the server by running the binary

$ bin/noodle-server
Noodle node server started
├ process title  node-noodle
├ process pid    4739
└ server port    8888

You may specify a port number as an argument

$ bin/noodle-server 9090
Noodle node server started
├ process title  node-noodle
├ process pid    4739
└ server port    9090

If you are interested in the node module just run npm install noodlejs, require it and check out the noodle api

var noodle = require('noodlejs');
 
noodle.query({
  url:      'https://github.com/explore',
  selector: 'ol.ranked-repositories h3 a',
  extract:  'href'
})
.then(function (results) {
  console.log(results);
});

The noodle tests create a temporary server on port 8889 which the automated tests tell noodle to query against.

To run tests you can use the provided binary from the noodle package root directory:

$ cd noodle
$ bin/tests

Contributors and suggestions welcomed.