hapi-goldwasher
A plugin for hapi to run goldwasher as a scraping API on the web. Basically a scraper proxy that will return information in the selected format, defaulting to JSON.
Installation
npm install hapi-goldwasher
If you aren't already running a hapi server, you need to install this too, to run the example:
npm install hapi
Options
When registering the plugin with hapi, you have several options, non of them required:
path
- the endpoint you mount the plugin on. Defaults to/goldwasher
.maxRedirects
- the maximum number of redirects the scraper will accept before giving up. Defaults to5
.cors
- a CORS object. Defaults tofalse
. See hapi docs for more information.raw
- enable raw output mode. This will enableoutput=raw
that will return the raw, scraped result, usually HTML.
Parameters
url
- url to scrape. Required.selector
- cheerio (jQuery) selector, a selection of target tags. Defaults to the default of goldwasher, usually'h1, h2, h3, h4, h5, h6, p'
.search
- only pick results containing these terms. Not case or special character sensitive.limit
- limit number of results.output
- output format (json
,xml
,atom
,rss
or - if enabled -raw
).filterTexts
- stop texts that should be excluded.filterKeywords
- stop words that should be excluded as keywords.filterLocale
- stop words from external JSON file (see documentation on goldwasher)).
Example
var Hapi = ;var HapiGoldwasher = ; var server = ;server; server;
Go to the server uri and you will be presented with a JSON response containing documentation. I recommend using something like the Chrome JSON Formatter for readability.