npm

Ready to take your JavaScript development to the next level? Meet npm Enterprise - the ultimate in enterprise JavaScript.Learn more »

hapi-goldwasher

1.0.4 • Public • Published

hapi-goldwasher

npm version Build Status Coverage Status Code Climate

Dependency Status devDependency Status

A plugin for hapi to run goldwasher as a scraping API on the web. Basically a scraper proxy that will return information in the selected format, defaulting to JSON.

Installation

npm install hapi-goldwasher

If you aren't already running a hapi server, you need to install this too, to run the example:

npm install hapi

Options

When registering the plugin with hapi, you have several options, non of them required:

  • path - the endpoint you mount the plugin on. Defaults to /goldwasher.
  • maxRedirects - the maximum number of redirects the scraper will accept before giving up. Defaults to 5.
  • cors - a CORS object. Defaults to false. See hapi docs for more information.
  • raw - enable raw output mode. This will enable output=raw that will return the raw, scraped result, usually HTML.

Parameters

  • url - url to scrape. Required.
  • selector - cheerio (jQuery) selector, a selection of target tags. Defaults to the default of goldwasher, usually 'h1, h2, h3, h4, h5, h6, p'.
  • search - only pick results containing these terms. Not case or special character sensitive.
  • limit - limit number of results.
  • output - output format (json, xml, atom, rss or - if enabled - raw).
  • filterTexts - stop texts that should be excluded.
  • filterKeywords - stop words that should be excluded as keywords.
  • filterLocale - stop words from external JSON file (see documentation on goldwasher)).

Example

var Hapi = require('hapi');
var HapiGoldwasher = require('./index');
 
var server = new Hapi.Server();
server.connection({ port: 7979 });
 
server.register({
  register: HapiGoldwasher,
  options: {
    path: '/goldwasher',
    cors: {
      origin: ['*']
    }
  }
}, function(err) {
  if (err) {
    throw err;
  }
 
  server.start(function() {
    console.log('Server running at: ' + server.info.uri);
  });
});

Go to the server uri and you will be presented with a JSON response containing documentation. I recommend using something like the Chrome JSON Formatter for readability.

install

npm i hapi-goldwasher

Downloadsweekly downloads

15

version

1.0.4

license

MIT

homepage

github.com

repository

Gitgithub

last publish

collaborators

  • avatar
Report a vulnerability