Data extraction tools.
Set of tools for structured data extraction from web.
npm i xstruct --save
Example of how easy it is to extract, for example, comments from dou.ua forum.
var $ = ;return $;
Returns promise with downloaded and cheerio-wrapped HTML (optionally, if encoding is specified, document will be converted before passing it to cheerio). If qs (query string object) is specified, query string will be appended to url.
Returns promise with downloaded and parsed JSON. If qs (query string object) is specified, query string will be appended to url.
Returns promise with result of form posting. Activates cookie persistence.
Promised version of
request.js root function.
cheerio(cheerioElement) and returns result synchronously.
Takes text from object using path and cleans it by removing heading and trailing spaces, removing space and period repetitions, converting to single-line text if
options.singleline is specified, and also removing any characters from ones specified via
options.remove (if specified). Returns null if result is empty string or nothing.
cleanText, but casts result to number in the end. If result is not-a-number, returns null.
cleanText, but casts result to date in the end (using moment.js). If result is not a valid date, returns null. You can optionally specify date-time format via
Returns object as is or null if all its properties do not have value.
Exposes all functions from
Limits library to do at most
requests number of HTTP-requests per
period in milliseconds.
This library is built with heavy usage of
bluebird. Also it uses
util as additional utils.