brows
An easy to use application for consuming text content from any website in the command line. Uses CSS selectors to retrieve content.
Shows
basic usage,
importing, and
groups
Contents
Features
- Sensible defaults
- Saves targets and groups of targets for easy access
- Automatically uses a headless browser if necessary
- Retrieves content from any number of saved targets at a time
- Simple import/export for quickly saving and transferring many targets
- Doesn't make more requests (or open more browser pages) than it needs to
- Conventional environment variables take care of proxy if needed
Installation
npm install -g brows
Usage
brows can either be used with one URL followed by one selector, or any number of saved target names.
brows [options] <url> <selector>brows [options] <name> [<name> ...]
Options
Option | Alias | Description |
---|---|---|
--save <name> |
-s |
Save target or group for future use with given name |
--save-only <name> |
Save target or group and exit without retrieving content | |
--html |
-h |
Retrieve outer HTML instead of text content |
--all-matches |
-a |
Target all matching elements instead of just the first one |
--delim |
-d |
Set delimiter between results for -a, defaults to newline |
--force-browser |
-f |
Prevent request attempt and force browser launch |
--list-saved |
-l |
Print a list of all saved targets and groups |
--import <source> |
-i |
Import targets and groups from source file |
--export <target> |
-e |
Export all saved targets and groups to target file |
--ordered-print |
-o |
Print results in the order their targets were passed |
--verbose |
-v |
Print information about about what is being done |
--yes |
-y |
Accept confirmation prompts without displaying them |
--help |
Print a detailed explanation of usage and options |
Examples
Basic usage
By default, brows will retrieve the first matching HTML element's text content.
$ brows info.cern.ch/hypertext/WWW/TheProject.html h1World Wide Web
The --html
option can be used to retrieve its outer HTML instead.
$ brows -h info.cern.ch/hypertext/WWW/TheProject.html h1<h1>World Wide Web</h1>
--all-matches
will target all elements matching the selector. By default, results are separated by a newline.
$ brows -a todomvc.com/examples/react 'ul:first-of-type li'TutorialPhilosophySupportFlux architecture example
Options can be placed anywhere.
$ brows info.cern.ch/hypertext/WWW/TheProject.html h1 -v# ...Found h1 in response dataWorld Wide Web
Saving targets
Targets can be saved with a given name using ---save
or --save-only
. Content type preferences are saved as well.
$ brows --save-only listItems todomvc.com/examples/react 'ul:first-of-type li' -a -d ', '$ brows -s titleHtml info.cern.ch/hypertext/WWW/TheProject.html h1 -h<h1>World Wide Web</h1>
This name can then be used in future executions.
$ brows listItemsTutorial, Philosophy, Support, Flux architecture example
Multiple saved names can be used at a time.
$ brows titleHtml listItemstitleHtml: <h1>World Wide Web</h1>listItems: Tutorial, Philosophy, Support, Flux architecture example
Saving groups
Multiple saved targets can also be grouped under a different name.
$ brows 'google.com/search?q=weather' '#wob_ttm' --save-only temperature$ brows 'google.com/search?q=weather' '#wob_pp' --save-only precipitation$ brows temperature precipitation --save-only weather$ brows weathertemperature: 28precipitation: 64%
It's generally much faster to retrieve all desired content together rather than performing a separate run for each target.
Further grouping saved targets (and groups of targets) makes this easy to do for content you expect to retrieve frequently.
$ brows --save-only latestKurzgesagt 'youtube.com/user/Kurzgesagt/videos?sort=dd' '#video-title'$ brows --save-only availability https://amazon.com/How-Absurd-Scientific-Real-World-Problems/dp/0525537090 '#availability span'$ brows --save-only examples weather availability latestKurzgesagt titleHtml listItems
Results are printed as they are retrieved by default.
$ brows examplestitleHtml: <h1>World Wide Web</h1>listItems: Tutorial, Philosophy, Support, Flux architecture exampletemperature: 28precipitation: 64%latestKurzgesagt: Why Are You Alive – Life, Energy & ATPavailability: Temporarily out of stock.
Importing and exporting
--import
and --export
use a relative or absolute path.
$ brows -i /absolute/path/to/example.yaml$ brows -e readme_examples.yml
A default file name will be used if the provided path is a directory.
$ brows -e .$ lsbrows_exports.yml
brows will prompt for confirmation before overwriting anything by default.
$ brows -i .8 names match existing ones and would be overwritten: availability, precipitation, temperature, titleHtml, listItems, latestKurzgesagt, examples, weatherImport anyway? Y/N:
Overriding defaults
--yes
will accept any confirmation prompts which would have otherwise been displayed.
$ brows -i . -y
--delim
can be used to specify a different delimiter than the default newline for --all-matches
.
$ brows -a -d ', ' todomvc.com/examples/react 'ul:first-of-type li'Tutorial, Philosophy, Support, Flux architecture example
The --ordered-print
option can be used to wait for all results to be ready and print them in the order their targets were passed instead of printing each result as it's retrieved.
$ brows examples -otemperature: 28precipitation: 64%availability: Temporarily out of stock.latestKurzgesagt: Why Are You Alive – Life, Energy & ATPtitleHtml: <h1>World Wide Web</h1>listItems: Tutorial, Philosophy, Support, Flux architecture example
Browser requirements are handled automatically for the vast majority of use cases. The --force-browser
option will override this.
$ brows my-single-page-app.com html -h --force-browser > spa.html
Import/Export Format
The import/export format is based around creating, editing, and transferring any number of targets and groups as easily as possible:
- Uses easy to read and quick to type YAML format by default.
- Targets are listed under their URLs.
- Defaults don't need to be entered.
- If no other options are being entered, each target name can be directly mapped to its corresponding selector.
- As in the command line,
http://
is automatically prepended to the URL if it doesn't begin withhttp://
orhttps://
. - Groups can be entered as arrays of target names in any valid YAML format.
- You don't need to specify whether a browser is needed except for niche use cases.
Targets: example.com: myHeader: h1 mySpan: div span.my-span example2.com: myAnchors: selector: a contentType: outerHTML allMatches: trueGroups: myGroup: [myHeader, mySpan] anotherGroup: [mySpan, myAnchors]
is effectively the same as:
Targets: http://example.com: myHeader: selector: h1 contentType: textContent forceBrowser: false allMatches: false mySpan: selector: div span.my-span contentType: textContent forceBrowser: false allMatches: false http://example2.com: myAnchors: selector: a contentType: outerHTML forceBrowser: false allMatches: true delim: "\n"Groups: myGroup: - myHeader - mySpan anotherGroup: - mySpan - myAnchors
Targets and groups saved in the above examples are exported as:
Targets: google.com/search?q=weather: precipitation: forceBrowser: true selector: '#wob_pp' temperature: forceBrowser: true selector: '#wob_ttm' https://amazon.com/How-Absurd-Scientific-Real-World-Problems/dp/0525537090: availability: '#availability span' info.cern.ch/hypertext/WWW/TheProject.html: titleHtml: contentType: outerHTML selector: h1 todomvc.com/examples/react: listItems: allMatches: true delim: ', ' forceBrowser: true selector: ul:first-of-type li youtube.com/user/Kurzgesagt/videos?sort=dd: latestKurzgesagt: forceBrowser: true selector: '#video-title'Groups: examples: - temperature - precipitation - availability - latestKurzgesagt - titleHtml - listItems weather: - temperature - precipitation
Additional Details
- By default, brows will initially make a GET request to the URL and attempt to find the selector in the response HTML. If this fails, a headless browser will be used instead.
- If a saved target isn't found in the response data on the first attempt, it will be automatically updated to skip the unnecessary request in the future and directly launch the browser.
- When multiple saved names are passed, brows will only make a request (and/or navigate a browser page) to each URL once. All targets in the same URL will be retrieved from the same response data and/or browser page.
- Saving multiple targets with a new name will create a group. Groups are essentially just aliases which expand to their member targets in the order they were passed when saving.
- When saving or retrieving content from multiple overlapping groups, each individual target is only used once. No duplicates will be retrieved or saved under the new combined group.
- Conventional
HTTP_PROXY
/HTTPS_PROXY
/NO_PROXY
environment variables will be used if they exist. - Importing JSON files with the same structure as the YAML examples above is also supported without any additional configuration. Just pass a JSON file instead.