Mercury Parser - Extracting content from chaos
The Mercury Parser extracts the bits that humans care about from any URL you give it. That includes article content, titles, authors, published dates, excerpts, lead images, and more.
Mercury Parser powers the Mercury AMP Converter and Mercury Reader, a Chrome extension that removes ads and distractions, leaving only text and images for a beautiful reading view on any site.
Mercury Parser allows you to easily create custom parsers using simple JavaScript and CSS selectors. This allows you to proactively manage parsing and migration edge cases. There are many examples available along with documentation.
How? Like this.
Installation
# If you're using yarn yarn add @postlight/mercury-parser # If you're using npm npm install @postlight/mercury-parser
Usage
; Mercury; // NOTE: When used in the browser, you can omit the URL argument// and simply run `Mercury.parse()` to parse the current page.
The result looks like this:
If Mercury is unable to find a field, that field will return null
.
Mercury Parser also ships with a CLI, meaning you can use the Mercury Parser from your command line like so:
# Install Mercury globally yarn global add @postlight/mercury-parser# or npm -g install @postlight/mercury-parser # Then mercury-parser https://postlight.com/trackchanges/mercury-goes-open-source
License
Licensed under either of the below, at your preference:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
Contributing
For details on how to contribute to Mercury, including how to write a custom content extractor for any site, see CONTRIBUTING.md
Unless it is explicitly stated otherwise, any contribution intentionally submitted for inclusion in the work, as defined in the Apache-2.0 license, shall be dual licensed as above without any additional terms or conditions.