HTML/XML parser and web scraper for NodeJS.


  • Uses native libxml C bindings

  • Clean promise-like interface

  • Supports CSS 3.0 and XPath 1.0 selector hybrids

  • Sizzle selectors, Slick selectors, and more

  • No large dependencies like jQuery, cheerio, or jsdom

  • Compose deep and complex data structures

  • HTML parser features

    • Fast parsing
    • Very fast searching
    • Small memory footprint
  • HTML DOM features

    • Load and search ajax content
    • DOM interaction and events
    • Execute embedded and remote scripts
    • Execute code in the DOM
  • HTTP request features

    • Logs urls, redirects, and errors
    • Cookie jar and custom cookies/headers/user agent
    • Login/form submission, session cookies, and basic auth
    • Single proxy or multiple proxies and handles proxy failure
    • Retries and redirect limits


var osmosis = require('osmosis');
.find('h1 + div a')
.find('header + div + div li > a')
.paginate('.totallink +')
.find('p > a')
    'title':        'section > h2',
    'description':  '#postingbody',
    'subcategory':  'div.breadbox > span[4]',
    'date':         'time@datetime',
    'latitude':     '#map@data-latitude',
    'longitude':    '#map@data-longitude',
    'images':       ['img@src']
.data(function(listing) {
    // do something with listing data 


