sax-super-stream

2.0.0 • Public • Published

NPM version Build Status Dependency Status

sax-super-stream

Transform stream converting XML into object by applying hierarchy of element parsers. It's implemented using sax parser, which allows it to process large XML files in a memory efficient manner. It's very flexible: by configuring element parsers only for those elements, from which you need to extract data, you can avoid creating an intermediary representation of the entire XML structure.

Install

$ npm install --save sax-super-stream

Usage

Example below shows how to print the titles of the articles from RSS feed.

const PARSERS = {
  'rss': {
    'channel': {
      'item': {
        $: stream.object,
        'title': {
          $text(text, o) { o.title = text; }
        }
      }
    }
  }
};

const res = await fetch('http://blog.npmjs.org/rss');
const rssStream = res.body
  .pipeThrough(new TextDecoderStream())
  .pipeThrough(stream(PARSERS));

for await (const item of rssStream) {
  console.log('title: %s', item.title);
}

More examples can be found in Furkot GPX and KML importers.

API

stream(parserConfig[, options])

Create transform stream that reads XML and writes objects

  • parserConfig - contains hierarchical configuration of element parsers, each entry correspondes to the XML element tree, each value describes the action performed when an element is encountered during XML parsing

  • options - optional set of options passed to sax parser - defaults are as follows

    • trim - true
    • normalize - true
    • lowercase - false
    • xmlns - true
    • position - false
    • strictEntities - true
    • noscript - true

parserConfig

parserConfig is a hierarchical object that contains references to either parse functions or other parseConfig objects

parse function - function(xmlnode, object, context)

  • xmlnode - sax node with attributes
  • object - contains reference to the currently constructed object if any
  • context - provided to be used by parser functions, it can be used to store intermediatry data

this is bound to current parsed object stack

parse config reference - object

each propery of the object represents a direct child element of the parsed node in XML hierachy, special $ is a self reference

'item': parseItemFunction

is the same as:

'item': {
  '$': parseItemFunction
}

special values

  • $after - function(object, context) - called when element tag is closed, element content is parsed
  • $text - function(text, object, context) - called when element content is encountered
  • $uri - string - if specified it should match element namespace, otherwise element will be ignored, if $uri is not specified namespaces are ignored

predefined parsers

There are several predefined parser functions that can be used in parser config:

  • object(name) - creates a new object and optionally assigns it to parent's name property
  • collection(name) - creates a new Array and optionally assigns it to parent's name property
  • appendToCollection(name) - create a new object and append to Array stored in parent's name property, create a new Array if it does not exist yet
  • assignTo(name) - assign value to the parent's property name

License

MIT © Damian Krzeminski

Package Sidebar

Install

npm i sax-super-stream

Weekly Downloads

6

Version

2.0.0

License

MIT

Unpacked Size

10.5 kB

Total Files

6

Last publish

Collaborators

  • pirxpilot