tagstream wraps node-expat in a through-stream so you can perform XML parsing as part of a pipeline using event-stream's various fun operators, e.g. map and join, and your own through-streams.

My use case was accepting terabytes of XML over fast networks, extracting objects, filtering them, putting them back into a text based format, then feeding them to clients over slow networks. Proper backpressure handling was compulsory.

  • cheerio can traverse your XML and hand you objects to deal with if your XML fits in memory.

  • sax' createStream lets you stream XML text in and streams the same XML text back out. Meanwhile, it emits the events tagstream would emit as data. If you adapt sax to do what tagstream does but publish the adaptation code as a library, the library will be pretty much exactly tagstream but with sax instead of node-expat.

  • JSONStream is somewhat along these lines, but the input is JSON and the output is, sensibly, parsed objects out of the JSON stream. Tagstream's input is XML and the output is XML parser events.

Drop a tagstream between your XML data text and whatever needs to deal with events the XML parser is throwing.

var request = require('request'),
    es = require('element-stream'),
    tagstream = require('tagstream');

  // through-streams transforming events into whatever...

See test/demo.js for one example, which extracts the text from between XML tags. If you replace input with a fast stream and output with a slow stream, you can feed in 1TB of XML and watch the output stream control how fast the input stream is read.

Each datum carried by the data event out of the tag stream is an array. Depending on the XML event, it'll be one of these:

  • { what: "start", tag: tagName, attrs: { ... } }
  • { what: "end", tag: tagName }
  • { what: "text", text: text }

npm install to ensure tap is present, then npm test.