taglex

0.1.10 • Public • Published

TagLex

Build Status

TagLex is library containing streaming Lexers and Parsers for processing custom mark-up languages. It makes writing a Markdown-like language parser very easy. It also facilitates writing parsers for strict data formats.

Installation

npm install taglex

Usage

Check out the Examples for an example full Markdown-esque language, or see below for mini-language examples.

Simple example

var taglex = require('taglex');
var sys = require('sys');
 
var ruleset = new taglex.TagRuleSet({ ignore_case: true });
ruleset.add_tag({
    name: 'italic',
    open: '*',
    close: '*',
    parents: ['root'],
    aliases: [['_', '_'], ['i:', ':i']],
    payload: {start: '<i>', finish: '</i>'}
});
 
var parser = ruleset.new_parser();
parser.on('tag_open', function (payload, token) {
    sys.print(payload.start);
});
 
parser.on('text_node', function (text) {
    sys.print(text.replace(/</g, "&lt;")); // escape
});
 
parser.on('tag_close', function (payload, token) {
    sys.print(payload.finish);
});
 
parser.write("This is an *example* of a small I:regular");
parser.write(" language:I");
 
// Would output:
// This is an <i>example</i> of a small <i>regular language</i>

Tag hierarchy example

TagLex is also capable of parsing context-free grammars:

var ruleset = new taglex.TagRuleSet();
ruleset.add_tag({
    name: 'table',
    open: '{{{', close: '}}}',
    ignore_text: true,
    parents: ['root'],
    payload: {start: '<table>', finish: '</table>'}
});
 
ruleset.add_tag({
    name: 'row',
    ignore_text: true,
    open: '[', close: ']',
    parents: ['table'],
    payload: {start: '<tr>', finish: '</tr>'}
});
 
ruleset.add_tag({
    name: 'cell',
    open: '[', close: ']',
    parents: ['row'],
    payload: {start: '<td>', finish: '</td>'}
});
 
// to make it context-free, a tag that can contain itself:
ruleset.add_tag({
    name: 'i',
    open: '[', close: ']',
    parents: ['i', 'cell'],
    payload: {start: '<i>', finish: '</i>'}
});
 
/* [... parser set up as before ...] */
 
parser.write("Outside the table I can freely use [] characters.\n");
parser.write("Here is a table example:\n{{{ (ignored text)");
parser.write("[ [ cell 1 ] [[[ cell 2 ]]] [ 3 ] ]\n");
parser.write("[ [ cell 4 ] [ cell 5 ] [ 6 ] ]");
parser.write("}}}");
 
// Would output (wrapped):
// Outside the table I can freely use [] characters.
// Here is a table example:
// <table><tr><td> cell 1 </td><td><i><i> cell 2 </i></i></td><td> 3 </td></tr>
// <tr><td> cell 4 </td><td> cell 5 </td><td> 6 </td></tr></table>

Speed

I haven't benchmarked it, or carefully looked at complexity, but to give you a broad idea of what to expect:

  • Compile step is at least O(n^2) both for memory and CPU, with n = number of tags.

  • Render step should be very fast as it relies on searching the input string by a single regular expression (per context). The slowest feature is the "stack collapse" feature.

Anti-features

  • TagRuleset aliases are counter-intuitive. Presently, they can be mixed and matched. Assume that in the future this will change, that opening with one alias can only close with that alias.

  • The "stack collapse" feature (enabled with the option to add_tag "force_close") sometimes splits TEXT_NODE emissions, typically this is a harmless bug. This feature in general is needlessly complex and could use a re-write.

  • Poor documentation: TagLex documentation could use a lot of work. In the mean time, check out examples and tests.js to see many more examples of what you can do.

  • Very large number of heavily interacting tags (e.g. where tag nesting is a complete graph, and sloppy tag closes apply everywhere, such as a fault-tolerant HTML parser) might mean a slow compile step and unnecessarily larger memory footprint (lots of n^2 operations)

Package Sidebar

Install

npm i taglex

Weekly Downloads

0

Version

0.1.10

License

LGPL-3.0

Last publish

Collaborators

  • michaelb