atok-parser

    0.4.4 • Public • Published

    Parser builder

    Synopsis

    Writing parsers is quite a common but sometimes lengthy task. To ease this process atok-parser leverages the atok tokenizer and performs the basic steps to set up a streaming parser, such as:

    • Automatically instantiate a tokenizer with provided options
    • Provide a mechanism to locate an error in the input data
      • track([Boolean]): keep track of the line and column positions to be used when building errors. Note that when set, tracking incurs a performance penalty.
    • Proxy basic node.js streaming methods: write(), end(), pause() and resume()
    • Proxy basic node.js streaming events (note that [data] and [end] are not automatically proxied) and some of atok
      • [drain]
      • [debug]
    • Provide preset variables within the parser constructor
      • atok {Object}: atok tokenizer instance
      • self {Object}: this
    • Provide helpers that simplify parsing rules (see below for description)
      • whitespace()
      • number()
      • float()
      • word()
      • string()
      • utf8()
      • chunk()
      • stringList()
      • match()
      • noop()
      • wait()

    Download

    It is published on node package manager (npm). To install, do:

    npm install atok-parser
    

    Usage

    A silly example to illustrate the various pre defined variables and parser definition. It parses a flot number and returns the value via its #parse method.

    function myParser (options) {
        function handler (num) {
            // The options are set from the myParser function parameters
            // self is already set to the Parser instance
            if ( options.check && !isFinite(num) )
                return self.emit('error', new Error('Invalid float: ' + num))
     
            self.emit('data', num)
        }
        // the float() and whitespace() helpers are provided by atok-parser
        atok.float(handler)
        atok.whitespace()
    }
     
    var Parser = require('..').createParser(myParser)
     
    // Add the #parse() method to the Parser
    Parser.prototype.parse = function (data) {
        var res
     
        // One (silly) way to make parse() look synchronous...
        this.once('data', function (data) {
            res = data
        })
        this.write(data)
     
        // ...write() is synchronous
        return res
    }
     
    // Instantiate a parser
    var p = new Parser({ check: true })
     
    // Parse a valid float
    var validfloat = p.parse('123.456 ')
    console.log('parsed data is of type', typeof validfloat, 'value', validfloat)
     
    // The following data will produce an invalid float and an error
    p.on('error', console.error)
    var invalidfloat = p.parse('123.456e1234 ')

    Methods

    • createParserFromFile(file[, parserOptions, parserEvents, atokOptions]): return a parser class (Function) based on the input file.

      • file {String}: file to read the parser from (.js extension is optional)
      • parserOptions {String}: coma separated list of parser options
      • parserEvents {Object}: events emitted by the parser with their arguments count
      • atokOptions {Object}: tokenizer options

      The following variables are made available to the parser javascript code:

      • atok {_Object_}: atok tokenizer instanciated with provided options. Also set as this.atok DO NOT DELETE
      • self {_Object_}: reference to this

      Predefined methods:

      • write(data)
      • end([data])
      • pause()
      • resume()
      • debug([logger (_Function_)])
      • track(flag (_Boolean_))

      Events automatically forwarded from tokenizer to parser:

      • drain
      • debug
    • createParser(data[, parserOptions, parserEvents, atokOptions]): same as createParserFromFile() but with supplied content instead of a file name

      • data {String | Array | Function}: the content to be used, can also be an array of strings or a function. If a function, its parameters are used as parser options unless parserOptions is set

    Helpers

    Helpers are a set of standard Atok rules organized to match a specific type of data. If the data is encountered, the handler is fired with the results. If not, the rule is ignored. The behaviour of a single helper is the same as a single Atok rule:

    • go to the next rule if no match, unless continue(jump, jumpOnFail) was applied to the helper
    • go back to the first rule of the rule set upon match, unless continue(jump) was applied to the helper
    • next rule set can be set using next(ruleSetId)
    • rules can be jumped around by using continue(jump, jumpOnFail). A helper has exactly the size of a single rule, which greatly helps defining complex rules.
    // Parse a whitespace separated list of floats
    var myParser = [
        'atok.float(function (n) { self.emit("data", n) })'
    , 'atok.continue(-1, -2)'
    , 'atok.whitespace()'
    ]
     
    var Parser = require('atok-parser').createParser(myParser)
    var p = new Parser
     
    p.on('data', function (num) {
        console.log(typeof num, num)
    })
    p.end('0.133  0.255')

    Arguments are not required. If no handler is specified, the [data] event will be emitted with the corresponding data.

    • whitespace(handler): ignore consecutive spaces, tabs, line breaks.
      • handler(whitespace)
    • number(handler): process positive integers
      • handler(num)
    • float(handler): process float numbers. NB. the result can be an invalid float (NaN or Infinity).
      • handler(floatNumber)
    • word(handler): process a word containing letters, digits and underscores
      • handler(word)
    • string([start, end, esc,] handler): process a delimited string. If end is not supplied, it is set to start.
      • start {String}: starting pattern (default=")
      • end {String}: ending pattern (default=")
      • esc {String}: escape character (default=)
      • handler(string)
    • utf8([start, end,] handler): process a delimited string containing UTF-8 encoded characters. If end is not supplied, it is set to start.
      • start {String}: starting pattern (default=")
      • end {String}: ending pattern (default=")
      • handler(UTF-8String)
    • chunk(charSet, handler):
      • charSet {Object}: object defining the charsets to be used as matching characters e.g. { start: 'aA', end 'zZ' } matches all letters
      • handler(chunk)
    • stringList([start, end, separator,] handler): process a delimited list of strings
      • start {String}: starting pattern (default=()
      • end {String}: ending pattern (default=))
      • separator {String}: separator character (default=,)
      • handler(listOfStrings)
    • match(start, end, stringQuotes, handler): find a matching pattern (e.g. bracket matching), skipping string content if required
      • start {String}: starting pattern to look for
      • end {String}: ending pattern to look for
      • stringQuotes {Array}: array of string delimiters (default=['"', "'"]). Use an empty array to disable string content processing
      • handler(token)
    • noop(next): passthrough - does not do anything except applying given properties (useful to branch rules without having to use atok#saveRuleSet() and atok#loadRuleSet())
      • next {String}: next ruleset to load
    • wait(atokPattern[...atokPattern], handler): wait for the given pattern. Nothing happens until data is received that triggers the pattern. Must be preceded by continue() to properly work. Typical usage is when expecting a string the starting quote is received but not the end... so wait until then and resume the rules workflow.
    • nvp([nameCharSet, separator, endPattern] handler): parse a named value pair (default nameCharSet={ start: 'aA0_', end: 'zZ9_' }, separator==, endPattern={ firstOf: ' \t\n\r' }). Disable endPattern by setting it to '' or [].
      • handler(name, value)

    Examples

    A set of examples are located under the examples/ directory.

    Install

    npm i atok-parser

    DownloadsWeekly Downloads

    1

    Version

    0.4.4

    License

    none

    Last publish

    Collaborators

    • pierrec