Nanobot: Polygonal Mascot

    universal-lexer

    2.0.6 • Public • Published

    Universal Lexer

    Travis Code Climate Coverage Status NPM Downloads

    Lexer which can parse any text input to tokens, according to provided regular expressions.

    In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of tokens (strings with an assigned and thus identified meaning). A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer. A lexer is generally combined with a parser, which together analyze the syntax of programming languages, web pages, and so forth.

    Features

    • Allow named regular expressions, so you don't have to work with it a lot
    • Allow post-processing tokens, to get more information you require

    How to install

    Package is available as universal-lexer in NPM, so you can use it in your project using npm install universal-lexer or yarn add universal-lexer

    What are requirements?

    Code itself is written in ES6 and should work in Node.js 6+ environment. If you would like to use it in browser or older development, there is also transpiled and bundled (UMD) version included. You can use universal-lexer/browser in your requires or UniversalLexer in global environment (in browser):

    // Load library
    const UniversalLexer = require('universal-lexer/browser')
     
    // Create lexer
    const lexer = UniversalLexer.compile(definitions)
     
    // ...

    How it works

    You've got two sets of functions:

    // Load library
    const UniversalLexer = require('universal-lexer')
     
    // Build code for this lexer
    const code1 = UniversalLexer.build([ { type: 'Colon', value: ':' } ])
    const code2 = UniversalLexer.buildFromFile('json.yaml')
     
    // Compile dynamically a function which can be used
    const func1 = UniversalLexer.compile([ { type: 'Colon', value: ':' } ])
    const func2 = UniversalLexer.compileFromFile('json.yaml')

    There are two ways of passing rules to this lexer: from file or array of definitions.

    Pass as array of definitions

    Simply, pass definitions to lexer:

    // Load library
    const UniversalLexer = require('universal-lexer')
     
    // Create token definition
    const Colon = {
      type: 'Colon',
      value: ':'
    }
     
    // Build array of definitions
    const definitions = [ Colon ]
     
    // Create lexer
    const lexer = UniversalLexer.compile(definitions)

    A definition is more complex object:

    // Required fields: 'type' and either `regex` or `value`
    {
      // Token name
      type: 'String',
     
      // String value which should be searched on beginning on string
      value: 'abc',
      value: '(',
     
      // Regular expression to validate
      // if current token should be parsed as this token
      // Useful i.e. when you require separator after sentence,
      // but you don't want to include it.
      valid: '"',
     
      // Regular expression flags for 'valid' field
      validFlags: 'i',
     
      // Regular expression to find current token
      // You can use named groups as well (?<name>expression):
      // Then it will attach this information to token.
      regex: '"(?<value>([^"]|\\.)+)"',
     
      // Regular expression flags for 'regex' field
      regexFlags: 'i'
    }

    Pass YAML file

    // Load library
    const UniversalLexer = require('universal-lexer')
     
    const lexer = UniversalLexer.compileFromFile('scss.yaml')

    YAML file for now should contain only Tokens property with definitions. Later it may have more advanced stuff like macros (for simpler syntax).

    Example:

    Tokens:
      # Whitespaces 
     
      type: NewLine
        value: "\n"
     
      type: Space
        regex: '\t]+'
     
      # Math 
     
      type: Operator
        regex: '[+-*/]'
     
      # Color 
      # It has 'valid' field, to be sure that it's not i.e. blacker 
      # Now, it will check if there is no text after 
     
      type: Color
        regex: '(?<value>black|white)'
        valid: '(black|white)[^\w]'

    Processing data

    Processing input data, after you created a lexer is pretty straight-forward with for method:

    // Load library
    const UniversalLexer = require('universal-lexer')
     
    // Create lexer
    const tokenize = UniversalLexer.compileFromFile('scss.yaml')
     
    // Build processor
    const tokens = tokenize('some { background: code }').tokens

    Post-processing tokens

    If you would like to make more advanced parsing on parsed tokens, you can do it with addProcessor method:

    // Load library
    const UniversalLexer = require('universal-lexer')
     
    // Create lexer
    const tokenize = UniversalLexer.compileFromFile('scss.yaml')
     
    // That's 'Literal' definition:
    const Literal = {
      type: 'Literal',
      regex: '(?<value>([^\t \n;"'',{}()\[\]#=:~&\\]|(\\.))+)'
    }
     
    // Create processor which will replace all '\X' to 'X' in value
    function process (token) {
      if (token.type === 'Literal') {
        token.data.value = token.data.value.replace(/\\(.)/g, '$1')
      }
     
      return token
    }
     
    // Also, you can return a new token
    function process2 (token) {
      if (token.type !== 'Literal') {
        return token
      }
     
      return {
        type: 'Literal',
        data: {
          value: token.data.value.replace(/\\(.)/g, '$1')
        },
        start: token.start,
        end: token.end
      }
    }
     
    // Get all tokens...
    const tokens = tokenize('some { background: code }', process).tokens

    Beautified code

    If you would like to get beautified code of lexer, you can use second argument of compile functions:

    UniversalLexer.compile(definitions, true)
    UniversalLexer.compileFromFile('scss.yaml', true)

    Possible results

    On success you will retrieve simple object with array of tokens:

    {
      tokens: [
        { type: 'Whitespace', data: { value: '     ' }, start: 0, end: 5 },
        { type: 'Word', data: { value: 'some' }, start: 5, end: 9 }
      ]
    }

    When something is wrong you will get error information:

    {
      error: 'Unrecognized token',
      index: 1,
      line: 1,
      column: 2
    }

    Examples

    For now, you can see example of JSON semantics in examples/json.yaml file.

    CLI

    After installing globally (or inside of NPM scripts) universal-lexer command is available:

    Usage: universal-lexer [options] output.js
    
    Options:
      --version       Show version number                                  [boolean]
      -s, --source    Semantics file                                      [required]
      -b, --beautify  Should beautify code?                [boolean] [default: true]
      -h, --help      Show help                                            [boolean]
    
    Examples:
      universal-lexer -s json.yaml lexer.js  build lexer from semantics file
    

    Changelog

    Version 2

    • 2.0.6 - bugfix for single characters
    • 2.0.5 - fix mistake in README file (post-processing code)
    • 2.0.4 - remove unneeded benchmark dependency
    • 2.0.3 - add unit and E2E tests, fix small bugs
    • 2.0.2 - added CLI command
    • 2.0.1 - fix typo in README file
    • 2.0.0 - optimize it (even 10x faster) by expression analysis and some other things

    Version 1

    • 1.0.8 - change that current position in syntax error starts from 1 always
    • 1.0.7 - optimize definitions with "value", make syntax errors developer-friendly
    • 1.0.6 - optimized Lexer performance (20% faster in average)
    • 1.0.5 - fix browser version to be put into NPM package properly
    • 1.0.4 - bugfix for debugging
    • 1.0.3 - add proper sanitization for debug HTML
    • 1.0.2 - small fixes for README file
    • 1.0.1 - added Rollup.js support to build version for browser

    Install

    npm i universal-lexer

    DownloadsWeekly Downloads

    67

    Version

    2.0.6

    License

    MIT

    Unpacked Size

    108 kB

    Total Files

    24

    Last publish

    Collaborators

    • rangoo