js-tokeniser

1.1.2 • Public • Published

tokeniser

Why another tokenizer?

I needed a tokenizer and tried some of the existing, but they missed some features I needed:

  • node-tokenizer doesn't return information about the token types it finds, only the text.
  • tokenizer-array returns this information but has a strange way of finding tokens: it searches by bisecting the text, but this doesn't find all tokens, at least in my case.

Since the task of creating a tokenizer doesn't seems too hard, I decided to create an own one - here it is.

Requirements

The only requirement is to use ES6. It makes code easier to read (in my opinion), you might need a translator (like Babel if you want to use it in some browsers or with older Node versions (time to update?), but it is simply the future.

Installation

npm i -S js-tokeniser

Usage

const tokeniser = require('js-tokeniser')
 
let result = tokeniser('// Test file\nName = Test\nAuthor = "Joachim Schirrmacher"',
  [
    {type: "comment", regex: /^\s*\/\/[^\n]*/},
    {type: "definition", regex: /^\s*(\w+)\s*=\s*((?:[^\n]+\\\n)*[^\n]+)/},
  ]
)
console.log(result)

prints the following output:

[ { type: 'comment', matches: [] },
  { type: 'definition', matches: [ 'Name', 'Test' ] },
  { type: 'definition', matches: [ 'Author', '"Joachim Schirrmacher"' ] }
]

As you see, you don't only get the three recognised tokens, but also the matches that are found. This makes it easy to handle a lot without defining separate tokens but instead use patterns (like in 'definition').

Updating from version 1.0.x

In previous versions, tokeniser returned the unfiltered result from String.match(), which contains an input and an index attribute as well as the whole match. These elements are stripped in version beginning at 1.1.0, so if you need this side-effect, be sure to modify your code.

Package Sidebar

Install

npm i js-tokeniser

Weekly Downloads

2

Version

1.1.2

License

MIT

Last publish

Collaborators

  • jschirrmacher