tokeniser
Why another tokenizer?
I needed a tokenizer and tried some of the existing, but they missed some features I needed:
- node-tokenizer doesn't return information about the token types it finds, only the text.
- tokenizer-array returns this information but has a strange way of finding tokens: it searches by bisecting the text, but this doesn't find all tokens, at least in my case.
Since the task of creating a tokenizer doesn't seems too hard, I decided to create an own one - here it is.
Requirements
The only requirement is to use ES6. It makes code easier to read (in my opinion), you might need a translator (like Babel if you want to use it in some browsers or with older Node versions (time to update?), but it is simply the future.
Installation
npm i -S js-tokeniser
Usage
const tokeniser = let result = console
prints the following output:
type: 'comment' matches: type: 'definition' matches: 'Name' 'Test' type: 'definition' matches: 'Author' '"Joachim Schirrmacher"'
As you see, you don't only get the three recognised tokens, but also the matches that are found. This makes it easy to handle a lot without defining separate tokens but instead use patterns (like in 'definition').
Updating from version 1.0.x
In previous versions, tokeniser returned the unfiltered result from String.match()
, which contains an input
and an index
attribute as well as the whole match.
These elements are stripped in version beginning at 1.1.0, so if you need this side-effect, be sure to modify your code.