node package manager


Generic source code tokenizer


Generic source code tokenizer. WIP.


Via npm on Node:

npm install kodetokenizer


Reference in your program:

var kt = require('kodetokenizer');

Given a text, get its content as tokens:

var tokens = kt.getTokens("var myvar = 13;");

The result is an array of tokens, each one is a plain JavaScript object with:

  • value: text
  • type: a number, from kt.Types

The types are:

  • kt.Types.Word: a sequence of letters
  • kt.Types.Digits: a sequence of digits
  • kt.Types.WhiteSpace: a sequence of whitespace
  • kt.Types.NewLine: a new line: \n, \r\n or \r
  • kt.Types.Symbol: a sequence of symbol (not a letter, digit, whitespace, new line or separator)
  • kt.Types.Unknown
  • kt.Types.Separator: a character separator

The separators are "language dependend", so you must indicate them in an option object parameter, ie:

var tokens = kt.getTokens("myfun(1,2,3);", { separators: ['(', ')', '{', '}', ',', ';' ]);

You can add processors: functions that given an initial character, returns a token:

function stringProcessor(ch, text, position) { 
var tokens = kt.getTokens("myfun('foo', 'bar');", { processors: { '#': stringProcessor } });

The parameter ch is the detected character. position points to a character in text, the next unprocessed one.

The processor can return:

  • null: no token detected, so the tokenizer takes control again.
  • { position: anumber, token: atoken }: where position is the new unprocessed char position in text, and token is the token to be used

See test/string.js as an example of processor. Note that you can use;

var Types = kt.Types;
Types.String = ++Types.MaxValue;

to add your own token types.


git clone git://
cd KodeTokenizer
npm install
npm test




  • 0.0.1: Published




Feel free to file issues and submit pull requests � contributions are welcome

If you submit a pull request, please be sure to add or update corresponding test cases, and ensure that npm test continues to pass.