TokenizeThis
Quickstart
It turns a string into tokens.
var tokenizer = ;var str = 'Tokenize this!';var tokens = ;tokenizer;;
By default, it can tokenize math-based strings.
var tokenizer = ;var str = '5 + 6 -(4/2) + gcd(10, 5)';var tokens = ;tokenizer; ;
...Or SQL.
var tokenizer = ;var str = 'SELECT COUNT(id), 5+6 FROM `users` WHERE name = "shaun persad" AND hobby IS NULL';var tokens = ;tokenizer;;
Installation
npm install tokenize-this
.
// or if in the browser: <script src="tokenize-this/tokenize-this.min.js"></script>
Usage
require
it, create a new instance, then call tokenize
.
// var TokenizeThis = require('tokenize-this');// OR// var TokenizeThis = require('tokenize-this/tokenize-this.min.js'); // for node.js < 4.0// OR// <script src="tokenize-this/tokenize-this.min.js"></script> <!-- if in browser --> var tokenizer = ; var str = 'Hi!, I want to add 5+6';var tokens = ;tokenizer;;
Advanced Usage
Supplying a config object to the constructor
here for all options
SeeThis can be used to tokenize many forms of data, like JSON into key-value pairs.
var jsonConfig = shouldTokenize: '{' '}' '[' ']' shouldMatch: '"' shouldDelimitBy: ' ' "\n" "\r" "\t" ':' ',' convertLiterals: true;var tokenizer = jsonConfig;var str = '[{name:"Shaun Persad", id: 5}, { gender : null}]';var tokens = ;tokenizer;;
Here it is tokenizing XML like a boss.
var xmlConfig = shouldTokenize: '<?' '?>' '<!' '<' '</' '>' '/>' '=' shouldMatch: '"' shouldDelimitBy: ' ' "\n" "\r" "\t" convertLiterals: true;var tokenizer = xmlConfig;var str = `<?xml-stylesheet href="catalog.xsl" type="text/xsl"?><!DOCTYPE catalog SYSTEM "catalog.dtd"><catalog> <product description="Cardigan Sweater" product_image="cardigan.jpg"> <size description="Large" /> <color_swatch image="red_cardigan.jpg"> Red </color_swatch> </product></catalog> `;var tokens = ;tokenizer;;
The above examples are the first steps in writing parsers for those formats. The next would be parsing the stream of tokens based on the format-specific rules, e.g. SQL.
API
Methods
#tokenize(str:String, forEachToken:Function)
sends each token to the forEachToken(token:String, surroundedBy:String, index:Integer)
callback.
var tokenizer = ;var str = 'Tokenize "this"!'; var tokens = ;var indices = ;var { tokens; indices;}; tokenizer; ;;
it converts true
, false
, null
, and numbers into their literal versions.
var tokenizer = ;var str = 'true false null TRUE FALSE NULL 1 2 3.4 5.6789';var tokens = ;tokenizer;;
.defaultConfig:Object
The default config object used when no config is supplied.
var config = shouldTokenize: '(' ')' ',' '*' '/' '%' '+' '-' '=' '!=' '!' '<' '>' '<=' '>=' '^' shouldMatch: '"' "'" '`' shouldDelimitBy: ' ' "\n" "\r" "\t" convertLiterals: true escapeCharacter: "\\";;
You can change converting to literals with the convertLiterals
config option.
var config = convertLiterals: false;var tokenizer = config;var str = 'true false null TRUE FALSE NULL 1 2 3.4 5.6789';var tokens = ;tokenizer;;
Any strings surrounded by the quotes specified in the shouldMatch
option are treated as whole tokens.
var config = shouldMatch: '"' '`' '#';var tokenizer = config;var str = '"hi there" `this is a test` #of quotes#';var tokens = ;var tokensQuoted = ;tokenizer;;;
Quotes can be escaped via a backslash.
var tokenizer = ;var str = 'These are "\\"quotes\\""';var tokens = ;tokenizer; ;
The escape character can be specified with the escapeCharacter
option.
var config = escapeCharacter: '#';var tokenizer = config;var str = 'These are "#"quotes#""';var tokens = ;tokenizer;;