This project has the sole purpose of tokenizing JavaScript by traversing a given file (stringified) and generating an abstract syntax tree from the resulting tokenization process
JCT does not worry about loading the files or what to do with the files after pushing them into an abstract syntax tree it simply concerns itself with the process of generating an AST
Other projects utilise this projects and take the resulting abstract syntax tree and transpire the JavaScript accordingly
For example require to ecma takes the abstract syntax tree generated from this code and uses it to transpile code that contains the old require import system into the new es6 import system
usage
Install it
npm i javascript-compiling-tokenizer
Import it
import { LexicalAnalyzer, Generator } from 'javascript-compiling-tokenizer';
Initialize it
const tokenizer = new LexicalAnalyzer(options)
//options only have one property.... 'verbose' [boolean]
Its as simple as providing an array of functions. The functions are given the following arguments:
char:string, current:number, input:string
char is the string at the current position, current is the index of the current position in the input string and input is a stringified version of the whole file.
The function must return an object (IThirdPartyParsingResult) containing:
{
payload: {type: 'coolnew type', value: 'the value'} //the token
current: number //new cursor position after going through this function
}
Third party lexical checks are always performed first.
Transforming tokens into Javascript
new Generator().start(tokens)
This will return a string representation of the tokens provided. all you have to do is pipe it into a file.
Dev notes
The tokenizer will recurse in the following conditions:
if it finds an opening parenthesis (
if it finds an opening code block {
if it finds an opening array [
if it finds a declaration const, let, var
Examples
There are more examples of this being used on a react file, normal js file and a file containing the 'old' 'defines' method to import things in ./tests/beforeandafter
Gotchas
This tokenizer wont parse files correctly if variables are not declared properly. If you declare variables without a declaration statement such as const, var, let. Then this tokenizer is not for you.
Parser doesn't support comma separated declarations. let var1,var2,var3,var4; will not parse correctly.
Some regex
import*as_from'underscore';
import*ascolorsfrom'colors';
constCARRIAGE_RETURN=/\n/;
constEOL=/\r/;
constWHITESPACE=/\s/;
constNUMBERS=/[0-9]/;
constDECLARABLE_CHARACTERS=/[A-Za-z_.$]/i;
The AST
{
"tokens":[{
"type":"name",
"value":"import"
},{
"type":"operator",
"value":"*"
},{
"type":"name",
"value":"as"
},{
"type":"name",
"value":"_"
},{
"type":"name",
"value":"from"
},{
"type":"string",
"value":"underscore"
},{
"type":"statementSeperator",
"value":";"
},{
"type":"carriagereturn",
"value":1
},{
"type":"name",
"value":"import"
},{
"type":"operator",
"value":"*"
},{
"type":"name",
"value":"as"
},{
"type":"name",
"value":"colors"
},{
"type":"name",
"value":"from"
},{
"type":"string",
"value":"colors"
},{
"type":"statementSeperator",
"value":";"
},{
"type":"carriagereturn",
"value":2
},{
"type":"carriagereturn",
"value":3
},{
"type":"const",
"value":[{
"type":"name",
"value":"CARRIAGE_RETURN"
},{
"type":"assigner",
"value":"="
},{
"type":"assignee",
"value":"/\\n/"
}]
},{
"type":"carriagereturn",
"value":4
},{
"type":"const",
"value":[{
"type":"name",
"value":"EOL"
},{
"type":"assigner",
"value":"="
},{
"type":"assignee",
"value":"/\\r/"
}]
},{
"type":"carriagereturn",
"value":5
},{
"type":"const",
"value":[{
"type":"name",
"value":"WHITESPACE"
},{
"type":"assigner",
"value":"="
},{
"type":"assignee",
"value":"/\\s/"
}]
},{
"type":"carriagereturn",
"value":6
},{
"type":"const",
"value":[{
"type":"name",
"value":"NUMBERS"
},{
"type":"assigner",
"value":"="
},{
"type":"assignee",
"value":"/[0-9]/"
}]
},{
"type":"carriagereturn",
"value":7
},{
"type":"const",
"value":[{
"type":"name",
"value":"DECLARABLE_CHARACTERS"
},{
"type":"assigner",
"value":"="
},{
"type":"assignee",
"value":"/[A-Za-z_.$]/i"
}]
}],
"current":211
}
The generated code
(its the same...)
import*as_from'underscore';
import*ascolorsfrom'colors';
constCARRIAGE_RETURN=/\n/;
constEOL=/\r/;
constWHITESPACE=/\s/;
constNUMBERS=/[0-9]/;
constDECLARABLE_CHARACTERS=/[A-Za-z_.$]/i;
Trixie code
if(true===false&&!!(false>=true)||false!=true)
{
let result =(((5*5)%5)/5);
console.log('test',()=>{return/** random inline comment */!!true})
//inline multichar ternary/comparator/operator test
console.log('test',()=>{return/** random inline comment */!!true})
//inline multichar ternary/comparator/operator test
}
Some string literals
conststring=`something ${1+2+3}`;
conststring2=`something ${(true)?'x':'y'}`;
conststring3=`something
another ${'x'}
new line ${1+2+3}
test`;
the AST
{
{
"tokens":[{
"type":"carriagereturn",
"value":1
},{
"type":"const",
"value":[{
"type":"name",
"value":"string"
},{
"type":"assigner",
"value":"="
},{
"type":"stringLiteral",
"value":"something ${1 + 2 + 3}"
}]
},{
"type":"carriagereturn",
"value":2
},{
"type":"const",
"value":[{
"type":"name",
"value":"string"
},{
"type":"number",
"value":"2"
},{
"type":"assigner",
"value":"="
},{
"type":"stringLiteral",
"value":"something ${(true) ? 'x' : 'y'}"
}]
},{
"type":"carriagereturn",
"value":3
},{
"type":"const",
"value":[{
"type":"name",
"value":"string"
},{
"type":"number",
"value":"3"
},{
"type":"assigner",
"value":"="
},{
"type":"stringLiteral",
"value":"something\nanother ${'x'}\nnew line ${1 + 2 + 3}\ntest"
}]
}],
"current":163
}
A class
exportdefaultclass{
constructor(){
}
randomFn(){
return'this is awesome'
}
}
The AST for that class
{
"tokens":[{
"type":"name",
"value":"export"
},{
"type":"name",
"value":"default"
},{
"type":"name",
"value":"class"
},{
"type":"codeblock",
"value":[{
"type":"carriagereturn",
"value":1
},{
"type":"name",
"value":"constructor"
},{
"type":"params",
"value":[]
},{
"type":"codeblock",
"value":[{
"type":"carriagereturn",
"value":2
},{
"type":"carriagereturn",
"value":3
}]
},{
"type":"carriagereturn",
"value":4
},{
"type":"name",
"value":"randomFn"
},{
"type":"params",
"value":[]
},{
"type":"codeblock",
"value":[{
"type":"carriagereturn",
"value":5
},{
"type":"name",
"value":"return"
},{
"type":"string",
"value":"this is awesome"
},{
"type":"carriagereturn",
"value":6
}]
},{
"type":"carriagereturn",
"value":7
}]
}],
"current":107
}
React app file
Note, you can obviously make it a bit smarter for complex frameworks by injecting your own lexer functions. They will be called first and can tell the rest of the app where to continue from once your magical function has done its magic