EPEG.js - Expressive Parsing Expression Grammar
A top down parser that can handle left recursion by using a stack and backtracking.
Typical PEG parser cannot handle left recursion. This project is an attempt to solve this problem by using a stack to detect recursion and backtrack when necessary. I used this paper as inspiration:
http://www.vpri.org/pdf/tr2007002_packrat.pdf
Indirect recursion is not implemented yet.
CokeScript is an example of an CoffeScript-like language implement with EPEG https://github.com/batiste/CokeScript/
EPEG.js is able to provide quite accurate grammar parsing error messages that are directly formatted.
Example of a valid grammar
var tokensDef = key:"number" reg:/^-?[0-9]+\.?[0-9]*/ key:"operator" reg:/^[-|\+|\*|/|%]/ key:"w" reg:/^[ ]/; var grammarDef = // You need to start you grammar with the START rule. "START": rules: "MATH EOF" // The End Of File token is added automatically by the token parser. "MATH": rules: "MATH w operator w number" // This rule is left recursive "number" ; var parser = EPEG; { var AST = parser; if!ASTcomplete throw "Incomplete parsing" } ;;
Public API
The EPEG module expose a public function that return a parser object:
var parser = EPEG; parser;
This parse object only has the a parse method that return an Abstract Syntax Tree.
Other features
Tokenizer options
If a regexp is not enough to find a token you can use a function. The contract is that you need to return the matched string. This string has to be at the start of the input.
tokens = key:"isHello" { ifinput == 'hello' return input; } key:"w" reg:/^[ ]/ key:"n" reg:/^[a-z]+/ key:"n" str:"a string is acceptable too";
A simple static string is also accepted.
Modifiers
Every item in a rule/token in the grammar can use the modifiers *, + and ?. They behave similarly as in regular expressions. E.g using the tokensDef above:
var grammarDef = "REPEAT": rules: "number w" "START": rules: "REPEAT* EOF"; parser = EPEG; ;;; // Should throw an error as the white space is missing
Named tokens and functions hooks
Tokens parsed in the rules can be named. Each rules can have a hook function defined. This function is called at parse time with a single parameter being the map of each named parameter. This map also contains all the matched tokens in order with $0, $1, etc.
{ // We reject the white space params.ws here // it will not apear in the AST return paramsnum1 paramsnum2; // Could also have been written return params$0 params$2;} var grammarDef = "NAMED": rules: "num1:number ws:w num2:number" hooks: numberHook "START": rules: "NAMED EOF"; parser = EPEG; ;
Running the test
$ mocha
✓ Assert input complete bbb
✓ Assert input complete a
✓ Test createParams
100 passing (18ms)