epegjs

0.0.55 • Public • Published

EPEG.js - Expressive Parsing Expression Grammar

A top down parser that can handle left recursion by using a stack and backtracking.

Typical PEG parser cannot handle left recursion. This project is an attempt to solve this problem by using a stack to detect recursion and backtrack when necessary. I used this paper as inspiration:

http://www.vpri.org/pdf/tr2007002_packrat.pdf

Indirect recursion is not implemented yet.

CokeScript is an example of an CoffeScript-like language implement with EPEG https://github.com/batiste/CokeScript/

EPEG.js is able to provide quite accurate grammar parsing error messages that are directly formatted.

Example of a valid grammar

var tokensDef = [
  {key:"number", reg:/^-?[0-9]+\.?[0-9]*/},
  {key:"operator", reg:/^[-|\+|\*|/|%]/},
  {key:"w", reg:/^[ ]/}
];
 
var grammarDef = {
  // You need to start you grammar with the START rule.
  "START": {rules: ["MATH EOF"]},
  // The End Of File token is added automatically by the token parser.
  "MATH": {rules: [
    "MATH w operator w number", // This rule is left recursive
    "number"
  ]}
};
 
var parser = EPEG.compileGrammar(grammarDef, tokensDef);
 
function valid(input) {
  var AST = parser.parse(stream);
  if(!AST.complete) {
    throw "Incomplete parsing"
  }
}
 
valid("1 + 1");
valid("1 + 1 - 4");

Public API

The EPEG module expose a public function that return a parser object:

var parser = EPEG.compileGrammar(grammar definition, tokens definition);
 
parser.parse(input);

This parse object only has the a parse method that return an Abstract Syntax Tree.

Other features

Tokenizer options

If a regexp is not enough to find a token you can use a function. The contract is that you need to return the matched string. This string has to be at the start of the input.

tokens = [
  {key:"isHello", func:function(input) { if(input == 'hello'){ return input; }} },
  {key:"w", reg:/^[ ]/},
  {key:"n", reg:/^[a-z]+/}
  {key:"n", str:"a string is acceptable too"}
];

A simple static string is also accepted.

Modifiers

Every item in a rule/token in the grammar can use the modifiers *, + and ?. They behave similarly as in regular expressions. E.g using the tokensDef above:

var grammarDef = {
  "REPEAT": {rules: ["number w"]}
  "START": {rules: ["REPEAT* EOF"]}
};
 
parser = EPEG.compileGrammar(grammarDef, tokensDef);
 
valid("1 2 3 ");
valid("");
valid("1"); // Should throw an error as the white space is missing

Named tokens and functions hooks

Tokens parsed in the rules can be named. Each rules can have a hook function defined. This function is called at parse time with a single parameter being the map of each named parameter. This map also contains all the matched tokens in order with $0, $1, etc.

 
function numberHook(params) {
  // We reject the white space params.ws here
  // it will not apear in the AST
  return [params.num1, params.num2];
  // Could also have been written
  return [params.$0, params.$2];
}
 
var grammarDef = {
  "NAMED": {rules: ["num1:number ws:w num2:number"], hooks: [numberHook]},
  "START": {rules: ["NAMED EOF"]}
};
 
parser = EPEG.compileGrammar(grammarDef, tokensDef);
 
valid("1 2");

Running the test

$ mocha

✓ Assert input complete bbb
✓ Assert input complete a
✓ Test createParams

100 passing (18ms)

Dependents (1)

Package Sidebar

Install

npm i epegjs

Weekly Downloads

6

Version

0.0.55

License

BSD

Last publish

Collaborators

  • batiste