Top-down parser generator for CoffeeScript
MetaCoffee is a top-down compiler-compiler language written in CoffeeScript and itself. It is a rewrite and redesign of the original Alex Warth's OMetaJS.
MetaCoffee is a parser that works over a stream of anything. Thanks to do this, you can not only parse text but also parse data structures like trees and graphs.
To write code in MetaCoffee, the easiest approach is to use npm.
npm install -g metacoffee
(Leave off the -g if you don't wish to install globally.)
You can also clone this repository from github.
Once installed, you can use the metacoffee command
node_modules/metacoffee/bin/metacoffee) to compile
metacoffee dest/ src/my-parser.mc
For the pattern matching part, MetaCoffee combines OMetaJS syntax with PEG.js syntax, taking the best from both. The semantic actions are written in normal CoffeeScript, which is awesome on its own. Together with a pinch of white-space significance, this results in a beautiful and simple syntax:
ometa MultiplicativeInterpreterexpr = mulExprmulExpr = mulExpr:x "*" primExpr:y -> x * y| mulExpr:x "/" primExpr:y -> x / y| primExprprimExpr = "(" expr:x ")" -> x| numbernumber = "" digit:d -> valueOfDigit d=+digitconsolelog MultiplicativeInterpretermatchAll '((7 * 8) / (8 / 6))''expr'# 42
Yes, as in OMetaJS, MetaCoffee allows for parsers to be included anywhere in CoffeeScript. The second great thing about MetaCoffee is that it's object-oriented! The syntax is familiar:
ometa ArithmeticInterpreter extends MultiplicativeInterpreterexpr = addExpraddExpr = addExpr:x "+" mulExpr:y -> x + y| addExpr:x "-" mulExpr:y -> x - y| mulExprmulExpr = mulExpr:x "%" primExpr:y -> x % y| ^mulExprconsolelog ArithmeticInterpretermatchAll '((9 + 8) / (7 % 6))''expr'# 17
You can spot implicit overriding ( expr ), implicit inheritance ( primExpr ) and explicit super call ( ^mulExpr ). But inheritence is the evil sister to composition:
ometa FruitEvaluator extends ArithmeticInterpreternumber = FruitParserfruit| ^numberometa FruitParserfruit = "apple" -> 14| 93| "coconut" -> 1consolelog FruitEvaluatormatchAll '2 * apple + 3 * pear''expr'# Yes we can add apples and pears!# 307
Grammar consists of a list of rules, that are indented the same number of spaces more than the keyword ometa. Each rule has a unique name followed by optional parameters, equal sign and the body of the rule. These must come either after the rule name, or on next lines given they are more indented than the rule name. The parameters behave in the same way tokens do, and for now are really just a syntactic sugar - it doesn't matter whether they precede the equal sign or not.
Anything following the equal sign or indented at least as much as the equal sign is part of the rule's body. The rule's body consist of a single parsing expression.
There are several types of expressions, very similar to PEG.js. The basic ones are, from the loosest binding to the ones with highest precedence:
expression1 | expression2 | ... | expressionN
Tries to match the first expression, if it fails, tries the second one and so on. Returns the first successfully matched expressions's result. This behavior distinguishes parser expressions grammars, like MetaCoffee, from other types of grammars!
expression -> semanticAction
When the expressions matches, the semantic action is executed, and since it is the rightmost part of an expression, its return value becomes the match result of the whole expression. Semantic actions are written in CoffeeScript, they behave like function bodies and if they consist of more than one line, the following lines must be more indented than the "tip" of the arrow (->).
expression1 expression2 ... expressionN
Matches expression after expression, returning the last expression's match result.
Returns the result of recursively applying the given rule.
To be able to manipulate the match results of all subexpressions in given expressions, we label them with identifiers (these must be valid CoffeeScript variable identifiers).
The match results are assigned to these identifiers and can be used anywhere inside the rule body (in semantic actions and as arguments to other rules).
We can more succinctly express the number of occurences of expression, similar to regular expressions:
Matches as many occurences of expression as possible and returns an array of match results.
Matches one or more occurences of expression and returns an array of match results.
Always succeeds, matches expression and returns its result or returns undefined.
Sometimes we want to determine the behavior of our parser depending on the following input. MetaCoffee provides infinite positive or negative lookahead.
Succeeds when the expression would match next, but doesn't consume the matched input.
The opposite of positive lookahead, succeeds when the expression would fail next.
Runs the semantic action and succeeds if the semantic action returns true value.
Runs the semantic action and succeeds if the semantic action returns false value.
Semantic actions can be included anywhere as a subexpression without the need for matching or a return value like this:
This is extremely handy, for example, we can log our progress when running our grammar. If the action does return a value, we can label it and use as any other expression. Unlike the semantic actions following arrow, these are not indented to a certail column, so if they take up more than one line, it is advisable to start on the second line.
# With arrowexpression:e -> result = scramble econsolelog result# Inside curly bracesexpression:econsolelog escramble e:scr expression2 -> scr
The indentation rules in MetaCoffee might seem convulted, but in reality they simple match the desired behavior.
Applies the rule, passing in the results of semantic actions (this can be a simple identifier). Care must be taken when using function calls inside the actions, as CoffeeScript by default takes as argument anything until the end of line (in this case, until the right parenthesis). At the moment, this prepends the arguments at the beginning of the input stream for the rule to match them.
Invokes the rule on the Class, which must be a MetaCoffee grammar.
MetaCoffee comes with a large variety of useful built-in rules.
The most basic matching rule, succeeds if there is at least one element to be matched in the input and returns it. Anything is also the default when we don't specify the rule to be a applied to get the result for a label.
end # defined as !anything
Matches the end of the input. Can be applied arbitrary number of times.
Useful when parsing a linear structure (for example a string), returns the current position in the input from the beginning.
Matches if the next thing in input is a literal number/string (typeof "number"/"string") and returns it.
Matches if the next thing in input is a string of length 1 (a character) and returns it.
spacespaces # defined as space*
Matches a char/s with char code <= 32. This rule can be overriden to exclude or include other things as whitespace.
digitlowerupperletter # defined as lower | upperletterOrDigit # defined as letter | digit
Match one char and check its char code. Lower, upper and letter are only defined on the basic ASCII Latin alphabet.
These methods have actual function arguments. When calling them from our grammar, MetaCoffee takes care of using the arguments and not the input stream to pass them in.
Meta rule for MetaCoffee!! Applies a rule of this grammar with given name. This means, one can apply rules dynamicly based on the input!
Combination of apply and foreign rule invocation.
Compares the next thing in the input with the passed-in argument.