Marklit modern markdown parser in TypeScript
WARNING: Ready for use with exceptions (missing HTML parsing rules)
Originally this project is deeply re-engineered fork of marked with conceptual differences.
Design goals
- Deep customizability
- Compile-time configuration
- Compact code size
Key features
- Parsing result is abstract document tree what allows advanced intermediate processing and non-string rendering
- Extensible architecture which allows adding new parsing rules and document tree element types
- Strictly typed design which allows to use full power of typescript to avoid runtime errors
- Progressive parser core implementation what gives maximum possible parsing speed
HTML support
The HTML doesn't supported at the moment, but it will be added in future.
Usage tips
Basic setup
Basically you need to do several things to get type-safe markdown parser and renderer.
Define types
First, you need define some types:
- Meta-data type
- Block token types map
- Inline token types map
- Context mapping type
See example below:
;
Init parser
Next, you can initialize parser:
; // initialize parser using normal parsing rules; // parse markdown to get abstract document tree;
...and renderer:
; // initialize renderer using basic HTML render rules; // render abstract document tree to get HTML string;
All together
The example below shows complete configuration:
; // initialize parser using normal parsing rules; // initialize renderer using basic HTML render rules; // parse markdown to get abstract document tree; // render abstract document tree to get HTML string;
Github-flavored markdown
The next example shows configuration which uses GFM rules instead of normal:
; // initialize parser using normal parsing rules; // initialize renderer using basic HTML render rules; // parse markdown to get abstract document tree; // render abstract document tree to get HTML string;
Using extensions
The programming design of marklit allows to modify the behavior of any rules in order to extend parser and renderer.
An extensions includes rules and rule modifiers which allows deep customization.
Writing rulesets
You can override existing rules in rulesets like BlockNormal
, InlineNormal
, BlockGfm
by appending modified rules.
Or you can create your own rulesets using existing or new rules.
The topics below shows how to customize behavior by using extensions:
GFM breaks
You can extend normal text rule to GFM or GFM with breaks:
; ;;
Or simply use existing GFM text rules:
import { GfmTextSpan, GfmBreaksTextSpan} from 'marklit';
SmartyPants
You can add smartypants support to any text rule like this:
; ;; // Custom SmartypantsTextSpan rule overrides default TextSpan rule which comes from InlineNormal ruleset; // Custom SmartypantsGfmTextSpan rule overrides default GfmTextSpan rule which comes from InlineGfm ruleset;
Math extension
The mathematic extension includes two rules:
- Inline math enclosed by
$
signs like an inline code (MathSpan
) - Block math enclosed by series of
$
signs like a fenced code blocks (MathBlock
)
You can use one of this rules or both together.
; // inline token with math // block token with math // inline context with math // append math rules to normal rules; // append math rules to normal rules; ; ;
Abbreviations
The abbrevs extension consists of three parts:
- Block rule (
AbbrevBlock
) - Text rule modifier (
abbrevText
) - Inline rule (
Abbrev
)
Usually you need first two rules to get automatic abbreviations. The third rule adds extra forced abbreviations into inline context.
; // inline token with abbrev // normal block token // inline context with abbrev // append abbrev rules to normal rules; // append abbrev rules to normal rules; ; ;
Footnotes
The footnotes extension includes two rules:
- Inline footnote reference rule (
Footnote
) - Block footnotes block rule (
FootnotesBlock
)
You need use both rules to get working footnotes:
; // inline token with footnote refs // block token with footnotes list // inline context with footnotes // append footnote rules to normal rules; // append footnote rules to normal rules; ; ;
Inline footnotes
TODO:
Table of contents
TODO:
Basic ideas
Abstract document tree
Traditionally markdown parsers generates HTML as result. This is simple but not so useful in most advanced usecases. By example, when you need intermediate processing or direct rendering to DOM tree, the ADT is much more conveniently.
The marklit ADT is a JSON tree of block and inline elements called tokens.
Each token is a simple object with $
field as tag, optional _
field with list of sub-tokens, and optionally several other token type specific fields which called properties.
TODO: Document tree examples.
Extensibility
The architecture of marked does not allows you to add new rules. You may only modify regexps of existing rules and write your own string-based renderer.
The marked-ast partially solves the problem of renderer but still doesn't allows add new rules.
The simple-markdown from Khan academy have good extensibility but it is not so fast parser as marked.
Because one of important goal of this project is parsing speed it required solution, which gives extensibility without worsening of speed.
Type-safety
As conceived the abstract document tree must be is strictly typed itself. But because TypeScript doesn't yet support circular type referencing, the token type infering cannot be implemented now. So you need a little bit of handwork with types here.
Speedup parsing
The marked iterates over matching regexps for each rules until first match occured. It's not so fast as it can be because JS engine does multiple matching for multiple regexps.
The marklit constructs single regexp using all rules to do matching for all rules at once. This technique moves workload from JS side to embedded RegExp engine.
Benchmarking
Because the operation flow of marklit includes ADT stage it is too differs from other md-to-html parsers so the benchmarking won't give comparable results.