@mazard/scanner
TypeScript icon, indicating that this package has built-in type declarations

1.2.0 • Public • Published

Mazard Scanner

This scanner converts a Markdown document into an array of tokens. These tokens can then be interpreted by a parser into an expression tree. Much inspiration has been taken from Robert Nystrom's Crafting Interpreters as well as Alfred Aho's The Theory of Parsing, Translation, and Compiling.

Tokens types

Type Description Example
SYMBOL An alphanumeric string that closely resembles a variable name in other languages Foo, foo, foo-bar, foo_bar
RUNE Similar to a symbol, but these strings contain non-alphanumeric content Foo#, -foo, _foo, 1foo, fo>o
NUMBER An integer, decimal, or a number in exponential notation 1, 1.0, +1, -1, 1.0e1
SPACE One ore more space characters. The literal value is the number of spaces encountered.
TAB A "\t" or " " at the start of a line.
BR One or more line break characters.
COLON A, well, colon :
COLON_COLON Two colons in sequence, likely indicated an Obsidian metadata value ::
FRONTMATTER_START The triple-dash at the start of a frontmatter section ---
FRONTMATTER_END The triple-dash at the end of a frontmatter section ---
FRONTMATTER_KEY A frontmatter key The foo in foo: bar
FRONTMATTER_VALUE A frontmatter value The bar in foo: bar
FRONTMATTER_BULLET A dash at the beginning of a line The - in - bar
CODE_START The triple-backtick at the start of a code section ```
CODE_LANGUAGE The language specified after the triple backticks of a CODE_START The typescript in \``typescript`
CODE_KEY Similar to frontmatter, code blocks can have keys and values after the CODE_START The foo in foo: bar
CODE_VALUE A metadata code value The bar in foo: bar
CODE_SOURCE The source code inside of a code block
CODE_END The triple-backtick at the end of a code section ```
HHASH A one- to six-legged hash tag at the beginning of a line The ### in ### Foo
HGTHAN A > at the beginning of a line The > in > Foo
L_BRACKET A single left bracket [
LL_BRACKET Two left brackets [[
R_BRACKET A single right bracket [
RR_BRACKET Two right brackets ]]
LL_BRACE Two left braces {{
RR_BRACE Two right braces }}
ASTERISK A single asterisk *
ASTERISK_ASTERISK Two asterisks **
EQUALS_EQUALS Two equals signs ==
ORDINAL A number with an ordinal suffix 1st, 2nd, 3rd, 4th
PIPE A bar pipe |
TAG A symbol prefixed with a hashtag #tag, #tag-foo #tag1
TILDE_TILDE Two tildes ~~
ESCAPE A backslash followed by any character |
L_PAREN A left parenthesis (
R_PAREN A right parenthesis )
BACKTICK A single backtick ```
DOLLAR A dollar sign $
DOLLAR_DOLLAR Two dollar signs $$
PERCENT_PERCENT Two percent signs %%
COMMENT The content of a comment A comment in %% A comment
HTML_TAG An html tag <div>, </div>, <p />
HR A horizontal rule ---, ***, ___
BULLET A dash or asterisk at the beginning of a line The - in - foo
N_BULLET A numbered bullet at the beginning of a line The 1. in 1. foo
CHECKBOX A checkbox at the beginning of a line The - [ ] in - [ ] foo
URL A url https://www.google.com
EOF The very end of the string or file

Some examples

const tokens = scanTokens([
	"# Mazard Scanner",
	"",
	"This scanner converts a Markdown document into an array of tokens.",
]);

printTokens(tokens);
No Type Lexeme Literal Line Column
0 HHASH "#" 1 0 0
1 SPACE " " 1 0 1
2 SYMBOL "Mazard" "Mazard" 0 2
3 SPACE " " 1 0 8
4 SYMBOL "Scanner" "Scanner" 0 9
5 BR "\n\n" 2 0 16
6 SYMBOL "This" "This" 2 0
7 SPACE " " 1 2 4
8 SYMBOL "scanner" "scanner" 2 5
9 SPACE " " 1 2 12
10 SYMBOL "converts" "converts" 2 13
11 SPACE " " 1 2 21
12 SYMBOL "a" "a" 2 22
13 SPACE " " 1 2 23
14 SYMBOL "Markdown" "Markdown" 2 24
15 SPACE " " 1 2 32
16 SYMBOL "document" "document" 2 33
17 SPACE " " 1 2 41
18 SYMBOL "into" "into" 2 42
19 SPACE " " 1 2 46
20 SYMBOL "an" "an" 2 47
21 SPACE " " 1 2 49
22 SYMBOL "array" "array" 2 50
23 SPACE " " 1 2 55
24 SYMBOL "of" "of" 2 56
25 SPACE " " 1 2 58
26 RUNE "tokens." "tokens." 2 59
27 EOF "" "" 2 66
const tokens = scanTokens("here's a *line* with some ~~formatting~~.");
printTokens(tokens);
No Type Lexeme Literal Line Column
0 RUNE "here's" "here's" 0 0
1 SPACE " " 1 0 6
2 SYMBOL "a" "a" 0 7
3 SPACE " " 1 0 8
4 ASTERISK "*" "*" 0 9
5 SYMBOL "line" "line" 0 10
6 ASTERISK "*" "*" 0 14
7 SPACE " " 1 0 15
8 SYMBOL "with" "with" 0 16
9 SPACE " " 1 0 20
10 SYMBOL "some" "some" 0 21
11 SPACE " " 1 0 25
12 TILDE_TILDE "~~" "~~" 0 26
13 SYMBOL "formatting" "formatting" 0 28
14 TILDE_TILDE "~~" "~~" 0 38
15 RUNE "." "." 0 40
16 EOF "" "" 0 41
const tokens = scanTokens([
	"- [x] Finish the scanner.",
	"- [ ] Write some reasonable documentation",
]);

printTokens(tokens);
No Type Lexeme Literal Line Column
0 CHECKBOX "- [x]" true 0 0
1 SPACE " " 1 0 5
2 SYMBOL "Finish" "Finish" 0 6
3 SPACE " " 1 0 12
4 SYMBOL "the" "the" 0 13
5 SPACE " " 1 0 16
6 RUNE "scanner." "scanner." 0 17
7 BR "\n" 1 0 25
8 CHECKBOX "- [ ]" false 1 0
9 SPACE " " 1 1 5
10 SYMBOL "Write" "Write" 1 6
11 SPACE " " 1 1 11
12 SYMBOL "some" "some" 1 12
13 SPACE " " 1 1 16
14 SYMBOL "reasonable" "reasonable" 1 17
15 SPACE " " 1 1 27
16 SYMBOL "documentation" "documentation" 1 28
17 EOF "" "" 1 41

Readme

Keywords

none

Package Sidebar

Install

npm i @mazard/scanner

Weekly Downloads

0

Version

1.2.0

License

MIT

Unpacked Size

131 kB

Total Files

28

Last publish

Collaborators

  • evan.nagle