A simple streamable lexer. It transforms text into token objects.
Slexer will take text and break it up into individual token objects based upon a
given lexicon. There will be one token for each portion of text that matches an
item in the lexicon. There will also be one token for each portion of text that
does not match an item in the lexicon. This way the entire length of text will
be represented by tokens. The tokens will contain the matched or unmatched
portion of text, called the lexeme, along with positional information where the
lexeme was found in the text. For example, given the text
'abcdefghijklmnopqrstuvwxyz' and the lexicon
['a', 'e', 'i', 'o', 'u'],
Slexer will produce the following tokens:
column: 0lexeme: 'a'line: 0offset: 0column: 1lexeme: 'bcd'line: 0offset: 1column: 4lexeme: 'e'line: 0offset: 4column: 5lexeme: 'fgh'line: 0offset: 5column: 8lexeme: 'i'line: 0offset: 8column: 9lexeme: 'jklmn'line: 0offset: 9column: 14lexeme: 'o'line: 0offset: 14column: 15lexeme: 'pqrst'line: 0offset: 15column: 20lexeme: 'u'line: 0offset: 20column: 21lexeme: 'vwxyz'line: 0offset: 21end: true
The final token is unique in order to mark the end of the text.
Begin by importing the
The Slexer constructor requires a config object with a
lexicon property. The
lexicon is defined by an array of strings. For optimal performance, the lexicon
should not contain duplicates.
const slexer =;
By default, Slexer uses
'\n' to identify line endings. This can be overridden
by specifying a
lineEnding property on the config object.
Slexer is a
Readable stream. When the stream becomes readable, tokens can be
obtained through the
Slexer is also a
Writable stream. The input text can be written to the stream
write method. Calling the
end method will signal the end of the
input text. A more common use case is to read text from an input file. In this
case, it is recommended to create a
Readable stream to read the file and pipe
its output into Slexer.
If you happen to already have a string containing the entire input text, you can
pass it to the
end method to simultaneously write and close the stream.
S in Slexer might stand for any or all of the following:
Copyright (c) 2013 Steven Olmsted firstname.lastname@example.org
This software is provided "as is", without any express or implied warranties, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. In no event will the authors or contributors be held liable for any direct, indirect, incidental, special, exemplary, or consequential damages however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise), arising in any way out of the use of this software, even if advised of the possibility of such damage.
Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter and distribute it freely in any form, provided that the following conditions are met:
The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required.
Altered source versions may not be misrepresented as being the original software, and neither the name of Steven Olmsted nor the names of authors or contributors may be used to endorse or promote products derived from this software without specific prior written permission.
This notice must be included, unaltered, with any source distribution.