Intro
Calculate Levenshtein distance and much more!
Install
Using bower:
$ bower install levenshtein-toolbox --save
Use your favorite flavor in the dist/
directory. Under AMD it registers the module name: levenshtein-toolbox
. As a global, use: LevenshteinToolbox
.
Using npm:
$ npm install levenshtein-toolbox --save
Use var x = require('levenshtein-toolbox')
in your code.
Usage
Note: I'll be using
chai
's.should
test construction in the examples to show what results you can expect. The library itself sets no.should
property.
Not only can you calculate the edit distance between any two strings with this library, you can also:
View and act on detailed information in each step
You can view each step taken in the process of converting your pattern string (first argument to .process()
) against a substring of the given text (second argument). Use this information to make your own assessments like the maximum insertions/deletions/substitutions you'll allow, or how closely you want the pattern string to match the text in order to perform some action.
Examples:
var l = ;var matches = 0;var substitutions = 0;var inserts = 0;var deletes = 0;var patternString = 'wheel';var len = patternStringlength;var text = 'hello'; { } // perform the Levenshtein distance algorithm,// converting `patternString` to `text`.l;// view each step in the path of converting `patternString` to `text`l; matchesshould;substitutionsshould;insertsshould;deletesshould; if matches/len > 08 console;
var l = // case insensitivity is the default, // but you can still be more explicit caseSensitive: false; l; // reconstruct and return the path taken in// converting the pattern string to the textlshoulddeep;
var l = caseSensitive: true; l;lshoulddeep;
Choose edit distance type
Typically edit distance is done against both whole strings, but by putting type: substring
into the options object passed into the constructor function, you can match your pattern string (first argument to .process()
) against a substring of the given text (second argument). This means the least-cost match will be found anywhere in the text, and insertions up to that point cost nothing.
By default, type: whole
is set, where the edit distance is calculated starting at the beginning of the text (all insertions will cost). You can change the type at any time with .set('type', 'whole')
or .set('type', 'substring')
.
By default, matches have a cost of 0
and insertions, deletions, and subtitutions all have a default cost of 1
.
Examples:
var l = ;var patternString = 'science';var text = 'do it for science!!'; l;// insert eveything up to 'science',// then insert the exclamation pointslshould;
var l = type: 'substring';var patternString = 'science';var text = 'do it for science!!'; l;// match the least-cost *substring* of `text`// insert everything up to 'science' at no cost,// then match `science` and stop there.// if `patternString` were `scence`, for example,// one insertion would take place, for a total cost of 1.lshould;
Directly influence which path is taken
Introduce your own cost functions for the three string operations: matching/substitution, char insertion, and char deletion, and you'll have total control over how the algorithm finds the cheapest path.
Make sure you use .set()
to set your cost functions, so that the library can do some internal setup.
Examples:
// the `matchCost` function dictates the cost when both// letters in both strings match, and if they don't match,// the cost of doing a substitution from the first to the second.// you can even vary the cost depending on the letters being compared. { if c1 === c2 return 0; // don't substitute a space char; keeps words separate if c1 === ' ' || c2 === ' ' return Infinity; return 1;} // the `insertCost` is the cost of insertion of a letter in the text// into the pattern string to make them match at that point. `cBefore`// and `cAfter` are the chars in the pattern string before and after// the insertion point, respectively, or `null` if at the beginning// or end of the pattern string. { // insertion at a word boundary is free if cBefore === null || cAfter === null || cBefore === ' ' || cAfter === ' ' return 0; // prefer insertions inside of a word to a substitution return 09;} // you can set the cost as a function or a numeric value.// the `deleteCost` is the cost of deleting a char from// `patternString` in order to better match the text.//// specifying a large value will make the algorithm prefer// a path that doesn't include deletionsvar deleteCost = Infinity; var l = type: 'substring'; l;l;l; l;lshould;lshoulddeep; // notice here how the insertions at the ends of a word have no cost.// the only costs here are the insertions inside the "word" `insrto`.l;lshould;lshoulddeep;
API
To de JSDoc'd!
License
MIT