levenshtein-lte1
TypeScript icon, indicating that this package has built-in type declarations

1.0.1 • Public • Published

Build Status

levenshtein-lte1

The Levenshtein distance is infamoulsy known to run in O(n * m) time, n & m being the respective lengths of the considered strings.

But, most of the time, people just want to know whether the Levenshtein distance between two strings is below a given threshold. What's more, especially when targeting the English language, this threshold is often 1 or 2 (1 usually for the very similar Damerau-Levenshtein distance, for the reason explained in this seminal paper).

But people often miss that you can develop a custom implementation of the Levenshtein & Damerau-Levenshtein distance limited to 1 that can be way more performant.

This library exposes such an implementation for the JavaScript language. It runs in O(m) time, m being the length of the shortest string. It also does not need to rely on auxiliary memory.

Installation

The library comes along with its own TypeScript definition files.

npm install levenshtein-lte1

Usage

Note that if the distance between strings exceeds 1, the functions will return Infinity (typical implementation rather return -1 but Infinity is kinda useful with threshold conditions).

Levenshtein distance

var levenshteinLte1 = require('levenshtein-lte1');
 
levenshteinLte1('book', 'book');
>>> 0
 
levenshteinLte1('book', 'booc');
>>> 1
 
levenshteinLte1('abc', 'def');
>>> Infinity

Damerau-Levenshtein

var damerauLevenshteinLte1 = require('levenshtein-lte1/damerau');
 
damerauLevenshteinLte1('book', 'book');
>>> 0
 
damerauLevenshteinLte1('book', 'booc');
>>> 1
 
damerauLevenshteinLte1('abc', 'def');
>>> Infinity
 
damerauLevenshteinLte1('BONJOUR', 'BNOJOUR');
>>> 1

Both functions also accepts arrays instead of strings if you need to apply them to arbitrary sequences.

levenshteinLte1(
  ['the', 'mouse', 'eats'],
  ['a', 'mouse', 'eats']
);
>>> 1

Benchmark

Benchmark methodology adapted from js-levenshtein by gustf.

I only benchmarked words because it usually does not make sense to test large strings with a threshold of 1.

Disclaimer: this benchmark is deliberately unfair. It only means to demonstrate that this lib is a valid choice ONLY if your use case is to test whether strings have a Levenshtein distance less than or equal to 1.

2000 words, length max=20 min=3 avr=9.5

  145,110 op/s » levenshtein-lte1
   19,183 op/s » talisman-limited
    3,126 op/s » node-levenshtein
    2,557 op/s » js-levenshtein
    2,115 op/s » talisman
    1,801 op/s » levenshtein-edit-distance
    1,840 op/s » leven
    1,619 op/s » fast-levenshtein

Package Sidebar

Install

npm i levenshtein-lte1

Weekly Downloads

5

Version

1.0.1

License

MIT

Unpacked Size

8.29 kB

Total Files

7

Last publish

Collaborators

  • yomguithereal