fuzzy-neon

0.1.5 • Public • Published

fuzzy-neon

fuzzy-neon: Collection of fuzzy string matching algorithms written in rust

Created using neon.

Installation

Requires Node and Cargo to install.

Node In order to install fuzzy-neon you may need to first install cargo-cp-artifact:

$ npm i cargo-cp-artifcat

Installing the Module

$ npm i fuzzy-neon

Usage

const fuzzy = require('fuzzy-neon');
let score = fuzzy.hamming('nick', 'nice');
console.log(score);
// 1

Available String Matching Metrics

  • Hamming distance hamming(str1, str2)
  • Levenshtein (recursive impl) levenshtein(str1, str2)
  • Wagner-Fischer (dynamic programming impl of levenshtein) wagnerFischer(str1, str2)
  • Damerau-Levenshtein damerauLevenshtein(str1, str2)
  • Jaro-Winkler Distance jaroWinkler(str1, str2)
  • Longest Common Subsequence lcs(str1, str2)
  • n-gram based distance metric ngram(str1, str2, ngramSize)
  • Jensen-Shannon Vector Distance jensonshannonVector(str1, str2)

Algorithm Explanation/Useful Links

Hamming

Computes number of positions between two strings where characters differ. Expanded to allow strings of different lengths Example

const fuzzy = require('fuzzy-neon');
let score = fuzzy.hamming('nick', 'nice');
console.log(score);
// 1

Levenshtein

The Levenshtein distance between two strings is the minimum number of single-character edits (insertions, deletions or substitutions) to convert one word to the other. For efficient implementation use Wagner-Fischer Example

const fuzzy = require('fuzzy-neon');
let score = fuzzy.levenshtein('nick', 'nit');
console.log(score);
// 2

Wagner-Fischer

Implementation of Levenshtein using a faster, dynamic programming implementation - interesting to compare performance between the two. Example

const fuzzy = require('fuzzy-neon');
let score = fuzzy.wagnerFischer('nick', 'nit');
console.log(score);
// 2

Damerau-Levenshtein

Extension of the Levenshtein distance metric which allows for adjacent character transpositions Example

const fuzzy = require('fuzzy-neon');
let score = fuzzy.damerauLevenshtein('taco', 'atco');
console.log(score);
// 1

Jaro Winkler Distance

Extension of the Jaro distance between two strings; the Jaro distance uses the relative probability of each character in a string to calculate an edit score between two strings (see here for formula details). Winkler extended this to boost the scores of words which shared prefixes of some length. Returns a normalised score between 0 and 1. Example

const fuzzy = require('fuzzy-neon');
let score = fuzzy.jaroWinkler('nice', 'nick');
console.log(score);
// 0.11549999999999994

N-gram Based distance metric

n-gram based string distance metric implemented based on the work from this paper. Extremely good at integrating context into producing the metric score, O(n^2) complexity. Example

const fuzzy = require('fuzzy-neon');
let score = fuzzy.ngram('rupert', 'robert', 2); // last arg is size of ngram
console.log(score);
// 0.16666666666666666

Jensen Shannon Distance

Computes the relatively probability of events in the string (events being either single characters or bi-grams) porducing a probabilty vector. Then computes the Jensen Shannon distance metric over the two probability vectors. Produced from the work by Richard Connor et al in this paper. O(n) complexity. Example

const fuzzy = require('fuzzy-neon');
let score = fuzzy.jensonshannonVector('hi their', 'hi there');
console.log(score);
// 0.36638698518165513

Package Sidebar

Install

npm i fuzzy-neon

Weekly Downloads

10

Version

0.1.5

License

ISC

Unpacked Size

528 kB

Total Files

15

Last publish

Collaborators

  • rupert648