ngram-fingerprint

1.0.0 • Public • Published

ngram-fingerprint

Windows Mac/Linux
Windows Build status Build Status

JavaScript implementation of the ngram-fingerprint algorithm from the Open Refine project described here.

Algorithm

The algorithm is slightly different to the one by Google Refine. The replacements of extended western characters is already done in the third step and not as the last step. This is mostly done so the sorting will work properly.

  1. change all characters to their lowercase representation
  2. remove all punctuation, whitespace, and control characters
  3. normalize extended western characters to their ASCII representation
  4. obtain all the string n-grams
  5. sort the n-grams and remove duplicates
  6. join the sorted n-grams back together

Usage

var fingerprint = require('ngram-fingerprint')
 
fingerprint(2, 'paris') // returns arispari
 

Package Sidebar

Install

npm i ngram-fingerprint

Weekly Downloads

2

Version

1.0.0

License

MIT

Last publish

Collaborators

  • finnpauls