November: Procrastination Month

npm

Bring the best of OSS JavaScript development to your projects with npm Orgs - private packages & team management tools.Learn more »

@nano-sql/plugin-fuzzy-search

2.0.2 • Public • Published

nanoSQL Logo

nanoSQL Logo nanoSQL Logo

nanoSQL 2 Fuzzy Search Plugin

Allows you to build and use dynamic fuzzy search with nanoSQL 2

Documentation | Bugs | Chat

What's This For?

Add fuzzy search capability to your NanoSQL apps, the special indexing in this plugin is well suited for fuzzy name matching, document search, or anywhere else you need to match words that sound or look similar together.

Features

  • Single word and phrase searches.
  • Similar functionality as Elastic Search.
  • Filter results by relevance to search.
  • Use custom tokenizer function.
  • Dynamic index can be updated on the fly.
  • Works in NodeJS or any modern Browser.
  • Only 10KB gzipped.

Installation

For NodeJS or with a bundler (webpack, parcel, etc)

npm i @nano-sql/plugin-fuzzy-search --save

To use in the browser, just drop this into your head AFTER nanoSQL core script

<script src="https://cdn.jsdelivr.net/npm/@nano-sql/plugin-fuzzy-search@2.0.2/dist/plugin-fuzzy-search.min.js" integrity="sha256-czgrUq1EccktG3O5AGuW/LoeMWeaQrkQzvj1S45ZiXw=" crossorigin="anonymous"></script>

Usage

import { FuzzySearch } from "@nano-sql/plugin-fuzzy-search";
import { nSQL } from "@nano-sql/core";
// or with <script> usage
const { FuzzySearch } = window["@nano-sql/plugin-fuzzy-search"];
const { nSQL } = window["@nano-sql/core"];
 
nSQL().createDatabase({
    id: "my_db",
    mode: "PERM", // or any adapter
    plugins: [
      FuzzySearch()
    ]
}).then(() => {
    return nSQL().query("create table", {
      name: "my_table",
      model: {
        "id:int": {pk: true, ai: true},
        "document:string": {},
      },
      indexes: {
        "document:string": {
          search: true // <== required to index this column with fuzzy search engine.
        }
      }
    })
}).then(() => {
  return nSQL("my_table").query("upsert", {document: "I put some crazy text here."}).exec();
}).then(() => {
  // the SEARCH function returns 0 for exact phrase matches and higher numbers for less strict matches
  // the first argument is the column, every following argument is a search term
  return nSQL("my_table").query("select").where(["SEARCH(document, 'crzy txt')", "=", 0]).exec();
}).then((results) => {
  console.log(results); // [{id: 1, document: "I put some crazy text here."});
})

API

Setting Up Fuzzy Indexes

In order to use the Fuzzy Search plugin it must be included in your initial createDatabase call in the plugins property.

import { FuzzySearch } from "@nano-sql/plugin-fuzzy-search";
import { nSQL } from "@nano-sql/core";
 
nSQL().createDatabase({
    plugins: [ // must include this
      FuzzySearch()
    ]
}).then..

Fuzzy search is enabled on secondary indexes by adding the search property to the index. Fuzzy search indexes will only work on string type indexes.

nSQL().query("create table", {
  name: "my_table",
  model: {
    "id:int": {pk: true, ai: true},
    "document:string": {},
  },
  indexes: {
    "document:string": {
      search: true // enable fuzzy search
    }
  }
}).exec()...

You can optionally pass an object to the search property instead of true. The object can be used to provide a customized tokenizer to be used by the fuzzy index.

By default the tokenizer will remove all special characters except numbers and letters, lowercase everything, remove english stop words, and use an english friendly stemmer and metaphone. To adjust how tokenization happens you can either provide your own tokenizer or provide the built in tokenizer with different configuration options from the default.

The provided tokenizer function is used on search terms passed into the fuzzy search plugin as well as on terms being indexed.

The defaultTokenizer export is a function that accepts these arguments:

type:string (required)

Should be one of "english", "english-meta", or "english-stem". All options will remove special characters and only leave numbers and letters. "english-stem" will cause the searches/indexes to also be stemmed with the Porter Stemmer algorithm, "english-meta" will cause the searches/indexes to also be metaphoned, "english" will cause both metaphone and stemmer algorithms to be ran on searches/indexes.

stopWords:string

An array of stop words to be excluded from indexes/searches. The stopWords export contains the default list of english stop words.

decimalPoints:number (optional, default is 4)

Since the fuzzy search can't use native number types, all numbers are stringified and formatted to a very specific format with a fixed number of decimal places. For example, by default if 50 is found in a string it will be converted to "50.0000". If 0.00001 is found it will be converted to "0.0000". You can increase the decimalPoints argument to get more precise decimal searches at the cost of space in the database.

A quick example:

import { nSQL } from "@nano-sql/core";
import { FuzzySearch, defaultTokenizer, stopWords} from "@nano-sql/plugin-fuzzy-search";
 
nSQL().query("create table", {
  name: "my_table",
  model: {
    "id:int": {pk: true, ai: true},
    "document:string": {},
  },
  indexes: {
    "document:string": {
      search: { // customize the default tokenizer
        tokenizer: defaultTokenizer("english-meta", stopWords, 1)
      }
    }
  }
}).exec()...

You don't have to use the defaultTokenizer at all, you can build your own tokenizing functions from scratch.

The tokenizer property accepts a function with these arguments:

  • tableName:string The name of the table currently being tokenized.
  • tableId:string The id of the table in the first argument.
  • path:string[] The parsed path of the column being tokenized.
  • value:string The value being tokenized.

The function should return an array of objects, each object should have these properties:

  • w: The token of the word at a given position in the provided string.
  • i: The position of this tokenized word in the original string.

The ordering of the result array is irrelevant.

A quick example of using a custom tokenizer:

import { nSQL } from "@nano-sql/core";
import { FuzzySearch} from "@nano-sql/plugin-fuzzy-search";
 
nSQL().query("create table", {
  name: "my_table",
  model: {
    "id:int": {pk: true, ai: true},
    "document:string": {},
  },
  indexes: {
    "document:string": {
      search: {
        tokenizer: (tableName, tableId, path, value) => {
          return String(value).split(" ").map((s, i) => {
            return {w: s.trim().toLowerCase(), i: i}
          })
        }
      }
    }
  }
}).exec()...

On a final note, if you change the tokenizer at all for an index, you must rebuild the index using the steps below.

Rebuilding Indexes

Fuzzy indexes can be rebuilt with the new query rebuild search.

// rebuilid the fuzzy search index for all rows in "myTable"
nSQL("myTable").query("rebuild search").exec()..
 
// rebuilid the fuzzy search index for specific rows in "myTable"
nSQL("myTable").query("rebuild search").where(["some condition", "=", true]).exec()..

If you add fuzzy search to an existing secondary index or change the tokenizer for a fuzzy search index you must rebuild the fuzzy indexes on that table to get consistent results.

Using Fuzzy Indexes

The new SEARCH function is used to take advantage of fuzzy indexes. The search function accepts between 2 and any number of arguments, the first argument must always be the column/path the fuzzy search is being performed on and every following argument is a search phrase.

The SEARCH function will return a number from 0 and up where 0 means an exact match was found and every number above zero represents an increasingly less relevant match.

You can also pass a * into the first argument of the SEARCH function to search all fuzzy indexes on a given table.

Since search terms must be surrounded by quotes and seperated by commas, it's important that any user supplied data is escaped of these values. This can be done automatically with the FuzzyUserSanitize exported function.

import { FuzzyUserSanitize } from "@nano-sql/core/plugin-fuzzy-search";
 
// find matches for "crzy txt" on the document column.
nSQL("my_table").query("select").where(["SEARCH(document, 'crzy txt')", "=", 0]).exec();
 
// find users who's name closely or exactly matches "bill" OR "jeb"
nSQL("users").query("select").where(["SEARCH(firstName, 'bill', 'jeb')", "=", 0]).exec();
 
// search the body of posts for text provided by a user
nSQL("posts").query("select").where([`SEARCH(body, "${FuzzyUserSanitize(userProvidedSearch)}")`, "<=", 4]).exec();
 
// search all indexed columns for "something crazy" in the posts table
nSQL("posts").query("select").where([`SEARCH(*, "something crazy")`, "<=", 4]).exec();

One final note, the SEARCH function will also perform an exact match search of the entire untokenized search phrase against the initial secondary index the fuzzy search is based on, in addition to doing the fuzzy search. So any matches in the original secondary index will also be in the results.

MIT License

Copyright (c) 2019 Scott Lott

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Changelog

[2.0.2]

  • Added script install option.
  • Readme updates.

[2.0.1]

  • Fixed a bug with non string indexes.
  • Added more documentation.
  • Added new * feature to SEARCH function.
  • Added conditional rebuilding to the search rebuild query.

[2.0.0]

  • First release

install

npm i @nano-sql/plugin-fuzzy-search

Downloadsweekly downloads

17

version

2.0.2

license

MIT

homepage

nanosql.io

repository

Gitgithub

last publish

collaborators

  • avatar
Report a vulnerability