nanoSQL 2 Fuzzy Search Plugin
Allows you to build and use dynamic fuzzy search with nanoSQL 2
What's This For?
Add fuzzy search capability to your NanoSQL apps, the special indexing in this plugin is well suited for fuzzy name matching, document search, or anywhere else you need to match words that sound or look similar together.
- Single word and phrase searches.
- Similar functionality as Elastic Search.
- Filter results by relevance to search.
- Use custom tokenizer function.
- Dynamic index can be updated on the fly.
- Works in NodeJS or any modern Browser.
- Only 10KB gzipped.
For NodeJS or with a bundler (webpack, parcel, etc)
npm i @nano-sql/plugin-fuzzy-search --save
To use in the browser, just drop this into your head AFTER nanoSQL core script
;;// or with <script> usage;;nSQL.createDatabase.then.then.then.then
Setting Up Fuzzy Indexes
In order to use the Fuzzy Search plugin it must be included in your initial
createDatabase call in the
Fuzzy search is enabled on secondary indexes by adding the
search property to the index. Fuzzy search indexes will only work on
string type indexes.
You can optionally pass an object to the
search property instead of
true. The object can be used to provide a customized tokenizer to be used by the fuzzy index.
By default the tokenizer will remove all special characters except numbers and letters, lowercase everything, remove english stop words, and use an english friendly stemmer and metaphone. To adjust how tokenization happens you can either provide your own tokenizer or provide the built in tokenizer with different configuration options from the default.
The provided tokenizer function is used on search terms passed into the fuzzy search plugin as well as on terms being indexed.
defaultTokenizer export is a function that accepts these arguments:
Should be one of "english", "english-meta", or "english-stem". All options will remove special characters and only leave numbers and letters. "english-stem" will cause the searches/indexes to also be stemmed with the Porter Stemmer algorithm, "english-meta" will cause the searches/indexes to also be metaphoned, "english" will cause both metaphone and stemmer algorithms to be ran on searches/indexes.
An array of stop words to be excluded from indexes/searches. The
stopWords export contains the default list of english stop words.
decimalPoints:number (optional, default is 4)
Since the fuzzy search can't use native number types, all numbers are stringified and formatted to a very specific format with a fixed number of decimal places. For example, by default if 50 is found in a string it will be converted to "50.0000". If 0.00001 is found it will be converted to "0.0000". You can increase the decimalPoints argument to get more precise decimal searches at the cost of space in the database.
A quick example:
You don't have to use the
defaultTokenizer at all, you can build your own tokenizing functions from scratch.
tokenizer property accepts a function with these arguments:
tableName:stringThe name of the table currently being tokenized.
tableId:stringThe id of the table in the first argument.
path:stringThe parsed path of the column being tokenized.
value:stringThe value being tokenized.
The function should return an array of objects, each object should have these properties:
w: The token of the word at a given position in the provided string.
i: The position of this tokenized word in the original string.
The ordering of the result array is irrelevant.
A quick example of using a custom tokenizer:
On a final note, if you change the tokenizer at all for an index, you must rebuild the index using the steps below.
Fuzzy indexes can be rebuilt with the new query
// rebuilid the fuzzy search index for all rows in "myTable"nSQL"myTable".query"rebuild search".exec..// rebuilid the fuzzy search index for specific rows in "myTable"nSQL"myTable".query"rebuild search".where.exec..
If you add fuzzy search to an existing secondary index or change the tokenizer for a fuzzy search index you must rebuild the fuzzy indexes on that table to get consistent results.
Using Fuzzy Indexes
SEARCH function is used to take advantage of fuzzy indexes. The search function accepts between 2 and any number of arguments, the first argument must always be the column/path the fuzzy search is being performed on and every following argument is a search phrase.
SEARCH function will return a number from 0 and up where 0 means an exact match was found and every number above zero represents an increasingly less relevant match.
You can also pass a
* into the first argument of the
SEARCH function to search all fuzzy indexes on a given table.
Since search terms must be surrounded by quotes and seperated by commas, it's important that any user supplied data is escaped of these values. This can be done automatically with the
FuzzyUserSanitize exported function.
;// find matches for "crzy txt" on the document column.nSQL"my_table".query"select".where.exec;// find users who's name closely or exactly matches "bill" OR "jeb"nSQL"users".query"select".where.exec;// search the body of posts for text provided by a usernSQL"posts".query"select".where.exec;// search all indexed columns for "something crazy" in the posts tablenSQL"posts".query"select".where.exec;
One final note, the
SEARCH function will also perform an exact match search of the entire untokenized search phrase against the initial secondary index the fuzzy search is based on, in addition to doing the fuzzy search. So any matches in the original secondary index will also be in the results.
Copyright (c) 2019 Scott Lott
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
- Added script install option.
- Readme updates.
- Fixed a bug with non string indexes.
- Added more documentation.
- Added new
- Added conditional rebuilding to the
- First release