@quantleaf/probly-search
TypeScript icon, indicating that this package has built-in type declarations

1.2.4 • Public • Published

probly-search · GitHub license Coverage Status Latest Version PRs Welcome

A full-text search library, optimized for insertion speed, that provides full control over the scoring calculations.

This start initially as a port of the Node library NDX.

Demo

Recipe (title) search with 50k documents.

https://quantleaf.github.io/probly-search-demo/

Features

  • Three ways to do scoring

    • BM25 ranking function to rank matching documents. The same ranking function that is used by default in Lucene >= 6.0.0.
    • zero-to-one, a library unique scoring function that provides a normalized score that is bounded by 0 and 1. Perfect for matching titles/labels with queries.
    • Ability to fully customize your own scoring function by implenting the ScoreCalculator trait.
  • Trie based dynamic Inverted Index.

  • Multiple fields full-text indexing and searching.

  • Per-field score boosting.

  • Configurable tokenizer and term filter.

  • Free text queries with query expansion.

  • Fast allocation, but latent deletion.

Documentation

Documentation is under development. For now read the source tests.

Example

Creating an index with a document that has 2 fields. Query documents, and remove a document.

use std::collections::HashSet;
use probly_search::{
    index::{add_document_to_index, create_index, remove_document_from_index, Index},
    query::{
        query,
        score::default::{bm25, zero_to_one},
        QueryResult,
    },
};


// Create index with 2 fields
let mut index = create_index::<usize>(2);

// Create docs from a custom Doc struct
let doc_1 = Doc {
    id: 0,
    title: "abc".to_string(),
    description: "dfg".to_string(),
};

let doc_2 = Doc {
    id: 1,
    title: "dfgh".to_string(),
    description: "abcd".to_string(),
};

// Add documents to index
add_document_to_index(
    &mut index,
    &[title_extract, description_extract],
    tokenizer,
    filter,
    doc_1.id,
    doc_1.clone(),
);

add_document_to_index(
    &mut index,
    &[title_extract, description_extract],
    tokenizer,
    filter,
    doc_2.id,
    doc_2,
);

// Search, expected 2 results
let mut result = query(
    &mut index,
    &"abc",
    &mut bm25::new(),
    tokenizer,
    filter,
    &[1., 1.],
    None,
);
assert_eq!(result.len(), 2);
assert_eq!(
    result[0],
    QueryResult {
        key: 0,
        score: 0.6931471805599453
    }
);
assert_eq!(
    result[1],
    QueryResult {
        key: 1,
        score: 0.28104699650060755
    }
);

// Remove documents from index
let mut removed_docs = HashSet::new();
remove_document_from_index(&mut index, &mut removed_docs, doc_1.id);

// Vacuum to remove completely
vacuum_index(&mut index, &mut removed_docs);

// Search, expect 1 result
result = query(
    &mut index,
    &"abc",
    &mut bm25::new(),
    tokenizer,
    filter,
    &[1., 1.],
    Some(&removed_docs),
);
assert_eq!(result.len(), 1);
assert_eq!(
    result[0],
    QueryResult {
        key: 1,
        score: 0.1166450426074421
    }
);

Go through source tests in for the BM25 implementation and zero-to-one implementation for more query examples.

License

MIT

Readme

Keywords

Package Sidebar

Install

npm i @quantleaf/probly-search

Weekly Downloads

0

Version

1.2.4

License

MIT

Unpacked Size

6.45 kB

Total Files

7

Last publish

Collaborators

  • marcus-quantleaf