@jimtkelly/aho-corasick

1.0.1 • Public • Published

Contributors Forks Stargazers Issues MIT License LinkedIn


Logo

aho-corasick

An implementation of the Aho-Corasick algorithm in TypeScript.
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgments
  9. References

About The Project

Logo

Figure 1: Aho-Corasick Algorithm Diagram [1]

The Aho—Corasick algorithm is a string-searching algorithm invented by Alfred V. Aho and Margaret J. Corasick in 1975. [1]. This repository contains an implementation of this algorithm in TypeScript for use in efficient string matching as an aid to bibliographic searches.

This package is referenced and inspired from tanishiking's aho-corasick-js package, providing some minor improvements to optimisation and additional functionality.

This package is hosted on npm and is accessible here.

(back to top)

Built With

  • npm
  • TypeScript

(back to top)

Getting Started

Installation

To use this project, please install the required packages as available from npm.

  • npm
    npm i @jimtkelly/aho-corasick

Usage

Create Trie and Add Keywords

It's important to note that if the param, keywords is an array of empty strings, e.g., [' ', ''], then an error will be thrown.

const trie = new Trie();
trie.addKeyword('apple');
trie.addKeyword('banana');

Get Matches

const trie = new Trie(['apple', 'banana']);
const matches = trie.getMatches('apple banana');
console.log(matches);
// Output: [{ start: 0, end: 4, keyword: 'apple' }, { start: 6, end: 11, keyword: 'banana' }]

Get Inverse Matches

const trie = new Trie(['cat', 'dog']);
const nonMatches = trie.getNonMatches('The cat and the dog');
console.log(nonMatches);
// Output: ['The ', ' and the ']

Get String Occurrences

const trie = new Trie(['apple', 'banana']);
const occurrences = trie.getStringOccurrences('apple banana apple');
console.log(occurrences);
// Output: [{ keyword: 'apple', occurrences: 2 }, { keyword: 'banana', occurrences: 1 }]

For more examples, please refer to the Documentation

(back to top)

Roadmap

  • [x] Base implementation of the Aho-corasick algorithm.
  • [x] Add GitHub workflow to test & build the package.
  • [x] Implement feature to return non-matches, i.e., all words that did not match.
    • [x] Additional nice to have is to only return the words that matched and their counts, i.e., without position in the string.

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Jim Kelly - jimkelly.t@outlook.com

Project Link: https://github.com/jamestkelly/aho-corasick

(back to top)

Acknowledgments

(back to top)

References

[1] "Aho–Corasick algorithm," in Wikipedia: The Free Encyclopedia; (Wikimedia Foundation Inc., updated 11 December 2023, 05:39 UTC) [encyclopedia on-line]; available from https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm; Internet; retrieved 14 December 2023.

Package Sidebar

Install

npm i @jimtkelly/aho-corasick

Weekly Downloads

3

Version

1.0.1

License

MIT

Unpacked Size

292 kB

Total Files

31

Last publish

Collaborators

  • jamestkelly