@clipperhouse/jargon
TypeScript icon, indicating that this package has built-in type declarations

0.2.0 • Public • Published

Jargon is a TypeScript/JavaScript library for tokenization and lemmatization. It finds variations on canonical terms and converts them to a single form.

For example, in tech, you might see 'node js' or 'NodeJS' or 'node.js' and want them understood as the same term. That’s lemmatization.

Quick start

npm install "@clipperhouse/jargon@latest"

Then create a file, preferably TypeScript.

// demo.ts

import jargon from '@clipperhouse/jargon';		
import stackexchange from '@clipperhouse/jargon/stackexchange';	// a dictionary

const text = 'I ❤️ Ruby on Rails and vue';

const lemmas = jargon.Lemmatize(text, stackexchange);

console.log(lemmas.toString());

// I ❤️ ruby-on-rails and vue.js
// demo.js

const jargon = require('@clipperhouse/jargon');
const stackexchange = require('@clipperhouse/jargon/stackexchange');

const text = 'I ❤️ Ruby on Rails and vue';

const lemmas = jargon.Lemmatize(text, stackexchange);
console.log(lemmas.toString());

// I ❤️ ruby-on-rails and vue.js

What’s it doing?

jargon tokenizes the incoming text, identifying punctuation and spaces. It understands tech-ish terms as single words, such as asp.net and TCP/IP, and #hangtags and @handles (other tokenizers would see two words).

Those tokens go to the lemmatizer, with a dictionary. The lemmatizer passes over tokens, and asks the dictionary if it recognizes them. It handles multi-token phrases like 'Ruby on Rails', converting it a single ruby-on-rails token.

It is insensitive to spaces, hyphens, dots, slashes and case -- so it handles a lot of variation that would be difficult to get right with simple search-and-replace or regex.

These rules are defined in a Dictionary. In the above examples, stackexchange is the dictionary, and it knows about react vs react.js. It also understands synonyms, such as ecmascript javascript.

Another example is the contractions dictionary. It'll split tokens like it'll into two tokens it and will.

Package Sidebar

Install

npm i @clipperhouse/jargon

Weekly Downloads

0

Version

0.2.0

License

MIT

Unpacked Size

371 kB

Total Files

48

Last publish

Collaborators

  • clipperhouse