# group-similar

1.0.4 • Public • Published

# Group Similar

Group similar items together.

Runtime complexity is `O(N^2 * (M + α(N)))`, where `N` is the number of elements in `items`, `M` is the runtime complexity of the `similarityFunction`, and `α(N)` is the inverse Ackermann function (amortized constant time for all practical purposes).

Space complexity is `O(N)`.

## Getting started

``````npm i group-similar
``````

## Examples

Group similar strings
```import { groupSimilar } from "group-similar";
import { distance } from "fastest-levenshtein";

function levenshteinSimilarityFunction(a: string, b: string): number {
return a.length === 0 && b.length === 0
? 1
: 1 - distance(a, b) / Math.max(a.length, b.length);
}

groupSimilar({
items: ["cat", "bat", "kitten", "dog", "sitting"],
mapper: (i) => i,
similarityFunction: levenshteinSimilarityFunction,
similarityThreshold: 0.5,
});

// [ [ 'cat', 'bat' ], [ 'kitten', 'sitting' ], [ 'dog' ] ]```
Group similar numbers
```import { groupSimilar } from "group-similar";

function evenOddSimilarityFunction(a: number, b: number): number {
return Number(a % 2 === b % 2);
}

groupSimilar({
items: [1, 5, 10, 0, 2, 123],
mapper: (i) => i,
similarityFunction: evenOddSimilarityFunction,
similarityThreshold: 1,
});

// [ [ 1, 5, 123 ], [ 10, 0, 2 ] ]```
Group similar objects
```import { groupSimilar } from "group-similar";
import { distance } from "fastest-levenshtein";

function nestedMapper(object: { a: { b: { value: string } } }): string {
return object.a.b.value;
}

function levenshteinSimilarityFunction(a: string, b: string): number {
return a.length === 0 && b.length === 0
? 1
: 1 - distance(a, b) / Math.max(a.length, b.length);
}

groupSimilar({
items: [
{ a: { b: { value: "sitting" } } },
{ a: { b: { value: "dog" } } },
{ a: { b: { value: "kitten" } } },
{ a: { b: { value: "bat" } } },
{ a: { b: { value: "cat" } } },
],
mapper: nestedMapper,
similarityFunction: levenshteinSimilarityFunction,
similarityThreshold: 0.5,
});

// [
//   [{ a: { b: { value: "sitting" } } }, { a: { b: { value: "kitten" } } }],
//   [{ a: { b: { value: "dog" } } }],
//   [{ a: { b: { value: "bat" } } }, { a: { b: { value: "cat" } } }],
// ]```

## Syntax

```groupSimilar(options);
groupSimilar({ items, mapper, similarityFunction, similarityThreshold });```

### Parameters

Parameter Type Required Default Description
options `Object` Yes none Arguments to pass to the function

#### Options

Property Type Required Default Description
items `T[]` Yes none Array of items to group
mapper `(t: T) => K` Yes none Function to apply to each element in items prior to measuring similarity
similarityFunction `(a: K, b: K) => number` Yes none Function to measure similarity between mapped items
similarityThreshold `number` Yes none Threshold at which items whose similarity value is greater than or equal to it are grouped together

### Return value

The return value is a new nested array of type `T[][]` containing elements of `items` grouped by similarity. If there are no elements in `items`, an empty array will be returned.

## Benchmark

Benchmark test results where `N` is the number of items being grouped, higher `ops/sec` is better.

Library N=16 N=32 N=64 N=128 N=256 N=512 N=1024 N=2048
group-similar 86867 17538 6067 1594 444 171 75 27
set-clustering 28506 6258 1831 455 121 30 6 1

Benchmark configuration details can be found here.

## Package Sidebar

### Install

`npm i group-similar`

### Repository

github.com/peterthehan/group-similar

### Homepage

github.com/peterthehan/group-similar

4

1.0.4

MIT

10.1 kB

5