A powerful TypeScript library for fuzzy name matching that combines Levenshtein distance and phonetic similarity algorithms to provide accurate name matching capabilities.
Using npm:
npm install fuzzy-names
Using yarn:
yarn add fuzzy-names
- Fuzzy Name Matching: Combines Levenshtein distance and phonetic similarity for accurate name matching
- Multiple Phonetic Algorithms: Uses SoundEx, Metaphone, and Double Metaphone for comprehensive phonetic matching
- Customizable Thresholds: Configurable distance and phonetic similarity thresholds
- Flexible Input Handling: Works with both string arrays and object arrays with custom paths
- Name Normalization: Handles special characters, diacritics, and various name formats
- TypeScript Support: Full TypeScript support with comprehensive type definitions
The main function for finding the best matching name in a list.
function search<T = MatchItem>(
input: string,
matchList: Array<T>,
options?: Partial<Options>
): T | null
-
input: string
- The name string to search for within the matchList
- Properties:
- Can be a full name or partial name
- Is case-insensitive
- Can include special characters, diacritics, or extra spaces
- Can handle various name formats (Western, Eastern, with prefixes/suffixes)
- Examples:
search("John Doe") // Basic full name search("Dr. John A. Doe Jr.") // Name with prefix and suffix search("josé garcía") // Name with diacritics search(" Mary Jane ") // Name with extra spaces
-
matchList: Array<T>
- An array of items to search through
- Properties:
- Can be an array of strings:
string[]
- Can be an array of objects:
Array<T>
- For objects, use
matchPath
in options to specify the path to the name property - Generic type
T
allows for any object structure
- Can be an array of strings:
- Examples:
// Simple string array const stringList = [ "John Doe", "Jane Smith", "Bob Johnson" ]; // Object array with direct name property const objectList = [ { name: "John Doe", id: 1 }, { name: "Jane Smith", id: 2 } ]; // Object array with nested name property const nestedList = [ { user: { profile: { name: "John Doe" }, id: 1 } }, { user: { profile: { name: "Jane Smith" }, id: 2 } } ];
-
options?: Partial<Options>
- Optional configuration object that customizes the search behavior
- Properties:
-
matchPath: ReadonlyArray<number | string>
- Specifies the path to the name property in objects
- Required when searching through object arrays
- Can handle nested paths
- Empty array for direct string matching
- Examples:
// Direct name property search("John", objects, { matchPath: ["name"] }) // Nested name property search("John", nested, { matchPath: ["user", "profile", "name"] }) // Array index access search("John", arrays, { matchPath: ["names", 0] })
-
threshold: { distance?: number, phonetics?: number }
- Fine-tunes the matching sensitivity
-
distance
property:- Default value: 10
- Maximum allowed Levenshtein distance
- Higher values allow more character differences
- Lower values require closer string matches
- Recommended ranges:
- 1-2 for strict matching
- 3-5 for moderate matching
- 6-10 for loose matching
-
phonetics
property:- Default value: 1
- Minimum required phonetic similarity score
- Range: 0-9 (3 points per matching algorithm)
- Higher values require stronger phonetic matches
- Recommended ranges:
- 1-2 for basic matching
- 3-5 for moderate matching
- 6-9 for strict matching
- Examples:
// Strict matching search("John", names, { threshold: { distance: 2, // Allow only minor typos phonetics: 6 // Require strong phonetic match } }) // Loose matching search("John", names, { threshold: { distance: 8, // Allow more character differences phonetics: 2 // Accept weaker phonetic matches } }) // Balanced matching search("John", names, { threshold: { distance: 5, // Moderate character differences phonetics: 4 // Moderate phonetic similarity } })
-
- The best matching item from the list, or
null
if no match is found- For string arrays: returns the matching string
- For object arrays: returns the entire matching object
- Matches are ranked by:
- Phonetic similarity (higher is better)
- Levenshtein distance (lower is better)
const people = [
{ name: "John Doe", id: 1 },
{ name: "Jane Smith", id: 2 }
];
const result = search("Jon Doe", people, {
matchPath: ["name"],
threshold: { distance: 2, phonetics: 1 }
});
// Returns: { name: "John Doe", id: 1 }
Calculates both Levenshtein distance and phonetic similarity between two names.
function calculateMatchMetric(
queryName: string,
corpusName: string
): MatchMetric
-
queryName
: The input name to compare -
corpusName
: The name to compare against
- Object containing Levenshtein distance and phonetic metric scores
const metric = calculateMatchMetric("John Doe", "Jon Doe");
// Returns: {
// levDistance: { firstName: 1, lastName: 0, middleName: 0, total: 1 },
// phoneticsMetric: 6
// }
Calculates the Levenshtein distance between two names, broken down by name parts.
function calculateLevenshteinDistance(
queryName: string,
corpusName: string
): LevDistance
-
queryName
: The input name to compare -
corpusName
: The name to compare against
- Object containing distance scores for first name, middle name, last name, and total
const distance = calculateLevenshteinDistance("John A Doe", "Jon B Doe");
// Returns: {
// firstName: 1,
// middleName: 1,
// lastName: 0,
// total: 2
// }
Calculates phonetic similarity between two names using multiple algorithms.
function calculatePhoneticMetric(
inputName: string,
corpusName: string,
options?: CalculatePhoneticsMetricOptions
): number
-
inputName
: The input name to compare -
corpusName
: The name to compare against -
options
: Optional configuration-
returnAsPercentage
: Return score as percentage instead of raw number
-
- Numeric score indicating phonetic similarity (higher is more similar)
const score = calculatePhoneticMetric("John Doe", "Jon Doe");
// Returns: 6 (or 100 if returnAsPercentage is true)
Normalizes a name string by removing special characters and standardizing format.
function normalizeName(name: string): string
-
name
: The name string to normalize
- Normalized name string
const normalized = normalizeName(" John Döe-Smith ");
// Returns: "john doe smith"
Splits a full name into its constituent parts.
function splitNameIntoParts(name: string): NameParts
-
name
: The full name to split
- Object containing firstName, lastName, and middleNames array
const parts = splitNameIntoParts("John Alan Doe");
// Returns: {
// firstName: "john",
// lastName: "doe",
// middleNames: ["alan"]
// }
interface NameParts {
firstName: string;
lastName: string;
middleNames: string[];
}
interface LevDistance {
firstName: number;
middleName: number;
lastName: number;
total: number;
}
interface MatchMetric {
levDistance: LevDistance;
phoneticsMetric: number;
}
type Options = {
readonly matchPath: ReadonlyArray<number | string>;
readonly threshold: {
phonetics?: number;
distance?: number;
};
}
import { search } from 'fuzzy-names';
const names = ["John Doe", "Jane Smith", "Bob Johnson"];
const result = search("Jon Doe", names);
// Returns: "John Doe"
import { search } from 'fuzzy-names';
const users = [
{ user: { name: "John Doe", id: 1 } },
{ user: { name: "Jane Smith", id: 2 } }
];
const result = search("Jon Doe", users, {
matchPath: ["user", "name"]
});
// Returns: { user: { name: "John Doe", id: 1 } }
import { search } from 'fuzzy-names';
const names = [
{ name: "José García" },
{ name: "François Dubois" }
];
const result = search("Jose Garcia", names, {
matchPath: ["name"]
});
// Returns: { name: "José García" }
import { search } from 'fuzzy-names';
const names = ["John Doe", "Jonathan Doe", "Jon Doe"];
const result = search("Johnny Doe", names, {
threshold: {
distance: 5, // Allow more character differences
phonetics: 2 // Require stronger phonetic similarity
}
});
// Returns: "Jonathan Doe"
The core functionality of this library revolves around the search
function, which implements a sophisticated name matching algorithm combining both edit distance and phonetic similarity measures. This dual approach allows for highly accurate name matching that can handle variations in spelling, pronunciation, and formatting.
The search process follows these steps:
- Name Normalization: Input names are normalized by removing special characters, converting to lowercase, and standardizing spacing.
- Name Part Splitting: Names are split into first, middle, and last name components for granular comparison.
- Distance Calculation: Levenshtein distance is computed for each name part.
- Phonetic Matching: Multiple phonetic algorithms are applied for pronunciation-based matching.
- Score Combination: Results are combined and weighted to determine the best match.
The Levenshtein distance algorithm measures the minimum number of single-character edits required to change one string into another. For example:
- "John" to "Jon" has a distance of 1 (one deletion)
- "Smith" to "Smyth" has a distance of 1 (one substitution)
- "Catherine" to "Katherine" has a distance of 1 (one substitution)
The library calculates this distance separately for each name part (first, middle, last) to provide more accurate matching for full names.
SoundEx is a phonetic algorithm that indexes names by sound as pronounced in English. It generates a code that remains the same for similar-sounding names:
- Keeps the first letter
- Converts remaining letters to numbers:
- 1 = B, F, P, V
- 2 = C, G, J, K, Q, S, X, Z
- 3 = D, T
- 4 = L
- 5 = M, N
- 6 = R
For example:
- "Robert" and "Rupert" both encode to "R163"
- "Smith" and "Smythe" both encode to "S530"
Metaphone improves upon SoundEx by using more sophisticated rules that better handle English pronunciation patterns. It considers letter combinations and their positions:
- Handles combinations like "PH" (sounds like "F")
- Accounts for silent letters
- Considers word beginnings and endings differently
For example:
- "Philip" and "Filip" both encode to "FLP"
- "Catherine" and "Kathryn" both encode to "K0RN"
Double Metaphone further enhances the Metaphone algorithm by:
- Providing primary and alternative encodings for names
- Better handling international name variations
- Supporting multiple cultural pronunciation patterns
This is particularly useful for names that might have different pronunciations or cultural origins. For example:
- "Zhang" might encode to both "JNG" and "CSNG"
- "Michael" might encode to both "MKL" and "MXL"
The search function combines these algorithms in the following way:
-
Initial Filtering:
- Normalizes all names in the search corpus
- Applies basic string matching optimizations
-
Distance Scoring:
- Calculates Levenshtein distance for each name part
- Applies weightings based on name part importance
- Filters out matches exceeding the distance threshold
-
Phonetic Matching:
- Applies all three phonetic algorithms
- Counts matching algorithms for each name part
- Generates a phonetic similarity score (0-9, 3 points per algorithm)
-
Final Ranking:
- Combines distance and phonetic scores
- Prioritizes matches with high phonetic similarity
- Uses Levenshtein distance as a tiebreaker
- Returns the best match above the threshold
This multi-algorithm approach provides robust matching that can handle:
- Common spelling variations
- Phonetic similarities
- Typographical errors
- Cultural name variations
- Different name formats