Alternative to Soundex


Q-Gram based algorithm

A Q-gram in this context refers to a sequence of letters, q letters long, from a given word.
For example, for q = 2, the word Nelson has the following q-grams:
NE EL LS SO ON
By comparison, Neilsen breaks down into these q-grams (q = 2):
NE EI IL LS SE EN
Clearly, Nelson and Neilsen share the NE and LS q-grams in common.
Various techniques have been developed which compare two words based on their q-grams. A simple example would be counting the number of q-grams two words have in common, with a higher count yielding a stronger match.
Technically, q-gram algorithms aren’t strictly phonetic matching, in that they do not operate based on comparison of the phonetic characteristics of words. Instead, q-grams can be thought to compute the "distance", or amount of difference, between two words. Since phonetically similar words often have similar spellings, this technique can provide favorable results, yet it also successfully matches misspelled or otherwise mutated words, even if they are rendered phonetically disparate.

 

Metaphone

Double Metaphone

Caverphone

Q-Gram

NYSIIS