|
Alternative
to Soundex
Q-Gram based algorithm
A Q-gram in this context refers to a sequence of
letters, q letters long, from a given word.
For example, for q = 2, the word Nelson has the following q-grams:
NE EL LS SO ON
By comparison, Neilsen breaks down into these q-grams (q = 2):
NE EI IL LS SE EN
Clearly, Nelson and Neilsen share the NE and LS q-grams in common.
Various techniques have been developed which compare two words based on
their q-grams. A simple example would be counting the number of q-grams
two words have in common, with a higher count yielding a stronger match.
Technically, q-gram algorithms aren’t strictly phonetic
matching, in that they do not operate based on comparison of the
phonetic characteristics of words. Instead, q-grams can be thought to
compute the "distance", or amount of difference, between two words.
Since phonetically similar words often have similar spellings, this
technique can provide favorable results, yet it also successfully
matches misspelled or otherwise mutated words, even if they are
rendered phonetically disparate.
|
|
Metaphone
Double
Metaphone
Caverphone
Q-Gram
NYSIIS
|