Soundex
vs. Levenshtein
In fact these are not real alternatives. Soundex
describes a scheme to match words that are similar with respect to
phonemes, whereas Levenshtein
is a distance measure between two strings.
The application of Soundex consists of 2 steps,
the encoding of the string and an exact match of these codes.
This method divides all strings into equivalent classes, i.e. in the
disjoint sets of words. Therefore every word can be found only in same
class, but no word which is in a "close" class. Exactly to say: the
Soundex has no concept of "proximity".
Thereagainst the
Levenshtein algorithm allows to estimate an identity between
two words.
Of course the calculating costs are higher than by Soundex method. But
for the acceleration a Hashtable is generated.
Today only the company Exorbyte managed a
hochperformante implementing of the Levenshtein algorithm . Exorbytes
software allows the
Levenshtein distance with millions of words within
milliseconds zu calculate. This allows even for interactive
search applications based on Ajax.
|