The Soundex algorithm is desgined to work on single words. (To simplify, it encodes the first letter and the first three following consonants.)
SQLite soundex string length
题
Does soundex function in sqlite has limitation on string length? As I found that the result of
SELECT soundex('Schneider Thomson'), soundex('Schneider Rene'), soundex('Schneider')
Are all the same value which equal S536, However the result of :
SELECT soundex('Schn Thomson'), soundex('Schn Rene'), soundex('Schn');
Is different for each string and the values is
soundex('Schn Thomson') = S535
soundex('Schn Rene') = S565
soundex('Schn') = S500
Please can any one explain why?
解决方案
其他提示
To provide a little more clarification to CL's answer, the encoding uses the first letter and then encodes the following consonants (with the exception of H,W,Y) until 3 digits have been generated. Mississippi is a word that illustrates this well. MISSISSIPPI has a SOUNDEX of M210.
- M is the first letter followed by the first consonant S. S is then repeated and ignored by SOUNDEX encoding.
- The next consonant is P and it is not followed by another valid letter (just a repeated p and an i).
- Thus a zero is the final digit.
Hopefully that provides a little more clarification on how SOUNDEX encodes words. For a little more information, this article from Genealogy.com explains how to use SOUNDEX
when researching names. This would explain why supercell and supercalifragilisticexpialidocious have the same SOUNDEX, S162.