Question

I want to see how phonetically similar two non-English strings are, AFAIK soundex and metaphone implementations only work correctly for English based strings, for instance coração and corassão sound exactly the same in Portuguese but metaphone() returns KR and KRS. The same thing happens with other phonemes, chita and xita returns XT and ST, but they sound the same.

I've also tried this Double Metaphone implementation (demo) but the results are exactly the same.

So, is there any alternative algorithm that works with Portuguese words? I've read about Lucene in this other question, but I've never used it before and I'm not sure how it works or how to use it.

If not, does anyone know what kind of data I need to gather to develop a metaphone-like algorithm?

Was it helpful?

Solution

In case anyone is interested, I found a promising work-in-progress here and some other cool projects.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top