Question

I'm trying to unify the records in a database, I'm using the levenshtein algorithm and works for some cases,

Working sample (distance <= 2):

* --------- * ---------- * -------- *
|  Looking  |    Finds   | Distance |
* --------- * ---------- * -------- *
| No existe | No Existe  |     1    |
| desempleo | Desempleo  |     1    |    
* --------- * ---------- * -------- *

thats great but ignores cases with mayor distances like:

  • Femenino and FEMENINO with 7 distance

Note: I'm looking for a PHP solution

Was it helpful?

Solution

Compare

   echo levenshtein("Femenino", "FEMENINO");    // 7

VS

 echo levenshtein(strtolower("Femenino"), strtolower("FEMENINO"));  //0

If alphabet case doesn't matter for your application, make both the strings same case before you compare and you'll get significant improvement.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top