Physical distance between two places
Question
I need to measure the physical distance between two places whose names are provided as strings. Since sometimes the names are written slightly differently, I was looking for a library that could help me measure the difference and then combine it with a measure of the latitude and longitude to select the correct matches. Preferred languages: Java or PHP.
Any suggestions?
Solution
Have a look at the Levenshtein distance. This is a way of measuring how different two strings are from one another.
Hopefully I understood your question correctly; using "distance" in the same sentence as "latitude and longitude" could be confusing!
OTHER TIPS
Although written in c (with python and tcl bindings), libdistance would be a tool for applying several distances metrics on strings/data.
Metrics included:
- bloom
- damerau
- euclid
- hamming
- jaccard
- levenshtein
- manhattan
- minkowski
- needleman_wunsch
You might get some decent results using a phonetic algorithm to find slightly misspelld names.
Also, if you use a more mechanical edit distance, you'll probably see better results using a weighted function that accounts for keyboard geometry (i.e. physically close keys are "cheaper" to replace than far off ones). That's a patented method btw, so be careful not to write something that becomes too popular ;)
I took the liberty to translate a piece of C# code I've written to calculate the Levenshtein distance into Java code. It uses only two single-dimension arrays that alternate instead of a big jagged array:
public static int getDifference(String a, String b)
{
// Minimize the amount of storage needed:
if (a.length() > b.length())
{
// Swap:
String x = a;
a = b;
b = x;
}
// Store only two rows of the matrix, instead of a big one
int[] mat1 = new int[a.length() + 1];
int[] mat2 = new int[a.length() + 1];
int i;
int j;
for (i = 1; i <= a.length(); i++)
mat1[i] = i;
mat2[0] = 1;
for (j = 1; j <= b.length(); j++)
{
for (i = 1; i <= a.length(); i++)
{
int c = (a.charAt(i - 1) == b.charAt(j - 1) ? 0 : 1);
mat2[i] =
Math.min(mat1[i - 1] + c,
Math.min(mat1[i] + 1, mat2[i - 1] + 1));
}
// Swap:
int[] x = mat1;
mat1 = mat2;
mat2 = x;
mat2[0] = mat1[0] + 1;
}
// It's row #1 because we swap rows at the end of each outer loop,
// as we are to return the last number on the lowest row
return mat1[a.length()];
}
It is not rigorously tested, but it seems to be working okay. It was based on a Python implementation I made for a university exercise. Hope this helps!
I would recommend either Levenshtein Distance or the Jaccard Distance for comparing text.
I found SumMetrics in Java, but haven't used it.