A pretty simple way to do it would be to just remove the common prefix and suffix before doing the distance calculation. The absolute distance between the resulting strings will be the same as with the full strings, but when the shorter length is taken into account the distance looks much greater.
Also keep in mind that in general even grevious misspellings get the first letter right. It's highly likely, then, that Cowville
and Bowville
are different cities, even though their L. distance is only 1.
You can make your job a lot easier by, at least at first, not doing the distance calculation if two words start with the different letters. They're likely to be different. Concentrate first on removing duplicates of words that start with the same letters. If, after that, you still have a large number of potential duplicates, you can refine your distance threshold to more closely examine words that start with different letters.