Question

I'm looking for a way to do programmatically detect the delta ratio between two strings. I can use string length, but this doesn't give much useful information for like-sized but different inputs. There is a java diff tool on google code Java Diff Utils, but it hasn't been updated since 2011 and I don't need to actually modify the Strings themselves.

I'm attempting to do change detection with threshold values, for instance: Updated string is 42% different than existing string, are you sure you want to proceed?

Does anyone know of a library that could be used for this, or is java-diff-utils my only option? I couldn't find much in apache commons, and googling is returning irrelevant information.

Was it helpful?

Solution

You could use the Levenshtein Distance to calculate how much different two strings are amongst themselves. There's some quite complex math there but the actual code is rather short. You can easily rewrite the code in that wiki in Java.

The difference will be measured in integers, saying how many steps you'd take to turn one string into the other. A step may be a character addition, removal, or replacement with another character. It will tell you the amount of steps it takes, but not which steps, nor in which order. But then again, since you only want to measure the total difference, I'm sure that's enough information for your needs.

edit: one of the commenters (kaos) provided a link to an implementation of Levenshtein Distance in the Apache Commons.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top