Question

I would like to implement a type of version history on my website and I need a way of comparing strings or object keys. For example:

Original string / object key: The quicker brown fox

Revised string / object key: The quick brown fox jumped over the lazy rabbit

Revision: added jumped over the lazy rabbit removed er

I would like to save only the revision in my history table. I don't really know where to start, so any ideas how to get me going or advice on the approach would be really appreciated.

I am aware of the find() function and I suspect it's a prime candidate for use, but I don't know how to visualize it as a solution since it compares strings "wholesale" so to speak.

Was it helpful?

Solution

You want a diffing algorithm (I've tagged the question as such), which I highly recommend you not try to write yourself. I've tried - and failed - as it's a NP complete problem and not easy to wrap your mind around. Instead, check out diff-match-patch, which has a JavaScript and Java implementation for client (demo) or server side processing. If you need to do HTML differencing look at daisydiff instead, albeit be forewarned HTML/XML diffing is truly a painful experience (see this page for some reasons why).

Probably the grand-daddy of diffing is GNU diff, which also has a Java implementation (find "GNU Diff for Java"). This algorithm is more optimized than diff-match-patch (dmp), albeit dmp seems to be improving all the time, so if you need to compare very large strings (e.g. megabytes) the GNU algorithm is probably a better bet.

OTHER TIPS

OK, what about this then? Not sure if it does plain old strings as you'd like, but it seems to address your concerns about not knowing how to tackle the Java integration bits (since it's already written). Should at least point you in the write direction.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top