Question

I am currently implementing some sort of text-version (revision) comparison visualizations and am trying to find some information about how wikipedia achieves their "View History"-feature in which they allow to compare the current revision with an older one.

You can find one example (About stackoverflow!) here:

http://en.wikipedia.org/w/index.php?title=Stack_Overflow&diff=512241244&oldid=458578615

I have implemented several ideas so far and also tried to reproduce the way wikipedia is doing it. For this I've implemented the Levenshtein-distance algorithm ( http://en.wikipedia.org/wiki/Levenshtein_distance ).

Lets assume I have two lists. I am iterating over the first list and check for the index-position of the first list on the second if the string found there is more than 50% equal. If it is, I'll just print both Strings side by side in my comparison view and continue with the next item of the first list. If it is not, I check the next item in the second list until I find it or leave the field for the second list blank if it cannot be found. (Although I would basically prefer that a sentence from the second list also always appears on the comparison view instead of leaving it out, just e.g. with a blank field for the first list field)

This method has some weaknesses. At first, if some sentence got deleted I would need to check the positions around the index for not simply "forgetting" it. But still I need to take care that text positions don't get inverted if I do so.

Has anyone of you tried to achieve something similar with java? If there are some code examples how others or you achieved it, I would gladly take a look to learn from it.

And of course, if you know anything about the algorithm wikipedia (and general wikis I assume?) uses for their revision comparison I'd be glad to hear it.

Thanks a lot

Was it helpful?

Solution

Wikipedia explains how the wiki difference engine works - http://en.wikipedia.org/wiki/Help:Diff

You can follow the links at the bottom of the page to learn more, but this page lists the template used.

OTHER TIPS

Another implementation besides Wikipedia's version control is diff on Unix flavor systems. GNU actually makes the source code available for diff which may enable you to look at their algorithms here:

http://ftp.gnu.org/gnu/diffutils/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top