What is the general format of Ruby “diff-lcs” diff output?
Question
The Ruby diff-lcs
library does a great job of generating the changeset you need to get from one sequence to another but the format of the output is somewhat confusing to me. I would expect a list of changes but instead the output is always a list containing one or two lists of changes. What is the meaning/intent of having multiple lists of changes?
Consider the following simple example:
> Diff::LCS.diff('abc', 'a-c')
# => [[#<Diff::LCS::Change:0x01 @action="-", @position=1, @element="b">,
# #<Diff::LCS::Change:0x02 @action="+", @position=1, @element="-">],
# [#<Diff::LCS::Change:0x03 @action="-", @position=3, @element="">]]
Ignoring the fact that the last change is blank, why are there two lists of changes instead of just one?
Solution
You might have better luck with a better example. If you do this:
Diff::LCS.diff('ab cd', 'a- c_')
Then the output looks like this (with the noise removed):
[
[
<@action="-", @position=1, @element="b">,
<@action="+", @position=1, @element="-">
], [
<@action="-", @position=4, @element="d">,
<@action="+", @position=4, @element="_">
]
]
If we look at Diff::LCS.diff('ab cd ef', 'a- c_ e+')
, then we'd get three inner arrays instead of two.
What possible reason could there be for this? There are three operations in a diff:
- Add a string.
- Remove string.
- Change a string.
A change is really just a combination of removes and adds so we're left with just remove and add as the fundamental operations; these line up with the @action
values quite nicely. However, when humans look at diffs, we want to see a change as a distinct operation, we want to see that b
has become -
, the "remove b
, add -
" version is an implementation detail.
If all we had was this:
[
<@action="-", @position=1, @element="b">,
<@action="+", @position=1, @element="-">,
<@action="-", @position=4, @element="d">,
<@action="+", @position=4, @element="_">
]
then you'd have to figure out which +/-
pairs were really changes and which were separate additions and removals.
So the inner arrays map the two fundamental operations (add, remove) to the three operations (add, remove, change) that humans want to see.
You might want to examine the structure of the outputs from these as well:
Diff::LCS.diff('ab cd', 'a- x c_')
Diff::LCS.diff('ab', 'abx')
Diff::LCS.diff('ab', 'xbx')
I think an explicit change @action
for Diff::LCS::Change
would be better but at least the inner arrays let you group the individual additions and removals into higher level edits.