Remove unneeded context lines from diff output (using sed)

https://stackoverflow.com/questions/21609603

08-10-2022
|

Question

I've got input from diffing several files. Those files contain 4 line long blocks of information, separated by empty lines, were sometimes 1-3 lines can differ.

I call diff with the parameter -c3 because I need the context arround the differing lines to get the complete info-block, since the line itself is worthless.

Because of this my output gets really cluttered, and hard to read. Hence I'm looking for a way to cut away the context-lines that don't belong to the differing block.

Samples of the input files:

Port-configuration of Switch "HP_e5412zl_secondary"
Timestamp: 20140206-161001

Interface:      A1
Description:    Uplink to primary switch
VLAN Untagged:  2
VLANs Tagged:   1 23 42 103 169

Interface:      A2
Description:    -- Not set --
VLAN Untagged:  30
VLANs Tagged:   

Interface:      A3
Description:    WS-198
VLAN Untagged:  1
VLANs Tagged:   

Interface:      A4
Description:    -- Not set --
VLAN Untagged:  30
VLANs Tagged:   

Interface:      A5
Description:    Printer finances
VLAN Untagged:  30
VLANs Tagged:

For repuducing my scenario please use this and just change some random lines.

When I run diff -c3 on two differing files I get something like this:

*** 2014-02-06/HP_e5412zl_secondary.txt   2014-02-06 16:14:38.024112434 +0100
--- 2014-02-05/HP_e5412zl_secondary.txt   2014-02-05 16:14:27.415741855 +0100
***************
*** 246,255 ****
  VLAN Untagged:        1
  VLANs Tagged:

  Interface:      A4
  Description:    -- Not set --
  VLAN Untagged:  30
  VLANs Tagged:   

  Interface:      A5
  Description:    Printer finances
--- 245,254 ----
  VLAN Untagged:        1
  VLANs Tagged:

  Interface:      A4
  Description:    WS-211
  VLAN Untagged:  1
  VLANs Tagged:   

  Interface:      A5
  Description:    Printer finances
***************
...

I've tried my best sed-tricks on it, but failed to isolate the info I need from contextual clutter. The desired Output would look like this:

*** 2014-02-06/HP_e5412zl_secondary.txt   2014-02-06 16:14:38.024112434 +0100
--- 2014-02-05/HP_e5412zl_secondary.txt   2014-02-05 16:14:27.415741855 +0100
***************
*** 246,255 ****

  Interface:      A4
  Description:    -- Not set --
  VLAN Untagged:  30
  VLANs Tagged:   

--- 245,254 ----

  Interface:      A4
  Description:    WS-211
  VLAN Untagged:  1
  VLANs Tagged:   

***************
...

-> While I wouldn't even need the lines containing the line-numbers. A simple separator would suffice.

I tried this:

diff -c3 file1 file2 | sed -n '/^[ ]*Inter.*/,/^[ ]*VLANs.*/p'

And this:

diff -c3 file1 file2 | sed -e '/^[*-]{3,}.*/,/^$/d'

Also I experimented around with the * and - characters by masking them \* or using just one of the. Using them masked and unmasked with and without the enclosing brackets - nothing worked.

Help? Please?

Bonusquestion: I'd like to do this with colordiff instead of diff. Would that enhance the difficulty (because of embedded color-codes or anything)?

Solution

If awk is acceptable, you could use this:

awk '/^[^! ]/ {p = 1; print;}  /^ *$/ {if (p++ % 2 == 0) print;}  (p % 2 == 0) { print; }'

Explanation:

On any diff meta-output, set p=1 and print the line
On any blank lines, add 1 to p. Print the line if p is even.
Otherwise, print the line if p is even.

This produces the desired output you provided. Note that this isn't suitable for feeding back into diff (because line numbers will need to change), and still contains all the diff meta-stuff, since you said you wanted it.

Note that my diff puts a ! as the first character of changed lines, so I also look for that as non-meta output.

This may work with a colorizing diff, if you can find a way to trick it into thinking your pipe can display color escapes.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow