It will be a little easier with awk
:
awk '{
lines[$1,$2]=(lines[$1,$2]?lines[$1,$2] RS $0:$0)
dups[$1,$2]++
}
END {
for(line in lines)
if(dups[line]>1) print lines[line]
}' file
v1=2 v2=10630231 v3=60528947 v4=17
v1=2 v2=10630231 v3=60529119 v4=18
- We create two arrays.
lines
anddups
. - We increment the count when first and second column are seen more than once. We use
dups
array for this. - In our
lines
array, we check if we have stored a line with same first and second column. If we have we append the duplicate line to it. - In the
END
block we iterate overlines
array. If the first and column are found more than once in ourdups
array, we print the lines.
Alternatively, if you don't want to keep entire file in memory, you can do the following (since you stated your data is already sorted):
awk '($1==c1 && $2==c2){print line RS $0}{line=$0;c1=$1;c2=$2}' file
- We assign variables
line
as your entire current line,c1
as column 1 andc2
as column 2. - If column 1 and 2 of current line and same as previous column1 and column2, print previous line and current line.