awk find difference between second field in two files

https://stackoverflow.com/questions/23443083

unix
awk

14-07-2023
|

Question

I am using awk to find difference between field 2 in two different files based on field 1.

My files are like below :

file1 :

2014-04-28|2667066
2014-04-29|5484549
2014-04-23|5484572
2014-04-24|2822096

file2 :

2014-04-27|2667066
2014-04-28|7746836
2014-04-29|5484549
2014-04-30|2822060

For each date(field 1),if count(field 2) does not match , i would like print the difference into a separate file.

I currently have the below script to find the difference , however it is not displaying the records which are in file 1 but not in file 2 :

awk -F\| 'NR==FNR{a[$1]=$2;next}a[$1]!=$NF{printf "%s, %s Cnt:%d %s Cnt:%d\n",$1,ARGV[1],a[$1],ARGV[2],$NF}' file1 file2
2014-04-27, file1 Cnt:0       file2 Cnt:2667066
2014-04-28, file1 Cnt:2667066 file2 Cnt:7746836
2014-04-30, file1 Cnt:0       file2 Cnt:2822060

Required Result :

2014-04-23, file1 Cnt:5484572 file2 Cnt:0
2014-04-24, file1 Cnt:2822096 file2 Cnt:0
2014-04-27, file1 Cnt:0       file2 Cnt:2667066
2014-04-28, file1 Cnt:2667066 file2 Cnt:7746836
2014-04-30, file1 Cnt:0       file2 Cnt:2822060

Any help is greatly appreciated.

Solution

awk -F\| '
NR==FNR { a[$1] = $2; next }
a[$1]!=$NF {
    printf "%s, %s Cnt:%d %s Cnt:%d\n", $1, ARGV[1], a[$1], ARGV[2], $NF
}
{ delete a[$1] } 
END {
    for (i in a) {
        printf "%s, %s Cnt:%d %s Cnt:%d\n", i, ARGV[1], a[i], ARGV[2], 0
    }
}' file1 file2 | sort -k1

Output:

2014-04-23, file1 Cnt:5484572 file2 Cnt:0
2014-04-24, file1 Cnt:2822096 file2 Cnt:0
2014-04-27, file1 Cnt:0 file2 Cnt:2667066
2014-04-28, file1 Cnt:2667066 file2 Cnt:7746836
2014-04-30, file1 Cnt:0 file2 Cnt:2822060

It removes elements of the array that match file2, then prints all remaining elements of the array. This does not save the order of lines, so I have piped it to sort -k1.

OTHER TIPS

I think that you are excluding dates which dont exist in one of the files in the first condition of awk. Remove NR==FNR and keep the assignation a[$1]=$2;

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow