Comparing few colums of a file with columns of another file

https://stackoverflow.com/questions/16220362

13-04-2022
|

Question

I have two data files 1.txt and 2.txt

1.txt contains valid lines.

For example.

1 2 1 2 
1 3 1 3

In 2.txt i have an extra coloum, but if you ignore that, I have a few valid lines, and few invalid lines. There could be multiple occurrences of the same line in 2.txt

For example:

1 2 1 2 1.9
1 3 1 3 3.4
1 3 1 3 3.4
2 3 2 3 5.6
2 3 2 3 5.6

The second and third lines are the same and valid.

The fourth and fifth lines are also the same but invalid.

I want to write a shell script which compares these two files and outputs two files, valid.txt and invalid.txt which look like these...

valid.txt :

1 2 1 2 1
1 3 1 3 2

and invalid.txt :

2 3 2 3 2

The last extra column of valid.txt and invalid.txt contains the number of times the line has been repeated in 2.txt.

Solution

this awk script works for the example data:

 awk 'NR==FNR{sub(/ *$/,"");a[$0]++;next}
        {sub(/ [^ ]*$/,"")
         if($0 in a)
                 v[$0]++
         else 
                 n[$0]++
        }
        END{
            for(x in v)print x,v[x] > "valid.txt"
            for(x in n) print x,n[x] >"inv.txt"
        }' file1 file2

output:

kent$  head inv.txt valid.txt
==> inv.txt <==
2 3 2 3 2

==> valid.txt <==
1 3 1 3 2
1 2 1 2 1

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow