Comparing two lists with a shell script
-
05-07-2019 - |
Question
Suppose I have two lists of numbers in files f1, f2, each number one per line. I want to see how many numbers in the first list are not in the second and vice versa. Currently I am using grep -f f2 -v f1 and then repeating this using a shell script. This is pretty slow (quadratic time hurts). Is there a nicer way of doing this?
Solution
I like 'comm' for this sort of thing. (files need to be sorted.)
$ cat f1
1
2
3
$ cat f2
1
4
5
$ comm f1 f2
1
2
3
4
5
$ comm -12 f1 f2
1
$ comm -23 f1 f2
2
3
$ comm -13 f1 f2
4
5
$
OTHER TIPS
Couldn't you just put each number in a single line and then diff
(1) them? You might need to sort the lists beforehand, though for that to work properly.
In the special case where one file is a subset of the other, the following:
cat f1 f2 | sort | uniq -u
would list the lines only in the larger file. And of course piping to wc -l
will show the count.
However, that isn't exactly what you described.
This one-liner serves my particular needs often, but I'd love to see a more general solution.