This can work for any given number of files:
$ cat a.cnt b.cnt | awk '{a[$2]+=$1} END{for (i in a) print a[i],i}'
1 a
2 b
3 c
So if you have let's say 10 files, you just have to do cat f1 f2 ...
and then pipe this awk
.
If the file names happen to share a pattern, you can also do (thanks Adrian Frühwirth!):
awk '{a[$2]+=$1} END{for (i in a) print a[i],i}' *cnt
So for example this will take into consideration all the files whose extension is cnt
.
Some possible side-aspects one might want to consider additionally:
- what if a, b, c are arbritrary strings, containing arbitrary white-spaces?
- what if the files are too big to fit in memory? Is there some
sort | uniq -c
-style command line option for this case that only looks at two lines at a time?
In that case, you can use the rest of the columns as indexes for the counter:
awk '{count=$1; $1=""; a[$0]+=count} END{for (i in a) print a[i],i}' *cnt
Note that in fact you don't need to sort | uniq -c
and redirect to a cnt
file and then perform this re-counting. You can do it all together with something like this:
awk '{a[$0]++} END{for (i in a) print a[i], i}' file
Example
$ cat a.cnt
1 and some
2 text here
$ cat b.cnt
4 and some
4 and other things
2 text here
9 blabla
$ cat *cnt | awk '{count=$1; $1=""; a[$0]+=count} END{for (i in a) print a[i],i}'
4 text here
9 blabla
4 and some
4 and other things
Regarding second comment:
$ cat b
and some
text here
and some
and other things
text here
blabla
$ awk '{a[$0]++} END{for (i in a) print a[i], i}' b
2 and some
2 text here
1 and other things
1 blabla