awk '{sum+=$1} END{ print sum}'
sum occurrence output of uniq -c
Question
I want to sum up occurrence output of "uniq -c" command. How can I do that on the command line?
For example if I get the following in output, I would need 250.
45 a4
55 a3
1 a1
149 a5
Solution
OTHER TIPS
This should do the trick:
awk '{s+=$1} END {print s}' file
Or just pipe it into awk
with
uniq -c whatever | awk '{s+=$1} END {print s}'
for each line add the value of of first column to SUM, then print out the value of SUM
awk
is a better choice
uniq -c somefile | awk '{SUM+=$1}END{print SUM}'
but you can also implement the logic using bash
uniq -c somefile | while read num other
do
let SUM+=num;
done
echo $SUM
uniq -c is slow compared to awk. like REALLY slow.
{mawk/mawk2/gawk} 'BEGIN { OFS = "\t" } { freqL[$1]++; } END { # modify FS for that
# column you want
for (x in freqL) { printf("%8s %s\n", freqL[x], x) } }' # to uniq -c upon
if your input isn't large like 100MB+, then gawk suffices after adding in the
PROCINFO["sorted_in"] = "@ind_num_asc"; # gawk specific, just use gawk -b mode
if it's really large, it's far faster to use mawk2 then pipe to to
{ mawk/mawk2 stuff... } | gnusort -t'\t' -k 2,2
To get the sum of unique lines, just use use wc -l
on the output.
for example:
/tmp/a
a
b
a
b
c
a
c
b
c
a
b
b
expected: 4 a, 5 b, 3 c, 1 , 1 expected
example output: cat /tmp/a | sort -h | uniq -c
1
4 a
5 b
3 c
1 expected: 4 a, 5 b, 3 c, 1 , 1 expected
as we can see there are 5 unique lines, so we run wc -l on the output to get this number
cat /tmp/a | sort -h | uniq -c | wc -l
5
Caveat: This method has not been tested with special/invisible characters, so your mileage may vary there.