Question

I want to sum up occurrence output of "uniq -c" command. How can I do that on the command line?

For example if I get the following in output, I would need 250.

 45 a4
 55 a3
  1 a1
149 a5
Was it helpful?

Solution

awk '{sum+=$1} END{ print sum}'

OTHER TIPS

This should do the trick:

awk '{s+=$1} END {print s}' file

Or just pipe it into awk with

uniq -c whatever | awk '{s+=$1} END {print s}'

for each line add the value of of first column to SUM, then print out the value of SUM

awk is a better choice

uniq -c somefile | awk '{SUM+=$1}END{print SUM}'

but you can also implement the logic using bash

uniq -c somefile | while read num other
do
   let SUM+=num;
done
echo $SUM

uniq -c is slow compared to awk. like REALLY slow.

{mawk/mawk2/gawk} 'BEGIN { OFS = "\t" } { freqL[$1]++; } END {  # modify FS for that
                                                                # column you want
   for (x in freqL) { printf("%8s %s\n", freqL[x], x) } }'      # to uniq -c upon

if your input isn't large like 100MB+, then gawk suffices after adding in the

PROCINFO["sorted_in"] = "@ind_num_asc";  # gawk specific, just use gawk -b mode

if it's really large, it's far faster to use mawk2 then pipe to to

   { mawk/mawk2 stuff... } | gnusort -t'\t' -k 2,2

To get the sum of unique lines, just use use wc -l on the output.

for example:

/tmp/a

a
b
a
b
c
a
c
b
c
a
b
b

expected: 4 a, 5 b, 3 c, 1 , 1 expected

example output: cat /tmp/a | sort -h | uniq -c

      1
      4 a
      5 b
      3 c
      1 expected: 4 a, 5 b, 3 c, 1 , 1 expected

as we can see there are 5 unique lines, so we run wc -l on the output to get this number

cat /tmp/a | sort -h | uniq -c | wc -l

5

Caveat: This method has not been tested with special/invisible characters, so your mileage may vary there.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top