sum occurrence output of uniq -c

Question 1

awk '{sum+=$1} END{ print sum}'

Question 2

This should do the trick:

awk '{s+=$1} END {print s}' file

Or just pipe it into awk with

uniq -c whatever | awk '{s+=$1} END {print s}'

Question 3

for each line add the value of of first column to SUM, then print out the value of SUM

awk is a better choice

uniq -c somefile | awk '{SUM+=$1}END{print SUM}'

but you can also implement the logic using bash

uniq -c somefile | while read num other
do
   let SUM+=num;
done
echo $SUM

Question 4

uniq -c is slow compared to awk. like REALLY slow.

{mawk/mawk2/gawk} 'BEGIN { OFS = "\t" } { freqL[$1]++; } END {  # modify FS for that
                                                                # column you want
   for (x in freqL) { printf("%8s %s\n", freqL[x], x) } }'      # to uniq -c upon

if your input isn't large like 100MB+, then gawk suffices after adding in the

PROCINFO["sorted_in"] = "@ind_num_asc";  # gawk specific, just use gawk -b mode

if it's really large, it's far faster to use mawk2 then pipe to to

   { mawk/mawk2 stuff... } | gnusort -t'\t' -k 2,2

Question 5

To get the sum of unique lines, just use use wc -l on the output.

for example:

/tmp/a

a
b
a
b
c
a
c
b
c
a
b
b

expected: 4 a, 5 b, 3 c, 1 , 1 expected

example output: cat /tmp/a | sort -h | uniq -c

      1
      4 a
      5 b
      3 c
      1 expected: 4 a, 5 b, 3 c, 1 , 1 expected

as we can see there are 5 unique lines, so we run wc -l on the output to get this number

cat /tmp/a | sort -h | uniq -c | wc -l

Caveat: This method has not been tested with special/invisible characters, so your mileage may vary there.