How to count occurrence of specific characters in file using awk command using associative array

StackOverflow https://stackoverflow.com/questions/22800531

  •  25-06-2023
  •  | 
  •  

Question

I have file like :

manish@yahoo.com
Rajesh.patel@hotmail.in
jkl@gmail.uk
New123@utu.ac.in
qwe@gmail.co.in

And i want to count occurrence of each domain as

Domain Name No of Email
-----------------------
com         1
in          3
uk          1
Was it helpful?

Solution 2

You can use sed, sort, uniq:

sed 's/.*[.]//' input | sort | uniq -c

Gives:

  1 com
  3 in
  1 uk

And some make-up with awk:

sed 's/.*[.]//' input | sort | uniq -c | \
     awk 'BEGIN{print "Domain Name No of Email\n-----------------------"} \
          {print $2"\t\t"$1}'

To get:

Domain Name No of Email
-----------------------
com     1
in      3
uk      1

OTHER TIPS

Here's a pure POSIX awk solution (with sort invoked from inside the awk program):

awk -F. -v OFS='\t' '
    # Build an associative array that maps each unique top-level domain
    # (taken from the last `.`-separated field, `$NF`) to how often it
    # occurs in the input.
  { a[$NF]++ }

  END { 
      # Print the header.
    print "Domain Name", "No of Email"
    print "----------------------------"
     # Output the associative array and sort it (by top-level domain).
    for (k in a) print k, a[k] | "sort"
  }
' file

If you have GNU awk 4.0 or higher, you can make do without the external sort and even easily control the sort field from inside the gawk program:

gawk -F. -v OFS='\t' '
    # Build an associative array that maps each unique top-level domain
    # (taken from the last `.`-separated field, `$NF`) to how often it
    # occurs in the input.
  { a[$NF]++ }

  END { 
      # Print the header.
    print "Domain Name", "No of Email"
    print "----------------------------"
     # Output the associative array and sort it (by top-level domain).
     # First, control output sorting by setting the order in which 
     # the associative array will be looped over by, via the special
     # PROCINFO["sorted_in"] variable; e.g.:
     #  - Sort by top-level domain, ascending:  "@ind_str_asc"
     #  - Sort by occurrence count, descending: "@val_num_desc"
    PROCINFO["sorted_in"]="@ind_str_asc"
    for (k in a) print k, a[k]
  }
' file
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top