Here's a pure POSIX awk
solution (with sort
invoked from inside the awk
program):
awk -F. -v OFS='\t' '
# Build an associative array that maps each unique top-level domain
# (taken from the last `.`-separated field, `$NF`) to how often it
# occurs in the input.
{ a[$NF]++ }
END {
# Print the header.
print "Domain Name", "No of Email"
print "----------------------------"
# Output the associative array and sort it (by top-level domain).
for (k in a) print k, a[k] | "sort"
}
' file
If you have GNU awk 4.0
or higher, you can make do without the external sort
and even easily control the sort field from inside the gawk
program:
gawk -F. -v OFS='\t' '
# Build an associative array that maps each unique top-level domain
# (taken from the last `.`-separated field, `$NF`) to how often it
# occurs in the input.
{ a[$NF]++ }
END {
# Print the header.
print "Domain Name", "No of Email"
print "----------------------------"
# Output the associative array and sort it (by top-level domain).
# First, control output sorting by setting the order in which
# the associative array will be looped over by, via the special
# PROCINFO["sorted_in"] variable; e.g.:
# - Sort by top-level domain, ascending: "@ind_str_asc"
# - Sort by occurrence count, descending: "@val_num_desc"
PROCINFO["sorted_in"]="@ind_str_asc"
for (k in a) print k, a[k]
}
' file