Best Awk Commands

https://stackoverflow.com/questions/273664

07-07-2019
|

Question

I find AWK really useful. Here is a one liner I put together to manipulate data.

ls | awk '{ print "awk " "'"'"'"  " {print $1,$2,$3} " "'"'"'"  " " $1 ".old_ext > " $1    ".new_ext"  }' > file.csh

I used this AWK to make a script file that would rename some files and only print out selective columns. Anyone know a better way to do this? What are you best AWK one liners or clever manipulations?

Solution

The AWK book is full of great examples. They used to be collected for download from Kernighan's webpage (404s now).

OTHER TIPS

You can find several nice one liners here.

I use this:

df -m | awk '{p+=$3}; END {print p}'

To total all disk space used on a system across filesystems.

Many years ago I wrote a tail script in awk:

#!/usr/bin/awk -f
BEGIN {
  lines=10
}

{
  high = NR % lines + 1
  a[high] = $0
}

END {
  for (i = 0; i < lines; i++) {
    n = (i + high) % lines + 1
    if (n in a) {
      print a[n]
    }
  }
}

It's silly, I know, but that's what awk does to you. It's just very fun playing with it.

Henry Spencer wrote a fairly good implementation of nroff on awk. He called it "awf". He also claimed that if Larry Wall had known how powerful awk was, he wouldn't have needed to invent perl.

Here's a couple of awks that I used to use regularly ... note that you can use $1, $2, etc to get out the column you want. So, for manipulating a bunch of files, for example here's a stupid command you could use instead of mv ...

ls -1 *.mp3 | awk '{printf("mv %s newDir/%s\n",$1,$1)}' | /bin/sh

Or if you're looking at a set of processes maybe ...

ps -ef | grep -v username | awk '{printf("kill -9 %s\n",$2)}' | /bin/sh

Pretty trivial but you can see how that would get you quite a ways. =) Most of the stuff I used to do you can use xargs for, but hey, who needs them new fangled commands?

I use this script a lot for editing PATH and path-like environment variables. Usage:

export PATH=$(clnpath /new/bin:/other/bin:$PATH /old/bin:/other/old/bin)

This command adds /new/bin and /other/bin in front of PATH, removes both /old/bin and /other/old/bin from PATH (if present - no error if absent), and removes duplicate directory entries on path.

:   "@(#)$Id: clnpath.sh,v 1.6 1999/06/08 23:34:07 jleffler Exp $"
#
#   Print minimal version of $PATH, possibly removing some items

case $# in
0)  chop=""; path=${PATH:?};;
1)  chop=""; path=$1;;
2)  chop=$2; path=$1;;
*)  echo "Usage: `basename $0 .sh` [$PATH [remove:list]]" >&2
    exit 1;;
esac

# Beware of the quotes in the assignment to chop!
echo "$path" |
${AWK:-awk} -F: '#
BEGIN       {       # Sort out which path components to omit
                    chop="'"$chop"'";
                    if (chop != "") nr = split(chop, remove); else nr = 0;
                    for (i = 1; i <= nr; i++)
                            omit[remove[i]] = 1;
            }
{
    for (i = 1; i <= NF; i++)
    {
            x=$i;
            if (x == "") x = ".";
            if (omit[x] == 0 && path[x]++ == 0)
            {
                    output = output pad x;
                    pad = ":";
            }
    }
    print output;
}'

Count memory used by httpd

ps -ylC httpd | awk '/[0-9]/ {SUM += $8} END {print SUM/1024}'

Or any other process by replacing httpd. Dividing by 1024 to get output in MB.

I managed to build a DOS tree command emulator for UNIX ( find + awk ):

find . -type d -print 2>/dev/null|awk '{for (i=1;i< NF;i++)printf("%"length($i)"s","|");gsub(/[^\/]*\//,"--",$0);print $NF}'  FS='/'

Print lines between two patterns:

awk '/END/{flag=0}flag;/START/{flag=1}' inputFile

Detailed explanation: http://nixtip.wordpress.com/2010/10/12/print-lines-between-two-patterns-the-awk-way/

A couple of favorites, essentially unrelated to each other. Read as 2 different, unconnected suggestions.

Identifying Column Numbers Easily

For those that use awk frequently, as I do for log analysis at work, I often find myself needing to find out what the column numbers are for a file. So, if I am analyzing, say, Apache access files (some samples can be found here) I run the script below against the file:

NR == 1 {
        for (i = 1 ; i <= NF ; i++)
                {
                print i "\t" $i
                }
        }
NR > 1  {
        exit
        }

I typically call it "cn.awk", for 'c'olumn 'n'umbers. Creative, eh? Anyway, the output looks like:

1   64.242.88.10
2   -
3   -
4   [07/Mar/2004:16:05:49
5   -0800]
6   "GET
7   /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables
8   HTTP/1.1"
9   401
10  12846

Very easy to tell what's what. I usually alias this on my servers and have it everywhere.

Referencing Fields by Name

Now, suppose your file has a header row and you'd rather use those names instead of field numbers. This allows you to do so:

NR == 1 {
    for (i = 1 ; i <= NF ; i++)
        {
        field[$i] = i
        }
    }

Now, suppose I have this header row...

metric,time,val,location,http_status,http_request

...and I'd like to sum the val column. Instead of referring to $3, I can refer to it by name:

NR > 1  {
    SUM += $field["val"]
    }

The main benefit is making the script much more readable.

Printing fields is one of the first things mentioned in most AWK tutorials.

awk '{print $1,$3}' file

Lesser known but equally useful is excluding fields which is also possible:

awk '{$1=$3=""}1' file

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow