awk: create list of destination ports seen for each source IP from a bro log (conn.log)

StackOverflow https://stackoverflow.com/questions/16742955

  •  30-05-2022
  •  | 
  •  

Question

I'm trying to solve a problem in awk as an exercise but I'm having trouble. I want awk (or gawk) to be able to print all unique destination ports for a particular source IP address.

The source IP address is field 1 ($1) and the destination port is field 4 ($4).

Cut for brevity:
SourceIP          SrcPort   DstIP           DstPort
192.168.1.195       59508   98.129.121.199  80
192.168.1.87        64802   192.168.1.2     53
10.1.1.1            41170   199.253.249.63  53
10.1.1.1            62281   204.14.233.9    443

I imagine you would store each Source IP as in index to an array. But I'm not quite sure how you would store destination ports as values. Maybe you can keep appending to a string, being the value of the index e.g. "80,"..."80,443,"... for each match. But maybe that's not the best solution.

I'm not too concerned about output, I really just want to see how one can approach this in awk. Though, for output I was thinking something like,

Source IP:dstport, dstport, dstport
192.168.1.195:80,443,8088,5900

I'm tinkering with something like this,

awk '{ if ( NR == 1) next; arr[$1,$4] = $4 } END { for (i in arr) print arr[i] }' infile

but cannot figure out how to print out the elements and their values for a two-dimensional array. It seems something along this line would take care of the unique destination port task because each port is overwriting the value of the element.

Note: awk/gawk solution will get the answer!

Solution EDIT: slightly modified Kent's solution to print unique destination ports as mentioned in my question and to skip the column header line.

awk '{ if ( NR == 1 ) next ; if ( a[$1] && a[$1] !~ $4 ) a[$1] = a[$1]","$4; else a[$1] = $4 } END {for(x in a)print x":"a[x]}'
Was it helpful?

Solution

here is one way with awk:

 awk '{k=$1;a[k]=a[k]?a[k]","$4:$4}END{for(x in a)print x":"a[x]}' file

with your example, the output is:

kent$  awk '{k=$1;a[k]=a[k]?a[k]","$4:$4}END{for(x in a)print x":"a[x]}' file                                                                                               
192.168.1.195:80
192.168.1.87:53
10.1.1.1:53,443

(I omitted the title line)

EDIT

k=$1;a[k]=a[k]?a[k]","$4:$4

is exactly same as:

if (a[$1])                   # if a[$1] is not empty
    a[$1] = a[$1]","$4       # concatenate $4 to it separated by ","
else                         # else if a[$1] is empty
    a[$1] = $4               # let a[$1]=$4

I used k=$1 just for saving some typing. also the x=boolean?a:b expression

I hope the explanation could let you understand the codes.

OTHER TIPS

I prefer a solution using perl because I like more the posibilities of creating data structures like hash of arrays:

perl -ane '
    ## Same BEGIN block than AWK. It prints header before processing any input.
    BEGIN { printf qq|%s:%s\n|, q|Source IP|, q|dstport| }

    ## Skip first input line (header).
    next if $. == 1;

    ## This is what you were thinking to achieve. Store source IP as key of a 
    ## hash, and instead of save a string, it will save an array with all
    ## ports.
    push @{ $ip{ $F[0] } }, $F[ 3 ]; 

    ## Same END block than AWK. For each IP, get all ports saved in the array
    ## and join them using a comma.
    END { printf qq|%s:%s\n|, $_, join q|,|, @{ $ip{ $_ } } for keys %ip }

' infile

It yields:

Source IP:dstport
192.168.1.195:80
10.1.1.1:53,443
192.168.1.87:53
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top