AWK is it possible to read a time field and use it for sorting?

https://stackoverflow.com//questions/22053402

21-12-2019
|

Question

I have two files and I need to sort and merge the rows based on the time column:

File A:

"2014-02-26 16:03:04"   "Login Success|isNoSession=false"   id=csr,ou=user,dc=openam,dc=forgerock,dc=org    7efb2f0e035a0e3d01  10.17.174.30    INFO    dc=openam,dc=forgerock,dc=org   "cn=dsameuser,ou=DSAME Users,dc=openam,dc=forgerock,dc=org" AUTHENTICATION-100  DataStore   "Not Available" 10.17.174.30

File B:

"2014-02-26 16:02:27"   "Login Failed"  dennis  "Not Available" 10.17.174.30    INFO    dc=openam,dc=forgerock,dc=org   "cn=dsameuser,ou=DSAME Users,dc=openam,dc=forgerock,dc=org" AUTHENTICATION-200  DataStore   "Not Available" 10.17.174.30    
"2014-02-26 16:02:37"   "Login Failed"  purva   "Not Available" 10.17.174.30    INFO    dc=openam,dc=forgerock,dc=org   "cn=dsameuser,ou=DSAME Users,dc=openam,dc=forgerock,dc=org" AUTHENTICATION-200  DataStore   "Not Available" 10.17.174.30

I need to merge the files (pretty standard) but I have to insert the rows into final file based on time found in column 1. I have several other items to modify for each line but I'm pretty sure I can figure that out. The sorting based on time column has me stumped.

So in this case I would have a file with the line from File A at the end.

Other details.

Just to refresh myself on gawk I was working on parsing the first file. Here is what I have so far:

#!/bin/awk -f
BEGIN {
    FS="\t";
}
{
    # if we have more than 12 fields for the current row, proceed
    if ( NF > 12 )
    {
        # start looking for the user name
        n = split( $3, var1, ",");
        if (n > 4)
        {
            n2 = split (var1[1], var2, "=");
            if (n2 >= 2)
            {
                # Ignore any line where we do not have "id=xxxxx,..."
                if (var2[1] == "id")
                {
                    print $1, "N/A", "N/A", $12, $5, $5, var2[2]
                }
            }
        }
    }
}
END {
    print "Total Number of records=" NR
}

I probably need to move that into a function to make it easier since I'm going to be processing two files at the same time.

Solution

Based in the linux and bash tags, you can concatenate both files, sort them by first field and then apply your awk command to the result:

cat fileA fileB | sort -t$'\t' -s -k1,1 | awk -f script.awk

OTHER TIPS

Little extra work but if you'd like to do it completely in awk (GNU awk), then you'll have to use mktime and strftime functions.

Here is a hint:

awk '{
    # Split the time field so that you have a pattern of YYYY MM DD HH MM SS
    split($0, t, /[-: ]/); 
    patt = t[1] FS t[2] FS t[3] FS t[4] FS t[5] FS t[6];  
    # Store your variable in array
    time[mktime(d)]++
}
END {
    # Sort the array so that you get sorted time
    x = asorti(time, s_time)
    # Iterate over your new sorted array and print it in desired format
    for(i=1; i<=x; i++) {
        print strftime("%Y-%m-%d %T",s_time[i])
    }
}' file

$ cat file
2014-02-26 16:03:04
2017-02-26 16:02:27
2012-02-26 16:02:37

$ awk '{
    split($0, t, /[-: ]/); 
    patt = t[1] FS t[2] FS t[3] FS t[4] FS t[5] FS t[6];   
    time[mktime(d)]++
}
END {
    x = asorti(time, s_time)
    for(i=1; i<=x; i++) {
        print strftime("%Y-%m-%d %T",s_time[i])
    }
}' file
2012-02-26 16:02:37
2014-02-26 16:03:04
2017-02-26 16:02:27

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow