Question

The problem is that: I have different txt files in which is registered a timestamp and an ip address for every malware packet that arrives to a server. What I want to do is create another txt file that shows, for every ip, the first time a malware packet arrives.

In general I want to do something like this :

for every  line in file.txt
 if (ip is not present in list.txt)
 copy timestamp and ip in list.txt

I'm using awk for doing it. The main problem is the "if ip is not present in list.txt". I'm doing this:

 {    a=$( grep -w "$3" list.txt | wc -c );
    if ( a == 0 )
   {
     #copy timestamp and ip in list.txt
   }

( i'm using $3 because the ip address is in the third column of the source file )

I don't know how to make awk evaluate the grep function. I've tried with backticks also but it didn't work. Someone could give me some hint?

I'm testing my script on test file like this:

10  192.168.1.1
11  192.168.1.2
12  192.165.2.4
13  122.11.22.11    
13  192.168.1.1
13  192.168.1.2
13  122.11.22.11
14  122.11.22.11
15  122.11.22.11
15  122.11.22.144
15  122.11.2.11
15  122.11.22.111

What should I obtain is:

10  192.168.1.1
11  192.168.1.2
12  192.165.2.4
13  122.11.22.11    
15  122.11.22.144
15  122.11.2.11
15  122.11.22.111

Thanks to your help I've succeded in creating the script that fits my needs :

awk '
FILENAME == ARGV[1] {
    ip[$2] = 1
    next
}
! ($2 in ip) {
    print $1, $2 >> ARGV[1]
    ip[$2] = 1
}
' list.txt file.txt 
Was it helpful?

Solution

But really what you want to do is get awk to read the list.txt file first, then process the other file with the list.txt data in memory. This will allow you to avoid calling system() for each line.

I assume the ip is in the 1st column of list.txt.

When you say copy timestamp and ip in list.txt, I assume you want to append some info from the current line of file.txt to the list.txt file.

awk '
    FILENAME == ARGV[1] {
        ip[$1] = 1
        next
    }
    ! ($3 in ip) {
        print $3, $(whatevever_column_holds_timestamp) >> ARGV[1]
    }
' list.txt file.txt

Given the sample file and simplified requirements of your question update:

awk '! seen[$2]++' filename

will produce the results you've seen. That awk program will print the line if the IP has not yet been seen.

OTHER TIPS

Interpreting the question as "How can I evaluate the status of a command from within awk?", just use system.

{
  if( system( "cmd" ) == 0 ) {
    # the command succeeded
  {
}

So, in your case, just do:

{
  if( system( "grep -w \"" $3 "\" list.txt > /dev/null " ) == 0 ) {
    ...
  }
}

You might want to reconsider your approach to the problem, though. Grepping each time is computationally expensive, and there are better ways to approach the problem. (Read list.txt once into an array, for example.)

Also, note that you do not need to use wc. grep fails if it doesn't match the string. Use the return value rather than parsing the output.

This will save the result of execution into variable a

BEGIN {  } 
{
"grep -w \"$3\" list.txt | wc -c" | getline a
print a
}
END   {}

You want to use getline:

BEGIN {
    "date" | getline current_time
     close("date")
     print "Report printed on " current_time
}

That takes the output of date and puts it into the current_time variable. You should be able to do the same with your grep | wc -l.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top