Question

I have three files and I would like to use awk to compare the first two and then update the last one with names that are not matched from the frist file.

File 1: ignore.txt

bob
diana

File 2: list.txt

alice
bob
chris
diana
elvis

File 3: names.txt

alice
chris
elvis

File 2 will have new names added every little while, so I must be able to compare it with both other files and add any new names on that list to names.txt.

This is my script so far, comparing List and ignore is working, but it doesnt do the updating because I still dont understand correctly how to use getline and compare the new array against the files in memory.

I'm calling the script like: awk -f compare ignore.txt list.txt and it works. If I call it with the extra variable, it doesnt do it: awk -f compare -v newnames=1 ignore.txt list.txt.

 BEGIN {
  file="list.txt"
  tmpfile="new_list.txt"
  }
# working
FNR == NR { names[$0]++; next }
!names[$0] {
   print > names.txt
}
{ #not working
if (newnames == 1) {
  mvcmd="mv " tmpfile file;
    while ((getline newnames < file) > 0)
      newnames[$0]++; next
     !newnames[$0];
        print > tmpfile 
     system(mvcmd)
     close(dbfile)
  }

How can I make it work? The code is not very good because I find it confusing. Sorry.

Was it helpful?

Solution 2

This is a very common use for awk:

$ awk 'FNR==NR{a[$0];next}!($0 in a)' file1 file2
alice
chris
elvis

It's easier just rewrite the whole file3 then just updating it:

$ awk 'FNR==NR{a[$0];next}!($0 in a)' file1 file2 > file3

Explanation:

NR is an awk variable incremented after every record is read, FNR is similar but gets reset to 1 everytime a new file is read. NR==FNR can therefor only be True when reading the first file. When reading the first file we create an array a where the keys in the array are the lines in the file, as well as storing all the lines from file1 this will remove any duplicates. next is a command that makes sure no further blocks get executed on the current record. Once file1 has been read we just check if the current line in file2 is found in the array (i.e was in file1). The condition !($0 in a) has no block to executed so by default awk executed {print $0}.

There is plenty wrong with your script, the best thing to do would be to read Effective Awk Programming if you want to learn awk.

OTHER TIPS

This is what you need, given how you describe the problem:

awk 'FILENAME!=ARGV[3]{seen[$0]++;next} !seen[$0]++' file1 file3 file2

To update file3 with the output:

awk 'FILENAME!=ARGV[3]{seen[$0]++;next} !seen[$0]++' file1 file3 file2 >> file3

It will even remove duplicate new names from file2:

$ cat file1
bob
diana

$ cat file2
alice
bill
bob
chris
ted
diana
elvis
ted

$ cat file3
alice
chris
elvis

$ awk 'FILENAME!=ARGV[3]{seen[$0]++;next} !seen[$0]++' file1 file3 file2
bill
ted

If all of the values in file3 also exist in file2 but duplicates in file2 are possible then this is all you need:

awk 'NR==FNR{seen[$0]++;next} !seen[$0]++' file1 file2 > file3

If all of the values in file3 also exist in file2 and duplicates in file2 are not possible, @sudo_O's solution will work just fine.

Here is a way to do this using grep:

grep -v -f names.txt <(grep -v -f ignore.txt list.txt) >>  names.txt

This would work even if names.txt does not exist to begin with. (Of course, it'd update names.txt if new additions are made to list.txt and the command is executed again.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top