Join multiple files in gawk

Question 1

For given sample files:

$ head f*
==> f1 <==
a 123
b 221
c 904

==> f2 <==
a 298
b 230
c 102

==> f3 <==
a 500
b 600
c 700

Method 1:

$ awk '{a[FNR]=((a[FNR])?a[FNR]FS$2:$0)}END{for(i=1;i<=FNR;i++) print a[i]}' f*
a 123 298 500
b 221 230 600
c 904 102 700

Method 2: (Will probably be faster as your are not loading 500 files in memory)

Using paste and awk together. (Assuming first column is same and present in all files). Doing paste f* will give you the following result:

$ paste f*
a 123   a 298   a 500
b 221   b 230   b 600
c 904   c 102   c 700

Pipe that to awk to remove extra columns.

$ paste f* | awk '{printf "%s ",$1;for(i=2;i<=NF;i+=2) printf "%s%s",$i,(i==NF?RS:FS)}'
a 123 298 500
b 221 230 600
c 904 102 700

You can re-direct the output to another file.

Question 2

I have encountered this problem very frequently.

I strongly encourage you to check into the getline function in gawk.

getline var < filename

is the command syntax and can be used to solve your problem.

I would suggest utilizing another language that solves this problem much more easily. Typically I invest about 5 lines of code to solve this standard problem.

j=1;
j=getline x < "filename";
if(j==0) {
      break;
}
... (Commands involving x such as split and print).

Question 3

You could try something like :

$ ls
f1.txt  f2.txt  f3.txt
$ awk '($0 !~ /#/){a[$1]=a[$1]" "$2} END {for(i in a){print i""a[i]}}' *.txt
a 123 298 299
b 221 230 231
c 904 102 103

Question 4

awk 'FNR==NR{arr[$1]=$2; next;}{printf "%s%s%s%s%s",$1,OFS,arr[$1],OFS,$2; print"";}' file1 file2

based on this