Question

I have 2 files File1 and File2. File1 has some values separated by "|". For example,

A|a
C|c
F|f

File2 also has some values separated by "|". For example,

a|1
b|2
c|3
d|4
e|5

Means 2nd column in File1 is resembled with 1st column of File2.

I have to create 3rd file File3 with expected output

A|a|1
C|c|3

I tried to take each record in loop and searched for that in File2 using "awk".

It worked, but the problem is both File1 and File2 are having more than 5 million records.

I need an optimized solution.

Was it helpful?

Solution

You can use this awk,

awk -F'|' 'NR==FNR{a[$2]=$1;next} $1 in a { print a[$1],$1,$2 }' OFS="|" file1 file2 > file3

More clearer way:

awk  'BEGIN{ OFS=FS="|";} NR==FNR{a[$2]=$1;next} $1 in a { print a[$1],$1,$2 }' file1 file2 > file3

As per @Kent suggestion:

If your file2 have more than two columns that you want it in file3 then,

awk  'BEGIN{ OFS=FS="|";} NR==FNR{a[$2]=$1;next} $1 in a { print a[$1],$0 }' file1 file2 > file3

Here,

FS - Field Separator

OFS - Output Field Separator

OTHER TIPS

This is what join was created to do:

$ join -t '|' -o '1.1,1.2,2.2' -1 2 -2 1 file1 file2
A|a|1
C|c|3

man join for more details and pay particular attention to the files needing to be sorted on the join fields (i.e. 2nd field for file1 and 1st field for file2), as your posted sample input is.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top