Вопрос

I have 2 files File1 and File2. File1 has some values separated by "|". For example,

A|a
C|c
F|f

File2 also has some values separated by "|". For example,

a|1
b|2
c|3
d|4
e|5

Means 2nd column in File1 is resembled with 1st column of File2.

I have to create 3rd file File3 with expected output

A|a|1
C|c|3

I tried to take each record in loop and searched for that in File2 using "awk".

It worked, but the problem is both File1 and File2 are having more than 5 million records.

I need an optimized solution.

Это было полезно?

Решение

You can use this awk,

awk -F'|' 'NR==FNR{a[$2]=$1;next} $1 in a { print a[$1],$1,$2 }' OFS="|" file1 file2 > file3

More clearer way:

awk  'BEGIN{ OFS=FS="|";} NR==FNR{a[$2]=$1;next} $1 in a { print a[$1],$1,$2 }' file1 file2 > file3

As per @Kent suggestion:

If your file2 have more than two columns that you want it in file3 then,

awk  'BEGIN{ OFS=FS="|";} NR==FNR{a[$2]=$1;next} $1 in a { print a[$1],$0 }' file1 file2 > file3

Here,

FS - Field Separator

OFS - Output Field Separator

Другие советы

This is what join was created to do:

$ join -t '|' -o '1.1,1.2,2.2' -1 2 -2 1 file1 file2
A|a|1
C|c|3

man join for more details and pay particular attention to the files needing to be sorted on the join fields (i.e. 2nd field for file1 and 1st field for file2), as your posted sample input is.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top