As @fedorqui commented, your example inputs/output are not consistent. I think this should do the trick though:
awk 'NR==FNR{a[$2]=$0; next} a[$2]>0{print a[$2],$1}' file1 file2
file1:
A alice
B bob
C carol
D dan
file2:
1 dan
2 alice
3 carol
4 bob
Output:
$ awk 'NR==FNR{a[$2]=$0} NR>FNR && a[$2]>0{print a[$2],$1}' file1 file2
D dan 1
A alice 2
C carol 3
B bob 4
Output can be sorted by whatever column you choose using sort -k
. Breaking down the awk
code:
NR==FNR{a[$2]=$0; next}
-NR
is anawk
variable which contains the total number of lines processed so far.FNR
is similar, but only contains the number of processes lines of the current file, so this condition effectively means "only do this for the first input file". The associated action stores the entire line ($0
) in the associative arraya
, with the index being the value of the second field of the row.next
just means thatawk
should move to the next line without doing further processing.a[$2]>0{print a[$2],$1}
- this condition will only be tested for the second and subsequent input files. The second field is used as an index to look up a value from thea
- if the value is greater than 0 (ie, the value is a string in this case) then the value is printed, followed by the first field of the current line.
Basically, every line of the first file is stored in an array, indexed by the second field of the line. If that second field is matched in the second file, then the whole line from the first file is printed, with the second field from the second file appended.