Compare two columns in different files and append data for the shared items - UNIX

Question 1

As @fedorqui commented, your example inputs/output are not consistent. I think this should do the trick though:

awk 'NR==FNR{a[$2]=$0; next} a[$2]>0{print a[$2],$1}' file1 file2

file1:

A alice
B bob
C carol
D dan

file2:

1 dan
2 alice
3 carol
4 bob

Output:

$ awk 'NR==FNR{a[$2]=$0} NR>FNR && a[$2]>0{print a[$2],$1}' file1 file2
D dan 1
A alice 2
C carol 3
B bob 4

Output can be sorted by whatever column you choose using sort -k. Breaking down the awk code:

NR==FNR{a[$2]=$0; next} - NR is an awk variable which contains the total number of lines processed so far. FNR is similar, but only contains the number of processes lines of the current file, so this condition effectively means "only do this for the first input file". The associated action stores the entire line ($0) in the associative array a, with the index being the value of the second field of the row. next just means that awk should move to the next line without doing further processing.
a[$2]>0{print a[$2],$1} - this condition will only be tested for the second and subsequent input files. The second field is used as an index to look up a value from the a - if the value is greater than 0 (ie, the value is a string in this case) then the value is printed, followed by the first field of the current line.

Basically, every line of the first file is stored in an array, indexed by the second field of the line. If that second field is matched in the second file, then the whole line from the first file is printed, with the second field from the second file appended.

Question 2

You can use join command to link fields on files. The simplest way to use it can be:

 join -j 2

to obtain an output with the first and second file of first file and following the first field of second file you can use the -o option as follow:

 join -j 2 -o 1.1 1.2 2.1

it is required that the lines in both files are sorted on second field, if the lines are not correctly sorted you can use the following bash trick:

 join -j 2  -o 1.1 1.2 2.1 <(sort -k2 1) <(sort -k2 2)

Supposing this is the content of two files:

 $ cat 1
 FIRST   first
 SECOND  second
 THIRD   third
 FOURTH  fourth
 $ cat 2
 2       second
 1       first
 3       third
 4       fourth

Note: the field separator on each line is the TAB character

The final result:

 $ join -j 2 -o 1.1 1.2 2.1 <(sort 1) <(sort -k2 2)
 FIRST first 1
 FOURTH fourth 4
 SECOND second 2
 THIRD third 3