Pregunta

I have two files, each one containing a single column:

x_a
x_b
x_d
x_e

and

y_a
y_c
y_e

Which is the best way to paste the two files in order to obtain a result like

x_a   y_a
x_b
      y_c
x_d
x_e   y_e

or even

x_a   y_a
x_b   ???
???   y_c
x_d   ???
x_e   y_e

There is a bash / awk / sed solution for that? The files have thousands of lines, and the part that I want to test (a,b,c,d,e) are a lot of strings of numbers (actually they are coordinates).

Thank you in advance for your help. I tried inserting in a mysql table and make a join, but there is no column in common.

EDIT: All the questions were quick, complete and working. Thank you everybody for your time

¿Fue útil?

Solución

Here is one solution:

awk -F_ 'NR==FNR {x[$2]=$0;a[$2]=1;next} {y[$2]=$0;a[$2]=1} END {for (i in a) print x[i] "\t" y[i]}' file1 file2
x_a     y_a
x_b
        y_c
x_d
x_e     y_e

And with ???

awk -F_ 'NR==FNR {x[$2]=$0;a[$2]=1;next} {y[$2]=$0;a[$2]=1} END {for (i in a) print (x[i]?x[i]:"???"),(y[i]?y[i]:"???")}' file1 file2
x_a y_a
x_b ???
??? y_c
x_d ???
x_e y_e

How does it work?

awk -F_ '                           # Field separator set to _
    NR==FNR {                       # Run this on first file
        x[$2]=$0                    # Store line in array x, using field 2 as separator
        a[$2]=1                     # Store every unique separator in array a
        next                        # Next record
        }

        {                           # Run this on second file
        y[$2]=$0                    # Store line in array y, using field 2 as separator
        a[$2]=1                     # Store every unique separator in array a
        }
    END {                           # Do this at end
        for (i in a)                # Read all unique value in array a
            print x[i] "\t" y[i]    # Print the array x and y using value in a
        }' file1 file2              # Read the files

Otros consejos

Try this awk:

awk -F '_' 'NR==FNR {a[$2]=$0; next} $2 in a{print a[$2], $0; delete a[$2]; next}
           !($2 in a){print "???", $0} END{for (i in a) print a[i], "???"}' f1 f2
x_a y_a
??? y_c
x_e y_e
x_d ???
x_b ???

If you consider using a mysql - yes, there is column in common, however, you need to create it artificially. Run the command: sed -e 's/_\(.*\)$/& \1/' on your file - you will receive output like:

x_a a
x_b b
x_d d
x_e e

The 2nd column you can use as a join key.

This gets us most of the way there:

$ join -t_ -j 2 -o 1.1,1.2,2.1,2.2 -e " " -a1 -a2 file1 file2
x_a_y_a
x_b_ _ 
 _ _y_c
x_d_ _ 
x_e_y_e

Pipe it through sed to remove the stray underscores:

$ join -t_ -j 2 -o 1.1,1.2,2.1,2.2 -e " " -a1 -a2 file1 file2 |
  sed 's/^\(._.\)_/\1 /;s/^ _ /   /; s/ _ $//'
x_a y_a
x_b 
    y_c
x_d 
x_e y_e

For question marks, use sed 's/^\(._.\)_/\1 /;s/^ _ /???/; s/ _ $/???/'

join requires the files to be sorted by the join field, and in this case they are. If they are not:

$ join -t_ -j 2 -o 1.1,1.2,2.1,2.2 -e " " -a1 -a2 <(sort -t_ -k2,2 file1) <(sort -t_ -k2,2 file2) |
  sed 's/^\(._.\)_/\1 /;s/^ _ /   /; s/ _ $//'
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top