merge similar files in awk

https://stackoverflow.com/questions/23365136

merge
awk

11-07-2023
|

Question

I have the following files

File A

Kmax Event File - Text Format
1 6 1000 
1 4143 9256 13645 16426 20490 
49 4144 8820 14751 16529 20505 
45 4146 8308 12303 16912 22715 
75 4139 9049 14408 16447 20480 
23 4137 8449 13223 16511 20498 
22 4142 8795 14955 16615 20493

File B

Kmax Event File - Text Format
1 6 1000 
42 4143 9203 13401 16475 20480 
44 4140 8251 12302 16932 21872 
849 6283 8455 12301 16415 20673 
18 4148 8238 12757 16597 20484 
19 4144 8268 12306 17110 21868 
50 4134 8331 12663 16501 20606 
988 5682 8296 12306 16577 20592 
61 4147 8330 12307 16945 22497 
0 4138 8333 12310 16871 22749

File C, File D, ... and all those files have exact the same format. In addition the file name of each file is the following : run, run%1, run%2, run%3, run%4 etc. The file number could reach even up to 30, run%30 that is.

What I'd like to do is to merge the files in the following way

Kmax Event File - Text Format
1 6 1000 
1 4143 9256 13645 16426 20490 
49 4144 8820 14751 16529 20505 
45 4146 8308 12303 16912 22715 
75 4139 9049 14408 16447 20480 
23 4137 8449 13223 16511 20498 
22 4142 8795 14955 16615 2049
42 4143 9203 13401 16475 20480 
44 4140 8251 12302 16932 21872 
849 6283 8455 12301 16415 20673 
18 4148 8238 12757 16597 20484 
19 4144 8268 12306 17110 21868 
50 4134 8331 12663 16501 20606 
988 5682 8296 12306 16577 20592 
61 4147 8330 12307 16945 22497 
0 4138 8333 12310 16871 22749

I believe I can do it using

awk '{i=$1;sub(i,x);A[i]=A[i]$0} FILENAME==ARGV[ARGC-1]{print i A[i]}'

but in this way the two first lines of the second file will be present. In addition I don't know if the above line will work. The problem is that I will need to merge many files at the same time. Any idea to merge those almost identical files?

Solution

Using grouping braces in the shell

{ cat run; sed '1,2d' run%*; } > c

OTHER TIPS

You can use cat and tail:

cat A > C && tail -n +3 B >> C

This will merge file A and B in a new file named C. Using awk:

awk 'FNR==NR{print; next} FNR>2' A B > C

If you have more than one file to merge into one, you can list them next to A B in the awk version, e.g A B D. C in awk version is the output file containing merged data.

In cat and tail version you can repeat tail part of the code for other files, e.g

cat A > C && tail -n +3 B >> C && tail -n +3 D >> C

or create some kind of loop to iterate over files.

Print all lines from the first file (NR==FNR) and only line 3 and on from the rest of the files (FNR>2):

awk 'NR==FNR||FNR>2' run*

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow