compare two files, print header if

https://stackoverflow.com/questions/23417741

13-07-2023
|

質問

I have two files. 1st line being the header line.

File1

File2

start end abc efg hij klm nmo
234 789 NA NA 01 02 NA
678 780 01 NA NA NA NA
125 457 NA 01 01 NA 02
534 988 NA NA NA NA 02

Now I want to compare these two files; coloumn1 and column2 of File1 with Column1 and Column2 of File2. If they match, I want to print a third file with column1 and column2 of File2 and then the header of the columns for which the field character is not equal to 'NA' like the following output file

start end
234 789 hij, klm
678 780 abc
125 457 efg, hij, nmo
534 988 nmo

I only know to compare lines; but dont know will it be possible to print the headers which dont match the pattern 'NA'.

解決

You could try awk -f a.awk file2 file1, where a.awk is:

NR==FNR {
    if (NR==1) {
        split($0,b)
        next
    }
    s="";
    for (i=3; i<=NF; i++) {
        if ($i!="NA") {
            if (s) 
                s=s", " b[i]
            else
                s=b[i]
        }
    }
    a[$1,$2]=s
    next
}
FNR==1 {next}
($1,$2) in a {
    print $1,$2,a[$1,$2]
}

Output:

234 789 hij, klm
678 780 abc
125 457 efg, hij, nmo
534 988 nmo

他のヒント

Here's one way using awk. Run like:

awk -f ./script.awk File1 File2 > File3

Contents of script.awk:

NR==1 {

    h=$0
    next
}

FNR==NR {

    a[$1,$2]
    next
}

FNR==1 {

    split($0, b)
    print h

    next
}

($1,$2) in a {

    for (i=3;i<=NF;i++) {

        c = ($i != "NA" ? b[i] : "")

        if (c) {

            r = (r ? r ", " : "") c
        }
    }

    print $1, $2, r
    r = c = ""
}

Results and contents of File3:

start end
234 789 hij, klm
678 780 abc
125 457 efg, hij, nmo
534 988 nmo

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow