Domanda

I have a file in Linux. For each unique title in the first column, I must sum up the values of all the fourth columns that have the title in their first column, and just output those two columns in the end. For instance:

AF116642    792 t   1   ^!. ]
AF116642    793 g   2   .   ]
AF116643    794 t   5   .   ]
AF116643    795 g   6   .   ]
AF116643    796 t   3   .   ]
AF116644    797 g   1   .   ]
AF116644    798 t   8   .   ]
AF116644    799 g   1   .   ]

would ideally produce:

AF116642   3
AF116643   14
AF116644   10

Same names in the first column of the input file are consecutive, and their order should not be switched in the output file. I only have pieces of sed and awk knowledge, and cannot piece them all together to perform this...

È stato utile?

Soluzione

With the data in a file called books, the numbers for each book can be added and printed them as follows:

$ awk '{n[$1]+=$4} END{for (b in n) {print b,n[b]}}' books
AF116642 3
AF116643 14
AF116644 10

In the above code, n is an associative array. The keys are the book names. Every time that we encounter a line, the number in the fourth column, denoted $4, is added to the total for its book, n[$1] where $1 is the name of the book in the first column.

Altri suggerimenti

To preserve the order you can do:

awk '
$1!=prev && NR>1 {
    print prev, sum;
    sum = 0;
}
{
    prev = $1;
    sum += $4
}
END {
    print prev,sum
}' file
AF116642 3
AF116643 14
AF116644 10
  • We create a variable prev and assign it the value of column 1 and another variable sum to which we add column 4.
  • If column 1 does not match to prev variable and line number is greater than 1, we print the variable and sum and reset sum to 0.
  • In the END block we print that last known value of our variables.

Note: Using associative arrays is good but it does not preserve the order of output. Since you have stated that first column in your file are consecutive, this approach will work.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top