Frage

I need your help on a multiple row entry into different columns. And do the same with all the entries in file.

File example (showing 2 entries only, there are many like these):

>ABC
*
AGA-AUUCUC-CGGUUCAAUCU
|||
UCUAUAACCGCGCCGAGUUAGU

>ABC
*
AGAUAU-GCUGCAGGCUCAAUUG
||||||
UCUAUAACCGCG-CCGAGUUAGU

File format required:

>ABC AGA-AUUCUC-CGGUUCAAUCU UCUAUAACCGCGCCGAGUUAGU
>ABC AGAUAU-GCUGCAGGCUCAAUUG UCUAUAACCGCG-CCGAGUUAGU

I am able to convert single entry into required format by:

tr '\n' '\t' <test3 | awk '{print $1,$3,$5}'

But how do I do it with all entries by reading whole file?

War es hilfreich?

Lösung

I think you were on the right track with your original awk solution. Try this; I think it's a good combination of readable and effective:

awk 'BEGIN { RS="\n\n" } ; { print $1, $3, $5 }' < myfile

The idea is to tell awk to treat the blank lines (2 consecutive newlines) as record separators. Then each stanza is treated as a single record, and the whitespace (in this case, single newlines) separates the fields. This is pretty similar to what you were doing with tr, except now awk will run through the whole file processing a stanza at a time.

Andere Tipps

You can use awk like this:

awk 'NR%2 { printf "%s%s", $0, (NR+1)%6 ? " " : "\n" }' < test


Explanation:

You need to know these two things about awk:

  • the syntax is condition { commands }, where commands are executied if condition is true (non-zero).

  • NR is the number of the current record (that is, the line number), starting with 1.


Here, the condition is NR%2, which is non-zero for odd-numbered lines. So the command is executed only for odd-numbered lines, which are the ones you want to print. The even-numbered lines are discarded silently.

The printf will print every odd line, followed by either a space or a newline. Your input repeats every 6 lines, and you want a newline after lines 5, 11, 17, etc. You can add 1 to each of these numbers to make it divisible by 6, so the formula (NR+1)%6 is 0 for these numbers.

So (NR+1)%6 ? " " : "\n" evaluates to a space for lines 1 and 3, and a newline for line 5. Then it repeats for 7, 9, and 11; and so on.

Here's one way using Perl:

perl -ne 'chomp; if($. % 2 == 1) { print $_, ($. % 6 == 5) ? "\n" : "\t" }'

That will print the 1st, 3rd, 5th, 7th, etc., lines of the file. After the 5th, 11th, 17th, etc., lines, it will print a newline; after other lines, it will print only a tab.

(Note: this assumes that there's exactly one blank line between successive groups of five lines. If that's not the case, then please clarify.)

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top