Convert rows based entry to column based in shell
-
30-04-2021 - |
سؤال
I need your help on a multiple row entry into different columns. And do the same with all the entries in file.
File example (showing 2 entries only, there are many like these):
>ABC
*
AGA-AUUCUC-CGGUUCAAUCU
|||
UCUAUAACCGCGCCGAGUUAGU
>ABC
*
AGAUAU-GCUGCAGGCUCAAUUG
||||||
UCUAUAACCGCG-CCGAGUUAGU
File format required:
>ABC AGA-AUUCUC-CGGUUCAAUCU UCUAUAACCGCGCCGAGUUAGU
>ABC AGAUAU-GCUGCAGGCUCAAUUG UCUAUAACCGCG-CCGAGUUAGU
I am able to convert single entry into required format by:
tr '\n' '\t' <test3 | awk '{print $1,$3,$5}'
But how do I do it with all entries by reading whole file?
المحلول
I think you were on the right track with your original awk
solution. Try this; I think it's a good combination of readable and effective:
awk 'BEGIN { RS="\n\n" } ; { print $1, $3, $5 }' < myfile
The idea is to tell awk to treat the blank lines (2 consecutive newlines) as record separators. Then each stanza is treated as a single record, and the whitespace (in this case, single newlines) separates the fields. This is pretty similar to what you were doing with tr
, except now awk will run through the whole file processing a stanza at a time.
نصائح أخرى
You can use awk
like this:
awk 'NR%2 { printf "%s%s", $0, (NR+1)%6 ? " " : "\n" }' < test
Explanation:
You need to know these two things about awk
:
the syntax is
condition { commands }
, wherecommands
are executied ifcondition
is true (non-zero).NR
is the number of the current record (that is, the line number), starting with 1.
Here, the condition is NR%2
, which is non-zero for odd-numbered lines. So the command is executed only for odd-numbered lines, which are the ones you want to print. The even-numbered lines are discarded silently.
The printf
will print every odd line, followed by either a space or a newline. Your input repeats every 6 lines, and you want a newline after lines 5, 11, 17, etc. You can add 1 to each of these numbers to make it divisible by 6, so the formula (NR+1)%6
is 0 for these numbers.
So (NR+1)%6 ? " " : "\n"
evaluates to a space for lines 1 and 3, and a newline for line 5. Then it repeats for 7, 9, and 11; and so on.
Here's one way using Perl:
perl -ne 'chomp; if($. % 2 == 1) { print $_, ($. % 6 == 5) ? "\n" : "\t" }'
That will print the 1st, 3rd, 5th, 7th, etc., lines of the file. After the 5th, 11th, 17th, etc., lines, it will print a newline; after other lines, it will print only a tab.
(Note: this assumes that there's exactly one blank line between successive groups of five lines. If that's not the case, then please clarify.)