awk: for every record extract specific information

https://stackoverflow.com/questions/23531404

17-07-2023
|

Question

Simplified example of my file looks like this:

@ FamilyName_A
Information 1 2 3
Information 4 5 6 
@ FamilyName_B
Information 7 8 9
@ FamilyName_C
Information 10 11 12
Information 13 14 15
Information 16 17 18

Record separator is @. For every record I want to print: record ID (Family Name (first word after record separator) and first to columns of next lines. For the output like this:

FamilyName_A Information 1
FamilyName_A Information 4
FamilyName_B Information 7
FamilyName_C Information 10
FamilyName_C Information 13
FamilyName_C Information 16

I tried doing this by myself:

awk 'BEGIN {RS="@"} {print $1}'  -- This prints me Record ID

But I don't know how to do the rest (loop to print for every record specific fields).

Solution 2

On one line:

awk 'BEGIN { family = ""} { if ($1 == "@") family = $2; else print family, $1, $2 }' input.txt

Explanation

BEGIN {
  family = "";
}
{
  if ($1 == "@")
    family = $2
  else
    print family, $1, $2
}

Set family to empty string.
Check each line: if starts with @, remember family name.
If no @, print last remembered family name and first two fields.

OTHER TIPS

Use the following script

$1 == @ { current=$2; next; }
{ print current, $1, $2; }

Depending on your input data the expression to catch the record header may slightly change. For the data you provided both $1 == @, /^@/ and /^@ FamilyName/ are perfectly suitable, but if your input data differs a bit, you may need to adjust the condition.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow