Question

I have a file file1 with contect in below pattern, group of lines separated by a blank line.

First line

line1 
Line2

abc
def
jkl

123456
opertxt

this is line
this is sentence
this is last line

I have another flat file2 in the below foramt. Eact line in file2 corresponds to the group of lines in file1.

group1
group2
group3
group4
group5

I want to read both the files simultaneously and repeat the line in file2. I will redirect the output to file3. I want the file 3 to look like this.

group1
group2
group2
group3
group3
group3
group4
group4
group5
group5
group5

So in the end the number of lines in file1 (excluding the blank lines) and file3 will be same. Can someone suggest a solution using ksh or awk, sed ?

Was it helpful?

Solution

You can use this awk approach, that firstly reads the file2 storing values in an array and then reads file1 printing the corresponding line:

$ awk 'BEGIN {count=1} FNR==NR {a[++i]=$0; next} /^$/ {count++; next} {print a[count]}' f2 f1
group1
group2
group2
group3
group3
group3
group4
group4
group5
group5
group5

Explanation

  • BEGIN {count=1} sets the count var as 1 in the beginning.
  • FNR==NR {a[++i]=$0; next} when reading file2 (indicated by FNR==NR), store in the array a[] the values of each line.
  • /^$/ {count++; next} when reading file1, increment the counter every time an empty line is found. Then skip the line.
  • {print a[count]} on the rest of lines from file1, print the line store in the array a[] with the index stated by the counter.

OTHER TIPS

Perl solution, reading the two files in parallel:

#!/usr/bin/perl
use warnings;
use strict;

open my $LINES,  '<', 'file1' or die $!;
open my $GROUPS, '<', 'file2' or die $!;

my $group = <$GROUPS>;

while (<$LINES>) {
    $group = <$GROUPS>, next if /^$/;
    print $group;
}

Awk and sed:

sed 's/.*..*/0/;s/^$/1/' input | awk '$1{s += 1;next} {print "group"s+1;}'

Here sed output is 1 when there is a new group, and 0 otherwise. Awk just computes a running sum.

Output:

group1
group2
group2
group3
group3
group3
group4
group4
group5
group5
group5

here's a clunky way to count the number of lines per paragraph:

awk -v RS= '
    FNR==NR {split($0,groups,/\n/); next} 
    {n = split($0,a,/\n/); for (i=1; i<=n; i++) print groups[FNR]}
' file2 file1

This might work for you (GNU sed):

sed ':a;$!{N;ba};s/\n/\\n/g;s|.*|1{x;s/^/&/;x};/./!{x;s/[^\\n]*\\n//;x;d};G;s/[^\\n]*\\n//;P;d|' file2 | sed -f - file1

From the group file a sed script is built which is run against the text file.

The script appends a list of groups to each line of the text file. The text file line is then deleted and the first group in the list printed. On encountering an empty line the first group on the list is removed.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top