split file into multiple files based upon differing start and end delimiter

https://stackoverflow.com/questions/21387330

03-10-2022
|

Question

i have a file that i need split into multiple files, and need it done via separate start and end delimiters.

for example, if i have the following file:

abcdef
START
ghijklm
nopqrst
END
uvwxyz
START
abcdef
ghijklm
nopqrs
END
START
tuvwxyz
END

i need 3 separate files of:

file1

START
ghijklm
nopqrst
END

file2

START
abcdef
ghijklm
nopqrs
END

file3

START
tuvwxyz
END

i found this link which showed how to do it with a starting delimiter, but i also need an ending delimiter. i have tried this using some regex in the awk command, but am not getting the result that i want. i don't quite understand how to get awk to be 'lazy' or 'non greedy', so that i can get it to pull apart the file correctly.

i really like the awk solution. something similar would be fantastic (i am reposting the solution here so you don't have to click through:

awk '/DELIMITER_HERE/{n++}{print >"out" n ".txt" }' input_file.txt

any help is appreciated.

Solution

You can use this awk command:

awk '/^START/{n++;w=1} n&&w{print >"out" n ".txt"} /^END/{w=0}' input_file.txt

OTHER TIPS

awk '
    /START/ {p = 1; n++; file = "file" n}
    p { print > file }
    /END/ {p = 0}
' filename

Here's another example using range notation:

awk '/START/,/END/ {if(/START/) n++; print > "out" n ".txt"}' data

Or an equivalent with a different if/else syntax:

awk '/START/,/END/ {print > "out" (/START/ ? ++n : n) ".txt"}' data

Here's a version without repeating the /START/ regex after Ed Morton's comments because I just wanted to see if it would work:

awk '/START/ && ++n,/END/ {print > "out" n ".txt" }' data

The other answers are definitely better if your range is or will ever be non-inclusive of the ends.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow