Let's see if I understand your requirements:
You have two strings, which I'll call KEY
and LIMIT
. And you want to print:
At most 20 lines before a line containing KEY
, but stopping if there is a blank line.
All the lines between a line containing KEY
and the following line containing LIMIT
. (This ignores your requirement that there be no more than 100 such lines; if that's important, it's relatively straightforward to add.)
The easiest way to accomplish (1)
is to keep a circular buffer of 20 lines, and print it out when you hit key
. (2)
is trivial in either sed or awk, because you can use the two-address form to print the range.
So let's do it in awk:
#file: extract.awk
# Initialize the circular buffer
BEGIN { count = 0; }
# When we hit an empty line, clear the circular buffer
length() == 0 { count = 0; next; }
# When we hit `key`, print and clear the circular buffer
index($0, KEY) { for (i = count < 20 ? 0 : count - 20; i < count; ++i)
print buf[i % 20];
hi = 0;
}
# While we're between key and limit, print the line
index($0, KEY),index($0, LIMIT)
{ print; next; }
# Otherwise, save the line
{ buf[count++ % 20] = $0; }
In order to get that to work, we need to set the values of KEY
and LIMIT
. We can do that on the command line:
awk -v "KEY=4320101" -v "LIMIT=</eventUpdate>" -f extract.awk $FILENAME
Notes:
I used index($0, foo)
instead of the more usual /foo/
, because it avoids having to escape regex special characters, and there is nowhere in the requirements that regexen are even desired. index(haystack, needle)
returns the index of needle
in haystack
, with indices starting at 1
, or 0
if needle
is not found. Used as a true/false value, it is true of needle
is found.
next
causes processing of the current line to end. It can be quite handy, as this little program shows.