If you're looking for the nearest prior HEADER and SUBHEADER before the SOMETHING, then I think you just want non-greedy matching in your regex--assuming you have a regex processor that will match multiple lines at once, which generally rules out grep
, sed
, and similar.
For example, something like this:
(^HEADER.*?$).*?(^SUBHEADER.*?$).*?(^SOMETHING.*?$)
I'm also assuming that '.
' does match newlines (as in PCRE_DOTALL
mode), and that '^
'/'$
' will match beginning/end-of-line in the middle of the string (as in PCRE_MULTILINE
mode). These are configurable options in many regex implementations.
edit: I've modified the command you laid out in your comment and gotten it to work.
perl -0777 -ne '/.*(^HEADER.*?\n).*(^SUBHEADER.*?\n).*?(^SOMETHING.*?\n)/ms
and print "$1$2$3*\n"'
(I added the 'm' flag and re-added beginning-of-line anchors for paranoia's sake; you can take them back out if you want.)
The key idea turned out to be placing a greedy match-all pattern at the beginning, giving the regular expression matcher permission to match HEADER as late as possible. I'd have expected an un-anchored match like this to act as if it had an implicit greedy match at the beginning, but apparently in the presence of non-greedy operators it doesn't work that way.