Matching multiline string at command line: return certain line if pattern matches, otherwise return empty string

StackOverflow https://stackoverflow.com/questions/23543016

Question

The output of a command I have takes on the following form when it is a "success":

/ >  -------
ABC123
/ > 

It's possible for this command to emit something like this, though (a "failure"):

/ >  -------
ABC123
 -------
DEF456
 -------
Hello (world!)
 -------
(any old string, really)
/ > 

Or, this (another "failure"):

/ > / >

For the first example, I would like to emit:

ABC123

For the other two examples, I would like to emit the empty string.

I tried this, which worked great for the third example:

mycmd | pcregrep -M '(?:/\s>\s{2}-{7}\n)[^\n]*(?!\n.*\n)'

But for the first two examples it emitted:

/ >  -------
ABC123

I'm at a loss for what to do. My regex above was an attempt to match the leading / > ------- but not capture it, then match the next line only if it was not followed by another line ending with a newline. I am fine with using something other than pcregrep to solve this problem, but I am not able to express this with awk or sed. I would use Python, but it is too slow for my needs. Any help?

Was it helpful?

Solution

I thought the following would work, but I could not get a look-behind expression to work if it contained a newline.

mycmd | pcregrep -M '(?<=^/ >  -{7}\n).*\n(?=/ > $)'

But the following two stage solution worked for me:

mycmd | pcregrep -M '^/ >  -{7}\n.*\n/ > $' | pcregrep -v '^/ >'

Update in response to OP's answer

I like the \K escape :-)

I assume you do not want to match the following situation

/ > -------
/ > perhaps text here
/ > 

I was able to get negative look ahead to work when it contains \n, even when it is embedded within a positive look ahead.

Here is a simpler regex with \K that is closer to what you want. It disallows any content after the / >, but it still allows lines before the / > -------.

mycmd | pcregrep -Mo '^/ >  -{7}\n\K(?!/ >).+(?=\n/ > $(?!\n[\s\S]))'

If the captured line should be allowed to start with / >, then it is simpler:

mycmd | pcregrep -Mo '^/ >  -{7}\n\K.+(?=\n/ > $(?!\n[\s\S]))'

Final update

Here is a sed one liner that I believe gives the exact result, disallowing any extra lines before or after. However, it does allow capturing a line that begins with / >.

mycmd | sed -n '1{/^\/ >  -\{7\}$/{n;/./{h;n;/^\/ > $/{${x;p}}}}}'

And here is another sed solution

mycmd | sed -n '1{h;n;H;x;N;${/^\/ >  -\{7\}\n..*\n\/ > $/{x;p}}}'

OTHER TIPS

You could also still have used awk:

BEGIN {
   first_line = "";
   second_line = "";
   third_line = "";

   ctr = 0;
}
{
   if (ctr == 0 ){
      first_line = $0;
   } else if (ctr == 1) {
      second_line = $0;
   } else if (ctr == 2 ) {
      third_line = $0;
   }
   ctr++;
}
END {
   if( first_line ~ /\/ >  -------/){
      if( third_line ~ /\/ >/){
         print second_line;
      }
   }
}

Output:

$ echo "/ >  -------\nABC12\n ---\n/ >\n" | awk -f test.awk
$ echo "/ >  -------\nABC12\n/ >\n" | awk -f test.awk
ABC12
$

I'm sure an awk expert would cringe, but it was quick and did the job.

I actually achieved success (after much gnashing of my teeth) with the following pcregrep command:

pcregrep -Mo '^/ > {2}-{7}[\n\r]\K[^\n\r]+(?=[\n\r]/ > $)'

Without the -o flag it included the first line (despite using \K). -o makes pcregrep emit only the lines which matched the pattern. As it turns out, negative look-aheads don't seem to work with multiline mode when trying to match newlines. Also, in multiline mode, \s will match newlines, so I stopped using it.

I do want to note that neither this solution nor dbenham's solution is exactly what I wanted. I was hoping to check that there weren't any other lines, besides the last line (i.e. not containing another newline), after the second line. These solutions assume a little more about how the output ends, but that will have to do.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top