Grep a Log file for the last occurrence of a string between two strings

Question 1

Use tac to print the file the other way round and then grep -m1 to just print one result. The look behind and look ahead checks text in between <tag> and </tag>.

tac a | grep -m1 -oP '(?<=tag>).*(?=</tag>)'

Test

Given this file

$ cat a
<tag> and </tag>
aaa <tag> and <b> other things </tag>
adsaad <tag>and  last one</tag>

$ tac a | grep -m1 -oP '(?<=tag>).*(?=</tag>)'
and  last one

Update

EDIT: The search parameters and are contained on different lines with about 100 lines of content separating them. The content is what I'm after...

Then it is a bit more tricky:

tac file | awk '/<\/tag>/ {p=1; split($0, a, "</tag>"); $0=a[1]};
                /<tag>/   {p=0; split($0, a, "<tag>");  $0=a[2]; print; exit};
                p' | tac

The idea is to reverse the file and use a flag p to check if the <tag> has appeared yet or not. It will start printing when </tag> appears and finished when <tag> comes (because we are reading the other way round).

split($0, a, "</tag>"); $0=a[1]; gets the data before </tag>
split($0, a, "<tag>" ); $0=a[2]; gets the data after <tag>

Test

Given a file a like this:

<tag> and </tag>
aaa <tag> and <b> other thing
come here
and here </tag>

some text<tag>tag is starting here
blabla
and ends here</tag>

The output will be:

$ tac a | awk '/<\/tag>/ {p=1; split($0, a, "</tag>"); $0=a[1]}; /<tag>/ {p=0; split($0, a, "<tag>"); $0=a[2]; print; exit}; p' | tac
tag is starting here
blabla
and ends here

Question 2

If like me, you don't have access to tac because your sysadmin won't play ball you can try:

grep pattern file | tail -1

Question 3

Another solution than grep would be sed:

tac file | sed -n '0,/<tag>\(.*\)<\/tag>/s//\1/p'

tac file prints the file in the reverse order (cat backwards), then sed proceeds from input line 0 to the first occurence of <tag>.*<\tag>, and substitutes <tag>.*<\tag> with only the part that was inside <tag>. The p flag prints the output, which was suppressed by -n.

Edit: This does not work if <tag> and </tag> are on different lines. We can still use sed for that:

tac file | sed -n '/<\/tag>/,$p; /<tag>/q' | sed 's/.*<tag>//; s/<\/tag>.*//' | tac

Again we use tac to read the file backwards, then the first sed command reads from the first occurrence of and quits when it finds . Only the lines in between are printed. Then we pass it to another sed process to strip the 's and finally reverse the lines again with tac.

Question 4

A little untested awk that handles multiple lines:

awk '
    BEGIN    {retain="false"}
    /<\tag>/ {retain = retain + $0; keep="false"; next}
    /<tag>/  {keep = "true"; retain = $0; next}
    keep == "true" {retain = retain + $0}
    END {print retain}
' filename

We start just reading the file; when we hit the , we start keeping lines. When we hit the , we stop. If we hit another , we clear the retained string and start again. If you want all the strings, print at each

Question 5

perl -e '$/=undef; $f=<>; push @a,$1 while($f=~m#<tag>(.*?)</tag>#msg); print $a[-1]' ex.txt

Extra Credit: Any way I can return the content contained within the two strings only if the content contains "testString"?

perl -e '$/=undef; $f=<>; push @a,$1 while($f=~m#<tag>(.*?)</tag>#msg); print $a[-1] if ($a[-1]~=/teststring/);' ex.txt