I have a log file trace.log. In it I need to grep for the content contained within the strings <tag> and </tag>. There are multiple sets of this pair of strings, and I just need to return the content between last set (in other words, from the tail of the log file).

Extra Credit: Any way I can return the content contained within the two strings only if the content contains "testString"?

Thanks for looking.

EDIT: The search parameters and are contained on different lines with about 100 lines of content separating them. The content is what I'm after...

有帮助吗?

解决方案

Use tac to print the file the other way round and then grep -m1 to just print one result. The look behind and look ahead checks text in between <tag> and </tag>.

tac a | grep -m1 -oP '(?<=tag>).*(?=</tag>)'

Test

Given this file

$ cat a
<tag> and </tag>
aaa <tag> and <b> other things </tag>
adsaad <tag>and  last one</tag>

$ tac a | grep -m1 -oP '(?<=tag>).*(?=</tag>)'
and  last one

Update

EDIT: The search parameters and are contained on different lines with about 100 lines of content separating them. The content is what I'm after...

Then it is a bit more tricky:

tac file | awk '/<\/tag>/ {p=1; split($0, a, "</tag>"); $0=a[1]};
                /<tag>/   {p=0; split($0, a, "<tag>");  $0=a[2]; print; exit};
                p' | tac

The idea is to reverse the file and use a flag p to check if the <tag> has appeared yet or not. It will start printing when </tag> appears and finished when <tag> comes (because we are reading the other way round).

  • split($0, a, "</tag>"); $0=a[1]; gets the data before </tag>
  • split($0, a, "<tag>" ); $0=a[2]; gets the data after <tag>

Test

Given a file a like this:

<tag> and </tag>
aaa <tag> and <b> other thing
come here
and here </tag>

some text<tag>tag is starting here
blabla
and ends here</tag>

The output will be:

$ tac a | awk '/<\/tag>/ {p=1; split($0, a, "</tag>"); $0=a[1]}; /<tag>/ {p=0; split($0, a, "<tag>"); $0=a[2]; print; exit}; p' | tac
tag is starting here
blabla
and ends here

其他提示

If like me, you don't have access to tac because your sysadmin won't play ball you can try:

grep pattern file | tail -1

Another solution than grep would be sed:

tac file | sed -n '0,/<tag>\(.*\)<\/tag>/s//\1/p'

tac file prints the file in the reverse order (cat backwards), then sed proceeds from input line 0 to the first occurence of <tag>.*<\tag>, and substitutes <tag>.*<\tag> with only the part that was inside <tag>. The p flag prints the output, which was suppressed by -n.

Edit: This does not work if <tag> and </tag> are on different lines. We can still use sed for that:

tac file | sed -n '/<\/tag>/,$p; /<tag>/q' | sed 's/.*<tag>//; s/<\/tag>.*//' | tac

Again we use tac to read the file backwards, then the first sed command reads from the first occurrence of and quits when it finds . Only the lines in between are printed. Then we pass it to another sed process to strip the 's and finally reverse the lines again with tac.

A little untested awk that handles multiple lines:

awk '
    BEGIN    {retain="false"}
    /<\tag>/ {retain = retain + $0; keep="false"; next}
    /<tag>/  {keep = "true"; retain = $0; next}
    keep == "true" {retain = retain + $0}
    END {print retain}
' filename

We start just reading the file; when we hit the , we start keeping lines. When we hit the , we stop. If we hit another , we clear the retained string and start again. If you want all the strings, print at each

perl -e '$/=undef; $f=<>; push @a,$1 while($f=~m#<tag>(.*?)</tag>#msg); print $a[-1]' ex.txt

Extra Credit: Any way I can return the content contained within the two strings only if the content contains "testString"?

perl -e '$/=undef; $f=<>; push @a,$1 while($f=~m#<tag>(.*?)</tag>#msg); print $a[-1] if ($a[-1]~=/teststring/);' ex.txt
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top