How about this:
<DESC>((?:</?p>|<br />|[^\\<])+)
This allows these three tags to match and stops at the next <
that doesn't belong to one of the three.
By the way, why aren't you allowing the backslash as a valid character?
質問
I have some SGML that I'm trying to clean up by adding closing tags to the opening ones. Right now, the document has a structure like this:
<CAT>
<NAME>Daniel
<COLOR>White
<DESC>Daniel is a white cat <p>He was born in July</p><br />He's super cute.<p><br />He does not have any siblings.
<COUNTRY>USA
</CAT>
So far I can match an open tag and capture the content as a group using this regexp:
<NAME>([^\\<]+)[^<]
if doesn't have any <p>
, </p>
, or <br />
elements within the content area.
But if i do
<DESC>([^\\<]+)[^<]
, the pattern matching stops right before the first <p>
The reason why I'm using <
as the end of the pattern is because all the other open nodes don't have html elements that stop the matching
How can I make a regexp that matches the <DESC>
node that includes <p>
, </p>
, <br />
and ends before the <COUNTRY>
node?
解決
How about this:
<DESC>((?:</?p>|<br />|[^\\<])+)
This allows these three tags to match and stops at the next <
that doesn't belong to one of the three.
By the way, why aren't you allowing the backslash as a valid character?