Removing everything between a tag (including the tag itself) using Regex / Eclipse

StackOverflow https://stackoverflow.com/questions/2541676

  •  23-09-2019
  •  | 
  •  

Question

I'm fairly new to figuring out how Regex works, but this one is just frustrating.

I have a massive XML document with a lot of <description>blahblahblah</description> tags. I want to basically remove any and all instances of <description></description>.

I'm using Eclipse and have tried a few examples of Regex I've found online, but nothing works.

<description>(.*?)</description>

Shouldn't that work?

EDIT:

Here is the actual code.

<description><![CDATA[<center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"><th>ID</th><td>308</td></tr></table></center>]]></description>

No correct solution

OTHER TIPS

I'm not familiar with Eclipse, but I would expect its regex search facility to use Java's built-in regex flavor. You probably just need to check a box labeled "DOTALL" or "single-line" or something similar, or you can add the corresponding inline modifier to the regex:

(?s)<description>(.*?)</description>

That will allow the . to match newlines, which it doesn't by default.

EDIT: This is assuming there are newlines within the <description> element, which is the only reason I can think of why your regex wouldn't work. I'm also assuming you really are doing a regex search; is that automatic in Eclipse, or do you have to choose between regex and literal searching?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top