Question

I've found a working solution to remove all html-tags as follow:

<cfset test = rereplace(blah, "<h2[^>]*>", "", "ALL") />

I need to generate a xml file and rename some tags after they were formatted with XMLFormat(). Thus I tried the following:

<!--- example string --->
<cfset blah = '&lt;h1&gt;title 1&lt;/h1&gt;
               &lt;h2 style="color: black;"&gt;title 2&lt;/h2&gt;
               &lt;h3&gt;test&lt;/h3&gt;' />

<cfset test = rereplace(blah, "&lt;h2[^>]*&gt;", "<title_2>", "ALL") />

This changes my tag as I want, but it doesn't stop at the > part?... I also tried to escape the ampersand like this \&lt;h2[^>]*\&gt;, but that doesn't seem to help.

Was it helpful?

Solution

You cannot use > after you have formatted the XML, because that character no longer exists anywhere in the text.

This will match/replace the opening h2 tag, but not the closing one:

<cfset test = blah.replaceAll('&lt;h2((?:[^&]+|&(?!gt))*)&gt;','<title_2$1>') />

The key part of that is: (?:[^&]+|&(?!gt))*

Which matches either a non-ampersand character, or an ampersand not followed by gt, until it finds the end of the tag.

To change the whole tag you need:

<cfset test = blah.replaceAll('&lt;h2((?:[^&]+|&(?!gt))*)&gt;((?:[^&]+|&(?!lt;/h2))*)&lt;/h2&gt;','<title_2$1>$2</title_2>') />

This repeats the same concept as above to also look for the closing h2 tag whilst capturing the contents to the appropriate groups.

At this stage you're starting to enter the territory of regex probably not being the best tool for the job - can you do these changes with an XML Parser prior to formatting it?

OTHER TIPS

<cfset test = rereplace(blah, "&lt;h2[^>]*&gt;", "<title_2>", "ALL") /> 
<!--- there is no [^>] for you to match --->

should be

<cfset test = rereplace(blah, "&lt;h2[^&]*&gt;", "<title_2>", "ALL") />

I think the [^ part prevents it from being greedy.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top