Question

So I have this regex:

&(?!#?[xX]?(?:[0-9a-fA-F]+|\w+);)

That matches all &'s in a block of text

However, if I have this string:

& & & & & <a href="http://localhost/MyFile.aspx?mything=2&this=4">My Text &</a>
---------------------------------------------------------^

... the marked & also get's targeted - and as I'm using it to replace the &'s with & the url then becomes invalid:

http://localhost/MyFile.aspx?mything=2&amp;this=4

D'oh! Does anyone know of a better way of encoding &'s that are not in a url.

Was it helpful?

Solution

No, the URL does not become invalid. The HTML code becomes:

<a href="http://localhost/MyFile.aspx?mything=2&amp;this=4">

This means that the code that was not correctly encoded now is correctly encoded, and the actual URL that the link contains is:

http://localhost/MyFile.aspx?mything=2&this=4

So, it's not a problem that the & character in the code gets encoded, on the contrary the code is now correct.

OTHER TIPS

In powershell this could be done as:

$String ='& & & & & <a href="http://localhost/MyFile.aspx?mything=2&this=4">My Text &</a>'
$String -replace '(?<!<[^<>]*)&', "&amp;"

yields

&amp; &amp; &amp; &amp; &amp; <a href="http://localhost/MyFile.aspx?mything=2&this=4">My Text &amp;</a>

Dissecting the regex:

  1. The look around (?<! .... ) first validates that you're not in any tag
  2. All & strings are then found and replaced.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top