Question

I was able to use this question as a starting point in parsing an "mht" file but the "3D" in the anchor tags (e.g.: <a href=3D"[my anchor]">[anchor text]></a>) breaks all the internal links and embedded images. I can have the parser replace "=3D" with just "=" (e.g.: <a href="[my anchor]">[anchor text]></a>) and it appears to work fine but I want to understand the purpose of that "meta markup".

Why does exporting from ".docx" to ".mht" add "3D" to the right-hand sides of most (if not all) of the html attributes? Is there a better way to handle them or a better regex to use when replacing them?

Was it helpful?

Solution

The =3D is a result of quoted printable encoding. It shouldn't be too hard to find a java library for decoding quoted printable data.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top