From escaped html -> to regular html? - Python
-
21-09-2019 - |
Question
I used BeautifulSoup to handle XML files that I have collected through a REST API.
The responses contain HTML code, but BeautifulSoup escapes all the HTML tags so it can be displayed nicely.
Unfortunately I need the HTML code.
How would I go on about transforming the escaped HTML into proper markup?
Help would be very much appreciated!
Solution
I think you want xml.sax.saxutils.unescape from the Python standard library.
E.g.:
>>> from xml.sax import saxutils as su
>>> s = '<foo>bar</foo>'
>>> su.unescape(s)
'<foo>bar</foo>'
OTHER TIPS
You could try the urllib module?
It has a method unquote()
that might suit your needs.
Edit: on second thought, (and more reading of your question) you might just want to just use string.replace()
Like so:
string.replace('<','<')
string.replace('>','>')
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow