Question

r = requests.get('...', allow_redirects=True)
pagetext = r.text
tree = etree.HTML(pagetext)
node = tree.xpath('...')[0]
out = str(etree.tostring(node, method='text', encoding='UTF8'))
print(out) // some "\x00(\x00A\x04>\x042\x04<\x045\x04A\"-like thing is printed

I've tried various .encode('UTF-8') on defferent parts of strings but it's still no luck :(

Was it helpful?

Solution

That's not UTF-8.

3>> b"\x00(\x00A\x04>\x042\x04<\x045\x04A".decode('utf-16be')
'(Aовмес'

Note that "utf-16be" was chosen based on your sample data; it is more likely to be UTF-16LE instead.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top