Question

In a webpage source i can see a word like: abac%c3%a0 that the browser (chrome) shows as abacà.
Now, i have downloaded the page using urllib2 and i am parsing the page source with python (2.7 on mac os x) to get some keywords: i would like to have the accented character instead of the %c3%a0 but using str.decode("utf8") did not work (i tried that since those seemed like the \xc3\xa0 utf8 codes).

What should i try to add the accented word within a dictionary?

By the way the html page have no indication of the encoding whatsoever in the source

thanks

Was it helpful?

Solution

The characters have been URL-encoded (are they part of a URL?), which you can undo using urllib.unquote.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top