There are multiple issues:
- non-ascii characters in a string literal: you must specify encoding declaration at the top of the module in this case
- you should urlencode the url path (
u"Stanislav_Šesták"
->"Stanislav_%C5%A0est%C3%A1k"
) - you are printing bytes received from a web to your terminal. Unless both use the same character encoding then you might see garbage instead of some characters
- to interpret html, you should probably use an html parser
Here's a code example that takes into account the above remarks:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import cgi
import urllib
import urllib2
wiki_title = u"Stanislav_Šesták"
url_path = urllib.quote(wiki_title.encode('utf-8'))
r = urllib2.urlopen("https://en.wikipedia.org/wiki/" + url_path)
_, params = cgi.parse_header(r.headers.get('Content-Type', ''))
encoding = params.get('charset')
content = r.read()
unicode_text = content.decode(encoding or 'utf-8')
print unicode_text # if it fails; set PYTHONIOENCODING
Related: