How to percent-encode URL parameters in Python?
Question
If I do
url = "http://example.com?p=" + urllib.quote(query)
- It doesn't encode
/
to%2F
(breaks OAuth normalization) - It doesn't handle Unicode (it throws an exception)
Is there a better library?
Solution
From the docs:
urllib.quote(string[, safe])
Replace special characters in string using the %xx escape. Letters, digits, and the characters '_.-' are never quoted. By default, this function is intended for quoting the path section of the URL.The optional safe parameter specifies additional characters that should not be quoted — its default value is '/'
That means passing '' for safe will solve your first issue:
>>> urllib.quote('/test')
'/test'
>>> urllib.quote('/test', safe='')
'%2Ftest'
About the second issue, there is a bug report about it here. Apparently it was fixed in python 3. You can workaround it by encoding as utf8 like this:
>>> query = urllib.quote(u"Müller".encode('utf8'))
>>> print urllib.unquote(query).decode('utf8')
Müller
By the way have a look at urlencode
Note that urllib.quote
moved to urllib.parse.quote
in Python3
OTHER TIPS
In Python 3, urllib.quote
has been moved to urllib.parse.quote
and it does handle unicode by default.
>>> from urllib.parse import quote
>>> quote('/test')
'/test'
>>> quote('/test', safe='')
'%2Ftest'
>>> quote('/El Niño/')
'/El%20Ni%C3%B1o/'
My answer is similar to Paolo's answer.
I think module requests
is much better. It's based on urllib3
.
You can try this:
>>> from requests.utils import quote
>>> quote('/test')
'/test'
>>> quote('/test', safe='')
'%2Ftest'
If you're using django, you can use urlquote:
>>> from django.utils.http import urlquote
>>> urlquote(u"Müller")
u'M%C3%BCller'
Note that changes to Python since this answer was published mean that this is now a legacy wrapper. From the Django 2.1 source code for django.utils.http:
A legacy compatibility wrapper to Python's urllib.parse.quote() function.
(was used for unicode handling on Python 2)
It is better to use urlencode
here. Not much difference for single parameter but IMHO makes the code clearer. (It looks confusing to see a function quote_plus
! especially those coming from other languates)
In [21]: query='lskdfj/sdfkjdf/ksdfj skfj'
In [22]: val=34
In [23]: from urllib.parse import urlencode
In [24]: encoded = urlencode(dict(p=query,val=val))
In [25]: print(f"http://example.com?{encoded}")
http://example.com?p=lskdfj%2Fsdfkjdf%2Fksdfj+skfj&val=34
Docs
urlencode: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlencode
quote_plus: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote_plus