Using python's urllib.quote_plus on utf-8 strings with 'safe' arguments

https://stackoverflow.com/questions/22415345

15-06-2023
|

Question

I have a unicode string in python code:

name = u'Mayte_Martín'

I would like to use it with a SPARQL query, which meant that I should encode the string using 'utf-8' and use urllib.quote_plus or requests.quote on it. However, both these quote functions behave strangely as can be seen when used with and without the 'safe' arguments.

from urllib import quote_plus

Without 'safe' argument:

quote_plus(name.encode('utf-8'))
Output: 'Mayte_Mart%C3%ADn'

With 'safe' argument:

quote_plus(name.encode('utf-8'), safe=':/')
Output: 
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-164-556248391ee1> in <module>()
----> 1 quote_plus(v, safe=':/')

/usr/lib/python2.7/urllib.pyc in quote_plus(s, safe)
   1273         s = quote(s, safe + ' ')
   1274         return s.replace(' ', '+')
-> 1275     return quote(s, safe)
   1276 
   1277 def urlencode(query, doseq=0):

/usr/lib/python2.7/urllib.pyc in quote(s, safe)
   1264         safe = always_safe + safe
   1265         _safe_quoters[cachekey] = (quoter, safe)
-> 1266     if not s.rstrip(safe):
   1267         return s
   1268     return ''.join(map(quoter, s))

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)

The problem seems to be with rstrip function. I tried to make some changes and call as...

quote_plus(name.encode('utf-8'), safe=u':/'.encode('utf-8'))

But that did not solve the issue. What could be the issue here?

Solution

I'm answering my own question, so that it may help others who face the same issue.

This particular issue arises when you make the following import in the current workspace before executing anything else.

from __future__ import unicode_literals

This has somehow turned out to be incompatible with the following sequence of code.

from urllib import quote_plus

name = u'Mayte_Martín'
quote_plus(name.encode('utf-8'), safe=':/')

The same code without importing unicode_literals works fine.

OTHER TIPS

According to this bug, here is the workaround:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from urllib import quote_plus
name = u'Mayte_Martín'
quote_plus(name.encode('utf-8'), safe=':/'.encode('utf-8'))

You must encode both argument in quote or quote_plus method to utf-8

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import urllib
name = u'Mayte_Martín'
print urllib.quote_plus(name.encode('utf-8'), safe=':/')

works without problem for me (Py 2.7.9, Debian)

(I don't know the answer, but I cannot make comments with regard to reputation)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow