M2Crypto RSA Unicode Strings Encoded Differently Than Byte Strings

https://stackoverflow.com/questions/12269412

30-06-2021
|

题

I was encoding emails to be used with an external website's API using Python M2Crypto's RSA with PKCS1 padding. When using unicode, the encoded emails returned no results from the API, but when I used str(unicode_email), I received the correct information.

I was under the impression that both unicode and byte representations of a string should have worked in this case. Does anyone know why the unicode fails?

Code for reference:

from M2Crypto import RSA
email = u'email@example.com'  #fails
email = str(email)  # succeeds 
rsa = RSA.load_pub_key('rsa_pubkey.pem')
result = rsa.public_encrypt(email, RSA.pkcs1_padding).encode('base64')

解决方案

The M2Crypto module deals exclusively with opaque bytes, which are values between 0 and 255, represented as the python str type.

The Python 2.x str type consists of such bytes, but the unicode type is a different beast altogether. You can easily convert between the two by using the .decode() method and it's mirror method .encode().

When you call str() on a unicode object, it makes the conversion by applying the default encoding, in essence it calls email.encode(sys.getdefaultencoding()). That's fine for your all-ASCII email address, but you're bound to run into UnicodeEncodeError exceptions with anything else. Better stick to using the explicit methods only.

Note that you probably have to set the encoding you used on the MIME headers of the email you send.

I strongly recommend you read up on the all this in the Python Unicode HOWTO.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow