Dealing with ctypes and ASCII strings when porting Python 2 code to Python 3

Question

ctypes, like most things in Python 3, intentionally doesn't automatically convert between unicode and bytes. That's because in most use cases, that would just be asking for the same kind of mojibake or UnicodeEncodeError disasters that people switched to Python 3 to avoid.

However, when you know you're only dealing with pure ASCII, that's another story. You have to be explicit—but you can factor out that explicitness into a wrapper.

As explained in Specifying the required argument types (function prototypes), in addition to a standard ctypes type, you can pass any class that has a from_param classmethod—which normally returns an instance of some type (usually the same type) with an _as_parameter_ attribute, but can also just return a native ctypes-type value instead.

class Asciifier(object):
    @classmethod
    def from_param(cls, value):
        if isinstance(value, bytes):
            return value
        else:
            return value.encode('ascii')

This may not be the exact rule you want—for example, it'll fail on bytearray (just as c_char_p will) even though that could be converted quietly to bytes… but then you wouldn't want to implicitly convert an int to bytes. Anything, whatever rule you decide on should be easy to code.

Here's an example (on OS X; you'll obviously have to change how libc is loaded for linux, Windows, etc., but you presumably know how to do that):

>>> libc = CDLL('libSystem.dylib')
>>> libc.atoi.argtypes = [Asciifier]
>>> libc.atoi.restype = c_int
>>> libc.atoi(b'123')
123
>>> libc.atoi('123')
123
>>> libc.atoi('１２３') # Unicode fullwidth digits
ArgumentError: argument 1: <class 'UnicodeEncodeError'>: 'ascii' codec can't encode character '\uff10' in position 0: ordinal not in range(128)
>>> libc.atoi(123)
ArgumentError: argument 1: <class 'AttributeError'>: 'int' object has no attribute 'encode'

Obviously you can catch the exception and raise a different one if those aren't clear enough for your use case.

You can similarly write a Utf8ifier, or an Encodifier(encoding, errors=None) class factory, or whatever else you need for some particular library and stick it in the argtypes the same way.

If you also want to auto-decode return types, see Return types and errcheck.

One last thing: When you're sure the data are supposed to be UTF-8, but you want to deal with the case where they aren't in the same way Python 2.x would (by preserving them as-is), you can even do that in 3.x. Use the aforementioned Utf8ifier as your argtype, and a decoder errcheck, and use errors=surrogateescape. See here for a complete example.