Python extension - construct and inspect large integers efficiently

Question 1

The underscore prefix largely means the same thing in the C API as in normal Python: "this function is an implementation detail subject to change, so watch yourself if you use it". You're not forbidden to use such functions, and if it's the only way to achieve a particular goal (e.g. significant efficiency gains in your case), then it's fine to use the API as long as you are aware of the hazard.

If the _PyLong_FromByteArray API was truly private, it would be a static function and wouldn't be fully documented and exported in longobject.h. In fact, Tim Peters (a well-known Python core developer) explicitly blesses its use:

[Dan Christensen]

My student and I are writing a C extension that produces a large integer in binary which we'd like to convert to a python long. The number of bits can be a lot more than 32 or even 64. My student found the function _PyLong_FromByteArray in longobject.h which is exactly what we need, but the leading underscore makes me wary. Is it safe to use this function?

Python uses it internally, so it better be ;-)

Will it continue to exist in future versions of python?

No guarantees, and that's why it has a leading underscore: it's not an officially supported, externally documented, part of the advertised Python/C API. It so happens that I added that function, because Python needed some form of its functionality internally across different C modules. Making it an official part of the Python/C API would have been a lot more work (which I didn't have time for), and created an eternal new maintenance burden (which I'm not keen on regardless ;-)).

In practice, few people touch this part of Python's implementation, so I don't /expect/ it will go away, or even change, for years to come. The biggest insecurity I can think of offhand is that someone may launch a crusade to make some other byte-array <-> long interface "official" based on a different way of representing negative integers. But even then I expect the current unofficial functions to remain, since the 256's-complement representation remains necessary for the struct module's "q" format, and for the pickle module's protocol=2 long serialization format.

Or is there some other method we should use?

No. That's why these functions were invented to begin with ;-)

Here's the documentation (from Python 3.2.1):

/* _PyLong_FromByteArray:  View the n unsigned bytes as a binary integer in
   base 256, and return a Python long with the same numeric value.
   If n is 0, the integer is 0.  Else:
   If little_endian is 1/true, bytes[n-1] is the MSB and bytes[0] the LSB;
   else (little_endian is 0/false) bytes[0] is the MSB and bytes[n-1] the
   LSB.
   If is_signed is 0/false, view the bytes as a non-negative integer.
   If is_signed is 1/true, view the bytes as a 2's-complement integer,
   non-negative if bit 0x80 of the MSB is clear, negative if set.
   Error returns:
   + Return NULL with the appropriate exception set if there's not
     enough memory to create the Python long.
*/
PyAPI_FUNC(PyObject *) _PyLong_FromByteArray(
    const unsigned char* bytes, size_t n,
    int little_endian, int is_signed);

The main reason it's an "underscore-prefixed" API is because it depends on the implementation of the Python long as an array of words in a power-of-two base. This isn't likely to change, but since you're implementing an API on top of this, you can insulate your callers from changes in the Python API later on.

Question 2

Sounds like you need PyNumber_Long. Some doc hits are here.