Question

I'm working with numbers with tens of thousands of digits in python. The long type works beautifully in performing math on these numbers, however I'm unable to access the highest digits of these numbers in a sufficiently fast way. Note that I don't know exactly how many digits the number contains. The "highest digits" refers to the digits in the most significant place, the lowest digits can be accessed quickly using modulus.

I can think of two ways to access these digits in python but they're both too slow for my purposes. I have tried converting to a string and accessing digits through array methods, however type conversions are slow when you have 10,000+ digits. Alternatively I could simply mask out bits and truncate, but this requires that I know how many digits are in the long. Finding the number of digits in the long would require a loop over a counter and a mask test, this will surely be slower than string conversion.

From the description here it seems that the long type does in fact contain a bignum array. Is there some way I can access the underlying data structure that stores the long, or possibly check how many digits the long has from the base type?

If people are interested I can provide an example with benchmarks.

Was it helpful?

Solution

A simple approach without digging on low level implementation of the long type:

>>> n = 17**987273 # 1.2 million digits number

>>> digits = int(math.log10(n))

>>> k = digits - 24 # i.e. first 24 digits

>>> n / (10 ** k)
9953043281569299242668853L

Runs quite fast on my machine. I tried to get the string representation of this number and it takes a huge time.

For Python 3.x, use n // (10 ** k)

Some timings with this big number (It is 140 times faster):

%timeit s = str(n)[:24]
1 loops, best of 3: 57.7 s per loop

%timeit n/10**(int(math.log10(n))-24)
1 loops, best of 3: 412 ms per loop


# With a 200K digits number (51x faster)

%timeit s = str(n)[:24]
1 loops, best of 3: 532 ms per loop

%timeit n/10**(int(math.log10(n))-24)
100 loops, best of 3: 10.4 ms per loop


# With a 20K digits number (19x faster)

%timeit s = str(n)[:24]
100 loops, best of 3: 5.4 ms per loop

%timeit n/10**(int(math.log10(n))-24)
1000 loops, best of 3: 272 us per loop

OTHER TIPS

Python 2.7 has the bit_length() method on integers.

Here is a very ugly one liner that will extract the first few decimal digits:

(x >> (x.bit_length()-50))*(10**(math.fmod((x.bit_length()-50)*math.log(2)/math.log(10), 1)))

If your value for x is around 10,000 decimal digits long, you should get an answer accurate to around 12 digits or so. As x gets larger, your accuracy will decrease.

If you are willing to use external modules, I would look at gmpy2. The gmpy2 library provides access to the GMP (or MPIR) library for multiple-precision integer and fractional arithmetic, the MPFR library for multiple-precision floating point arithmetic, and the MPC library for multiple-precision complex arithmetic. gmpy2 integers are faster than Python's native longs and you can convert a long integer into a floating point number to extract just the leading digits. The above one liner just becomes:

gmpy2.mpfr(x).digits()[0]

The gmpy2 approach will retain accuracy even as the numbers become larger.

Disclaimer: I maintain gmpy2.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top