Is using `str` the correct idiom for working with digits in Python
-
21-12-2019 - |
Question
I understand that one way to work with the digits of a number in Python is to convert the number to a string, and then use string methods to slice the resulting "number" into groups of "digits". For example, assuming I have a function prime
that tests primality, I can confirm that an integer n is both a left and right truncatable prime with
all(prime(int(str(n)[:-i])) and prime(int(str(n)[i:])) for i in range(1, len(str(n))))
This method involves first converting n to a string so that it can be sliced, and converting that slice back to an integer so that its primality can be checked. Perhaps it's my history with statically typed languages, or some vague idea that strings are "expensive", or experience with languages that include builtin functionality for similar manipulations (like Mathematica's IntegerDigits
and FromDigits
); but I'm left wondering whether this is the right way to go about such tasks.
Is conversion back and forth between stings and numbers the correct — or even only — approach for accessing digits in Python. Are there more efficient approaches?
Solution
This has always been my approach and it's worked fine, though I've never done much testing for speed. It works particularly well when needing to iterate over permutations/combinations of digits, since you can build up such strings with the functions in the itertools package.
There are certainly other methods that involve less straightforward mathematical operations, but unless speed is absolutely crucial I feel the string method is the most Pythonic.
Here, for example, is a more mathematical approach, where a and b are indexed from the right (ie ones place is 0, tens place is 1, etc):
def getSubdigits(n, a, b):
n %= 10 ** a
n //= 10 ** b
return n
For this to work with the same indexing as string slicing, you'd need to find the total number of digits first, and the function becomes:
def getSubdigits2(n, a, b):
l = int(math.ceil(math.log10(n)))
n %= 10 ** (l - a)
n //= 10 ** (l - b)
return n
And the string slicing equivalent:
def subDigits3(n, a, b):
return int(str(n)[a:n])
Here's timing results:
subDigits
: 0.293327726114subDigits2
: 0.850861833337subDigits3
: 0.990543234267
My takeaway from that result is that the slicing method is fine unless you really care about speed, in which case you need to use the first method and think about the indices in the other direction.
OTHER TIPS
In your example code, you could get away using divmod
rather than string slicing the digits. divmod(x, y)
returns the tuple x//y, x%y
, which for y
values that are 10**i
is exactly what you want for the left and right pieces of your number. This isn't necessarily more Pythonic, though it might be a bit faster.
sn = str(n)
all(prime(int(sn[:i])) and prime(int(sn[i:])) for i in range(1, len(sn))) # original
all(all(map(prime, divmod(n, 10**i))) for i in range(1, len(sn))) # variant using divmod
I think for more general digit operations, using str
is probably pretty sensible, as doing lots of math on powers of your numerical base is likely to be harder to understand than doing stuff directly on the digits in a string.
Write code to be read, unless it's really performance sensitive.
Python's native integers are stored in a power-of-2 base, so it takes real work to convert them to or from a decimal notation. In many kinds of "puzzle" ;-) problems, which require frequent access to the decimal digits of very large integers, it can make a world of difference to use the decimal
module instead. That stores values in a power-of-10 base, so that "converting" to/from decimal digits is a trivial expense.
>>> import decimal
>>> x = decimal.Decimal('1e20') - 1
>>> x
Decimal('99999999999999999999')
>>> x.as_tuple().digits
(9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9)
That takes time linear in the number of digits. Converting a native integer to/from decimal takes time quadratic in the number of digits.
Best I can guess about your specific application here, though, using divmod()
is really the best approach.
How about this?
def digits(n):
if n == 0:
yield 0
else:
while n:
yield n % 10
n //= 10
for digit in digits(123):
# do something with digit
This should be both more convenient and more efficient than the example you showed.
EDIT: I want to add two more things.
0) You can extend the technique as needed. Suppose you want to test for right-truncatable primes. Assuming you have a function is_prime()
:
def right_trunc_int_values(n):
if n == 0:
yield 0
else:
while n:
yield n
n //= 10
assert(all(is_prime(n) for n in right_trunc_int_values(317))
1) To solve the general problem of conveniently working with digits, you might be able to use the decimal
module. I'll look into this more. But meanwhile, you can use my digits()
function to make an indexable list of digits:
d = list(digits(123))
print(d[2]) # prints 2
EDIT: It's pretty easy to convert a series of digits to an integer value.
def int_from_digits(digits):
result = 0
found_a_digit = False
for digit in reversed(digits):
result = result * 10 + digit
return result
def is_right_trunc_prime(n):
d = list(digits(n))
return all(is_prime(int_from_digits(d[:i]) for i in range(len(d), -1, -1)))
# example from question of left and right truncatable check
d = list(digits(n))
all(prime(int_from_digits(d[:-i])) and prime(int_from_digits(d[i:])) for i in range(1, len(d)))
Testing left truncatable primes without str()
and slicing:
def is_prime(n):
if n < 2:
return False
elif n == 2:
return True
elif n % 2 == 0:
return False
return all(n % x for x in xrange(3,int(pow(n,0.5))+1,2))
def is_left_truncatable_prime(n):
largest_power_of_ten = 1
while largest_power_of_ten < n:
largest_power_of_ten *= 10
while True:
largest_power_of_ten /= 10 # Use // in Python 3
if is_prime(n):
n %= largest_power_of_ten
if n == 0:
return True
else:
return False
print is_left_truncatable_prime(167) # True
print is_left_truncatable_prime(173) # True
print is_left_truncatable_prime(171) # False
I haven't extensively tested this so sorry if there are any errors. Let me know if there are and I will fix them.
EDIT: Fixed up the code a bit.