python prime factorization performance

https://stackoverflow.com/questions/19310061

30-06-2022
|

سؤال

I'm relatively new to python and I'm confused about the performance of two relatively simple blocks of code. The first function generates a prime factorization of a number n given a list of primes. The second generates a list of all factors of n. I would have though prime_factor would be faster than factors (for the same n), but this is not the case. I'm not looking for better algorithms, but rather I would like to understand why prime_factor is so much slower than factors.

def prime_factor(n, primes):
  prime_factors = []
  i = 0
  while n != 1:
      if n % primes[i] == 0:
          factor = primes[i]
          prime_factors.append(factor)
          n = n // factor
      else: i += 1
  return prime_factors

import math
def factors(n):
  if n == 0: return []
  factors = {1, n}
  for i in range(2, math.floor(n ** (1/2)) + 1):
      if n % i == 0:
          factors.add(i)
          factors.add(n // i)
  return list(factors)

Using the timeit module,

{ i:factors(i) for i in range(1, 10000) } takes 2.5 seconds

{ i:prime_factor(i, primes) for i in range(1, 10000) } takes 17 seconds

This is surprising to me. factors checks every number from 1 to sqrt(n), while prime_factor only checks primes. I would appreciate any help in understanding the performance characteristics of these two functions.

Thanks

Edit: (response to roliu) Here is my code to generate a list of primes from 2 to up_to:

def primes_up_to(up_to):
  marked = [0] * up_to
  value = 3
  s = 2
  primes = [2]
  while value < up_to:
      if marked[value] == 0:
          primes.append(value)
          i = value
          while i < up_to:
              marked[i] = 1
              i += value
      value += 2
  return primes

المحلول

Without seeing what you used for primes, we have to guess (we can't run your code).

But a big part of this is simply mathematics: there are (very roughly speaking) about n/log(n) primes less than n, and that's a lot bigger than sqrt(n). So when you pass a prime, prime_factor(n) does a lot more work: it goes through O(n/log(n)) operations before finding the first prime factor (n itself!), while factors(n) gives up after O(sqrt(n)) operations.

This can be very significant. For example, sqrt(10000) is just 100, but there are 1229 primes less than 10000. So prime_factor(n) can need to do over 10 times as much work to deal with the large primes in your range.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow