Question

I wrote this simple code below to compare the speed of Python and Fortran.

import time

start = time.time()

x = 0

for i in range(1,1000000000):
    x = x + i
    x = 0

end = time.time()

print end - start

I set x =0 because in Fortran it overflows. When I run the code by Fortran, it returns the answer in 9 seconds, but when I run the code by Python, Python takes all 24GB of my system's RAM and program crashes.

What's the problem?

Was it helpful?

Solution

You are using range(), which produces a list object of all integers in the range. You produced a list object with 10**9 integers in it. On a 64-bit machine that takes about 30GB to hold all those objects, provided your OS will let you allocate that much.

Use the xrange() function instead; this produces an object that only yields numbers as you iterate:

>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> xrange(10)
xrange(10)
>>> list(xrange(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Summing 1 billion integers will still take time on Python; you can use sum(xrange(10**9)) to delegate most of the work to C code here, but you'll still be producing 1 billion integer objects for the process.

My rather aging 2008 vintage Macbook Pro does this:

>>> import timeit
>>> timeit.timeit('sum(xrange(10**9))', number=3)
38.89629793167114

so taking, on average, 13 seconds to produce and destroy 1 billion int objects. The actual summing is dwarfed by those allocations and deallocations here.

Use NumPy for calculations involving large datasets instead; a NumPy array holds C datatypes, not Python objects, so will take vastly less memory and the library offers optimized routines that'll beat anything pure Python can do hands-down.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top