Question

Dear Stack Exchangers,

I encountered a strange result while attempting to calculate the sum of values in a python dictionary. If my dictionary gets past a certain size, the following function: sum(dict.values()) appears to give incorrect results. It appears that the result becomes suddenly negative for no apparent reason. I am using python 2.7 on Windows 7. (I apologize in advance for my coding style).

Please note that I encountered this behaviour while working on https://projecteuler.net/problem=72, but I am not asking for help to get the answer, I am just perplexed by the behaviour of the built-in function. (I also apologise for posting a partial solution on a public forum, please look away now if you don't want any hints).

The goal of the program is explained on the project Euler link (above), but I will attempt to briefly explain my code:

The first function uses a Sieve of Erasthenes to produce a list of prime numbers and a modified sieve to produce a dictionary of {composite_number:[prime_factor_list]} within a specified range.

The second function attempts to count the number of fractions of the form n/d that can be produced where n < d and d <= 1000000. The problem states that I should only count reduced proper fractions, so the bulk of this function is concerned with weeding out reducible fractions. My strategy is to loop through each numerator between 1 and d-1, and discard unsuitable denominators. For primes this is simple, and for non-primes I am close to a working solution, but still discard some values more than once. For the purposes of this post, the important detail is the way that I tally up the count:

Initially I used a simple counter (initialise count to 0 then increment as needed), but decided to try a dictionary instead. What surprised me was that the two methods gave different results, but only when the upper limit (d) exceeded a certain size. I probed deeper and managed to isolate the exact moment that the counts diverge. The line if 88000 < i < 88055: near the bottom identifies the point at which the sum of dict values begins to differ from the simple count. For values up to i = 88032, the value are the same, but when i = 88033, the values diverge dramatically:

from collections import defaultdict

def primeset(limit):
    pr = [0]*(limit+1)
    for i in range(2,limit+1):
        j = i
        i += j
        while i <= limit:
            pr[i] = 1
            i += j
    primes = [k for k in range(2,limit+1) if pr[k] == 0]
    composites = defaultdict(list)
    for p in primes:
        q = p
        p += q
        while p <= limit:
            composites[p].append(q)
            p += q
    return primes, composites


def method2(limit):
    primes, composites = primeset(limit)
    prf = {}
    count = 0
    count += limit-1
    count += (limit-2)/2
    prf[1] = limit-1
    prf[2] = (limit-2)/2    
    for i in primes:
        if i != 2:
            tally = limit-i-(limit/i)+1
            count += tally
            prf[i] = tally
    for i in composites:
        tally = limit-i
        for item in composites[i]:
            tally -= (limit/item-i/item)
        count += tally
        prf[i] = tally
        if 88000 < i < 88055:
            print i, count, tally, sum(prf.values())
    return count, prf

result, index = method2(88547)

print result,sum(index.values())

I expect I have done something really stupid, but I felt compelled to put it out there in case something really is amiss.

Regards,

Was it helpful?

Solution

You are having a problem with integer overflow, which in Python is not supposed to happen. You have a 32-bit machine, so the largest normal integer is (2^31 - 1). Once your calculation exceeds that Python should automatically switch to doing calculations using a long which isn't limited in the size of the number that it can support. I only have 64-bit machines, but the same thing applies except the max integer is (2^63 - 1). You can tell from the shell when you have a long because of the L that is printed after the number. Here is an example from my shell:

>>> 2**62 - 1 + 2**62  # This is max int
9223372036854775807
>>> 2**63  # This is a long
9223372036854775808L
>>> num = 2**62 - 1 + 2**62
>>> num
9223372036854775807
>>> num+1
9223372036854775808L
>>> d = {1:2**62,2:-1,3:2**62}
>>> sum(d.values())
9223372036854775807
>>> d = {1:2**62,2:-1,3:2**62,4:1}
>>> sum(d.values())
9223372036854775808L

In my case with Python 2.7 on Linux on a 64-bit machine this all works as expected.

Now I run the same thing using Spyder and I get the wrong answer:

>>> d = {1:2**62,2:-1,3:2**62,4:1}
>>> sum(d.values())
-9223372036854775808

It promotes correctly when I just do normal addition, but this sum from a dictionary gives the wrong answer. This isn't specific to dictionaries, just the sum function. Same thing with an list:

>>> list = [2**62, -1, 2**62, 1]
>>> sum(list)
-9223372036854775808

So the problem is isolated to the sum() function in Spyder and happens for both 32 and 64-bit machines.

The real answer turns out that Spyder automatically imports numpy. Numpy has its own version of the sum function. It is described as follows "Arithmetic is modular when using integer types, and no error is raised on overflow." You are using that version of sum and it is causing the problem. If you don't want to use that sum you can put the following at the top of your file:

from __builtin__ import sum

That will cause the built-in version of sum to be used and you will get the correct answer.

To figure out that sum was not coming from where I thought it was coming from I could have used the following:

>>> import inspect
>>> inspect.getmodule(sum)
<module 'numpy.core.fromnumeric' from '/usr/lib/python2.7/dist-packages/nump/core/fromnumeric.pyc'>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top