Question

(This is Project Euler Problem 14)

I'm trying to diversify my Python skills by using Itertools.

I want to get the index of a maximum returned value in a set number of iterations (1,000,000) from a generator in Python's itertools.

The function works, butt I can't figure out how to get to effectively get the maximum value without storing it in a massive one-million long list which I think could hardly be efficient to this. Is there an intelligent smart way? (edit: Maybe this is the easiest way?)

The current code is that at the moment it stops when the chain reaches 100 which is wrong.

#Project Euler Problem 14
#Which starting number, under one million, produces the longest chain?

import itertools

def length():
    x=1

    while True:
        l=0

        c=x #Create the running integer.
        while c>1:
            if c%2==0:
                #Number is even, divide it by two.
                c=c/2
                l+=1
            else:
                #Number is odd, multiply by three and add one.
                c=3*c+1
                l+=1

        yield l+1   #Add the final one as well and show the chain length.
        x+=1        #Increment


print list(itertools.takewhile(lambda x:x<100,length()))
Was it helpful?

Solution

I approached the problem as follows:

import functools

def euler_014(max_=1000000):
    longest = max_len = None
    for n in range(1, max_):
        length = len_collatz(n)
        if max_len is None or length > max_len:
            max_len, longest = length, n
    return longest

def memoize(f):
    cache = {}
    @functools.wraps(f)
    def func(*args):
        if args not in cache:
            cache[args] = f(*args)
        return cache[args]
    return func

@memoize
def len_collatz(n):
    if n == 1:
        return 1
    if n % 2:
        n = (n * 3) + 1
    else:
        n //= 2
    return 1 + len_collatz(n)

Here memoize makes things more efficient by storing previous results for len_collatz, len_collatz tells you the length of the sequence from n to 1 (without actually generating a list of the whole sequence) and euler_014 simply has to keep track of the longest length and the associated value of n. Note that it doesn't keep all of the results - just the length of the longest series so far and the value of n that produced that sequence.

OTHER TIPS

To add a second answer with a different approach; this one will minimise the memory footprint using a generator to produce the collatz sequence.

However, it is much slower. Running this takes ~60s, compared to ~5s for the first run of the memoized version (and ~1s for subsequent runs, once the cache has all the values in),

def collatz(n):
    while n > 1:
        yield n
        n = (n * 3) + 1 if n % 2 else n // 2
    yield 1

def euler_014_iter(max_=1000000):
    longest = max_len = None
    for n in range(1, max_):
        length = sum(1 for _ in collatz(n))
        if max_len is None or length > max_len:
            longest, max_len = n, length 
    return longest

I can't see a reason to use itertools.takewhile here. Takewhile allows for early termination, but you can't terminate early since you must check collatz(n) for each n under one million.

The standard max function will find the highest value in an iterable. A range object is an iterable that works in constant memory (ie., it doesn't use an internal list of all the values in the range.). I normally wouldn't post a solution publicly, but since this is only one of the first twenty problems, and there are already three distinct solutions posted here...

def collatz(n):
    return n // 2 if n%2 == 0 else 3*n + 1

def distance(n, cache={1:1}):
    if n not in cache: cache[n] = distance(collatz(n)) + 1
    return cache[n]

ans = max(range(1,1000000), key=distance)

print(ans)

You really should just use the standard max function. You don't know how long the longest chain will be, so taking the first chain longer than 100 is not guaranteed to work. It's just guess-and-check. Max finds the longest without storing everything in memory.

I should leave a disclaimer that I memoize 'distance'. That means that my code stores at least 1,000,000 values so that collatz(n) doesn't need to be re-calculated repeatedly. This costs a few MB of ram, but cuts execution from over a full minute to just a few seconds. It's very much worth it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top