Question

I was curious how one could do this given a set of 5-6 parameters. The outcome is evaluated on finding the largest value increase.

The number of combinations seems enormous due to the number of parameters I have. But is my choice just to use a for loop?

I have been building on the grid search strategy in this assignment (just using for loops), but have many more variables.

http://nbviewer.ipython.org/github/cs109/content/blob/master/HW3.ipynb

Was it helpful?

Solution

If you have a complex black box function, and you need to tweak the inputs in a non-linear way, then this sounds like it might be a good application for using using a genetic algorithm (GA).

What is a genetic algorithm?

In general, you would encode one possible combination of your parameters as a series of bits. This would be the 'DNA' of your solution. For example, you might assign 8 bits to each value, param 1 as bits 0-7, param 2 as bits 8-15 etc.

After this you create a large population of totally random solutions (say 10,000) and evaluate each with your function. After this each possible solution gets a relative fitness to it's peers. The function that decides this is called the fitness function, and is the target function we are attempting to optimise.

Based on this relative frequency you pick 5,000 pairs of solutions based on this frequency, so that the best are picked more often, but even the worst get through sometimes. You 'breed' each of these pairs, by picking two random snip points on the bit DNA string, and swapping the portion from one to the other to produce two new offspring in the next population.

After this, you repeat the whole process again and again until you get bored, or the top performing solution is good enough.

Why does it work?

In general, the best solutions get combined more often. The process of combining them allows different successful solutions to be blended together to create something potentially better than either parent.

Is a GA the right tool?

The upside of using a GA is it will attempt to optimise whatever you throw at it without really understanding it. It also isn't concerned with how many independent parameters you have. If you have a 100 parameters, then nested loops are really out of a question. The thing that matters is how fast your fitness function is to evaluate. The faster it is, the more iterations you can calculate per second.

The downside is there is a lot of machinery to create to get a GA working in the first place, and the solution is not guaranteed to be optimal, or arrive in any particular time frame.

In practice however, they tend to do surprisingly well on complex problems were you can't afford to search the entire possibility space. It's a nice tool to have in your tool-box, as you can throw it at almost anything.

An example:

Here is a small, simplish GA example. The code above the line is generic, but below we solve the problem of finding a loyal animal with lots of legs, that isn't smelly or aggressive. This is encoded as 4 bytes.

import random


class DNA(object):
    score = 0

    def __init__(self, bits):
        self.bits = bits
        self.length = len(bits)

    # https://en.wikipedia.org/wiki/Crossover_%28genetic_algorithm%29#Two-point_crossover
    def two_point_crossover(self, other):
        start = random.randint(0, self.length - 1)
        end = random.randint(start, self.length - 1)

        child_1 = DNA(self.bits[:start] + other.bits[start:end] + self.bits[end:])
        child_2 = DNA(other.bits[:start] + self.bits[start:end] + other.bits[end:])
        return child_1, child_2

    # https://en.wikipedia.org/wiki/Mutation_%28genetic_algorithm%29
    def mutate(self, probability):
        self.bits = [bit if random.random() > probability else not bit for bit in self.bits]

    # Allow us to 'breed' two strings by doing dna_1 * dna_2
    def __mul__(self, other):
        return self.two_point_crossover(other)

    @staticmethod
    def decode_byte(bits):
        out = 0
        for bit in reversed(bits):
            out = out << 1
            out += bit

        return out

    def as_bytes(self):
        return [DNA.decode_byte(self.bits[start:start+8]) for start in xrange(0, self.length, 8)]


class Population(object):
    cumulative_scores = None
    total_score = 0
    best_score = 0
    best_member = None

    def __init__(self, members):
        self.members = members
        self.length = len(members)

    def rate(self, fitness_function):
        self.cumulative_scores = []
        self.total_scores = 0

        for member in self.members:
            score = fitness_function(member)
            member.score = score
            self.total_score += score
            self.cumulative_scores.append((self.total_score, member))

            if score > self.best_score:
                self.best_score = score
                self.best_member = member

    # https://en.wikipedia.org/wiki/Fitness_proportionate_selection
    def roulette_wheel_selection(self):
        pick = random.uniform(0, self.total_score)

        current = 0
        pos = 0

        while current < pick:
            (score, member) = self.cumulative_scores[pos]
            pos += 1
            current = score

        return member

    def mutate(self, mutation_rate):
        for member in self.members:
            member.mutate(mutation_rate)


class GeneticAlgorithm(object):
    def __init__(self, fitness_function, population, mutation_rate=0.01):
        self.population = population
        self.fitness_function = fitness_function
        self.mutation_rate=0.01

    def mutate(self):
        self.population.mutate(self.mutation_rate)

    def rate(self):
        self.population.rate(self.fitness_function)

    def breed(self):
        new_members = []
        while len(new_members) < self.population.length:
            parent_1 = self.population.roulette_wheel_selection()
            parent_2 = self.population.roulette_wheel_selection()

            new_members += parent_1 * parent_2

        self.population = Population(new_members[:self.population.length])


#----------------------------------------#
# My problem here 

def my_fitness(dna):
    angry, legs, smelly, loyalty = dna.as_bytes()

    score = float((510 - angry - smelly) * loyalty * legs) / float(510*255*255)
    return score


pop = Population([DNA([0] * 8 * 4) for _ in range(1000)])
pop.mutate(0.5)     # Totally randomise our starting set
ga = GeneticAlgorithm(my_fitness, pop)

for _ in range(100):
    ga.mutate()
    ga.rate()
    print "Best so far:", ga.population.best_score
    ga.breed()

# Finished!

ga.rate()
angry, legs, smelly, loyalty = ga.population.best_member.as_bytes()

print "Best score: ", ga.population.best_score
print "Angry", angry, "Legs", legs, "Smelly", smelly, "Loyalty", loyalty
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top