Question

I'm working with a genetic algorithm with 4 fitness functions. I have a file where each line is lists the fitness values for a single solution. The purpose of the genetic algorithm is to maximize these values.

I need to identify the non dominated solutions from those listed in the file. I have a function called "identify_pareto_observations" that is meant to identify the non dominated solutions for the genetic algorithm output. What I have is giving incorrect results, however. I have posted some code below that represents what I have in my program. I was hoping someone could help me identify where I am going wrong. If anyone could simply explain the process of identifying non dominated solutions in a 4 dimensional solution space that would be great!

Note that I struggled to understand the pseudo code that I had found to explain how to identify non dominated solutions so I simply used code that I had found here.

Thanks for the help/time!

import os
import sys
import numpy as np

fitness_observations_matrix = []
fitness_observations_matrix.append([-447.117,928641,-0,4131.33])
fitness_observations_matrix.append([-838.066,977440,-0,3859.25])
fitness_observations_matrix.append([-803.34,385226,-0,4799.27])
fitness_observations_matrix.append([-790.052,4919.54,-0,6672])
fitness_observations_matrix.append([-403.629,1.9091e+06,-0.775081,3011.89])
fitness_observations_matrix.append([-99.4555,413201,-0,7615.26])
fitness_observations_matrix.append([-78.8706,9472.03,-0,7071.58])
fitness_observations_matrix.append([-1096.82,127109,-0,7227.28])
fitness_observations_matrix.append([-1058.26,1.97533e+06,-0,4959.18])
fitness_observations_matrix.append([-1324.52,2.56381e+06,-2.66017,2669.81])
fitness_observations_matrix.append([-815.861,247594,-0,4674.39])
fitness_observations_matrix.append([-370.815,1.41723e+06,-0,6036.94])
fitness_observations_matrix.append([-202.34,1.21506e+06,-0,3587.2])
fitness_observations_matrix.append([-718.299,3.0879e+06,-0,3903.51])
fitness_observations_matrix.append([-1514.39,2.62657e+06,-0,6960.03])
fitness_observations_matrix.append([-815.018,414420,-0,-0])
fitness_observations_matrix.append([-747.015,2.5201e+06,-0.913635,7153])
fitness_observations_matrix.append([-246.017,683330,-0,10110.5])
fitness_observations_matrix.append([-5884.85,1.40051e+07,-8.74525,3037.08])
fitness_observations_matrix.append([-4045.09,5.83051e+06,-1.99262,4397.97])

def identify_pareto_observations(fitness_observations):

#Convert to np array
np_array_fitness_observations = np.array(fitness_observations)

 #Sort on first dimension
np_array_fitness_observations = np_array_fitness_observations[np_array_fitness_observations[:,0].argsort()]

# Add first row to pareto_frontier
    np_pareto_frontier = np_array_fitness_observations[0:1,:]

# Test next row against the last row in pareto_frontier
    for row in np_array_fitness_observations[1:,:]:
        if sum([row[x] >= np_pareto_frontier[-1][x] for x in range(len(row))]) == len(row):
            # If it is better on all features add the row to pareto_frontier
            np_pareto_frontier = np.concatenate((np_pareto_frontier, [row]))

#Return
print(np_pareto_frontier)

identify_pareto_observations(fitness_observations_matrix)
Was it helpful?

Solution

A solution x is pareto dominated if there exists a solution y where y is no worse than x in any dimension and is strictly better in some dimension. I'll assume you're trying to minimize in all dimensions for your problem. Then you can take the following approach (the code you posted has a similar flavor, but has a number of issues):

  1. Sort your data (in increasing order) in all dimensions (e.g. first sort by dimension 1, for solutions with the same dimension 1 value sort by dimension 2, for solutions with the same dimension 1 and 2 values sort by dimension 3, ...). This can be efficiently performed with the numpy.lexsort function. Note that in this sorted list no element can be pareto dominated by an element later in the list.
  2. Initialize your set of pareto non-dominated solutions, np_pareto_frontier, to the first sorted element. Due to the sorting in step 1, we know it's pareto non-dominated.
  3. Looping through the sorted list in order, add a row to np_pareto_frontier if it's not dominated by any solution in np_pareto_frontier -- since such elements are not dominated by anything in np_pareto_frontier and not dominated by anything later in the sorted list, they must be pareto non-dominated.

Here's an implementation of this approach:

import numpy as np
def identify_pareto_observations(fitness_observations):
    np_array_fitness_observations = np.array(fitness_observations)
    np_array_fitness_observations = np_array_fitness_observations[np.lexnp_array_fitness_observations[:,0].argsort()]
    np_pareto_frontier = np_array_fitness_observations[[0],:]
    for row in np_array_fitness_observations[1:,:]:
        dominated = False
        for nondom in np_pareto_frontier:
            if all([row[x] > nondom[x] for x in range(len(row))]):
                dominated = True
                break
        if not dominated:
            np_pareto_frontier = np.concatenate((np_pareto_frontier, [row]))
    return np_pareto_frontier

We can test it on your dataset:

print identify_pareto_observations(fitness_observations_matrix)
# [[ -8.15018000e+02   4.14420000e+05   0.00000000e+00   0.00000000e+00]
#  [ -1.32452000e+03   2.56381000e+06  -2.66017000e+00   2.66981000e+03]
#  [ -4.03629000e+02   1.90910000e+06  -7.75081000e-01   3.01189000e+03]
#  [ -5.88485000e+03   1.40051000e+07  -8.74525000e+00   3.03708000e+03]
#  [ -2.02340000e+02   1.21506000e+06   0.00000000e+00   3.58720000e+03]
#  [ -8.38066000e+02   9.77440000e+05   0.00000000e+00   3.85925000e+03]
#  [ -4.47117000e+02   9.28641000e+05   0.00000000e+00   4.13133000e+03]
#  [ -4.04509000e+03   5.83051000e+06  -1.99262000e+00   4.39797000e+03]
#  [ -8.15861000e+02   2.47594000e+05   0.00000000e+00   4.67439000e+03]
#  [ -8.03340000e+02   3.85226000e+05   0.00000000e+00   4.79927000e+03]
#  [ -1.05826000e+03   1.97533000e+06   0.00000000e+00   4.95918000e+03]
#  [ -3.70815000e+02   1.41723000e+06   0.00000000e+00   6.03694000e+03]
#  [ -7.90052000e+02   4.91954000e+03   0.00000000e+00   6.67200000e+03]
#  [ -1.51439000e+03   2.62657000e+06   0.00000000e+00   6.96003000e+03]
#  [ -7.88706000e+01   9.47203000e+03   0.00000000e+00   7.07158000e+03]
#  [ -7.47015000e+02   2.52010000e+06  -9.13635000e-01   7.15300000e+03]
#  [ -1.09682000e+03   1.27109000e+05   0.00000000e+00   7.22728000e+03]
#  [ -9.94555000e+01   4.13201000e+05   0.00000000e+00   7.61526000e+03]
#  [ -2.46017000e+02   6.83330000e+05   0.00000000e+00   1.01105000e+04]]

As we can see, you only had one pareto dominated solution, the row [ -7.18299000e+02 3.08790000e+06 0.00000000e+00 3.90351000e+03]. It is dominated by row [ -1.32452000e+03 2.56381000e+06 -2.66017000e+00 2.66981000e+03].

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top