A solution x
is pareto dominated if there exists a solution y
where y
is no worse than x
in any dimension and is strictly better in some dimension. I'll assume you're trying to minimize in all dimensions for your problem. Then you can take the following approach (the code you posted has a similar flavor, but has a number of issues):
- Sort your data (in increasing order) in all dimensions (e.g. first sort by dimension 1, for solutions with the same dimension 1 value sort by dimension 2, for solutions with the same dimension 1 and 2 values sort by dimension 3, ...). This can be efficiently performed with the
numpy.lexsort
function. Note that in this sorted list no element can be pareto dominated by an element later in the list. - Initialize your set of pareto non-dominated solutions,
np_pareto_frontier
, to the first sorted element. Due to the sorting in step 1, we know it's pareto non-dominated. - Looping through the sorted list in order, add a row to
np_pareto_frontier
if it's not dominated by any solution innp_pareto_frontier
-- since such elements are not dominated by anything innp_pareto_frontier
and not dominated by anything later in the sorted list, they must be pareto non-dominated.
Here's an implementation of this approach:
import numpy as np
def identify_pareto_observations(fitness_observations):
np_array_fitness_observations = np.array(fitness_observations)
np_array_fitness_observations = np_array_fitness_observations[np.lexnp_array_fitness_observations[:,0].argsort()]
np_pareto_frontier = np_array_fitness_observations[[0],:]
for row in np_array_fitness_observations[1:,:]:
dominated = False
for nondom in np_pareto_frontier:
if all([row[x] > nondom[x] for x in range(len(row))]):
dominated = True
break
if not dominated:
np_pareto_frontier = np.concatenate((np_pareto_frontier, [row]))
return np_pareto_frontier
We can test it on your dataset:
print identify_pareto_observations(fitness_observations_matrix)
# [[ -8.15018000e+02 4.14420000e+05 0.00000000e+00 0.00000000e+00]
# [ -1.32452000e+03 2.56381000e+06 -2.66017000e+00 2.66981000e+03]
# [ -4.03629000e+02 1.90910000e+06 -7.75081000e-01 3.01189000e+03]
# [ -5.88485000e+03 1.40051000e+07 -8.74525000e+00 3.03708000e+03]
# [ -2.02340000e+02 1.21506000e+06 0.00000000e+00 3.58720000e+03]
# [ -8.38066000e+02 9.77440000e+05 0.00000000e+00 3.85925000e+03]
# [ -4.47117000e+02 9.28641000e+05 0.00000000e+00 4.13133000e+03]
# [ -4.04509000e+03 5.83051000e+06 -1.99262000e+00 4.39797000e+03]
# [ -8.15861000e+02 2.47594000e+05 0.00000000e+00 4.67439000e+03]
# [ -8.03340000e+02 3.85226000e+05 0.00000000e+00 4.79927000e+03]
# [ -1.05826000e+03 1.97533000e+06 0.00000000e+00 4.95918000e+03]
# [ -3.70815000e+02 1.41723000e+06 0.00000000e+00 6.03694000e+03]
# [ -7.90052000e+02 4.91954000e+03 0.00000000e+00 6.67200000e+03]
# [ -1.51439000e+03 2.62657000e+06 0.00000000e+00 6.96003000e+03]
# [ -7.88706000e+01 9.47203000e+03 0.00000000e+00 7.07158000e+03]
# [ -7.47015000e+02 2.52010000e+06 -9.13635000e-01 7.15300000e+03]
# [ -1.09682000e+03 1.27109000e+05 0.00000000e+00 7.22728000e+03]
# [ -9.94555000e+01 4.13201000e+05 0.00000000e+00 7.61526000e+03]
# [ -2.46017000e+02 6.83330000e+05 0.00000000e+00 1.01105000e+04]]
As we can see, you only had one pareto dominated solution, the row [ -7.18299000e+02 3.08790000e+06 0.00000000e+00 3.90351000e+03]
. It is dominated by row [ -1.32452000e+03 2.56381000e+06 -2.66017000e+00 2.66981000e+03]
.