Differential Evolution optimal tolerance parameter

https://datascience.stackexchange.com/questions/69025

09-12-2020
|

题

I am trying to optimize the parameters of a global optimization system for my set of data, because I will have a bunch of similar data to process so I need to fine tune the global optimizator so that it can find the minimum of a function in a reasonable time.

The parameter optimization takes a very long time, and it only has to be done once after which I will just use the best combination and just run those on the other similar datasets, hopefully finding the best value there too.

The global optimizator that I use is called differential evolution and I use the python/numpy/scipy package implementation of it. Here is the wikipedia definition and the relevant papers in the references.

The problem is that it's extremely slow to sample enough combinations of the parameters to find any kind of trend which would suggest me and kind of pattern that I should follow. My problem is especially with the tol parameter which is defined as:

np.std(pop) <= atol + tol * np.abs(np.mean(population_energies))

Which exits the optimizer if the standard deviation of the population goes below the average of the population energy.

The tricky part is that if the tol=0.01 as per default then the optimizer exits quickly and gives you a low accuracy value, so I run it multiple times, if I set it to a very low number then it doesn't exit at the above criteria but it exits at the maxiter value the maximum number of iterations the function is allowed to do. So either I run it with a big tolerance many times or with a small tolerance few times, the question is which exit clause is better?

What I am measuring is which tol value can give me the best value within a given amount of time, currently 2 hour per set, and I group them as {1e-01,1e-02,...,1e-10} batches, testing which batch will give me the best value in the given time.

I am testing it now as we speak however the low tolerance values are very hard to sample because it takes ~30min for one to be done, so I would need a sample size of 200 from each which would take just too much to compute. My data is 15 dimensional and I only allow 90000 evaluations in total per run (15*popsize*maxiter), which with a low tolerance rate like 1e-10 gets hit very often, which makes drawing samples very slow.

Is there any other way to find the best tol parameter, I would guess a higher one is better because exiting with a condition is more efficient than just brute forcing through an arbitrary set of cycles, but only if the shape of my data is favorable enough, and I have no way of knowing.

Is there any literature on what the rule of thumb best parameters are for the differential evolution global optimizer, that on average is optimal for all functions?

I suspect the optimal range to be around [1e-4,1e-0] but I have no way of proving it, but if it's true then optimizing the other parameters would be a piece of cake as calculations within this tolerance range are very quick.

解决方案

After 4 days of hard continuous computing the results suggest that indeed the tol=0.01 parameter seems to be the best one. This of course depends on the topology of the data and other factors but if the literature suggests that plus my own findings as well this might hint that it might be a universal value applicable for perhaps all shapes of data.

There is a very low correlation between the tolerance parameter and the output result with a sample size of >1000. The average value barely changes as we increase the tol parameter from 1e-10 to 1e-0 (0.9 being the max value I tried). However the computing time drastically lowers itself from around 7 minutes to 5 minutes per run. Since there is no change to the output value but only in the duration of the calculation it is rational to use 0.01 as a tolerance parameter. However above 0.01 the values actually start to increase, which means that 0.01 is as good as it gets.

As for premature terminations of the optimizer, this should not be a problem since this can still be calibrated with the recombination parameter (CR= crossover probability), by setting it to a low value for a delayed convergence and a high one for an earlier one. This in it's turn also has to be optimized.

One thing is worth noting is that using a Latin Hypercube as init values is not good. I have found much higher results with random sampling, much better coverage, and if there is a problem with clustering of the values, then just increase the number of runs. There is no way the differential evolution optimizer will find the best value in 1 run, so it has to be run multiple times either way.

Unfortunately I can't provide the data, but if anyone else has the time to verify these findings on their own data that would be good too.

其他提示

The docs say

https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.differential_evolution.html

"To improve your chances of finding a global minimum use higher popsize values, with higher mutation and dithering, but lower recombination values. This has the effect of widening the search radius, but slowing convergence."

So theres no global optimum for all data shapes; if you need to find the optimum then you need to take longer with the more bumpy, multi dimensional data shapes than with the more funnel shaped ones.

许可以下： CC-BY-SA 和归因

不隶属于 datascience.stackexchange