Basic method of optimizing hyperparameters

https://datascience.stackexchange.com/questions/26239

31-10-2019
|

Question

I recently read the LIPO blog post on the dlib blog: http://blog.dlib.net/2017/12/a-global-optimization-algorithm-worth.html

It mentions that it can be used for optimizing hyperparameters of eg metaheuristic algorithmsike simulated annealing or genetic algorithms.

I looked for info on how optimizing hyperparameters work in general and the Wikipedia page is the most informative I found but it doesn't answer my basic questions: https://en.m.wikipedia.org/wiki/Hyperparameter_optimization

My question is just: what is the basic idea for optimizing hyperparameters?

If I have some problem I'm trying to solve with simulated annealing, I know that the starting temperature and the cooldown rate are important in determining how well the algorithm does at finding a solution.

I know that I could completely run the algorithm with one set of parameters, modify one of the parameters, completely run it again, then reset the parameters and modify the other parameter and run it again. This could give me a numerical gradient that I could use to modify the parameters via gradient descent.

However... At this point I had to run the whole algorithm 3 times just to get a single modification of the hyperparameters.

I feel like I must be missing something obvious because optimizing the hyperparameters would take many many hundreds or thousands of times or more the cost of running the whole thing once, which doesn't seem useful at all. Can someone clue me in?

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange