Question

When using Asynchronous Hyperparameter Optimization packages such as scikit optimize or hyperopt with cross validation (e.g., cv = 2 or 4) and setting the number of iteration to N (e.g., N=100), should I expect:

  1. Dependency between sequential iterations where the loss value improves sequentially (e.g., the optimized hyperparameters in iteration number 10 are better than the optimized hyperparameters generated in iteration number 9, etc.). In this case I should always select with the hyperparameters generated in the last iteration.

or

  1. Expect independency between iterations where after all 100 iterations are completed I should select the iteration with the smallest loss value.

If option a) is the right answer, that what does it mean if the best Hyperparameter are associated with iteration 50, does it mean that the data is not stable, or the loss function is ill-specified, and therefore, the hyperparameter optimization process outcome should not be trusted?

Was it helpful?

Solution

hyperopt proceeds sequentially, unless you let it use a parallel backend. Is N the max_evals parameter? Yes, you always want to select the hyperparams with the best validation loss. That's what it returns in the end. It may be that it found the best hyperparameters well before the final trial. It does not mean anything is wrong. It learns as it goes the distribution of the loss conditional on hyperparameters, and on purpose explores less-certain parts of the space that are most likely to yield improvement, but may not, especially towards the end of the search. This happens with entirely well-defined loss functions.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top