Optuna vs Hyperopt: Which Hyperparameter Optimization Library Should You Choose?

For this study, I tried to find the best parameters within 100 run budget.

I ran 6 experiments:

To train a model on a set of parameters you need to run something like this:

For this study, I tried to find the best parameters within 100 run budget .

I ran 6 experiments:

  • Random search (from hyperopt) as a reference
  • Tree of Parzen Estimator search strategies for both Optuna and Hyperopt
  • Adaptive TPE from Hyperopt
  • TPE from Optuna with a pruning callback for more runs but within the same time frame. It turns out that 400 runs with pruning takes as much time as 100 runs without it.
  • Optuna with Random Forest surrogate model from skopt.Sampler

Both Optuna and Hyperopt improved over the random searchwhich is good.

TPE implementation from Optuna was slightly better than Hyperopt’s Adaptive TPE but not by much. On the other hand, when running hyperparameter optimization, those small improvements are exactly what you are going for.

What is interesting is that TPE implementation from HPO and Optuna give vastly different results on this problem. Maybe the cutoff point between good and bad parameter configurations λ is chosen differently or sampling methods have defaults that work better for this particular problem.

Moreover, using pruning decreased training time by 4x . I could run 400 searches in the time that runs 100 without pruning. On the flip side, using pruning got a lower score . It may be different for your problem but it is important to consider that when making a decision whether to use pruning or not.

For this section, I assigned points based on the improvements over the random search strategy.

  • Hyperopt got (0.850–0.844)*100 = 6
  • Optuna got (0.854–0.844)*100 = 10

Experimental results:

Optuna = Hyperopt

