HyperparameterHunter

Automatically save and learn from Experiment results, leading to long-term, persistent optimization that remembers all your tests.

HyperparameterHunter provides a wrapper for machine learning algorithms that saves all the important data. Simplify the experimentation and hyperparameter tuning process by letting HyperparameterHunter do the hard work
of recording, organizing, and learning from your tests — all while using the same libraries you already do. Don't let any of your experiments go to waste, and start doing hyperparameter optimization the way it was meant to be.

Features

Stop worrying about keeping track of hyperparameters, scores, or re-running the same Experiments

Use the libraries and utilities you already love

How to Use HyperparameterHunter

Don’t think of HyperparameterHunter as another optimization library that you bring out only when its time to do hyperparameter optimization. Of course, it does optimization, but its better to view HyperparameterHunter as your own personal machine learning toolbox/assistant.

The idea is to start using HyperparameterHunter immediately. Run all of your benchmark/one-off experiments through it.

The more you use HyperparameterHunter, the better your results will be. If you just use it for optimization, sure, it’ll do what you want, but that’s missing the point of HyperparameterHunter.

If you’ve been using it for experimentation and optimization along the entire course of your project, then when you decide to do hyperparameter optimization, HyperparameterHunter is already aware of all that you’ve done, and that’s when HyperparameterHunter does something remarkable. It doesn’t start optimization from scratch like other libraries. It starts from all of the Experiments and previous optimization rounds you’ve already run through it.

Getting Started

1) Environment:

Set up an Environment to organize Experiments and Optimization results.
Any Experiments or Optimization rounds we perform will use our active Environment.

experiment = CVExperiment(
model_initializer=LinearSVC, # (Or any of the dozens of other SK-Learn algorithms)model_init_params=dict(penalty='l1', C=0.9) # Default values used and recorded for kwargs not given
)

Output File Structure

This is a simple illustration of the file structure you can expect your Experiments to generate. For an in-depth description of the directory structure and the contents of the various files, see the File Structure Overview section in the documentation. However, the essentials are as follows:

An Experiment adds a file to each HyperparameterHunterAssets/Experiments subdirectory, named by experiment_id

Each Experiment also adds an entry to HyperparameterHunterAssets/Leaderboards/GlobalLeaderboard.csv

Customize which files are created via Environment's file_blacklist and do_full_save kwargs (documented here)

I Still Don't Get It

That's ok. Don't feel bad. It's a bit weird to wrap your head around. Here's an example that illustrates how everything is related:

from hyperparameter_hunter import Environment, CVExperiment, BayesianOptimization, Integer
from hyperparameter_hunter.utils.learning_utils import get_breast_cancer_data
from xgboost import XGBClassifier
# Start by creating an `Environment` - This is where you define how Experiments (and optimization) will be conducted
env = Environment(
train_dataset=get_breast_cancer_data(target='target'),
results_path='HyperparameterHunterAssets',
metrics=['roc_auc_score'],
cv_type='StratifiedKFold',
cv_params=dict(n_splits=10, shuffle=True, random_state=32),
)
# Now, conduct an `Experiment`# This tells HyperparameterHunter to use the settings in the active `Environment` to train a model with these hyperparameters
experiment = CVExperiment(
model_initializer=XGBClassifier,
model_init_params=dict(
objective='reg:linear',
max_depth=3
)
)
# That's it. No annoying boilerplate code to fit models and record results
results_path
# Time for the fun part. We'll set up some hyperparameter optimization by first defining the `OptimizationProtocol` we want
optimizer = BayesianOptimization(verbose=1)
# Now we're going to say which hyperparameters we want to optimize.# Notice how this looks just like our `experiment` above
optimizer.set_experiment_guidelines(
model_initializer=XGBClassifier,
model_init_params=dict(
objective='reg:linear', # We're setting this as a constant guideline - Not one to optimizemax_depth=Integer(2, 10) # Instead of using an int like the `experiment` above, we provide a space to search
)
)
# Notice that our range for `max_depth` includes the `max_depth=3` value we used in our `experiment` earlier
optimizer.go() # Now, we goassert experiment.experiment_id in [_[2] for _ in optimizer.similar_experiments]
# Here we're verifying that the `experiment` we conducted first was found by `optimizer` and used as learning material# You can also see via the console that we found `experiment`'s saved files, and used it to start optimization
last_experiment_id = optimizer.current_experiment.experiment_id
# Let's save the id of the experiment that was just conducted by `optimizer`
optimizer.go() # Now, we'll start up `optimizer` again...# And we can see that this second optimization round learned from both our first `experiment` and our first optimization roundassert experiment.experiment_id in [_[2] for _ in optimizer.similar_experiments]
assert last_experiment_id in [_[2] for _ in optimizer.similar_experiments]
# It even did all this without us having to tell it what experiments to learn from# Now think about how much better your hyperparameter optimization will be when it learns from:# - All your past experiments, and# - All your past optimization rounds# And the best part: HyperparameterHunter figures out which experiments are compatible all on its own# You don't have to worry about telling it that KFold=5 is different from KFold=10,# Or that max_depth=12 is outside of max_depth=Integer(2, 10)

Tested Libraries

Gotchas/FAQs

These are some things that might "getcha"

General:

Can't provide initial search points to OptimizationProtocol?

This is intentional. If you want your optimization rounds to start with specific search points (that you haven't recorded yet), simply perform a CVExperiment before initializing your OptimizationProtocol

Assuming the two have the same guideline hyperparameters and the Experiment fits within the search space defined by your OptimizationProtocol, the optimizer will locate and read in the results of the Experiment

Keep in mind, you'll probably want to remove the Experiment after you've done it once, as the results have been saved. Leaving it there will just execute the same Experiment over and over again

After changing things in my "HyperparameterHunterAssets" directory, everything stopped working

Yeah, don't do that. Especially not with "Descriptions", "Leaderboards", or "TestedKeys"

HyperparameterHunter figures out what's going on by reading these files directly.

Removing them, or changing their contents can break a lot of HyperparameterHunter's functionality

Keras:

This is likely caused by switching between using a separate Activation layer, and providing a Dense layer with the activation kwarg

Each layer is treated as its own little set of hyperparameters (as well as being a hyperparameter, itself), which means that as far as HyperparameterHunter is concerned, the following two examples are NOT equivalent:

Dense(10, activation=‘sigmoid’)

Dense(10); Activation(‘sigmoid’)

We’re working on this, but for now, the workaround is just to be consistent with how you add activations to your models

Either use separate Activation layers, or provide activation kwargs to other layers, and stick with it!

Can't optimize the model.compile arguments: optimizer and optimizer_params at the same time?

This happens because Keras’ optimizers expect different arguments

For example, when optimizer=Categorical(['adam', 'rmsprop']), there are two different possible dicts of optimizer_params

For now, you can only optimize optimizer, and optimizer_params separately

A good way to do this might be to select a few optimizers you want to test, and don’t provide an optimizer_params value. That way, each optimizer will use its default parameters

Then you can select which optimizer was the best, and set optimizer=<best optimizer>, then move on to tuning optimizer_params, with arguments specific to the optimizer you selected

CatBoost:

Can't find similar Experiments for CatBoost?

This may be happening because the default values for the kwargs expected in CatBoost’s model __init__ methods are defined somewhere else, and given placeholder values of None in their signatures

Because of this, HyperparameterHunter assumes that the default value for an argument really is None if you don’t explicitly provide a value for that argument

This is obviously not the case, but I just can’t seem to figure out where the actual default values used by CatBoost are located, so if anyone knows how to remedy this situation, I would love your help!