Cutting edge hyperparameter tuning with Ray Tune

Behind most of the major flashy results in machine learning is a graduate student (me) or engineer spending hours training a model and tuning algorithm parameters. This is the tedious boring work that makes the headlines possible.

Here in the RISELab, we’re finding it more and more necessary to leverage cutting-edge hyperparameter tuning tools to keep up with the state of the art. Advancements in deep learning performance are becoming more and more dependent on newer and better hyperparameter tuning algorithms such as Population Based Training (PBT), HyperBand, and ASHA.

Yet, we see that a vast majority of researchers and teams do not leverage such algorithms.

Why? Well, most existing hyperparameter search frameworks do not have these newer optimization algorithms. And once you reach a certain scale, most existing solutions for parallel hyperparameter search can be a hassle to use — you’ll need to configure each machine for each run and often manage a separate database.

Practically speaking, implementing and maintaining these algorithms requires a significant amount of time and engineering.

But it doesn’t need to be this way. We believe there’s no reason why hyperparameter tuning at scale needs to be this hard. All AI researchers and engineers should be able to seamlessly run a parallel asynchronous grid search across 8 GPUs and even scale out to leverage Population Based Training or any Bayesian optimization algorithm on the cloud.

With another configuration file and 4 lines of code, launch a massive distributed hyperparameter tuning cluster on the cloud and automatically shut down the machines (we’ll show you how to do this below).

With Tune’s built-in fault tolerance, trial migration, and cluster autoscaling, you can safely leverage spot (preemptible) instances and reduce cloud costs by up to 90%.

Tune is flexible.

Tune integrates seamlessly with experiment management tools such as MLFlow and TensorBoard.

You can use Tune to leverage and scale many cutting edge optimization algorithms and libraries such as HyperOpt (below) and Ax without modifying any model training code.

Using Tune is Easy!

Let’s now dive into a concrete example that shows how you to leverage a popular early stopping algorithm (ASHA). We will start by running an example hyperparameter tuning script with Tune across all of the cores on your workstation. We’ll then scale out the same hyperparameter tuning experiment on the cloud with about 10 lines of code using Ray.

Running Tune

Let’s run 1 trial, randomly sampling from a uniform distribution for learning rate and momentum.

Now, you’ve run your first Tune run! You can easily enable GPU usage by specifying GPU resources — see the documentation for more details. We can then plot the performance of this trial (requires matplotlib).

Parallel execution and early stopping

Early stopping with ASHA.

Let’s integrate ASHA, a scalable algorithm for early stopping (blog post and paper). ASHA terminates trials that are less promising and allocates more time and resources to more promising trials.

Parallelize your search across all available cores on your machine with num_samples (extra trials will be queued).

You can use the same DataFrame plotting as the previous example. After running, if Tensorboard is installed, you can also use Tensorboard for visualizing results: tensorboard --logdir ~/ray_results

Going distributed

Setting up a distributed hyperparameter search is often too much work. Tune and Ray make this seamless.

Launching the cloud with a simple configuration file

First, we’ll create a YAML file which configures a Ray cluster. As part of Ray, Tune interoperates very cleanly with the Ray cluster launcher. The same commands shown below will work on GCP, AWS, and local private clusters. We’ll use 3 worker nodes in addition to a head node, so we should have a total of 32 vCPUs on the cluster — allowing us to evaluate 32 hyperparameter configurations in parallel.

tune-default.yaml

Putting things together

To distribute your hyperparameter search across the Ray cluster, you’ll need to append this to the top of your script:

Given the large increase in compute, we should be able to increase our search space and number of samples in our search space:

Launching your experiment

This will launch your cluster on AWS, upload tune_script.py onto the head node, and run python tune_script localhost:6379, which is a port opened by Ray to enable distributed execution.

All of the output of your script will show up on your console. Note that the cluster will setup the head node first before any of the worker nodes, so at first you may see only 4 CPUs available. After some time, you can see 24 trials being executed in parallel, and the other trials will be queued up to be executed as soon as a trial is free.

To shut down your cluster, you can run:

$ ray down tune-default.yaml

And you’re done 🎉!

Learn more:

Tune has numerous other features that enable researchers and practitioners to accelerate their development. Other Tune features not covered in this blogpost include:

For users that have access to the cloud, Tune and Ray provide a number of utilities that enable a seamless transition between development on your laptop and execution on the cloud. The documentation includes:

running the experiment in a background session

submitting trials to an existing experiment

visualizing all results of a distributed experiment in TensorBoard.

Tune is designed to scale experiment execution and hyperparameter search with ease. If you have any comments or suggestions or are interested in contributing to Tune, you can reach out to me or the ray-dev mailing list.