Using Hyperparameter Tuning

This page shows you how to use Cloud Machine Learning Engine hyperparameter
tuning when training your model. Hyperparameter tuning optimizes a target
variable that you specify. The target variable is called the
hyperparameter metric. When you start a job with hyperparameter tuning, you
establish the name of your hyperparameter metric. This is the name you assign to
the scalar summary that you add to your trainer.

The steps involved in hyperparameter tuning

To use hyperparameter tuning in your training job you must perform the following
steps:

Parse the command-line arguments representing the hyperparameters you
want to tune, and use the values to set the hyperparameters for your
training trial.

Add your hyperparameter metric to the summary for your graph.

Below are more details of each step.

Specify the hyperparameter tuning configuration for your training job

Create a HyperparameterSpec object to hold the hyperparameter tuning
configuration for your training job, and add the HyperparameterSpec as the
hyperparameters object in your TrainingInput object.

In your HyperparameterSpec, set the hyperparameterMetricTag to a value
representing your chosen metric. For example: metric1. If you don't specify
a hyperparameterMetricTag, Cloud ML Engine looks for a metric with the
name training/hptuning/metric.

gcloud

Add your hyperparameter configuration information to your configuration YAML
file. Below is an example. For a working config file, see
hptuning_config.yaml
in the census estimator sample.

Python

Make a dictionary representing your HyperparameterSpec and add it to your
training input. The following example assumes that you have already created
a TrainingInput dictionary (in this case named training_inputs) as shown
in the training job configuration
guide.

Check the code in your training application

In your application, handle the command-line arguments for the hyperparameters
and add your metric to the graph summary.

Handle the command-line arguments for the hyperparameters you want to tune

Cloud ML Engine sets command-line arguments when it calls your training
application. Make use of the command-line arguments in your code:

Define a name for each hyperparameter argument and parse it using whatever
argument parser you prefer (typically argparse).
The argument names must match the parameter names that you specified in the
job configuration, as described above.

Assign the values from the command-line arguments to the hyperparameters in your
graph.

Add your hyperparameter metric to the graph summary

Cloud ML Engine looks for your hyperparameter metric when the graph's
summary writer is called. Note: The canned TensorFlow estimator uses the same
metric name for training and evaluation. You need a separate metric for
hyperparameter tuning, to ensure that Cloud ML Engine can determine the
source of the metric.

Your code depends on whether you're using the
TensorFlow Estimator API
or the core TensorFlow APIs. Below are examples for both situations:

Estimator

Use the following code to add your hyperparameter metric to the
summary for your graph. The example assumes that the name of your metric is
metric1:

Getting hyperparameter tuning results

When the training runs are complete, you can call
projects.jobs.get to get the
results. The
TrainingOutput
object in the job resource contains the metrics for all runs, with the metrics
for the best-tuned run identified.

You can see the results from each trial in the job description. Find the trial
that yielded the most desirable value for your hyperparameter metric. If the
trial meets your standard for success of the model, you can use the
hyperparameter values shown for that trial in subsequent runs of your model.

Sometimes multiple trials give identical results for your
tuning metric. In such a case, you should determine which of the hyperparameter
values are most advantageous by other measures. For example, if you are tuning
the number of nodes in a hidden layer and you get identical results when the
value is set to 8 as when it's set to 20, you should use 8, because
more nodes means more processing and cost for no improvement in your model.

Setting a limit to the number of trials

You should decide how many trials you want to allow the service to run and set
the maxTrials value in the HyperparameterSpec object.

There are two competing interests to consider when deciding how many trials to
allow:

time (and consequently cost)

accuracy

Increasing the number of trials generally yields better results, but it is not
always so. In most cases there is a point of diminishing returns after which
additional trials have little or no effect on the accuracy. It may be best to
start with a small number of trials to gauge the effect your chosen
hyperparameters have on your model's accuracy before starting a job with a large
number of trials.

To get the most out of hyperparameter tuning, you shouldn't set your maximum
value lower than ten times the number of hyperparameters you use.

Running parallel trials

You can specify a number of trials to run in parallel by setting
maxParallelTrials in the HyperparameterSpec object.

Running parallel trials has the benefit of reducing the time the training job
takes (real time—the total processing time
required is not typically changed). However, running in parallel can reduce the
effectiveness of the tuning job overall. That is because hyperparameter tuning
uses the results of previous trials to inform the values to assign to the
hyperparameters of subsequent trials. When running in parallel, some trials
start without having the benefit of the results of any trials still running.

If you use parallel trials, the training service provisions multiple training
processing clusters (or multiple individual machines in the case of a
single-process trainer). The scale tier that you set for your job is used for
each individual training cluster.

Stopping trials early

You can specify that Cloud ML Engine must automatically stop a trial that
has become clearly unpromising. This saves you the cost of continuing a trial
that is unlikely to be useful.

To permit stopping a trial early, set the enableTrialEarlyStopping value
in the HyperparameterSpec to TRUE.

Resuming a completed hyperparameter tuning job

You can continue a completed hyperparameter tuning job, to start from a state
that is partially optimized. This makes it possible to reuse the knowledge
gained in the previous hyperparameter tuning job.

To resume a hyperparameter tuning job, submit a new hyperparameter tuning job
with the following configuration:

Set the resumePreviousJobId value in the HyperparameterSpec
to the job ID of the previous trial.

Specify values for maxTrials and maxParallelTrials.

Cloud ML Engine uses the previous job ID to find and reuse the
same goal, params, and hyperparameterMetricTag values to continue
the hyperparameter tuning job.

Use consistent hyperparameterMetricTag name and params for similar jobs,
even when the jobs have different parameters. This practice makes it possible
for Cloud ML Engine to improve optimization over time.

The following examples show the use of the resumePreviousJobId configuration: