When adaptive learning rate is disabled, the magnitude of the weight
updates are determined by the user specified learning rate
(potentially annealed), and are a function of the difference
between the predicted value and the target value.

_sparse

_use_all_factor_levels

_missing_values_handling

_standardize

public boolean _standardize

If enabled, automatically standardize the data. If disabled, the user must provide properly scaled input data.

_epochs

public double _epochs

The number of passes over the training dataset to be carried out.
It is recommended to start with lower values for initial experiments.
This value can be modified during checkpoint restarts and allows continuation
of selected models.

_activation

The activation function (non-linearity) to be used the neurons in the hidden layers.
Tanh: Hyperbolic tangent function (same as scaled and shifted sigmoid).
Rectifier: Chooses the maximum of (0, x) where x is the input value.

_hidden

public int[] _hidden

The number and size of each hidden layer in the model.
For example, if a user specifies "100,200,100" a model with 3 hidden
layers will be produced, and the middle hidden layer will have 200
neurons.

_input_dropout_ratio

public double _input_dropout_ratio

A fraction of the features for each training row to be omitted from training in order
to improve generalization (dimension sampling).

_hidden_dropout_ratios

public double[] _hidden_dropout_ratios

A fraction of the inputs for each hidden layer to be omitted from training in order
to improve generalization. Defaults to 0.5 for each hidden layer if omitted.

_train_samples_per_iteration

public long _train_samples_per_iteration

The number of training data rows to be processed per iteration. Note that
independent of this parameter, each row is used immediately to update the model
with (online) stochastic gradient descent. This parameter controls the
synchronization period between nodes in a distributed environment and the
frequency at which scoring and model cancellation can happen. For example, if
it is set to 10,000 on H2O running on 4 nodes, then each node will
process 2,500 rows per iteration, sampling randomly from their local data.
Then, model averaging between the nodes takes place, and scoring can happen
(dependent on scoring interval and duty factor). Special values are 0 for
one epoch per iteration, -1 for processing the maximum amount of data
per iteration (if **replicate training data** is enabled, N epochs
will be trained per iteration on N nodes, otherwise one epoch). Special value
of -2 turns on automatic mode (auto-tuning).

_target_ratio_comm_to_comp

public double _target_ratio_comm_to_comp

_learning_rate

public double _learning_rate

When adaptive learning rate is disabled, the magnitude of the weight
updates are determined by the user specified learning rate
(potentially annealed), and are a function of the difference
between the predicted value and the target value. That difference,
generally called delta, is only available at the output layer. To
correct the output at each hidden layer, back propagation is
used. Momentum modifies back propagation by allowing prior
iterations to influence the current update. Using the momentum
parameter can aid in avoiding local minima and the associated
instability. Too much momentum can lead to instabilities, that's
why the momentum is best ramped up slowly.
This parameter is only active if adaptive learning rate is disabled.

_learning_rate_annealing

public double _learning_rate_annealing

Learning rate annealing reduces the learning rate to "freeze" into
local minima in the optimization landscape. The annealing rate is the
inverse of the number of training samples it takes to cut the learning rate in half
(e.g., 1e-6 means that it takes 1e6 training samples to halve the learning rate).
This parameter is only active if adaptive learning rate is disabled.

_momentum_start

public double _momentum_start

The momentum_start parameter controls the amount of momentum at the beginning of training.
This parameter is only active if adaptive learning rate is disabled.

_momentum_ramp

public double _momentum_ramp

The momentum_ramp parameter controls the amount of learning for which momentum increases
(assuming momentum_stable is larger than momentum_start). The ramp is measured in the number
of training samples.
This parameter is only active if adaptive learning rate is disabled.

_momentum_stable

public double _momentum_stable

The momentum_stable parameter controls the final momentum value reached after momentum_ramp training samples.
The momentum used for training will remain the same for training beyond reaching that point.
This parameter is only active if adaptive learning rate is disabled.

_score_interval

public double _score_interval

The minimum time (in seconds) to elapse between model scoring. The actual
interval is determined by the number of training samples per iteration and the scoring duty cycle.

_score_training_samples

public long _score_training_samples

The number of training dataset points to be used for scoring. Will be
randomly sampled. Use 0 for selecting the entire training dataset.

_score_validation_samples

public long _score_validation_samples

The number of validation dataset points to be used for scoring. Can be
randomly sampled or stratified (if "balance classes" is set and "score
validation sampling" is set to stratify). Use 0 for selecting the entire
training dataset.

_score_duty_cycle

public double _score_duty_cycle

Maximum fraction of wall clock time spent on model scoring on training and validation samples,
and on diagnostics such as computation of feature importances (i.e., not on training).

_quiet_mode

public boolean _quiet_mode

Enable quiet mode for less output to standard output.

_replicate_training_data

public boolean _replicate_training_data

Replicate the entire training dataset onto every node for faster training on small datasets.

_single_node_mode

public boolean _single_node_mode

Run on a single node for fine-tuning of model parameters. Can be useful for
checkpoint resumes after training on multiple nodes for fast initial
convergence.

_shuffle_training_data

public boolean _shuffle_training_data

Enable shuffling of training data (on each node). This option is
recommended if training data is replicated on N nodes, and the number of training samples per iteration
is close to N times the dataset size, where all nodes train with (almost) all
the data. It is automatically enabled if the number of training samples per iteration is set to -1 (or to N
times the dataset size or larger).