Nested Class Summary

Visibility for this algo: is it always visible, is it beta (always
visible but with a note in the UI) or is it experimental (hidden by
default, visible in the UI if the user gives an "experimental" flag at
startup); test-only builders are "experimental"

Can be an ERROR, meaning the parameters can't be used as-is,
a TRACE, which means the specified field should be hidden given
the values of other fields, or a WARN or INFO for informative
messages to the user.

How many should be trained in parallel during N-fold cross-validation?
Train all CV models in parallel when parallelism is enabled, otherwise train one at a time
Each model can override this logic, based on parameters, dataset size, etc.

Method Detail

getToEigenVec

shouldReorder

get

Block till completion, and return the built model from the DKV. Note the
funny assert: the Job does NOT have to be controlling this model build,
but might, e.g. be controlling a Grid search for which this is just one
of many results. Calling 'get' means that we are blocking on the Job
which is controlling ONLY this ModelBuilder, and when the Job completes
we can return built Model.

setTrain

valid

Validation frame: derived from the parameter's validation frame, excluding
all ignored columns, all constant and bad columns, perhaps flipping the
response column to a Categorical, etc. Is null if no validation key is set.

response

vresponse

trainModelOnH2ONode

Start model training using a this ModelBuilder as a template. The MB can be either used directly
or if the method was invoked on a regular H2O node. If the method was called on a client node, the model builder
will be used as a template only and the actual instance used for training will re-created on a remote H2O node.
Warning: the nature of this method prohibits further use of this instance of the model builder after the method
is called.
This is intended to reduce training time in client-mode setups, it pushes all computation to a regular H2O node
and avoid exchanging data between client and H2O cluster. This also lowers requirements on the H2O client node.

Returns:

model job

trainModel

trainModelNested

fr: - Input frame override, ignored if null.
In some cases, algos do not work directly with the original frame in the K/V store.
Instead they run on a private anonymous copy (eg: reblanced dataset).
Use this argument if you want nested job to work on the actual working copy rather than the original Frame in the K/V.
Example: Outer job rebalances dataset and then calls nested job. To avoid needless second reblance, pass in the (already rebalanced) working copy.

trainModelImpl

nModelsInParallel

@Deprecated
protected int nModelsInParallel()

Deprecated.

nModelsInParallel

protected int nModelsInParallel(int folds)

How many should be trained in parallel during N-fold cross-validation?
Train all CV models in parallel when parallelism is enabled, otherwise train one at a time
Each model can override this logic, based on parameters, dataset size, etc.

cv_computeAndSetOptimalParameters

Override for model-specific checks / modifications to _parms for the main model during N-fold cross-validation.
Also allow the cv models to be modified after all of them have been built.
For example, the model might need to be told to not do early stopping. CV models might have their lambda value modified, etc.

nFoldCV

public boolean nFoldCV()

Returns:

Whether n-fold cross-validation is done

can_build

public abstract hex.ModelCategory[] can_build()

List containing the categories of models that this builder can
build. Each ModelBuilder must have one of these.

ignoreInvalidColumns

Ignore invalid columns (columns that have a very high max value, which can cause issues in DHistogram)

Parameters:

npredictors -

expensive -

checkMemoryFootPrint

protected void checkMemoryFootPrint()

Makes sure the final model will fit in memory.
Note: This method should not be overridden (override checkMemoryFootPrint_impl instead). It is
not declared 'final' to not to break 3rd party implementations. It might be declared final in the future
if necessary.

checkMemoryFootPrint_impl

protected void checkMemoryFootPrint_impl()

Override this method to call error() if the model is expected to not fit in memory, and say why

clearValidationErrors

message

validationErrors

public java.lang.String validationErrors()

Get a string representation of only the ERROR ValidationMessages (e.g., to use in an exception throw).

init

public void init(boolean expensive)

Initialize the ModelBuilder, validating all arguments and preparing the
training frame. This call is expected to be overridden in the subclasses
and each subclass will start with "super.init();". This call is made by
the front-end whenever the GUI is clicked, and needs to be fast whenever
expensive is false; it will be called once again at the start of
model building trainModel() with expensive set to true.

init_adaptFrameToTrain

Adapts a given frame to the same schema as the training frame.
This includes encoding of categorical variables (if expensive is enabled).
Note: This method should only be used during ModelBuilder initialization - it should be called in init(..) method.

rebalance

local - Whether to only create enough chunks to max out all cores on one node only
WARNING: This behavior is not actually implemented in the methods defined in this class, the default logic
doesn't take this parameter into consideration.