You can return to the original look by selecting English in the language selector
above.

IP Insights Hyperparameters

In the CreateTransformJob
request, you specify the training algorithm. You can also specify algorithm-specific
hyperparameters as string-to-string maps. The following table lists the hyperparameters
for the Amazon SageMaker IP Insights algorithm.

Parameter Name

Description

num_entity_vectors

The number of entity vector representations (entity embedding
vectors) to train. Each entity in the training set is randomly
assigned to one of these vectors using a hash function. Because of
hash collisions, it might be possible to have multiple entities
assigned to the same vector. This would cause the same vector to
represent multiple entities. This generally has a negligible effect
on model performance, as long as the collision rate is not too
severe. To keep the collision rate low, set this value as high as
possible. However, the model size, and, therefore, the memory
requirement, for both training and inference, scales linearly with
this hyperparameter. We recommend that you set this value to twice
the number of unique entity identifiers.

Required

Valid values: 1 ≤ positive integer ≤ 250,000,000

vector_dim

The size of embedding vectors to represent entities and IP
addresses. The larger the value, the more information that can be
encoded using these representations. In practice, model size scales
linearly with this parameter and limits how large the dimension can
be. In addition, using vector representations that are too large can
cause the model to overfit, especially for small training datasets.
Overfitting occurs when a model doesn't learn any pattern in the
data but effectively memorizes the training data and, therefore,
cannot generalize well and performs poorly during inference. The
recommended value is 128.

Required

Valid values: 4 ≤ positive integer ≤ 4096

batch_metrics_publish_interval

The interval (every X batches) at
which
the Apache MXNet Speedometer function prints the
training speed of the network (samples/second).

Optional

Valid values: positive integer ≥ 1

Default value: 1,000

epochs

The number of passes over the training data. The optimal value
depends on your data size and learning rate. Typical values range
from 5 to 100.

Optional

Valid values: positive integer ≥ 1

Default value: 10

learning_rate

The learning rate for the optimizer. IP Insights use a
gradient-descent-based Adam optimizer. The learning rate effectively
controls the step size to update model parameters at each iteration.
Too large a learning rate can cause the model to diverge because the
training is likely to overshoot a minima. On the other hand, too
small a learning rate slows down convergence. Typical values range
from 1e-4 to 1e-1.

Optional

Valid values: 1e-6 ≤ float ≤ 10.0

Default value: 0.001

mini_batch_size

The number of examples in each mini batch. The training
procedure processes data in mini batches. The optimal value depends
on the number of unique account identifiers in the dataset. In
general, the larger the mini_batch_size, the faster the
training and the greater the number of possible
shuffled-negative-sample combinations. However, with a large
mini_batch_size, the training is more likely to
converge to a poor local minimum and perform relatively worse for
inference.

Optional

Valid values: 1 ≤ positive integer ≤ 500000

Default value: 10,000

num_ip_encoder_layers

The number of fully connected layers used to encode the IP
address embedding. The larger the number of layers, the greater the
model's capacity to capture patterns among IP addresses. However,
using a large number of layers increases the chance of
overfitting.

Optional

Valid values: 0 ≤ positive integer ≤ 100

Default value: 1

random_negative_sampling_rate

The number of random negative samples,
R,
to generate per input example. The training procedure relies on
negative samples to prevent the vector representations of the model
collapsing to a single point. Random negative sampling generates R
random IP addresses for each input account in the mini batch. The
sum of the random_negative_sampling_rate (R) and
shuffled_negative_sampling_rate (S) must be in the
interval: 1 ≤ R + S ≤ 500.

Optional

Valid values: 0 ≤ positive integer ≤ 500

Default value: 1

shuffled_negative_sampling_rate

The number of shuffled negative samples,
S,
to generate per input example. In some cases, it helps to use more
realistic negative samples that are randomly picked from the
training data itself. This kind of negative sampling is achieved by
shuffling the data within a mini batch. Shuffled negative sampling
generates S negative IP addresses by shuffling the IP address and
account pairings within a mini batch. The sum of the
random_negative_sampling_rate (R) and
shuffled_negative_sampling_rate (S) must be in the
interval: 1 ≤ R + S ≤ 500.

Optional

Valid values: 0 ≤ positive integer ≤ 500

Default value: 1

weight_decay

The weight decay coefficient. This parameter adds an L2
regularization factor that is required to prevent the model from
overfitting the training data.

Optional

Valid values: 0.0 ≤ float ≤ 10.0

Default value: 0.00001

Javascript is disabled or is unavailable in your
browser.

To use the AWS Documentation, Javascript must be
enabled. Please refer to your browser's Help pages for instructions.