Formulate the RL problem—First, formulate the business problem into an
RL problem. For example, auto
scaling enables services to dynamically increase or decrease
capacity depending on conditions that you define. Currently, this requires
setting up alarms, scaling policies, and thresholds, and other manual steps. To
solve this with RL, we define the components of the Markov Decision
Process:

Objective—Scale instance capacity so that it matches the
desired load profile.

Environment—A custom environment that includes the load
profile. It generates a simulated load with daily and weekly variations
and occasional spikes. The simulated system has a delay between when new
resources are requested and when they become available for serving
requests.

State—The current load, number of failed jobs, and number
of
active machines

Action—Remove, add, or keep the same number of
instances.

Reward—A positive reward for successful transactions, a high
penalty for failing transactions beyond a specified threshold.

Define the RL environment—The RL environment can be the real world
where the RL agent interacts or a simulation of the real world. You can connect
open source and custom environments developed using Gym interfaces, and
commercial simulation environments such as MATLAB and Simulink.

Define the presets—The presets configure the RL training jobs and
define the hyperparameters for the RL algorithms.

Write the training code—Write training code as a Python script and pass
the script to
an
Amazon SageMaker training job. In your training code, import the environment files
and the
preset files, and then define the main() function.

Train the RL Model—
Use
the Amazon SageMaker RLEstimator in the Amazon SageMaker Python SDK to start an RL training
job. If you are using local mode, the training job runs on the notebook
instance. When you use Amazon SageMaker for training, you can select GPU or CPU instances.
Store the output from the training job in a local directory if you train in
local mode, or on Amazon S3 if you use Amazon SageMaker
training.

The RLEstimator requires the following information as parameters.

The source directory where the environment, presets, and training code
are
uploaded.

The path to the training script.

The RL toolkit and deep learning framework you want to use. This
automatically resolves to the Amazon ECR path for the RL container.

The training parameters, such as the instance count, job name, and S3
path for output.

Metric definitions that you want to capture in your logs. These can
also be visualized in CloudWatch and in Amazon SageMaker notebooks.

Visualize training metrics and output—After a training job that uses an
RL model completes, you can view the metrics you defined in the training jobs in
CloudWatch,. You can also plot the metrics in a notebook by using the Amazon SageMaker Python SDK
analytics library. Visualizing metrics helps you understand how the performance
of the model as measured by the reward improves over time.

Note

If
you train in local mode, you can't
visualize
metrics in
CloudWatch.

Evaluate the model—Checkpointed data from the previously
trained
models can be passed on for evaluation and inference in the checkpoint channel.
In local mode, use the local directory.
In
Amazon SageMaker training mode, you need to upload the data to S3 first.

Deploy RL models—Finally, deploy the trained model on an
endpoint
hosted on
Amazon SageMaker
or on an Edge device by using
AWS
IoT
Greengrass.