You can return to the original look by selecting English in the language selector
above.

Use Your Own Inference Code with Batch
Transform

This section explains how Amazon SageMaker interacts with a Docker container that
runs your own
inference code for batch transform. Use this information to write inference code and
create a Docker image.

How Amazon SageMaker Runs Your Inference
Image

To configure a container to run as an executable, use an ENTRYPOINT
instruction in a Dockerfile. Note the following:

For batch transforms, Amazon SageMaker runs the container as:

docker run image serve

Amazon SageMaker overrides default CMD statements in a container by
specifying the serve argument after the image name. The
serve argument overrides arguments that you provide with
the CMD command in the Dockerfile.

We recommend that you use the exec form of the
ENTRYPOINT instruction:

ENTRYPOINT ["executable", "param1", "param2"]

For example:

ENTRYPOINT ["python", "k_means_inference.py"]

Amazon SageMaker sets environment variables specified in CreateModel and CreateTransformJob on your container. Additionally, the following environment variables will
be populated:

SAGEMAKER_BATCH is always set to true
when the container runs in Batch Transform.

SAGEMAKER_MAX_PAYLOAD_IN_MB is set to the largest
size payload that will be sent to the container via HTTP.

SAGEMAKER_BATCH_STRATEGY will be set to
SINGLE_RECORD when the container will be sent a
single record per call to invocations and MULTI_RECORD
when the container will get as many records as will fit in the
payload.

SAGEMAKER_MAX_CONCURRENT_TRANSFORMS is set to the
maximum number of /invocations requests that can be
opened simultaneously.

Note

The last three environment variables come from the API call made by
the user. If the user doesn’t set values for them, they aren't passed.
In that case, either the default values or the values requested by the
algorithm (in response to the /execution-parameters) are
used.

If you plan to use GPU devices for model inferences (by specifying
GPU-based ML compute instances in your CreateTransformJob
request), make sure that your containers are nvidia-docker compatible. Don't
bundle NVIDIA drivers with the image. For more information about
nvidia-docker, see NVIDIA/nvidia-docker.

You can't use the init initializer as your entry point in
Amazon SageMaker containers because it gets confused by the train and serve
arguments.

How Amazon SageMaker Loads Your
Model Artifacts

In a CreateModel request,
container definitions includes the ModelDataUrl parameter, which
identifies the location in Amazon S3 where model artifacts are stored. When you use
Amazon SageMaker
to run inferences, it uses this information to determine where to copy the
model
artifacts from. It copies the artifacts to the
/opt/ml/model
directory in the Docker container for use by your inference
code.

The ModelDataUrl parameter must point to a tar.gz file. Otherwise,
Amazon SageMaker can't download the file. If you train a model in
Amazon SageMaker,
it saves the artifacts as a single compressed tar file in Amazon S3. If you train
a model
in another framework, you need to store the model artifacts in Amazon S3 as a compressed
tar file. Amazon SageMaker decompresses this tar file and saves it in the
/opt/ml/model directory in the
container
before the batch transform job starts.

How
Containers Serve Requests

Containers must implement a web server that responds to invocations and ping
requests
on port 8080. For batch transforms,you have the option to set
algorithms
to implement execution-parameters requests to provide a dynamic runtime
configuration to Amazon SageMaker. Amazon SageMaker uses the following endpoints:

ping—Used to periodically check the health of the
container. Amazon SageMaker waits for an HTTP 200 status code and an empty
body for a successful ping request before sending an invocations request.
You might use a ping request to load a model into memory to generate
inference when invocations requests are sent.

(Optional) execution-parameters—Allows the algorithm
to provide the optimal tuning parameters for a job during runtime. Based on
the memory and CPUs available for a container, the algorithm chooses the
appropriate MaxConcurrentTransforms,
BatchStrategy, and MaxPayloadInMB values for the
job.

Before calling the invocations request, Amazon SageMaker attempts to invoke the
execution-parameters request. When you create a batch transform job, you can provide
values for the MaxConcurrentTransforms, BatchStrategy, and
MaxPayloadInMB parameters. Amazon SageMaker determines the values for these
parameters using this order of precedence:

The parameter values that you provide when you create the
CreateTransformJob request,

The values that the model container returns when Amazon SageMaker invokes the
execution-parameters endpoint

The parameters default values, listed in the following table.

Parameter

Default Values

MaxConcurrentTransforms

1

BatchStrategy

MULTI_RECORD

MaxPayloadInMB

6

The response for a GET execution-parameters request is a JSON object
with keys for MaxConcurrentTransforms, BatchStrategy, and
MaxPayloadInMB parameters. This is an example of a valid
response:

How Your Container Should
Respond to Health Check (Ping) Requests

The simplest requirement on the container is to respond with an HTTP 200 status
code and an empty body. This indicates to Amazon SageMaker that the container is ready
to accept
inference requests at the /invocations endpoint.

While the minimum bar is for the container to return a static 200, a container
developer can use this functionality to perform deeper checks. The request timeout
on /ping attempts is 2 seconds.

Javascript is disabled or is unavailable in your
browser.

To use the AWS Documentation, Javascript must be
enabled. Please refer to your browser's Help pages for instructions.