In a previous blog post, I wrote about classifying images with the ResNet50v2 model from the ONNX Model Zoo. In that post, the container ran on a Kubernetes cluster with GPU nodes. The nodes had an NVIDIA v100 GPU. The actual classification was done with a simple Python script with help from Keras and Numpy. Each inference took around 25 milliseconds.

Note that in the previous post, Azure Machine Learning deployed two containers: the scoring container (the one described above) and a front-end container. In that scenario, the front-end container handles the HTTP POST requests (optionally with SSL) and route the request to the actual scoring container.

The scoring container accepts the same payload as the front-end container. That means it can be used on its own, as we are doing now.

Note that you can also use IoT Edge, as explained in an earlier post. That actually shows how easy it is to push AI models to the edge and use them locally, befitting your business case.

Scoring with Go

To actually classify images, I wrote a small Go program to do just that. Although there are some scientific libraries for Go, they are not really needed in this case. That means we do have to create the 4D tensor payload and interpret the softmax result manually. If you check the code, you will see that is not awfully difficult.

Remember that this model expects the input as a 4D tensor with the following dimensions:

dimension 0: batch (we only send one image here)

dimension 1: channels (one for each; RGB)

dimension 2: height

dimension 3: width

The 4D tensor needs to be serialized to JSON in a field called data. We send that data with HTTP POST to the scoring URI at http://localhost:5001/score.

The response from the container will be JSON with two fields: a result field with the 1000 softmax values and a time field with the inference time. We can use the following two structs for marshaling and unmarshaling

Input and output of the model

Note that this model expects pictures to be scaled to 224 by 224 as reflected by the height and width dimensions of the uint8 array. The rest of the code is summarized below:

read the image; the path of the image is passed to the code via the -image command line parameter

In a previous post, I discussed the creation of a container image that uses the ResNet50v2 model for image classification. If you want to perform tasks such as localization or segmentation, there are other models that serve that purpose. The image was built with GPU support. Adding GPU support was pretty easy:

Use the enable_gpu flag in the Azure Machine Learning SDK or check the GPU box in the Azure Portal; the service will build an image that supports NVIDIA cuda

In this post, we will deploy the image to a Kubernetes cluster with GPU nodes. We will use Azure Kubernetes Service (AKS) for this purpose. Check my previous post if you want to use NVIDIA V100 GPUs. In this post, I use hosts with one V100 GPU.

To get started, make sure you have the Kubernetes cluster deployed and that you followed the steps in my previous post to create the GPU container image. Make sure you attached the cluster to the workspace’s compute.

Deploy image to Kubernetes

Click the container image you created from the previous post and deploy it to the Kubernetes cluster you attached to the workspace by clicking + Create Deployment:

Starting the deployment from the image in the workspace

The Create Deployment screen is shown. Select AKS as deployment target and select the Kubernetes cluster you attached. Then press Create.

Azure Machine Learning now deploys the containers to Kubernetes. Note that I said containers in plural. In addition to the scoring container, another front–end container is added as well. You send your requests to the front-end container using HTTP POST. The front-end container talks to the scoring container over TCP port 5001 and passes the result back. The front-end container can be configured with certificates to support SSL.

Check the deployment and wait until it is healthy. We did not specify advanced settings during deployment so the default settings were chosen. Click the deployment to see the settings:

Deployment settings including authentication keys and scoring URI

As you can see, the deployment has authentication enabled. When you send your HTTP POST request to the scoring URI, make sure you pass an authentication header like so: bearer primary-or-secondary-key. The primary and secondary key are in the settings above. You can regenerate those keys at any time.

Checking the deployment

From the Azure Cloud Shell, issue the following commands in order to list the pods deployed to your Kubernetes cluster:

az aks list -o table

az aks get-credentials -g RESOURCEGROUP -n CLUSTERNAME

kubectl get pods

Listing the deployed pods

Azure Machine Learning has deployed three front-ends (default; can be changed via Advanced Settings during deployment) and one scoring container. Let’s check the container with: kubectl get pod onnxgpu-5d6c65789b-rnc56 -o yaml. Replace the container name with yours. In the output, you should find the following:

The data field is a multi-dimensional array, serialized to JSON. The shape of the array is (1,3,224,224). The dimensions correspond to the batch size, channels (RGB), height and width.

You only have to read an image and put the pixel values in the array! Easy right? Well, as usual the answer is: “it depends”! The easiest way to do it, according to me, is with Python and a collection of helper packages. The code is in the following GitHub gist: https://gist.github.com/gbaeke/b25849f3813e9eb984ee691659d1d05a. You need to run the code on a machine with Python 3 installed. Make sure you also install Keras and NumPy (pip3 install keras / pip3 install numpy). The code uses two images, cat.jpg and car.jpg but you can use your own. When I run the code, I get the following result:

It takes about 25 milliseconds to classify an image, or 40 images/second. By increasing the number of GPUs and scoring containers (we only deployed one), we can easily scale out the solution.

With a bit of help from Keras and NumPy, the code does the following:

check the image format reported by the keras back-end: it reports channels_last which means that, by default, the RGB channels are the last dimensions of the image array

load the image; the resulting array has a (224,224,3) shape

our container expects the channels_first format; we use moveaxis to move the last axis to the front; the array now has a (3,224,224) shape

our container expects a first dimension with a batch size; we use expand_dims to end up with a (1,3,224,224) shape

we convert the 4D array to a list and construct the JSON payload

we send the payload to the scoring URI and pass an authorization header

we get a JSON response with two fields: result and time; we print the inference time as reported by the container

from keras.applications.resnet50, we use the decode_predictions class to process the result field; result contains the 1000 values computed by the softmax function in the container; decode_predictions knows the categories and returns the first five

we print the name and probability of the category with the highest probability (item 0)

What happens when you use a scoring container that uses the CPU? In that case, you could run the container in Azure Container Instances (ACI). Using ACI is much less costly! In ACI with the default setting of 0.1 CPU, it will take around 2 seconds to score an image. Ouch! With a full CPU (in ACI), the scoring time goes down to around 180-220ms per image. To achieve better results, simply increase the number of CPUs. On the Standard_NC6s_v3 Kubernetes node with 6 cores, scoring time with CPU hovers around 60ms.

Conclusion

In this post, you have seen how Azure Machine Learning makes it straightforward to deploy GPU scoring images to a Kubernetes cluster with GPU nodes. The service automatically configures the resource requests for the GPU and maps the NVIDIA drivers to the scoring container. The only thing left to do is to start scoring images with the service. We have seen how easy that is with a bit of help from Keras and NumPy. In practice, always start with CPU scoring and scale out that solution to match your requirements. But if you do need GPUs for scoring, Azure Machine Learning makes it pretty easy to do so!

In a previous post, I discussed how you can add an existing Kubernetes cluster to an Azure Machine Learning workspace. Adding an existing cluster is necessary when the workspace does not support auto creation of a cluster. That is the case when you want to use the Standard_NC6s_v3 virtual machine image. I also used a container for scoring pictures with the ResNet50v2 model from the ONNX Model Zoo. Now we will take a look at actually creating that container image with GPU support. Note that in many cases, inference with CPUs is more than sufficient but the GPU case is more interesting to look at!

To get started, you need an Azure subscription with an Azure Machine Learning workspace. Take a look here for instructions.

Once you have a workspace, there are a few steps to take. If you look at the diagram at the top of this post, we will perform the steps starting from Register and manage your model:

Register model: we will add the Resnet50v2 model from the ONNX Model Zoo; we are using this existing model instead of our own; ResNet50v2 can recognize pictures in 1000 categories

Create container image: from the model in the workspace, we create a container image with GPU support

Deploy container image: from the image in the workspace, we deploy the image to compute that supports GPUs

Machine Learning SDK

The Azure Machine Learning service has a Machine Learning SDK for Python. All the steps discussed above can be performed with code. You can find an example of the Python code to use in the following Jupyter notebook hosted on Azure Notebooks: https://gebaml-geba.notebooks.azure.com/j/notebooks/ONNXResnet.ipynb. Note that the Azure Notebooks service is still in preview and a bit rough around the edges. The Machine Learning SDK is available by default in Azure Notebooks.

At the beginning of the notebook, we import azureml.core which allows you to check the version of the SDK (among other things):

Registering the model

First, we download the model to the notebook project. In the notebook, the urllib module is used to download the compressed version of the ResNet50v2 model. The tarball is untarred in resnet50v2/resnet50v2.onnx. You should see the model as a complex function with, in this case, millions of parameters (weights). The input to the function are the pixels of your picture (their red, green and blue values). The output of the function is a category: cat, guitar, …

Now that we have the model, we need to add it to the workspace, which means we also have to authenticate. Create a file called config.json with the following contents:

With the Workspace class from azureml.core we authenticate to Azure and grab a reference to the workspace with the ws variable. The Workspace.from_config() function searches for the config.json file.

Now we can finally register the model in the workspace using Model.register:

The above is the same as adding a model using the Azure Portal. You might hit file upload limits in the portal so adding the model via code is the better approach. Your model is now registered in the workspace:

Creating a GPU container image from the model

Now that we have the model, we can create the container image. The model will be included in the image which will add about 100MB to its size. The container image in Azure Machine Learning is created from four settings/artifacts:

model: registered in the workspace

score file: a file score.py with an init() and run() function; helper functions can also be included

dependency file: used to indicate the Python modules that need to be installed in the image (see https://conda.io/docs/)

GPU support: set to True or False

You will find the score file in the notebook. It was copied from a Microsoft supplied sample. If you do not have some experience with Machine Learning and neural networks (in this case), it will be difficult to create this from scratch. The ResNet50v2 model expects a 4-dimensional tensor with the following dimensions:

0: batch (1 when you send 1 image)

1: channels (3 channels for red, green and blue; RGB)

2: height (224 pixels)

3: width (224 pixels)

For inference, you will actually send the above data in a JSON payload as the data field. The preprocess() function in score.py grabs the data field and converts it to a NumPy array. The data is then normalized by dividing each pixel by 255, subtracting the mean values (of each channel) and dividing by the standard deviation (of each channel) . The normalized data is then sent to the model which outputs an array with 1000 probabilities that sum to 1 (via a softmax function).

Why are there a thousand probabilities? The model was trained on a thousand different categories of images and for each of these categories, a probability is output. After inference we will need a list of these categories so we can find the one that matches with our uploaded image and that has the highest probability!

This particular score.py file uses the ONNX runtime for inference. To enable GPU support, make sure you include the onnxruntime-gpu package in your conda dependencies as shown below:

With score.py and myenv.yml, the container image with GPU support can be created. Note that we are specifying the score.py file, the conda file and the model. GPU support is enabled as well via enable_gpu=True.

The code above should result in the following image in your workspace (after several minutes of building):

In the background, this image is stored in the container registry that got created when you deployed the Azure Machine Learning workspace. You are now ready for the third step, deploying the image to compute that supports GPUs (for instance Kubernetes). That step, together with some code to actually recognize images, will be for another post. In that post, we will also compare CPU to GPU speed.

Conclusion

In this post, we looked at creating a scoring (inference) container image with GPU support. Instead of creating and using our own model, we used the ResNet50v2 model from the ONNX Model Zoo. The model file, together with a score.py file and conda dependency file was used to build a container image. Azure Machine Learning builds the container image for you and stores it in a container registry. Although Azure Machine Learning takes care of most of the infrastructure work, you still need to know how to write the scoring file. In this post, the scoring file uses the ONNX runtime but you can use other runtimes or frameworks such as TensorFlow or MXNET.

Azure Machine Learning Service allows you to easily deploy compute for training and inference via a machine learning workspace. Although one of the compute types is Kubernetes, the workspace is a bit picky about the node VM sizes. I wanted to use two Standard_NC6s_v3 instances with NVIDIA Tesla V100 GPUs but that was not allowed. Other GPU instances, such as the Standard_NC6 type (K80 GPU) can be deployed from the workspace.

Luckily, you can deploy clusters on your own and then attach the cluster to your Azure Machine Learning workspace. You can create the cluster with the below command. Make sure you ask for a quota increase that allows 12 cores of Standard_NC6s_v3.

Before I ran the above command, I created an Azure Machine Learning workspace to a resource group called ml-rg. The above command was run with RESOURCE_GROUP set to ml-rg and NAME set to mlkub. After a few minutes, you should have your cluster up and running. Be mindful of the price of this cluster. GPU instances are not cheap!

Now we can Add Compute to the workspace. In your workspace, navigate to Compute and use the + Add Compute button. Complete the form as below. The compute name does not need to match the cluster name.

After a while, the Kubernetes cluster should be attached:

Manually deployed cluster attached

Note that detaching a cluster does not remove it. Be sure to remove the cluster manually!

You can now deploy container images to the cluster that take advantage of the GPU of each node. When you a deploy an image marked as a GPU image, Azure Machine Learning takes care of all the parameters that allow your container to use the GPU on the Kubernetes node.

The screenshot below shows a deployment of an image that can be used for inference. It uses an ONNX ResNet50v2 model.

Deployment of container for scoring (inference; ResNet50v2)

With the below picture of a cat, the model used by the container guesses it is an Egyptian Cat (it’s not but it is close) with close to 94% certainty.

Egyptian Cat (not)

Using your own compute with the Azure Machine Learning service is very easy to do. The more interesting and somewhat more complicated parts such as the creation of the inference container that supports GPUs is something I will discuss in a later post. In a follow-up post, I will also discuss how you send image data to the scoring container.

To use one of the APIs you need to provision it in an Azure subscription. After provisioning, you will get an endpoint and API key. Every time you want to classify an image or detect sentiment in a piece of text, you will need to post an appropriate payload to the cloud endpoint and pass along the API key as well.

What if you want to use these services but you do not want to pass your payload to a cloud endpoint for compliance or latency reasons? In that case, the Cognitive Services containers can be used. In this post, we will take a look at the Text Analytics containers, specifically the one for Sentiment Analysis. Instead of deploying the container manually, we will deploy the container with IoT Edge.

IoT Edge Configuration

To get started, create an IoT Hub. The free tier will do just fine. When the IoT Hub is created, create an IoT Edge device. Next, configure your actual edge device to connect to IoT Hub with the connection string of the device you created in IoT Hub. Microsoft have a great tutorial to do all of the above, using a virtual machine in Azure as the edge device. The tutorial I linked to is the one for an edge device running Linux. When finished, the device should report its status to IoT Hub:

Once you have your edge device up and running, you can use the following command to obtain the status of your edge device: sudo systemctl status iotedge. The result:

Deploy Sentiment Analysis container

With the IoT Edge daemon up and running, we can deploy the Sentiment Analysis container. In IoT Hub, select your IoT Edge device and select Set modules:

In Set Modules you have the ability to configure the modules for this specific device. Modules are always deployed as containers and they do not have to be specifically designed or developed for use with IoT Edge. In the three step wizard, add the Sentiment Analysis container in the first step. Click Add and then select IoT Edge Module. Provide the following settings:

Although the container can freely be pulled from the Image URI, the container needs to be configured with billing info and an API key. In the Billing environment variable, specify the endpoint URL for the API you configured in the cloud. In ApiKey set your API key. Note that the container always needs to be connected to the cloud to verify that you are allowed to use the service. Remember that although your payload is not sent to the cloud, your container usage is. The full container create options are listed below:

In HostConfig we ask the container runtime (Docker) to map port 5000 of the container to port 5000 of the host. You can specify other create options as well.

On the next page, you can configure routing between IoT Edge modules. Because we do not use actual IoT Edge modules, leave the configuration as shown below:

Now move to the last page in the Set Modules wizard to review the configuration and click Submit.

Give the deployment some time to finish. After a while, check your edge device with the following command: sudo iotedge list. Your TextAnalytics container should be listed. Alternatively, use sudo docker ps to list the Docker containers on your edge device.

Testing the Sentiment Analysis container

If everything went well, you should be able to go to http://localhost:5000/swagger to see the available endpoints. Open Sentiment Analysis to try out a sample:

Summary

IoT Edge is a great way to deploy containers to edge devices running Linux or Windows. Besides deploying actual IoT Edge modules, you can deploy any container you want. In this post, we deployed a Cognitive Services container that does Sentiment Analysis at the edge.

In the previous blog post, I discussed adding SSL to webhookd. In this post, I will briefly show how to use this solution to deploy Azure resources.

To run webhookd, I deployed a small Standard_B1s machine (1GB RAM, 1 vCPU) with a system assigned managed identity. After deployment, information about the managed identity is available via the Identity link.

Code running on a machine with a managed identity needs to do something specific to obtain information about the identity like a token. With curl, you would issue the following command:

The response would be JSON that contains a field called access_token. You could parse out the access_token and then use the token in a call to the Azure Resource Manager APIs. You would use the token in the autorization header. Full details about acquiring these tokens can be found here. On that page, you will find details about acquiring the token with Go, JavaScript and several other languages.

Because we are using webhookd and shell scripts, the Azure CLI is the ideal way to create Azure resources. The Azure CLI can easily authenticate with the managed identity using a simple command: az login –identity. Here’s a shell script that uses it to create a virtual machine:

The script expects three parameters: rg, vmname and pw. We can pass these parameters as HTTP query parameters. If the above script would be in the ./scripts/vm folder as create.sh, I could do the following call to webhookd:

Notice the user object, which clearly indicates we are using a system-assigned managed identity. In my case, the managed identity has the contributor role on an Azure subscription used for testing. With that role, the shell script has the required access rights to deploy the virtual machine.

As you can see, it is very easy to use webhookd to deploy Azure resources if the Azure virtual machine that runs webhookd has a managed identity with the required access rights.

A while ago, I stumbled upon https://github.com/ncarlier/webhookd. It is a simple webhook server, written in Go, that can execute shell scripts. To use it, simply install it on a Linux box and execute it. By default, the executable looks at the ./scripts folder and maps each shell script to a URL you can call. It is well documented so do take a look at the GitHub page for full details on its configuration.

Out of the box, webhookd supports basic authentication if you supply a .htpasswd file. It does not, however, support SSL. That can be fixed in several ways though. In my case, I wanted one executable that supports SSL with Let’s Encrypt certificates. As it turns out, there is a great solution for that: https://github.com/mholt/certmagic.

To simplify using webhookd together with certmagic, I forked the webhookd repo and added certmagic support. The fork is here: https://github.com/gbaeke/webhookd. To use it, use go get github.com/gbaeke/webhookd and work from there. The fork could be improved by adding extra parameters for e-mail address and DNS name. For now, change the code by following the steps below:

In main.go, search for mail@mail.com and replace it with a valid e-mail address; although not required it is a good practice to supply an e-mail address to the folks at Let’s Encrypt

In main.go, search for www.example.com and replace it with a valid DNS name

The DNS name you use needs to resolve to the IP address of the machine that runs webhookd; it should be a public IP address because the code uses the HTTP challenger

The machine that runs webhookd should expose port 80 and port 443

If you want to use the Let’s Encrypt staging servers during testing (recommended) change certmagic.LetsEncryptProductionCA to certmagic.LetsEncryptStagingCA

In my case, the machine that runs webhookd is a small Linux machine running on Microsoft Azure. The DNS name is actually a CNAME record that is an alias for the DNS name of the virtual machine (e.g. vmname.westeurope.cloudapp.azure.com). You are now ready to build webhookd with go build. When it’s ready, just execute webhookd. When you run this the first time, certmagic will notice there is no certificate and will start to talk to Let’s Encrypt using the ACME protocol. By default, HTTP verification is used which just means Let’s Encrypt will tell certmagic to host a file over plain HTTP. When Let’s Encrypt can retrieve that file, it concludes you must be the owner of the DNS name used in the certificate. The certificate will be issued and stored on the file system under $home/.local/share/certmagic/acme.

You will get some messages regarding the certificate request process as shown below. When the cached certificate is found and it is valid, you will just get the Serving HTTP->HTTPS message:

Note that you will not be able to bind to low ports like 80 and 443 as a non-root user. So either run webhookd as root or use setcap. For instance sudo setcap cap_net_bind_service=+ep /path/to/webhookd. After running the setcap command, you can run webhookd as a non-root user and it will be able to bind to port 80 and 443.

I also have basic authentication enabled for a user called api. To test the configuration, I can use curl like so:

Due to the use of the Let’s Encrypt production CA, there is no need to use the –insecure flag with curl. The certificate is fully trusted on my (Windows) machine. If you pulled down the complete repository, the scripts folder contains a shell script called echo.sh. That script is automatically made available as /echo. Everything the script echoes to stdout is used as output of the HTTP call. Simple but effective!

In a follow-up post, we will take a look at using webhookd to deploy Azure resources using a managed identity and the Azure CLI. Stay tuned!