Introduction

The top cloud platform vendors like Amazon, Google and Microsoft provide powerful and simple to use REST API for processing images, video analysis, speech recognition and other advanced algorithms. They allow developers make their applications more intelligent without deep knowledge of the data science.

However, those APIs are intended for general purpose and can't be customized for own needs, so I wondered how to build own, simple computer vision service that will be still publicly available, but gives me more flexibility and control over how it works.

This article doesn't attempt to explain how computer vision algorithm works, but instead, it covers how a developer can build and deploy own intelligent services using modern technologies.

What the Computer Vision Service Does

The idea of computer vision service is quite simple: it's a publicly available REST service that receives a picture and can localize and identify one or more objects on it.

Further, we will look deeply into how the service works and how to train underlying machine learning model. Also, you can publish the computer vision service on your Azure account by clicking the following button:

How Does it Work?

Let's briefly discuss the stack of technologies we need and how they will be used.

The heart of the computer vision service is an object detection TensorFlowmodel represented by the frozen_inference_graph.pb file. That file includes a graph definitions and metadata of the model. Graph definitions represent your computation in terms of the dependencies between individual operations. Metadata contains the information required to continue training, perform evaluation, or run inference on a previously trained graph. The object detection graph has "image_tensor" input tensor - any image on which objects will be detected, and four output tensors: detection_boxes - borders of detected objects, detection_scores - correctness confidence in percentages, detection_classes - classes of detected objects (cars, people, animal etc.), num_detections - the number of valid boxes per image in the batch. Further, the object detection graph can be loaded into TensorFlow framework (this is detailed covered in "Choose machine learning model" section). Once TensorFlow is set up and model is trained, we should be able to run execution graph and process input images.

I want to expose computer vision as a REST service and create it with my famous programming language C#, so I've chosen ASP.NET Core 2.0 as a web framework, so I can run TensorFlow and ASP.NET Core service on Linux. It's very important because many TensorFlow models are better documentated for Linux.

Now, I need to think about a bridge between managed .NET code and unmanaged Tensorflow API to get the service working. For this purpose, I will use TensorFlowSharp library - managed TensorFlow API for C# (see "Using Tensorflow Model from C#" section). After compiling TensorFlowSharp for .NET Standard 2.0, I will be able to use in ASP.NET Core 2.0 on Linux.

So finally, I will publish the computer vision service on Azure Linux VM making the REST API publicly available.

Computer Vision REST Service

Choosing Tensorflow Model

As has been said before, the core of the service is TensorFlowmodel trained specially for detecting objects on images. I will use the object detection model from TensorFlowresearch. I will train the model on custom datasets further in the article, but you can use one of pre-trained models from tensorflow detection model zoo as well.

Using TensorFlow Model from C#

The following code uses TensorFlowSharp binding to import the model into TensorFlow and detect objects on the image:

Once VM is created, it becomes available either by RDP or SSH (see How to use SSH on Linux VM). Connect and check that the service is deployed successfully and 'objectdetection' directory contains following files:

Let's Try It Out

The computer vision service is published and I want to try it out by sending some pictures to process. First, I encode the picture to base64 string, it could be done by a couple of lines of C# code, but even simply by using some online service, for instance, https://www.base64-image.de.

Further, sending HTTP request to the computer vision service using Postman (or any other utility). The HTTP request should include following headers:

Accept:text/plain
Content-Type:text/plain

and base64-encoded image in the body.

In response, I receive URL to the processed image and base64-encoded processed image. By clicking on the URL I can load it:

Troubleshooting

Check out logs for troubleshooting. They are located in /var/log/dotnettest.err.log and /var/log/dotnettest.out.log files.

Training Your Own Model

In this section, we will talk how to train object detection model.

Deploy VM for Training

In order to train an object detection model, we need to deploy a special virtual machine. The training requires intensive calculation which can take hours. To speed up the training process we will use GPU-based VMs in Azure (see the list of available sizes here) and parallelize computation on few GPU units. I've prepared an ARM template for deployment object detection training VM.

STEP 1 - Deploy GPU VM on Azure

Click the "Deploy" and fill the form to deploy training Linux VM on your Azure account:

STEP 2 - Download CUDA and cuDNN

CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs).

STEP 3 - Upload CUDA and CUDNN on Training VM

You need to execute the following command from Windows command line to upload CUDA and cuDNN installation files from your local machine (I assume they are located in "Downloads" folder) to training VM:

STEP 4 - Install NVIDIA Driver

Now, we can install NVIDIA driver. I've prepared "install_driver.sh" shell script which is uploaded on training VM. You need to connect via SSH or RDP to your training VM and launch installation using the following command:

sh install_driver.sh

Be patient, it will take a time. Your VM will be rebooted when the driver is installed, so you need to connect via SSH or RDP again.

STEP 5 - Install CUDA and cuDNN

After reboot, we need to install CUDA and cuDNN which were uploaded in earlier. For that purpose, execute following script:

this prints installed CUDA and cuDNN versions. Ensure you have CUDA v.8.0 and cuDNN v.6 installed on your training machine like on this picture:

STEP 7 - Start Training Object Detection

On this step, we will train our object detection neural network. I'm suggesting to connect to training Ubuntu VM via RDP because you'll need to launch three terminals: one launches TensorBoard, one - training and one - evaluation processes. Below, we will talk about each of them.

First of all, let's launch a TensorBoard - powerful tool which comes together with TensorFlow and visualizes TensorFlow graph and learning process. The following command starts TensorBoard on your training VM:

sh start_tensorboard.sh

Start training process:

. export_variables.sh
. train.sh

and finally - evaluation:

. export_variables.sh
. eval.sh

You should be able to see a picture like this (left - TensorBoard, middle - training and right - evaluation process):

TensorBoard

visualizes neural network graph. I used it for finding which places can be parallelized and for general understanding underlying neural network structure

visualizes how a distribution of some tensor changes over the time by showing many histograms visualizations of your tensor at different points in time

visualizes learning. It's helpful when you want to understand when you need to stop training process (we will see it in next section)

Training

Training process requires training data - set of images to pass through machine learning algorithm. The training data must contain correct answer so that the training process could compare the actual result of the algorithm to that you want to predict. After few number of steps, the training process creates a checkpoint, because it can take a long time and checkpoint allows resuming training from the last checkpoint in case if the process fails.

It's usefull to watch TotalLoss chart on TensorBoard to keep track how the training process is going. Than closer TotalLoss value to zero than a better prediction of your model.

I ran 16k steps of training and got following TotalLoss chart. As you can notice, I had pretty good prediction result after ~2k steps of training and subsequent iterations didn't change prediction dramatically:

Evaluation

In parallel with the training process, you can run evaluation process. The evaluation can be performed from time to time, for instance, every 500 steps, for test and validation purposes.

Here is the result of evaluation of bus picture. You can see how object detection accuracy was changing during the training:

Conclusion

Let's summarize what we've learned from this article.

This article demonstrates how to deploy and customize own computer vision service, but described steps can be used for building any other intelligent services based on TensorFlow models. The full list of available TensorFlow models is here.

.NET developers can use TensorFlow on Linux via TensorFlowSharp bindings which wrap up native TensorFlow API.

If you're exposing intensive TensorFlow operations as ASP.NET Core REST API, consider increasing default timeouts on your reverse proxy. I used Nginx as a reverse proxy and here you can find used configuration.

Consider using different hardware configurations for training and evaluation of TensorFlow models. Machines with powerful GPU cope very good with graphical tasks and image processing, so they allow you to train a model very fast, but they are costly. Once you've trained your model you can use it on cheaper CPU-based machine or cluster of machines.

You can skip model training and download pre-trained models for building PoC project or demo. Also, you can use pre-trained model checkpoints as a start point of training against own datasets, it will save you a time.

Training of TensorFlow models is a compute-intensive and time-consuming operation, so consider running it on cloud and parallelize on different machines or units. I ran training on NC12 VM on Azure Cloud powered by the NVIDIA Tesla K80 card with 2 GPUs and parallelize training and evaluation processes on different GPUs.

Using Azure Resource Manager I was able to automate most of deployment VMs operations for training TensorFlow model and hosting REST service. But upgrading of video card driver and installing CUDA should be done manually via SSH or RDP still.

TensorBoard is a powerful tool to visualize, optimize and debug TensorFlow training process.

Share

About the Author

Software Engineer and OSS contributor who interested in distributed systems and cloud computing. I've been using Microsoft stack of technologies with success over the years and love to share engineering approaches, practices, solutions and tools.

Comments and Discussions

I liked in-depth and step by steps instructions on how to build such service yourself. I find it well explained and encouraging.

I think you can make it even easier for others to try it out if you build base docker container with all the dependencies already on board. So the only thing will be necessary is to write your custom code and provide custom training data. This way it may be possible to run it on any other Cloud provider like AWS and Google.

Regarding data preparation: I used COCO dataset in the article which are available here[^]. That dataset contains photos for training and XML files with correct answer of object location on the photo. The following shell script prepares folder structure and training datain my article: shell[^]

If you want to train the model against own set of photos, I would suggest you read the following article explaining how to prepare own dataset for training:Raccon Detector[^]

When you prepared the dataset and trained own model, you can export it to frozen_inference_graph.pb file. How to use TensorFlow model (frozen_inference_graph.pb) from C# is covered in the article. Hope it will help.