Quick and easy guide to prepare a Detectron sandbox

Object detection is an exciting topic in deep learning and it takes the basic idea of classifying an object in an image to the next level of finding where the object(s) is inside the image and even what its bounding contour. Many deep learning models are there and some of them are even available as ONNX models. One of those models is YOLO and I have a post on hosting and consuming it using Azure Machine Learning Service.

Facebook AI research has released a new system in 2018 called Detectron. Detectron is based on Caffe2 deep learning library and can be used with several underlying architectures like Mask R-CNN and RetinaNet. From its home page, it looked cool specially that it can detect contours of objects identified so I wanted to give it a go.

Using Detectron is supposed to be straightforward as per its getting started page on GitHub. Unfortunately this is not the case specially with the latest changes on Caffe2. Caffe2 was merged inside PyTorch sometime in 2018 and most of the guides to build a Caffe2 environment to try Detectron were not working. I have tried different combinations but always hit walls but that could be a problem of mine only.

Anyway, I thought to try again and instead of trying to prepare a Caffe2 environment I switched to create a PyTorch environment assuming it will have a valid Caffe2 setup. That approach worked fine as of Jan 2018 and I thought to share it. So let’s get cooking!

Create a PyTorch GPU-enabled docker container

We will not start from scratch so the easy way is to create a GPU-enabled docker container of PyTorch. The official PyTorch images are found on PyTorch page on docker hub. There is a bunch of different images but I picked pytorch/pytorch:nightly-devel-cuda9.2-cudnn7 for two reasons. The first is that it has a recent build of PyTorch which means it will probably have Caffe2 bundled. The second is that it uses Cuda 9 as I recall I tried the one with Cuda 10 and failed somewhere along the way. Detectron documentation says it has been tested on certain versions of Cuda & Python so pulling the latest and greatest might not always work. Another thing for this setup to work is nvidia-docker. As we will have a PyTorch GPU-enabled version we need to have GPU pass-through from host to the container. On a Windows box that might be hard unless you have latest versions of Windows Server (2016 or maybe 2019). I don’t know about macOS options but unless you run a Linux distro then we need the help of the cloud. Azure came to the rescue here and I can simply create a deep learning virtual machine. There are a bunch of settings to provision it but it’s mainly around OS flavour and HW specs. A Linux one with a single GPU will be good enough and it also comes with docker, nvidia-docker and GPU drivers pre-installed.

You can create a free trial Azure account if you don’t have one but the same idea applies to any other cloud. We just need a Linux VM with nvidia GPU, nvidia-docker & drivers installed.

Once the VM is created, the public IP address will be available in Azure portal to be grabbed and used to SSH into the VM. The first step is to pull the needed image and spawn a container out of it.

To test the installation is fine, we will run a few tests mentioned in same installation page.

python ./detectron/tests/test_spatial_narrow_as_op.py

Fine but we need to fix one thing before proceeding which is OpenCV. The one installed along this process is version 4.0.0 (at the time of writing) and this comes with two issues. The first issue is a missing dependency we will install now and second issue is a breaking change in findContours function but we will come to that later.

So first let’s install the missing dependency and verify OpenCV is fine.

The above script first downloads the weights of a certain model then uses this model to run the inference on a bunch of images in demo folder and is supposed to generate an output of one PDF for every input image containing the image itself with the objects identified and highlighted. Running the above, you would probably get the following error:

This error is due to the fact that the test scripts coming with Detectron assume OpenCV v 3.x while we have now v 4.0.0. There is a function called findContours that was returning three values in v 3.x but for v 4.0.0 it returns two values only. Just a small breaking change we need to work around.

We can use one of those lovely Linux text editors and edit the file /detectron/utils/vis.py and fix the two occurrences that have that extra return value. But I thought to fork Detectron repo and do a small version check like the following.

That will not work for 4.1but that’s another story. Then all needed now is to drop the current Detectron folder and pull this Detectron forked version and build it.

Congrats, nothing failed and if we list files in output folder /tmp/detectron-visualizations we would see the PDF files generated.

It’s up to you how you would like to pull those PDFs and download them for inspection. For myself, I used docker to copy them from the container to the host VM. Then in the host VM, I put them in some folder under /notebooks as Azure deep learning VM comes with Jupiter installed and published externally on port 8000 (using HTTPS). So I could access them using the browser from my Windows laptop and download the PDFs locally.

Trying Detectron with some of my images yielded result like the following:

If you have high res images, Detectron will be able to find small objects that you may not even notice when you view the image without zooming 😃

Now what

That was nice and cool and before we forget, it’s better to open another SSH window and commit that container as a local image. We can also push it to docker hub for other people to use directly.

Exporting Detectron models to ONNX format so that it could be consumed more easily.

Publishing a web service out of this image using Flask or .NET core. The PDFs generated with object masks are cool features coming with Detectron source code for the sake of testing and verification purposes. But to consume Detectron in a real application, we need to get the plain outcome of the inference and interpret/use it from the caller application.

P.S. The image below the post title is an image from Flickr fed to same test script above.