TensorFlow

TensorFlow Image Segmentation: Two Quick Tutorials

TensorFlow lets you use deep learning techniques to perform image segmentation, a crucial part of computer vision. There are many ways to perform image segmentation, including Convolutional Neural Networks (CNN), Fully Convolutional Networks (FCN), and frameworks like DeepLab and SegNet.

In this article, we’ll explain the basics of image segmentation, provide two quick tutorials for building and training your models in TensorFlow, and explain how to automatically manage multiple TensorFlow projects through MissingLink’s deep learning platform.

Image Segmentation in Deep Learning: Concepts and Techniques

Image segmentation involves dividing a visual input into segments to simplify image analysis. Segments represent objects or parts of objects, and comprise sets of pixels, or “super-pixels”. Image segmentation sorts pixels into larger components. There are three levels of image analysis:

Classification – categorizing the image into a class such as “people”, “animals”

Object detection – detecting objects within an image and drawing a rectangle around them

Segmentation – identifying parts of the image and understanding what object they belong to

There are two types of segmentation: semantic segmentation which classifies pixels of an image into meaningful classes, and instance segmentation which identifies the class of each object in the image.

The following deep learning techniques are commonly used to power image segmentation tasks:

Convolutional Neural Networks (CNNs) – segments of an image can be fed as input to a CNN, which labels the pixels. The CNN cannot process the whole image at once. It scans the image, looking at a small “filter” of several pixels each time.

Fully Convolutional Networks (FCNs) – FCNs use convolutional layers to process varying input sizes. The final output layer has a large receptive field and corresponds to the height and width of the image, while number of channels corresponds to number of classes. FCNs classify every pixel to determine image context and location of objects.

DeepLab – an image segmentation framework that helps control signal decimation (reducing the number of samples and data the network must process), and aggregate features from images at different scales. DeepLab uses a ResNet architecture pre-trained on ImageNet for feature extraction. It uses a special technique called ASPP to process multi-scale information.

SegNet neural network – an architecture based on deep encoders and decoders, also known as semantic pixel-wise segmentation. It involves encoding an input image into low dimensions and recovering it, leveraging orientation invariance in the decoder. This generates a segmented image at the decoder.

Scaling Up Image Segmentation Tasks on TensorFlow with MissingLink

If you’re working on image segmentation, you probably have a large dataset and need to run experiments on several machines. This can become challenging, and you might find yourself working hard on setting up machines, copying data and troubleshooting.

MissingLink is a deep learning platform that lets you effortlessly scale TensorFlow image segmentation across many machines, either on-premise or in the cloud. It also helps manage large data sets, view hyperparameters and metrics across your entire team on a convenient dashboard, and manage thousands of experiments easily.

The images below show the implementation of a fully convolutional neural network (FCN). Input for the net is the RGB image on the right. The net creates pixel-wise annotation as a matrix, proportionally, with the value of each pixel correlating with its class, see the image on the left.

Begin by downloading a pre-trained VGG16 model here or here, and add the /Model_Zoo subfolder to the primary code folder.

The steps below are summarized, see the full instructions by Sagieppel.

1. Training

In: TRAIN.py

Set folder of the training images in Train_Image_Dir

Set folder for the ground truth labels in Train_Label_DIR

Download a pretrained VGG16 model and put in model_path

Set number of classes/labels in NUM_CLASSES

Run training script

2. Predicting pixelwise annotation using trained VGG network

In: Inference.py

Set the Image_Dir to the folder where the input images for prediction are located.

Set the number of classes in NUM_CLASSES

Set folder where you want the output annotated images to be saved to Pred_Dir

Run script

3. Evaluating network performance using Intersection over Union (IOU)

In: Evaluate_Net_IOU.py

Set the Image_Dir to the folder where the input images for prediction are located

Set folder for ground truth labels in Label_DIR. The Label Maps should be saved as PNG image with the same name as the corresponding image and png ending

Set number of classes number in NUM_CLASSES

Run script

Quick Tutorial #2: Modifying the DeepLab Code to Train on Your Own Dataset

DeepLab is semantic image segmentation technique with deep learning, which uses an IMageNet pre-trained ResNet as its primary feature extractor network. The new ResNet block uses atrous convolutions, rather than regular convolutions.

Prerequisites: Before you begin, install one of the DeepLab implementations in TensorFlow. See TensorFlow documentation for more details.

The following is a summary of tutorial steps, for the full instructions and code see Beeren Sahu.

1. Preparing Dataset

Define what your dataset will be used for. Name your new dataset “PQR”. Create a folder “PQR” as: tensorflow/models/research/deeplab/datasets/PQR

Begin by inputting images and their pre-segmented images as ground-truth, for training. Segmented images should be color indexed images and input images should be color images. See the PASCAL dataset.

Create a folder named dataset inside PQR, with the following directory structure:

+ dataset
-JPEGImages
-SegmentationClass
-ImageSets
+ tfrecord

2. Annotate input images Use this folder for the semantic segmentation annotations images for the color input images. This is the ground truth for the semantic segmentation. Colour index these images. Every color index should correspond to a class (with a unique color) called a color map.

3. Define lists of images for training and validation In the ImageSets folder, define:

Train.txt – list of image names for the training set

Val.txt – list of image names for the validation set

Trainval.txt – list of image names for training + validation set

4. Remove the color map in the ground truth annotations

If your segmentation annotation images are RGB images, you can use a Python script to do this:

The palette specifies the “RGB:LABEL” pair. In this sample code, (0,0,0):0 is background and (255,0,0):1 is the foreground class. Note, the new_label_dir is where the raw segmentation data is kept.

The script converts the image dataset to a TensorFlow record. Create a new copy of the script file ./dataset/download_and_convert_voc2012.sh as ./dataset/convert_pqr.sh The converted dataset will be saved at ./deeplab/datasets/PQR/tfrecord

To train the model on your dataset, run the train.py file in the research/deeplab/ folder. The script train-pqr.sh will do this automatically.

You can specify the number of training iterations in the variable NUM_ITERATIONS, and set — tf_initial_checkpoint to the location where you have downloaded or pre-trained the model and saved the *.ckpt files. The final trained model is in TRAIN_LOGDIR directory.

Lastly, run this script from the …/research/deeplab directory:

# sh ./train-pqr.sh

TensorFlow Image Segmentation in the Real World

In this article, we explained the basics of image segmentation with TensorFlow and provided two tutorials, which show how to perform segmentation using advanced models and frameworks like VGG16 and DeepNet. When you start working on real-life image segmentation projects, you’ll run into some practical challenges:

Tracking experiments

Tracking experiment source code, configuration, and hyperparameters. Image segmentation requires complex computer vision architectures and will often involve a lot of trial and error to discover the model that suits your needs. Organizing, tracking and sharing experiment data will become difficult over time.

Scaling up your experiments

Image segmentation requires heavy CNN architectures like VGG and ResNet which might require days or weeks to run. The only way to run multiple experiments will be to scale up and out across multiple GPUs and machines. Setting up these machines and distributing the work between them is a serious challenge.

Manage training data

Image segmentation involves large datasets. Copying these datasets to each training machine, then re-copying when you change project or fine tune the training examples, is time-consuming and error-prone. You need an automatic process that will prepare the required datasets on each training machine.

MissingLink is a deep learning platform that does all of this for you and lets you concentrate on building the most accurate model. Learn more to see how easy it is.