Prepare Datastore for Image-to-Image Regression

This example shows how to prepare a datastore for training an image-to-image regression network using the transform and combine functions of ImageDatastore.

This example shows how to preprocess data using a pipeline suitable for training a denoising network. This example then uses the preprocessed noise data to train a simple convolutional autoencoder network to remove image noise.

Prepare Data Using Preprocessing Pipeline

This example uses a salt and pepper noise model in which a fraction of input image pixels are set to either 0 or 1 (black and white, respectively). Noisy images act as the network input. Pristine images act as the expected network response. The network learns to detect and remove the salt and pepper noise.

Load the pristine images in the digit data set as an imageDatastore. The datastore contains 10,000 synthetic images of digits from 0 to 9. The images are generated by applying random transformations to digit images created with different fonts. Each digit image is 28-by-28 pixels. The datastore contains an equal number of images per category.

Use the splitEachLabel function to divide imds into three image datastores containing pristine images for training, validation, and testing.

[imdsTrain,imdsVal,imdsTest] = splitEachLabel(imds,0.95,0.025);

Use the transform function to create noisy versions of each input image, which will serve as the network input. The transform function reads data from an underlying datastore and processes the data using the operations defined in the helper function addNoise (defined at the end of this example). The output of the transform function is a TransformedDatastore.

Use the combine function to combine the noisy images and pristine images into a single datastore that feeds data to trainNetwork. This combined datastore reads batches of data into a two-column cell array as expected by trainNetwork. The output of the combine function is a CombinedDatastore.

Use the transform function to perform additional preprocessing operations that are common to both the input and response datastores. The commonPreprocessing helper function (defined at the end of this example) resizes input and response images to 32-by-32 pixels to match the input size of the network, and normalizes the data in each image to the range [0, 1].

Finally, use the transform function to add randomized augmentation to the training set. The augmentImages helper function (defined at the end of this example) applies randomized 90 degree rotations to the data. Identical rotations are applied to the network input and corresponding expected responses.

dsTrain = transform(dsTrain,@augmentImages);

Augmentation reduces overfitting and adds robustness to the presence of rotations in the trained network. Randomized augmentation is not needed for the validation or test data sets.

Preview Preprocessed Data

Since there are several preprocessing operations necessary to prepare the training data, preview the preprocessed data to confirm it looks correct prior to training. Use the preview function to preview the data.

Visualize examples of paired noisy and pristine images using the montage function. The training data looks correct. Salt and pepper noise appears in the input images in the left column. Other than the addition of noise, the input image and response image are the same. Randomized 90 degree rotation is applied to both input and response images in the same way.

Define Convolutional Autoencoder Network

Convolutional autoencoders are a common architecture for denoising images. Convolutional autoencoders consist of two stages: an encoder and a decoder. The encoder compresses the original input image into a latent representation that is smaller in width and height, but deeper in the sense that there are many feature maps per spatial location than the original input image. The compressed latent representation loses some amount of spatial resolution in its ability to recover high frequency features in the original image, but it also learns to not include noisy artifacts in the encoding of the original image. The decoder repeatedly upsamples the encoded signal to move it back to its original width, height, and number of channels. Since the encoder removes noise, the decoded final image has fewer noise artifacts.

Create the decoding layers. The decoder upsamples the encoded signal using a transposed convolution layer. Create the transposed convolution layer with the correct upsampling factor by using the createUpsampleTransponseConvLayer helper function. This function is defined at the end of this example.

The network uses a clippedReluLayer as the final activation layer to force outputs to be in the range [0, 1].

Concatenate the image input layer, the encoding layers, and the decoding layers to form the convolutional autoencoder network architecture.

layers = [imageLayer,encodingLayers,decodingLayers];

Define Training Options

Train the network using the Adam optimizer. Specify the hyperparameter settings by using the trainingOptions function. Train for 100 epochs. Combined datastores (created when you use the combine function) do not support shuffling, so specify the Shuffle parameter as 'never'.

Train the Network

Now that the data source and training options are configured, train the convolutional autoencoder network using the trainNetwork function. A CUDA-capable NVIDIA™ GPU with compute capability 3.0 or higher is highly recommended for training.

Evaluate the Performance of the Denoising Network

Visualize a sample input image and the associated prediction output from the network to get a sense of how well denoising is working. As expected, the output image from the network has removed most of the noise artifacts from the input image. The denoised image is slightly blurry as a result of the encoding and decoding process.

The PSNR of the output image is higher than the noisy input image, as expected.

Summary

This example showed how to use the transform and combine functions of ImageDatastore to set up the data preprocessing required for training and evaluating a convolutional autoencoder on the digit data set.

Supporting Functions

The addNoise helper function adds salt and pepper noise to images by using the imnoise function. The addNoise function requires the format of the input data to be a cell array of image data, which matches the format of data returned by the read function of ImageDatastore.

The augmentImages helper function adds randomized 90 degree rotations to the data by using the rot90 function. Identical rotations are applied to the network input and corresponding expected responses. The function requires the format of the input data to be a two-column cell array of image data, which matches the format of data returned by the read function of CombinedDatastore.

This website uses cookies to improve your user experience, personalize content and ads, and analyze website traffic. By continuing to use this website, you consent to our use of cookies. Please see our Privacy Policy to learn more about cookies and how to change your settings.