Images with directories as labels for Tensorflow data

A common format for storing images and labels is a tree directory structure with the data directory containing a set of directories named by their label and each containing samples for said label. Often transfer learning that is used for image classification may provide data in this structure.

Update May 2018: If you would like an approach that doesn’t prepare into TFRecords, utilising tf.data and reading directly from disk, I have done this in when making the input function for my Dogs vs Cats transfer learning classifier.

Data layout

As an example, the directory may be as so:

data

train

dog

1.jpg, 2.jpg, …, n.jpg

cat

1.jpg, 2.jpg, …, n.jpg

validation

dog

1.jpg, 2.jpg, …, n.jpg

cat

1.jpg, 2.jpg, …, n.jpg

test

unknown

1.jpg, 2.jpg, …, n.jpg

If we want to use the Tensorflow Dataset API, there is one option of using the tf.contrib.data.Dataset.list_files and use a glob pattern. This will give us a dataset of strings for our file paths and we could then make use of tf.read_file and tf.image.decode_jpeg to map in the actual image. The downsides of this is reading in the actual label. The string is a tensor and so I found it cumbersome to do path manipulation and get the folder name and map that to an integer label. Following on from my last post on Convert and using the MNIST dataset as TFRecords, we will do the same with this dataset so we can use a very similar input function.

Label preparation

We make the assumption that directories are labels: to make this generic and easier to transfer, we can list the directories and create a name to integer.

Images to TFRecords

Prior to encoding the images and labels as TFRecords, there are a few other choices we can make to simplify things. One such thing is the image dimensions. Images may not be of the same size or it may be desirable to downscale rather than using the full resolution image. It is also easier to use constants for these when training rather than the reading width, height and depth from Tensors or resizing in the training pipeline. In our cats vs dogs example, we will ensure our images are 224 x 224 x 3. I’ll dump the function here and then go over each step afterwards: