Auto-tagging Interior Design Styles

March 9, 2017 / Business, Developers, Tutorials

Sorting content into categories is a key task for recommendation systems, as well as for general data management. We’ve talked a lot about text data lately — for example, using topic tags to improve article suggestions for your readership, and how you can build a custom text classifier for your specific industry or task. We didn’t even need large datasets to build those tools — and the same applies to images. Using our customizable machine learning API, Custom Collection, you can easily build a model to auto-tag images, streamlining and improving visual content recommendations and management.

The Task

Build a simple interior design style classifier using Custom Collections. More specifically, we’ll be testing to see how well it can distinguish three fairly similar styles: contemporary, minimalist, and industrial.

The Data

Our dataset consists of 21 images of rooms — 7 per style — that I grabbed from Google Images (all labeled for reuse). That’s right, just 21. How is this possible? Custom Collections uses a machine learning technique called transfer learning, which allows us to build models with very small datasets. In fact, depending on the difficulty of the task you’re trying to achieve, you’ll start seeing diminishing returns after the first few thousand data points.
Take note though, it’s generally better to have 10 or more samples for each category (depending on the difficulty of the problem you’re trying to solve). It’s difficult to find good free-for-use images though, so we’ll have to make do.

Training the Model

If you want to follow along, clone the dataset and skeleton code from the Github repo. We’ll be working in Python.
Before we go any further, have you set up your free indico account yet? In case you haven’t, follow our Quickstart Guide. It will walk you through the process of getting your API key and installing the indicoio Python library. If you run into any problems, check the Installation section of the docs. You can also reach out to us through that little chat bubble if you run into any trouble.
Assuming your account is all set up and you’ve installed everything, let’s get started.

Step 1: Labeling the Data

If you’re working with the dataset I provided (located in the images folder), you’ll see that each image is named after the style it represents. Open up main.py. In the generate_training_data function, you’ll see that we grabbed those filenames and used them as the labels for each image. If you decide to use your own unlabeled dataset, you can use our CrowdLabel tool. I’m no design expert, so I may have inaccurately labeled a few of these images. CrowdLabel allows multiple people on your team to separately label datasets, increasing labeling accuracy by only using examples that multiple people have assigned the same label. Using CrowdLabel also lets you skip all of the code in this tutorial 😛

Step 2: Training Your Collection

The generate_training_data function processed all our data and labels and prepared them so they can be passed into the Custom Collection API, which only takes in a list of items paired with a single label.
Now we can train our model! It’s actually incredibly easy.
Go to the top of your file and import indicoio. Don’t forget to set your API key — there are a number of ways you can do it; I like to put mine in a configuration file.

Just like that. tqdm is a progress bar that will inform you about how much data has been uploaded, and .wait() will block until the training is complete. Since the dataset is so small, it should only take about a minute train, depending on how fast your Internet connection is.
Calling collection.info() will check your Collection’s status, and return metrics that are useful indicators of the model’s performance. However, larger training set sizes are recommended for more reliable precision and recall metrics, so we’ll use a more hands-on way to test our model instead.

Testing the Model

First, let’s run some test examples through our model to see how it performs for all the categories. I set aside some images in the test_images folder that weren’t in the training dataset. Comment out the code for training the model under if __name__ == "__main__", and then run the following code.

Generally, we can assume that the highest probability result is the category that the model thinks the image most likely belongs to. Your results should appear as below (note that slight variations in the numbers are normal — they should just be roughly the same). The test images for each category appear above the results here.

Looks like the model did alright! If, however, the model had not performed satisfactorily, we could try adding more examples of the underperforming category to the Collection’s training dataset, and retrain the model.

Next Steps

Where to from here? Try expanding the system by adding more styles, or applying this tutorial to other categories, like clothes, food, or art. Or, go a step further — can you adapt our fashion matching tutorial, which also uses the structure of a classification problem, to build a model that matches pieces of furniture to the style of already existing rooms? Share your projects with us at contact@indico.io!