How a Berlin startup beat the online giants at image recognition

Can a machine learn aesthetics in a way a human would? Could it then look at a set of photos, and draw on those same aesthetics to reproduce a different set? It’s a big question because it has long-term implications for how AI is going to develop.

What are aesthetics anyway? Is it just what you “like”? How does it all work? When you as a human find it hard to express what you like do you think a machine going to find it easy?

Above is an image by Pantea Naghavi Anaraki, winner of The Photojournalist category at The 2016 EyeEm Awards. It depicts a mother in Tehran, Iran who had to sell her possessions to pay for cancer treatment.

Now, does this photograph draw your attention? If so, why? Is it the expression of the mother? Is it the mood created by the pose? Or is it the light? Is it the child’s pose?

Clearly it is very hard to translate thoughts about photographs and aesthetics into human language, let alone machine. Most photographers and visual experts say things like “I know it when I see it”.

For the past few years, EyeEm — a startup out of Berlin which has aimed at photographers with its app platform and marketplace — has been working on identifying the core aesthetics of photographs. Their “EyeEm Vision” platform therefore makes its marketplace function a great deal more efficient. And its dataset is curated by expert photographers and photo editors all the time.

EyeEm says its image recognition model generalizes well with a few samples, and can be trained in near real-time using GPUs.

That means it can help curators interactively specify representative photographs that they deem appropriate for a particular aesthetic. The Vision platform can pick candidate photographs that are visually similar to the choices the curator has made so far. The video below shows one of their curators interacting with this tool (the video shows interaction at 4x actual speed).

EyeEm may not have as many users as Instagram, but that doesn’t matter, Ramzi Rizk, the CTO, tells me.

Recently he and his team conducted a neutral benchmark of its image recognition technology against the main players in the field, and the results were pretty impressive. It came out ahead of Google, IBM, Clarifai, Amazon and Microsoft for image recognition. What that means is that EyeEm is effectively adding “AI company” to the list of the things it can do.

To ensure neutrality, it ran the algorithms on the latest 200 images posted to 5 highly visual Instagram accounts (@instagram, @vsco, @foodnetwork, @redbull and @natgeotravel), photos that none of the systems had been trained on, or had seen before. The keywords were evaluated using MTurk, by anonymously showing each keyword and asking the viewer to mark the keyword as correct or false.

EyeEm Vision came out on top. On average, 80% of the keywords it generated per photo were accurate, compared to 78% for Google Cloud Vision and 73% for Clarifai.

This google sheet shows the benchmark results, as an overview, broken down by the Instagram account that created the photos and by the category that the photo falls into.

Looking into the various categories of images, EyeEm Vision was the most accurate for cityscapes, people/sports, nature and animals (all at 83%). The only category of images where it did not perform as well were “non photos” (text, drawings, illustrations, screenshots and collages) — images that their system is simply not trained on.

But the only reason EyeEm is capable of doing this is precisely because it also has a community which also trains its platform. That means it’s beating a ‘pure tech’ company like Google.

Yes, it has a skilled team of researchers and engineers but it also has a community and a marketplace. By connecting the software, marketplace and community they can constantly improve their technology.

EyeEm now has 20 million photographers on its platform, over a million of whom are contributing to its marketplace, where revenues grew by 300% last year, says the company.

That contributor base is 10 times larger than publicly traded Shutterstock, with the marketplace now passing 100 million photos. It counts stock sites including Getty and Adobe among its partners.

And this is photography, crowdsourced and on demand.

When some of the world’s biggest brands need good quality images EyeEm sets it as a ‘photo mission’ for its community. That’s generated over 6.5 million images, all shot on-demand. It’s base of customers includes Google, Facebook, Coca-Cola, Huawei, Mercedes-Benz, BCG, eBay, Audi, Converse and Land Rover.

So what we have here is a hyper-engaged community of 20 million expert photographers. In some ways that beats 600m Instagrammers taking selfies, although admittedly that’s a different kind of proposition and business model.

EyeEm even does — seemingly crazy — things that a tech company wouldn’t normally do, like holding photography exhibitions and meetups. These, in turn, only serve to tie their community closer into the platform, further super-charging that image recognition platform and that drive for the perfect aesthetic algorithm.