Graphic designer and tech enthusiast

Will AI be ever able to recognize images?

Today we have (seemingly) complex mechanisms of face recognition, fingerprint and retinal scans. However, if we compare human vs. machine ability to discern speech or recognize images, most advanced machines will appear to be quite dumb. They fail the test that an average two-year-old passes with flying colors. How come?

The fundamental difference between human brain and machines is that machine learning goes in reverse. People are born with a humongous number of links between neurons. Most of them we do not need. The brain adapts by losing unnecessary connections that we do not use. Sometimes not all irrelevant connections are lost and we have some funny glitches called “synesthetic experience” [1].

Despite being sometimes called “a hidden sensory powers” or “disorder”, synesthesia is only a perception glitch. Neurons that should have lost their random connections preserved them into adulthood. This also is not fantasy, as neural associations are automatic and consistent.

Instead, machines establish connections between their “neurons” as they learn. Human brain sees something new and immediately has an imprint of it in a highly pliable brain. Machine tries instead to “recognize” what it sees based on what it has already learned, fitting it into patterns.

A machine that is “trained” to recognize faces (as FaceID or a simple algorithm in a camera that detects faces and locks focus upon them) will see faces everywhere when there’s none. At the same time it will fail to pick up and actual face due to poor lighting conditions and low contrast. The problem is that it does not recognize anything else. No additional data to hint where a face is likely and where it’s unlikely to be found. Whereas people are able to “guess” the face due to additional cues (silhouette, posture, hair).

The artificial neural networks work out what's in an image by analyzing layers until they decide on what it shows. However, in the process they create a feedback loop: if the network sees something that resembles in shape an eye, it will only reinforce this impression with each iteration and finally will convince itself that an eye it is. Instead of actually recognizing something, it generates whatever it was trained to see even in the randomly generated visual noise. At the same time, even identifying the same picture seen several times (a duplicate) seems to be an advanced task that even specialized software fails to perform [2].

Human ability to recognize patterns can be excessive as well. This is an evolution-perfected alarm system. It can be a bit paranoid. You see, it is always better to mistake a bush for a tiger than to miss an actual tiger.

A famous quote by Leonardo Da Vinci describes this phenomenon: “Look at walls splashed with a number of stains, or stones of various mixed colors. If you have to invent some scene, you can see there resemblances to a number of landscapes, adorned with mountains, rivers, rocks, trees, great plains, valleys and hills, in various ways. Also you can see various battles, and lively postures of strange figures, expressions on faces, costumes and an infinite number of things, which you can reduce to good integrated form.”

Machine learning tries to recreate this process. Google’s DeepDream [3] made some splash a couple of years ago, but this trippy app is actually a byproduct of a not-so-successful pursuit of image recognition, that instead of recognizing human faces and animals reduced to wild uncontrolled pareidolia [4], where every surface of our world is covered in eyes.

Another reason why these images look like LSD-induced hallucinations is that principles of neural network algorithms loosely mirror the way human neurons actually work in our visual cortex. They are over-processed to identify patterns (the ones they have been trained to identify). That’s why they see pagodas everywhere, or dogs, or eyes. To produce meaningful results, it needs millions of human-labeled images, which is hard to get. So developers try to invent some kind of shortcut.

Humans evolved to identify patterns, living things, and faces. That was vital to our survival. Computers evolved to do math. Human brain is infinitely more powerful than any processor out there, yet it lacks “software” for doing math. That is why computers appear to be more powerful.

The human brain is very complex. It has several systems to see faces and patterns and then logical reasoning to “unsee” them and filter out irrelevant information. Our machine algorithms might be this complex in the future, but they aren’t arriving soon.