In 2017, when Tesla announced incredibly ambitious Model 3 production targets of 5,000 Model 3s per week and the beginning of “production hell,” analysts were wary. But Elon Musk insisted he could pull it off, citing hyper-automation — a robotic assembly line — as his secret weapon to increase manufacturing speed and drive down costs. Fast-forward a year and a half and Tesla delivered 91,000 vehicles in Q4 2018. But the ramp-up didn’t come without massive issues and a move away from Musk’s original vision of a highly automated assembly line.

What happened?

Asked why the push toward automation didn’t pan out, Elon’s answer revolved around one major issue: robotic vision, or the software that controls what the assembly line robots can “see” and then do based on that computer vision. Unfortunately, the assembly line robots just couldn’t deal with unexpected orientations of objects like nuts and bolts, or complicated maneuvering between the car frame. Every such issue would cause the assembly line to stop. In the end, it was far easier to substitute humans for robots in many assembly situations.

Today, computer vision (the umbrella term for robotic vision) is everywhere and represents the next frontier of AI technologies and groundbreaking applications across a variety of industries. The advances being made right now by researchers and companies in the space are impressive and represent the missing pieces needed to make Elon Musk’s vision of an automated car assembly line a reality. At its core, these advances will give computers and robots the ability to reliably deal with the vast array of unexpected corner cases — those errant nuts and bolts — that occur in the real world.

A watershed moment in computer vision

Computer vision experienced a watershed moment in 2012 with the application of convolutional neural networks. Since then, it has really picked up steam. Before 2012, computer vision was largely about hand-crafted solutions — basically, algorithms had manually defined rule sets and could mathematically describe features of an image relatively effectively. These were hand-selected and then combined by a computer vision researcher in order to identify a specific object in an image, like a bicycle, a storefront or a face.

The rise of machine learning and advances in artificial neural nets changed all of that, allowing us to develop algorithms using massive amounts of training data that can automatically decipher and learn image features. The net effect of this was twofold: (1) solutions became much more robust (e.g. a face could still be identified as a face, even if it were oriented slightly differently, or in shadow), and (2) the creation of good solutions became reliant upon large amounts of high-quality training data (models learn features based on the training data, so it is critical that the training data is accurate, sufficient in quantity and represents the full diversity of situations the algorithm may later see).

Now in the lab: GANs, unsupervised learning and synthetic data

Next, new approaches like GANs (Generative Adversarial Networks), unsupervised learning and synthetic ground truth offer the potential to substantially reduce both the amount of training data required to develop high-quality computer vision models, as well as the time and effort required to collect the data. With these approaches, networks can actually bootstrap their own learning and identify corner cases and outliers with higher fidelity, far faster. Humans can then evaluate the corner cases to refine solutions and get to a high-quality model much more quickly.

These new approaches are rapidly expanding the envelope of computer vision in terms of applications, robustness and reliability. Not only do they hold the promise to solve Mr. Musk’s manufacturing challenges, but they will also dramatically extend the boundaries in myriad critical applications, some of which are highlighted below:

Manufacturing Automation: Robots will increasingly have the capability to deal with objects at randomized orientations, like a car seat that is 20 degrees off-center or a screw that is an inch too far to the left. Even further, robots will be able to reliably identify soft, flexible, transparent objects (think about, for example, the plastic bag of socks you ordered on Amazon last week). New robotics providers like Berkshire Grey are at the cutting edge of this.

Facial Detection: Previously, facial detection was not robust in corner cases like side angles, partial shading or occlusion, or babies’ faces. Now, researchers are finding that computer vision can work to identify rare genetic disorders from a photo of a face, with 90 percent accuracy. Certain applications are being put in the hands of consumers, which is only possible because algorithms have become increasingly robust to diverse lighting conditions and other situations that arise as a result of less control over image capture.

Driver Assistance and Automation: Self-driving systems were failing when it was foggy, because they were unable to differentiate between heavy fog and a rock. Now, unsupervised learning and the ability to create synthetic data (led by the likes of Nvidia) are starting to be used to train the system on corner cases that even billions of recorded driving miles cannot uncover.

Agriculture: Companies like Blue River Technology (acquired by John Deere) are now reliably able to differentiate between weeds and crops, and selectively spray herbicide automatically, enabling a dramatic reduction in the quantity of toxic chemicals in use by commercial agriculture.

Real Estate and Property Information: Using computer vision on top of geospatial imagery could allow companies to automatically identify when floods, wildfires or hurricane-force winds may pose a danger to specific properties — allowing homeowners to take action faster, before disaster strikes.

When looking at these advances, one thing quickly becomes clear: Elon Musk wasn’t wrong. It’s just that his vision (robotic and otherwise) was a year or two away from reality. AI, computer vision and robotics are all nearing a tipping point of accuracy, reliability and efficacy. For Tesla, it means that the next ramp up to “production hell” (likely for the model Y) will see a vastly different assembly line at its Fremont and Shanghai factories — one that will more successfully implement robotics paired with computer vision.