“Using easily obtainable visual data, we can learn so much about our communities, on par with some information that takes billions of dollars to obtain via census surveys. More importantly, this research opens up more possibilities of virtually continuous study of our society using sometimes cheaply available visual data,” says Fei-Fei Li, an associate professor of computer science at Stanford University and director of the Stanford Artificial Intelligence Lab and the Stanford Vision Lab, where the work took place.

Li is an expert in computer vision and deep learning, a type of artificial intelligence in which computers teach themselves to recognize three-dimensional objects in two-dimensional images—computers that see, as she describes them.

Researchers trained the algorithms—or, more accurately, they trained themselves—to recognize the make, model, and year of every car produced since 1990 in each of more than 50 million Google Street View images from 200 American cities.

The data on car types and location were then compared against the most comprehensive demographic database in use today, the American Community Survey, and against presidential election voting data to estimate demographic factors such as race, education, income, and voter preferences.

Li and her team found a simple linear relationship exists between cars, demographics, and political persuasion. The societal associations were “simple and powerful,” as the authors describe in their paper.

For instance, if the number of sedans in a neighborhood is greater than the number of pickups, there is an 88 percent chance that the precinct will vote Democratic. Transpose those numbers to have more pickups than sedans and there is an 82 percent chance a precinct will vote Republican.

Speedy algorithms

Beyond the obvious political repercussions, the researchers believe their algorithms could help provide more timely and continuous supplements to current demographic surveys. The American Community Survey is now conducted through expensive and labor-intensive door-to-door canvassing costing the US more than $250 million annually.

Even at that cost, what is worse is the lag time between data collection and publication, which stretches two years or more, especially for small cities and rural areas.

By comparison, Li’s work piggybacks on a publicly available, regularly updated image database, built and paid for by Google Street View, and it generates analyses in near real time.

“I don’t see something like this replacing the American Community Survey, but as a supplement to keep the data up to date,” says Timnit Gebru, first author of the paper and formerly a member of Li’s lab. Gebru is now a postdoctoral researcher in the Fairness Accountability Transparency and Ethics (FATE) in Artificial Intelligence group at Microsoft Research.

Getting to this point was not easy, Gebru says. The team first had to build by hand an image database of all cars since 1990—year, make, model, trim packages—and then teach a computer to recognize the subtle differences between cars in partially obscured and odd-angle images.

They began with a database of 15,000 cars from car-sale website Edmunds.com, but that was only the start. Human experts next had to categorize the cars to the subtlest detail, one-by-one. The difference between the 2007 and 2008 Honda Accord, for instance, is an almost-imperceptible change to the rear tail lights.

The algorithm worked fast, taking just two weeks to sort the cars in all 50 million images into 2,657 categories by make, model, and year. A human working at a relatively high rate of six images a minute would need 15 years to complete the same task.

Early skepticism

Some outside the research group were skeptical, Gebru says. They pointed to inconsistencies in the times of day when images were taken that might affect traffic patterns and vehicle makeup. In fact, Gebru adds, many Street View images are taken in the early morning hours specifically to avoid traffic, providing some consistency to the time of day when the images were taken.

Regardless of the level of traffic, Gebru says, the images provided valuable data.

“If you walk around a neighborhood looking at cars, the density of traffic sometimes tells you things as valuable as the types of cars you see on the streets,” Gebru says. “We can use all this information in our algorithms.”

Gebru has high hopes for her new application. She is excited to move beyond demographics and use visual imagery analysis to improve surveys of hard-to-reach areas or for other beneficial uses like monitoring carbon dioxide levels and easing traffic congestion.

Li concurs. “It can help us understand how our society works, the things people need and how we can improve lives,” Li says. “There is great potential to use computer vision technology in a constructive and benevolent way.”

Additional researchers contributing to the work are from Stanford University, the University of Michigan, and Baylor University. Funding from the National Science Foundation and the Stanford DARE fellowship supported the research in part. NVIDIA donated GPUs for the research.