Breaking News Emails

Get breaking news alerts and special reports. The news and stories that matter, delivered weekday mornings.

May 10, 2018 / 8:30 AM GMT / Updated May 10, 2018 / 8:30 AM GMT

By Ben Popken

Google CEO Sundar Pichai stood on stage at the company’s yearly developer conference on Tuesday and rolled out some of its most advanced technology: an assistant that can schedule appointments for you over the phone, customized suggestions in Google Maps, and even a new feature that can help finish your sentences as you type an email.

It’s all underpinned by the same thing: the massive trove of data that Google is collecting on billions of people every day.

Related

That has helped make Google one of the world’s most well-regarded brands, according to a Morning Consult poll. But in a post-Cambridge Analytica world that is growing increasingly leery of how major tech companies track people, the data collection practices by the world’s leading digital advertising company have come under renewed scrutiny.

"Google is walking a very fine line,” David Yoffie, a professor at the Harvard Business School, said in an email. “Search, plus Android gives Google amazing insight into individual behavior. Google’s stated privacy policies seem adequate, but the question that I cannot answer is whether Google’s stated policy and actual behavior are one and the same. Facebook had a stated policy for the last three years which most of us found acceptable, until Cambridge Analytica came to light.”

Where does the data come from?

The more Google products you use, the more Google can gather about you. Whether it’s Gmail, the Android smartphone operating system, YouTube, Google Drive, Google Maps, and, of course, Google Search — the company is collecting gigabytes of data about you.

Google offers free access to these tools and in return shows you super-targeted advertising, which is how it made $31.2 billion in revenue in just the first three months of 2018.

The company’s data collection practices also include scanning your email to extract keyword data for use in other Google products and services and to improve its machine learning capabilities, Google spokesman Aaron Stein confirmed in an email to NBC News.

“We may analyze [email] content to customize search results, better detect spam and malware,” he added, later noting Google has customized search in this way since 2012.

How Google collects data from Gmail users and what it uses that data for has been a particularly sensitive topic. In June 2017, Google said it would stop scanning Gmail messages in order to sell targeted ads. After this article was published, Google’s confirmation that it does still collect data from the email of Gmail users drew attention from some journalists that cover technology and digital privacy.

Google reached out to NBC to clarify that the company’s spokesperson was referring to “narrow use cases” in Gmail.

"First, since 2012, we’ve enabled people to use Google Search to find information from their Gmail accounts by answering questions like 'When is my restaurant reservation?'" Stein, the Google spokesperson, wrote in an email. "We present customized search results containing this information if someone is signed-in and asks us for it. Second, like other email providers, our systems may also automatically process email messages to detect spam, malware and phishing patterns, to help us stop this abuse and protect people’s inboxes. We have the most secure email service because of these systems - and they are powered by machine learning technology.”

It doesn’t stop there, though. Google says it is also leverages some of its datasets to “help build the next generation of ground-breaking artificial intelligence solutions.” On Tuesday, Google rolled out “Smart Replies,” in which artificial intelligence helps users finish sentences.

The extent of the information Google has can be eyebrow-raising even for technology professionals. Dylan Curran, an information technology consultant, recently downloaded everything Facebook had on him and got a 600-megabyte file. When he downloaded the same kind of file from Google, it was 5.5 gigabytes, about nine times as large. His tweets highlighting each kind of information Google had on him, and therefore other users, got nearly 170,000 retweets.

“This is one of the craziest things about the modern age, we would never let the government or a corporation put cameras/microphones in our homes or location trackers on us, but we just went ahead and did it ourselves because … I want to watch cute dog videos,” Curran wrote.

Want to freak yourself out? I'm gonna show just how much of your information the likes of Facebook and Google store about you without you even realising it

What does Google guarantee?

The company has installed various guardrails against this data being misused. It says it doesn’t sell your personal information, makes user data anonymous after 18 months, and offers tools for users to delete their recorded data piece by piece or in its (almost) entirety, and to limit how they’re being tracked and targeted for advertising. And it doesn't allow marketers to target users based on sensitive categories like beliefs, sexual interests or personal hardships.

However, that doesn't prevent the company from selling advertising slots that can be narrowed to a user’s ZIP code. Combined with enough other categories of interest and behavior, Google advertisers can create a fairly tight Venn diagram of potential viewers of a marketing message, with a minimum of 100 people.

"They collect everything they can, as a culture," Scott Cleland, chairman of NetCompetition, an advocacy group that counts Comcast and other cable companies among its members, told NBC News. "They know they'll find some use for it."

“We give users controls to delete individual items, services or their entire account,” said Google’s Stein. “When a user decides to delete data, we go through a process over time to safely and completely remove it from our systems, including backups. We keep some data with a user’s Google Account, like when and how they use certain features, until the account is deleted.”

New European data privacy rules known as GDPR are set to go into effect on May 25. Those new regulations are supposed to limit what data can be collected on users and give them the ability to completely delete their data from systems, as well as bring their data from one service to another. Companies like Google will be forced to more clearly spell out to customers what kind of data is being collected and no longer be able to bury them in fine print, with fines for violations up to 4 percent of revenue.

What might Google do in the future?

All that data is already valuable to Google, but it could yield an even greater return once paired with advanced artificial intelligence systems that offer highly personalized services, like a souped-up version of Google Assistant.

“On your way to a friend’s house and say ‘find wine’ and you’ll get recommendations for a store that is still open and also not out of the route,” said Oren Etzioni, CEO of the Allen Institute for Artificial Intelligence, a research group founded by Microsoft co-founder Paul Allen.

But Etzioni recommended caution before we unleash swarms of digital agents.

Already we’ve seen some unpleasant effects. Palantir, a security and data-mining firm, sells software that hoovers up data and allows law enforcement to engage in “predictive policing,” guesstimating who might commit crimes. Uber’s self-driving car experiment resulted in a pedestrian being killed after the software was tuned too far in the direction of ignoring stray objects, like plastic bags.