Just how big is Big Data?

Big Data started with algorithms helpfully scouring vast amounts of data to find patterns. These days it feels a bit like Big Brother. Using machine learning and AI to tweak algorithms, companies are now able to deliver profound insights from datasets once considered impossible to compile.

This collection and analysis has expanded so rapidly, it’s pushing data holders off any existing ethical framework or map. Facing very little scrutiny, companies have been left on their own to establish right and wrong in this space. And we may not like where they draw the line.

Big Data holders aren't under any real official scrutiny, but the paradoxical problem for companies is that even when they try to help, they come off as creepy.

The scale at which Big Data operates is hard to imagine. Retail behemoth Walmart handles one million customer transactions every hour from its 6,360 or so stores. But that’s a floppy disc compared to a server rack when you consider the data stored by Amazon, Apple, Facebook, or Google.

In June 2017, Facebook announced it had two billion users—25 percent of humanity. Google handled at least 2.3 million searches per minute in mid-2016. Apple’s AI-assistant Siri apparently handled two billion queries a week in mid-2017; double what it did the previous year. Amazon collects enough data that it can figure out actual purchasing intent, rather than simply curating better recommendations.

These companies aren’t only developing in-house expertise with Big Data and research. They’re buying up anything that shows promise in this much-hyped field.

Amazon, Apple, Facebook and Google have all spent hundreds of millions, if not billions, of dollars in this space in the last few years through internal research and a string of big money acquisitions of start-ups that show promise in the field.

Clearly, the data that’s being collated from our usage habits and lives matters, though it’s not always clear why.

How Big Data is collected and analysed

Interpreting Big Data involves identifying trends from millions of data points and turning any interaction possible into a data point, even if the purpose isn’t understood straight away. Collect the data first, process it second.

IBM utilize large datasets in unexpected ways and from unexpected sources. Their data scientists ran the entire recipe archive of Bon Appétit through the enormous computational power of Watson to give us Chef Watson, a browser-based app that allows you to generate somewhat unusual recipes, just by nominating ingredients at hand and preferred cuisine style.

New York City turned to DataKind, a non-profit organization working with Big Data, to best determine how to manage and maintain 2.5 million trees in the greater city area from GPS data. Other projects by DataKind have determined where to install fire-alarms to reduce home fire blazes and saved water in California by better predicting future demand. This type of project is where Big Data is hyped the most. Companies everywhere want to use data to their advantage.

Doing what is right, when no law strictly covers your data trove, means it’s open season. Assurances of privacy and anonymity from Big Data techniques offers little comfort when the algorithms get personal.

Editor's Pick

How Google is powering the world’s AI

After helping to define the modern internet era with Search and Android, Google is already at the forefront of the next wave in computing research and development: AI. Many consider artificial intelligence and neural network …

Data scientist, industry analyst, and consultant of Rebaie Analytics Group Ali Rebaie confirmed data is being used to help companies, as well as help us.

“Data spread is now a treasure trove for companies,” said Rebaie in a statement sent to Android Authority. “For example, insurance companies are now using sentiment analysis to analyze tweets, which helps them predict heart diseases and thus improve claim targeting.”

Personalization generated from studying large data sets is already happening and will only get more sophisticated, if we’re willing, said the analyst.

“We are heading towards an era with anthropologically data-driven machines that understand our patterns and interactions, and can remove mundane tasks and personalize everything,” said Rebaie. “Personalization techniques can already recognize the walk style and movement of the user to open a car for him without keys, or automatically adjust room temperature and lighting preferences before they open their hotel room door.”

Your data

Generally, what you’re doing online as you talk to Google Assistant or search to buy on Amazon is being recorded somewhere in a giant database. That isn’t necessarily the case in the European Union, which offers privacy protection in ways the U.S. doesn’t. Browse any respectable website while in in the EU, and you’ll be warned prominently about cookie collection, thanks to The Cookie Law. It’s just one example of where EU directives have pushed for more privacy.

Some companies are public about investing in general privacy and ethics. Siri’s own machine learning development has been hampered by Apple’s insistence on removing old Siri searches after six months, which limits just how much data can be used to train the tool. Google Executive Chairman Eric Schmidt, mused publicly in 2010, that Google had looked at the concept of predicting stock prices by examining trends in incoming search requests. The company abandoned the idea after concluding that it was most likely illegal to do so. But was it feasible?

When no law strictly covers your data trove, it’s open season. Doing what’s right can fall by the wayside. Assurances of privacy and anonymity in Big Data techniques offers little comfort when the algorithms get personal.

When Big Data creeps on you

Take the auto-suggestions from Google’s own Big Data analysis of its most-searched similar terms to get an idea of what people are thinking about or worried about.

Type “Google knows” into a Google search, and look at the suggestions:

The first suggestion says it all. Similarly, try entering “Big Data knows” – from one of the biggest database of all time comes suggestions like “Big Data knows what your future holds,” and “Big Data knows when you are pregnant.”

The first search captivates people wanting to understand how to gaze into a future they don’t know, but apparently Big Data does. Hundreds of articles discuss this popular thought.

The second suggested search stems from a fascinating New York Times article published five years ago, on Target’s Big Data strategies, including a now famous sub-plot: Target knows when you’re pregnant.

The feature recounted a situation where a father walked into a Target store, clutching mailed out coupon codes, to berate a local manager for sending his daughter coupons for pregnancy-related goods:

“My daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?”

The manager didn’t have any idea what the man was talking about.

After apologies from the manager, including a phone call to the house, the abashed father admitted that “some activities” had happened without his knowledge. His daughter was due later in the year. Those coupons? Useful, but unsettling.

Target pumped the brakes and decided to more skillfully hide what Big Data was telling them. Target also decided to stop talking to the Times reporter for that story, but they still gave this quote:

“We found out that as long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons. She just assumes that everyone else on her block got the same mailer for diapers and cribs. As long as we don’t spook her, it works.”

When Big Data’s predicted insights are carefully acted upon, that’s when it works. So what about when Amazon, a company currently fifteen times the size of Target, weighs in?

Approximately 58 per cent of American households have an Amazon Prime subscription. This is more than the number of households that voted in the 2016 election.

According to digital intelligence firm L2 Inc, approximately 58 percent of American households have an Amazon Prime subscription. That’s more than the number of households that voted in the 2016 election. The Jeff Bezos-led company has a better purchase history and it has the search queries you made for what you bought from your account. Amazon knows what shows you’ve watched and books you’ve read. It’s now ever-present in your home via Amazon Echo, and soon, will know your offline and grocery purchases in Whole Foods stores.

John Kenny, the Chief Strategy Officer of FCB Chicago, told Forbes that the actual limit for advertisers isn’t what companies and advertisers know about their customers, it’s how they can reach them.

“Right now, I know so much about my customers, their needs, their point in the customer journey, but I’m limited by how much I can engage them,” said Kenny.

“You end up in a situation where consumers are over-targeted but under-engaged, being stalked by the same generic messaging again and again, creating customer frustration, the exact opposite of what we want.”

Arguably, Amazon and the big four have far more opportunity to engage across their various platforms.

Pumping the brakes

Studies and polls have shown we are concerned about our data. We want control. The issue is that we don’t understand the magnitude of what we are giving away when we use apps, sites, or buy something from a store. Information transactions aren’t clear. Opt-outs are hidden.

Smartphones capture more and more sensor data than can be interpreted through Big Data techniques to better understand you and your environment. The internet of things will contribute even further. Fitness trackers know your heart rate. Combined with related data such as location, and they know what gets you excited. They know when you’re asleep. Or geting intimate.

The problem is that these companies claim transparency about these practices. The Wall Street Journalpublished insight into how Facebook has been able to track Snapchat, using Big Data.

Editor's Pick

Always listening devices and the question of privacy vs safety

Was Harrison Ford really a replicant in Blade Runner?
Turns out it might not matter. There’s possibly already an AI detective living in your home. Her name is Alexa.
Alexa has been busy lately, solving murders and …

Four years ago, Facebook purchased Onavo, a Tel Aviv-based VPN company which developed an app for Android and iOS called Protect. Facebook examined the slew of data it received from the Protect app to look at how users use the Snapchat app. After the introduction of the very Snapchat-looking Instagram Stories, Snapchat use fell.

Users sought out a VPN app to mask their mobile data, but handed it to Facebook. How did Facebook defend this ominous data mining? The social network referred back to the Onavo Privacy Policy where this is all stated.

“Privacy policies”

What’s actually in these Privacy policies and Privacy Notices? This is from Amazon’s Privacy Notice:

Information You Give Us: We receive and store any information you enter on our website or give us in any other way.

So, everything? For all-time?

According Electronic Frontier Foundation Senior Staff Attorney Lee Tien, this does nothing to help you understand your rights or what’s happening.

“So in that example, we have a disclosure, but its meaning is opaque at many levels,” said Tien over email.

“When you visit Amazon via your desktop or mobile device, you’re probably conscious of information you type in, like your name/password/shipping address/payment info. But you may be much less conscious of clickstream data, you may not know that a “like” button is a form of tracking code, you may not know that browser headers are being collected, etc. So the [Privacy Notice] ‘any information you […] give us in any other way’ doesn’t convey all the information it could, and does not bridge any knowledge gap between Amazon and you.”

The problem isn’t just that data is being taken without a user’s full knowledge, it’s that how it’s used is also unclear.

“Maybe you know that Amazon has this data, but you might not understand what that data tells Amazon. A doctor sees certain things in a person that could begin to ground a medical diagnosis. A home inspector sees signs of termites where I don’t. A fancy term for this is ‘the decoding capacity of the audience’. The point is we are often comfortable ‘trusting’ others with personal information partly because we have no idea what they can figure out from it,” said Tien.

Tien pointed to a 2008 study by Hoofnagle and King which showed more than 50 per cent of Californians believed that if a website had a privacy policy, it didn’t share your information with others. “Obviously, if that’s what you believe, you look at the world (and those words) very differently,” said Tien.

There’s really no way to avoid these policies if you want to use these sites and their impossibly-good offerings. You can most often opt-out of third-party marketing but with the big four companies dominating advertising, there’s fewer third-parties every day.

50 per cent of Californians believed that if a website had a privacy policy, it didn’t share your information with others.

As for legality, Tien explained that only companies that fall inside of specific laws are bound by strict rules, such as HIPAA for doctors or health insurers.

“You usually only have a generic duty to not be unfair, deceptive, or misleading in your market/customer-facing statements. Basically, you’re not supposed to lie,” said Tien.

Will this data collection be reined in or are we relying on self-management, company ethics, and encryption? What about government intervention?

“It’s a hard fight,” said Tien. “It’s not obvious that companies have great incentives to cure all of these informational market failures, to be more transparent about what they have and what they do with it. And it’s not obvious that the government is on our side, because one of its ways to learn about us is to get data from the companies we do business with.”

It’s clear, as Big Data sprints forward, that there’s a great deal of work to be done in applying basic principles of freedom and privacy into laws and ethical rules.