A Deep Dive into Facebook and Datalogix

We’ve been seeing a range of reports about Facebook partnering up with marketing company Datalogix to assess whether users go to stores in the physical world and buy the products they saw in Facebook advertisements. A lot of the reports aren’t getting into the nitty gritty of what data is actually shared between Facebook and Datalogix, so the goal of this blog post is to dive into the details. We’re glad to see that Facebook is taking a number of steps to avoid sharing sensitive data with Datalogix, but users who are uncomfortable with the program should opt out (directions below). Hopefully, reporting on this issue will make more people aware of how our shopping data is being used for a lot more than offering us discounts on tomato soup.

Datalogix is an advertising metrics company that describes its data set as including “almost every U.S. household and more than $1 trillion in consumer transactions.” It specifically relies on loyalty card data – cards anyone can get by filling out a form at a participating grocery store.

These loyalty card programs have long been criticized by consumer advocates, who point out that they create a long data trail of our everyday purchases. Concern over these cards spurred the creation of advocacy group Consumers Against Supermarket Privacy Invasion and Numbering (C.A.S.P.I.A.N.), which argues that grocery stores falsely inflate prices for those not participating in the programs and that the programs themselves are expensive to run. Concern over these programs also prompted the state of California to enact a law preventing supermarkets from 1. requiring drivers’ licenses or social security numbers as a condition of issuing loyalty cards, and 2. sharing or selling cardholders’ personal information, with a few limited exceptions. (This blog post doesn’t attempt to compare Datalogix’s practices with the California law.)

Data from such loyalty programs is the backbone of Datalogix’s advertising metrics business.

What data is actually exchanged?

In order to assess the impact of Facebook advertisements on shopping in the physical world, Datalogix begins by providing Facebook with a (presumably enormous) dataset that includes hashed email addresses, hashed phone numbers, and Datalogix ID numbers for everyone they’re tracking. Using the information Facebook already has about its own users, Facebook then tests various email addresses and phone numbers against this dataset until it has a long list of the Datalogix ID numbers associated with different Facebook users.

Facebook then creates groups of users based on their online activity. For example, all users who saw a particular advertisement might be Group A, and all users who didn’t see that ad might be Group B. Then Facebook will give Datalogix a list of the Datalogix ID numbers associated with everyone in Groups A and B and ask Datalogix specific questions – for example, how many people in each group bought Ocean Spray cranberry juice? Datalogix then generates a report about how many people in Group A bought cranberry juice and how many people in Group B bought cranberry juice. This will provide Facebook with data about how well an ad is performing, but because the results are aggregated by groups, Facebook shouldn’t have details on whether a specific user bought a specific product. And Datalogix won’t know anything new about the users other than the fact that Facebook was interested in knowing whether they bought cranberry juice.

In addition to technical privacy protections, Facebook has a contractual relationship with Datalogix to try to make sure that user privacy isn’t violated. Through this relationship, Datalogix promises to keep all the data processing they do for Facebook separate from the rest of their data. (This means you couldn’t approach Datalogix and ask them to, say, give you a list of all the profiles queried by Facebook.) And Facebook promises to discard any hashed data it receives that isn’t about Facebook users1.

We were also initially concerned that Facebook could test a number of small, overlapping data sets to hone in on individual user behaviors. We raised this concern with Facebook, and Facebook responded that, due to the large sample sizes that were being tested, it would be impossible to figure out whether a specific individual bought a specific item. Apparently Facebook also sent in a privacy and security auditor to assess this issue, and was satisfied with the results. We’ve also reached out to Datalogix to talk to them about what formal rules they have regarding small, overlapping data sets. Given the large amount of sensitive data Datalogix maintains, we’re hoping they’ve got appropriate rules in place to prevent people from testing small, similar groups to figure out a particular individual’s actions.

But even with these technical and legal safeguards, many people may be concerned because the shopping data compiled by loyalty programs can be quite sensitive. A New York Times article earlier this year showed how Target was able to identify and target an expectant mother long before she started showing visible signs of pregnancy (and, in at least one case, before her father realized she was expecting). Loyalty card programs have been used by the CDC to track down cases of salmonella, and data collected through these programs has even been sought by law enforcement. In one unfortunate incident, a man was wrongfully charged with arson in part because he had used his loyalty club card to buy fire starters (thankfully, the charges were eventually dropped).

Many people who sign up for loyalty programs may not realize the data amassed on them will be shared with entities outside of the store. And if they do realize it, they might not be comfortable with it. A 2009 academic study found that 86% of those surveyed did not want websites to show them advertisements tailored to them based on their offline activities; perhaps more studies are necessary to see whether users are similarly uncomfortable with data shared from offline retailors to online entities, regardless of whether the advertisements are individually targeted.

All Facebook users are automatically opted in to this program. So if you’re uncomfortable with it, you need to opt out.

How to Opt Out

To opt out of this program, visit the Datalogix.com privacy page. Scroll down to the word “Choice” and the last sentence in the first paragraph says:

If you wish to opt out of all Datalogix-enabled advertising & analytic products, click here.

Click there and a little form will pop up that asks for your name, address, and email address. Datalogix promises that the opt-out will take effect within 30 days. Once you’ve been opted out, Datalogix will no longer include your information in the hashed data they provide to Facebook. (NB: There are a few different options under the “Choice” subheading. You want the one that says “opt out of all Datalogix-enabled advertising & analytic products” and then gives you a form to fill out.)

In addition to opting out via the Datalogix page, many people may want to consider how comfortable they are with loyalty card programs at all. Before you hand these programs your real name, phone number, and email address, consider whether you want every bag of Dorritos, over the counter medication, and box of tampons you buy associated with your identity in a marketing database for years to come.

[1] This is important because hashing data values that come from a relatively small data set, like phone numbers, isn't an effective way of hiding the original values. For example, a computer program could check every possible phone number's hash in just a few seconds to see which phone number matches a particular hash. E-mail addresses may be hidden better, but it would still be possible for Facebook to guess the original values of a substantial fraction of e-mail address hashes in a short time (for example, trying all 1-8 letter addresses at gmail.com). That’s why additional protections, like the contractual relationship Facebook has with Datalogix, are important.

The views expressed in this post are the opinions of the Infosec Island member that posted this content. Infosec Island is not responsible for the content or messaging of this post.

Unauthorized reproduction of this article (in part or in whole) is prohibited without the express written permission of Infosec Island and the Infosec Island member that posted this content--this includes using our RSS feed for any purpose other than personal use.