Subtitles and Transcript

Jennifer Golbeck

0:11
If you remember that first decade of the web,it was really a static place.You could go online, you could look at pages,and they were put up either by organizationswho had teams to do itor by individuals who were really tech-savvyfor the time.And with the rise of social mediaand social networks in the early 2000s,the web was completely changedto a place where now the vast majority of contentwe interact with is put up by average users,either in YouTube videos or blog postsor product reviews or social media postings.And it's also become a much more interactive place,where people are interacting with others,they're commenting, they're sharing,they're not just reading.

0:53
So Facebook is not the only place you can do this,but it's the biggest,and it serves to illustrate the numbers.Facebook has 1.2 billion users per month.So half the Earth's Internet populationis using Facebook.They are a site, along with others,that has allowed people to create an online personawith very little technical skill,and people responded by putting huge amountsof personal data online.So the result is that we have behavioral,preference, demographic datafor hundreds of millions of people,which is unprecedented in history.And as a computer scientist,
what this means is thatI've been able to build modelsthat can predict all sorts of hidden attributesfor all of you that you don't even knowyou're sharing information about.As scientists, we use that to helpthe way people interact online,but there's less altruistic applications,and there's a problem in that users don't reallyunderstand these techniques and how they work,and even if they did, they don't
have a lot of control over it.So what I want to talk to you about todayis some of these things that we're able to do,and then give us some ideas
of how we might go forwardto move some control back into the hands of users.

2:01
So this is Target, the company.I didn't just put that logoon this poor, pregnant woman's belly.You may have seen this anecdote that was printedin Forbes magazine where Targetsent a flyer to this 15-year-old girlwith advertisements and couponsfor baby bottles and diapers and cribstwo weeks before she told her parentsthat she was pregnant.Yeah, the dad was really upset.He said, "How did Target figure outthat this high school girl was pregnantbefore she told her parents?"It turns out that they have the purchase historyfor hundreds of thousands of customersand they compute what they
call a pregnancy score,which is not just whether or
not a woman's pregnant,but what her due date is.And they compute thatnot by looking at the obvious things,like, she's buying a crib or baby clothes,but things like, she bought more vitaminsthan she normally had,or she bought a handbagthat's big enough to hold diapers.And by themselves, those purchases don't seemlike they might reveal a lot,but it's a pattern of behavior that,when you take it in the context
of thousands of other people,starts to actually reveal some insights.So that's the kind of thing that we dowhen we're predicting stuff
about you on social media.We're looking for little
patterns of behavior that,when you detect them among millions of people,lets us find out all kinds of things.

3:18
So in my lab and with colleagues,we've developed mechanisms where we canquite accurately predict thingslike your political preference,your personality score, gender, sexual orientation,religion, age, intelligence,along with things likehow much you trust the people you knowand how strong those relationships are.We can do all of this really well.And again, it doesn't come from what you mightthink of as obvious information.

3:43
So my favorite example is from this studythat was published this yearin the Proceedings of the National Academies.If you Google this, you'll find it.It's four pages, easy to read.And they looked at just people's Facebook likes,so just the things you like on Facebook,and used that to predict all these attributes,along with some other ones.And in their paper they listed the five likesthat were most indicative of high intelligence.And among those was liking a pagefor curly fries. (Laughter)Curly fries are delicious,but liking them does not necessarily meanthat you're smarter than the average person.So how is it that one of the strongest indicatorsof your intelligenceis liking this pagewhen the content is totally irrelevantto the attribute that's being predicted?And it turns out that we have to look ata whole bunch of underlying theoriesto see why we're able to do this.One of them is a sociological
theory called homophily,which basically says people are
friends with people like them.So if you're smart, you tend to
be friends with smart people,and if you're young, you tend
to be friends with young people,and this is well establishedfor hundreds of years.We also know a lotabout how information spreads through networks.It turns out things like viral videosor Facebook likes or other informationspreads in exactly the same waythat diseases spread through social networks.So this is something we've studied for a long time.We have good models of it.And so you can put those things togetherand start seeing why things like this happen.So if I were to give you a hypothesis,it would be that a smart guy started this page,or maybe one of the first people who liked itwould have scored high on that test.And they liked it, and their friends saw it,and by homophily, we know that
he probably had smart friends,and so it spread to them,
and some of them liked it,and they had smart friends,and so it spread to them,and so it propagated through the networkto a host of smart people,so that by the end, the actionof liking the curly fries pageis indicative of high intelligence,not because of the content,but because the actual action of likingreflects back the common attributesof other people who have done it.

5:47
So this is pretty complicated stuff, right?It's a hard thing to sit down and explainto an average user, and even if you do,what can the average user do about it?How do you know that
you've liked somethingthat indicates a trait for youthat's totally irrelevant to the
content of what you've liked?There's a lot of power that users don't haveto control how this data is used.And I see that as a real
problem going forward.

6:12
So I think there's a couple pathsthat we want to look atif we want to give users some controlover how this data is used,because it's not always going to be usedfor their benefit.An example I often give is that,if I ever get bored being a professor,I'm going to go start a companythat predicts all of these attributesand things like how well you work in teamsand if you're a drug user, if you're an alcoholic.We know how to predict all that.And I'm going to sell reportsto H.R. companies and big businessesthat want to hire you.We totally can do that now.I could start that business tomorrow,and you would have
absolutely no controlover me using your data like that.That seems to me to be a problem.

6:49
So one of the paths we can go downis the policy and law path.And in some respects, I think
that that would be most effective,but the problem is we'd
actually have to do it.Observing our political process in actionmakes me think it's highly unlikelythat we're going to get a bunch of representativesto sit down, learn about this,and then enact sweeping changesto intellectual property law in the U.S.so users control their data.

7:15
We could go the policy route,where social media companies say,you know what? You own your data.You have total control over how it's used.The problem is that the revenue modelsfor most social media companiesrely on sharing or exploiting
users' data in some way.It's sometimes said of Facebook that the usersaren't the customer, they're the product.And so how do you get a companyto cede control of their main assetback to the users?It's possible, but I don't think it's somethingthat we're going to see change quickly.

7:44
So I think the other paththat we can go down that's
going to be more effectiveis one of more science.It's doing science that allowed us to developall these mechanisms for computingthis personal data in the first place.And it's actually very similar researchthat we'd have to doif we want to develop mechanismsthat can say to a user,"Here's the risk of that action you just took."By liking that Facebook page,or by sharing this piece of personal information,you've now improved my abilityto predict whether or not you're using drugsor whether or not you get
along well in the workplace.And that, I think, can affect whether or notpeople want to share something,keep it private, or just keep it offline altogether.We can also look at things likeallowing people to encrypt data that they upload,so it's kind of invisible and worthlessto sites like Facebookor third party services that access it,but that select users who the person who posted itwant to see it have access to see it.This is all super exciting researchfrom an intellectual perspective,and so scientists are going to be willing to do it.So that gives us an advantage over the law side.

8:48
One of the problems that people bring upwhen I talk about this is, they say,you know, if people start
keeping all this data private,all those methods that you've been developingto predict their traits are going to fail.And I say, absolutely, and for me, that's success,because as a scientist,my goal is not to infer information about users,it's to improve the way people interact online.And sometimes that involves
inferring things about them,but if users don't want me to use that data,I think they should have the right to do that.I want users to be informed and consentingusers of the tools that we develop.

9:23
And so I think encouraging this kind of scienceand supporting researcherswho want to cede some of that control back to usersand away from the social media companiesmeans that going forward, as these tools evolveand advance,means that we're going to have an educatedand empowered user base,and I think all of us can agreethat that's a pretty ideal way to go forward.