We all leave digital footprints, she says. Every time we search, data is recorded. The sequence of our searches gives especially useful information to help the engine figure out what you’re trying to find out. Now the engines can refer to social graphs.

“But what do we do with data?”

Bing Predicts
looks at all the data it can in order to make predictions. It began by predicting the winners and losers in American Idol, and got it 100% right. For this election year, it tried to predict who would win each state primary or caucus in the US. Then it took in sentiment data to figure out which issues matter in each state, broken down by demographic groups.

Now, for example, it can track a new diabetes drug through the places people visit when logged into their browser. This might show that there are problems with the drug; consider for example people searching for unexpected side effects of it. Bing shares the result of this analysis with the CDC. [The acoustics where I was sitting was poor. I’m not sure I got this right.]

They’re doing the same for retail products, and are able to tell which will be the big sellers.

Frances talks about Cortana, “the only digital system that works across all platforms.” Microsoft is working on many more digital assistants — Bots
— that live within other services. She shows a temporary tattoo
made from gold leaf that you can use as a track pad, and other ways; this came out of MIT.

She says that the Microsoft version of a Fitbit can tell if you’re dehydrated or tired, and then can point you to the nearest place with water and a place to sit. Those shops could send you a coupon.

She goes quickly over the Hololens since Robert Scoble covered it so well this morning.

She closes with a story about using sensor data to know when a cow is in heat, which, it turns out, correlates with them walking faster. Then the data showed at what point in the period of fertility a male or female cow is likely to be conceived. Then they started using genetic data to predict genetically disabled calves.

Barbara Bucknell
, the director of policy and research at Office of the Privacy Commissioner where she worries about how to protect privacy while being able to take advantage of all the good stuff data can do.

A recent large survey found that more than half of Canadians are more concerned about privacy than they were last year. Only 34% think the govt is doing enough to keep their privacy safe. Globally, 8 out of 10 are worried about their info being bought, sold, or monitored. “Control is the key concern here.” “They’re worried about surprises: ‘Oh, I didn’t know you were using my information that way!'”

Adam Kardash [this link
?] says that all the traditional approaches to privacy have be carefully reconsidered. E.g., data minimization says you only collect what you need. “It’s a basic principle that’s been around forever.” But data scientists, when asked how much data they need for innovation, will say “We need it all.” Also, it’s incredibly difficult to explain how your data is going to be used, especially at the grade 6-7 literacy rate that is required. And for data retention, we should keep medical info forever. Marketers will tell you the same thing so they can give you information about you what you really need.

Adam raises the difficulties with getting consent, which the OPC opened a discussion about. Often asking for consent is a negligible part of the privacy process. “The notion of consent is having an increasingly smaller role” while the question of control is growing.

He asks Barbara “How does PEPIDA facility trust?”

Barbara: It puts guardrails into the process. They may be hard implement but they’re there for a reason. The original guidelines from the OECD were prescient. “It’s good to remember there were reasons these guardrails were put in place.”

Consent remains important, she says, but there are also other components, including accountability. The organization has to protect data and be accountable for how it’s used. Privacy needs to be built into services and into how your company is organized. Are the people creating the cool tech talking to the privacy folks and to the legal folks? “Is this conversation happening at the front end?” You’d be surprised how many organizations don’t have those kind of programs in place.

Barbara: Can you talk to the ethical side of this?

Adam: Companies want to know how to be respectful as part of their trust framework, not just meeting the letter of the law. “We believe that the vast majority of Big Data processing can be done within the legal framework. And then we’re creating a set of questions” in order for organisations to feel comfortable that what they’re doing is ethical. This is very practical, because it forestalls law suits. PEPIDA says that organizations can only process data for purposes a reasonable person would consider appropriate. We think that includes the ethical concerns.

Adam: How can companies facilitate trust?

Barbara: It’s vital to get these privacy management programs into place that will help facilitate discussions of what’s not just legal but respectful. And companies have to do a better job of explaining to individuals how they’re using their data.

Suppose a laptop were found at the apartment of one of the perpetrators of last year’s Paris attacks. It’s searched by the authorities pursuant to a warrant, and they find a file on the laptop that’s a set of instructions for carrying out the attacks.

Thus begins Jonathan Zittrain‘s consideration of an all-too-plausible hypothetical. Should Google respond to a request to search everyone’s gmail inboxes to find everyone to whom the to-do list was sent ? As JZ says, you can’t get a warrant to search an entire city, much less hundreds of millions of inboxes.

But, while this is a search that sweeps a good portion of the globe, it doesn’t “listen in” on any mail except for that which contains a precise string of words in a precise order. What happens next would depend upon the discretion of the investigators.

JZ points out that Google already does something akin to this when it searches for inboxes that contain known child pornography images.

JZ’s treatment is even handed and clear. (He’s a renown law professor. He knows how to do these things.) He discusses the reasons pro and con. He comes to his own personal conclusion. It’s a model of clarity of exposition and reasoning.

I like this article a lot on its own, but I find it especially fascinating because of its implications for the confused feeling of violation many of us have when it’s a computer doing the looking. If a computer scans your emails looking for a terrorist to-do list, has it violated your sense of privacy? If a robot looks at you naked, should you be embarrassed? Our sense of violation is separable from our legal and moral right to privacy question, but the two meanings often get mixed up in such discussions. Not in JZ’s, but often enough.

Bruce is one of the most visible, articulate, and smartest voices on behalf of preserving our privacy. (His new book, Data and Goliath, is both very readable and very well documented.) At an event at West Point, he met Admiral Mike Rogers, Director of the NSA. Bruce did an extensive liveblog of the Rogers’ keynote.

There was no visible explosion, forcing physicists to rethink their understanding of matter and anti-matter.

Tim Hwang started a little memefest by suggesting that that photo was announcing a new movie. Contributions by the likes of Tim, Nathan Mathias, Sam Klein, and Ryan Budish include:

All I’ll say here is how struck I am again (as always) about the need to leave out most of everything when writing goes from web-shaped to rectangular.

Just as a quick example, I’m not convinced that the Facebook experiment was as egregious as the headlines would have us believe. But I made a conscious decision not to address that point in my column because I wanted to make a more general point. The rectangle for an op-ed is only so big.

Before I wrote the column, I’d observed, and lightly participated in, some amazing discussion threads among people who bring many different sorts of expertise to the party. Disagreements that were not just civil but highly constructive. Evidence based on research and experience experience. Civic concern. Emotional connections. Just amazing.

I learned so much from those discussions. What I produced in my op-ed is so impoverished compared to the richness in that tangle of linked differences. That’s where the real knowledge lives.

Ancilla Tilia [twitter: ncilla] is introduced as a former model. She begins by pointing out that last year, when this audience was asked if they were worried about privacy implications of Google Glass. Only two people did. One was her. We have not heard enough from people like Bruce Schneier, she says. She will speak to us as a concerned citizen.

Knowledge is power, she says. Do we want to give away info about ourselves that will be available in perpetuity, that can be used by future governments and corporations? The them of this conf is “Power to the people,” so let’s use our power.

She says she had a dream. She was an old lady talking with her grand-daughter. “What’s this ‘freedom’ thing I’ve been hearing about? The kids at school say the old people used to have it.” She answered, “It’s hard to define. You don’t realize what it is until you stop having it. And you stop having it when you stop caring about privacy.” We lost it step by step, she says. By paying with our bank cards, every transaction was recorded. She didn’t realize the CCD’s were doing face recognition. She didn’t realize when they put RFID chips in everything. And license plate scanners were installed. Fingerprint scanners. Mandatory ID cards. DNA data banks. Banning burqas meant that you couldn’t keep your face covered during protests. “I began to think that ‘anonymous’ was a dirty word.” Eye scanners for pre-flight check. Biometrics. Wearables monitoring brainwaves. Smart TVs watching us. 2013’s mandatory pet chipping. “And little did I know that our every interaction would be forever stored.” “When journalists started dying young, I didn’t feel like being labeled a conspiracy nut.” “I didn’t know what a free society was until I realized it was gone, or that we have to fight for it.”

Her granddaughter looks at her doe-eyed, and Ancilla can’t explain any further.

I’ve been meaning to try Medium.com, a magazine-bloggy place that encourages carefully constructed posts by providing an elegant writing environment. It’s hard to believe, but it’s even better looking than Joho the Blog. And, unlike HuffPo, there are precious few stories about side boobs. So, and might do so again.

The piece is about why we seem to keep insisting that the Internet is panopticon when it clearly is not. So, if you care about panopticons, you might find it interesting. Here’s a bit from the beginning:

A panopticon was Jeremy Bentham’s (1748-1832) idea about how to design a prison or other institution where people need to be watched. It was to be a circular building with a watchers’ station in the middle containing a guard who could see everyone, but who could not himself/herself be seen. Even though everyone couldn’t be seen at the same time, prisoners would never know when they were being watched. That’d keep ’em in line.

There is indeed a point of comparison between a panopticon and the Internet: you generally can’t tell when your public stuff is being seen (although your server logs could tell you). But that’s not even close to what a panopticon is.

William McGeveran [twitter:BillMcGev] has written an article for University of Minnesota Law School that suggests how to make “frictionless sharing” well-behaved. He defines frictionless sharing as “disclosing “individuals’ activities automatically, rather than waiting for them to authorize a particular disclosure.” For example:

Social media confers considerable advantages on individuals, their friends, and, of course, intermediaries like Spotify and Facebook. But many implementations of frictionless architecture have gone too far, potentially invading privacy and drowning useful information in a tide of meaningless spam.

Bill is not trying to build walls. “The key to online disclosures … turns out to be the correct amount of friction, not its elimination.” To assess what constitutes “the correct amount” he offers an heuristic, which I am happy to call McGeveran’s Law of Friction: “It should not be easier to ‘share’ an action online than to do it.” (Bill does not suggest naming the law after him! He is a modest fellow.)

One of the problems with the unintentional sharing of information are “misclosures,” a term he attributes to Kelly Caine.

Frictionless sharing makes misclosures more likely because it removes practical obscurity on which people have implicitly relied when assessing the likely audience that would find out about their activities. In other words, frictionless sharing can wrench individuals’ actions from one context to another, undermining their privacy expectations in the process.

Not only does this reveal, say, that you’ve been watching Yoga for Health: Depression and Gastrointestinal Problems (to use an example from Sen. Franken that Bill cites), it reveals that fact to your most intimate friends and family. (In my case, the relevant example would be The Amazing Race, by far the worst TV I watch, but I only do it when I’m looking for background noise while doing something else. I swear!) Worse, says Bill, “preference falsification” — our desire to have our known preferences support our social image — can alter our tastes, leading to more conformity and less diversity in our media diets.

Bill points to other problems with making social sharing frictionless, including reducing the quality of information that scrolls past us, turning what could be a useful set of recommendations from friends into little more than spam: “…friends who choose to look at an article because I glanced at it for 15 seconds probably do not discover hidden gems as a result.”

Bill’s aim is to protect the value of intentionally shared information; he is not a hoarder. McGeveran’s Law thus tries to add in enough friction that sharing is intentional, but not so much that it gets in the way of that intention. For example, he asks us to imagine Netflix presenting the user with two buttons: “Play” and “Play and Share.” Sharing thus would require exactly as much work as playing, thus satisfying McGeveran’s Law. But having only a “Play” button that then automatically shares the fact that you just watched Dumb and Dumberer distinctly fails the Law because it does not “secure genuine consent.” As Bill points out, his Law of Friction is tied to the technology in use, and thus is flexible enough to be useful even as the technology and its user interfaces change.

Marshall Breeding gave a talk today to the Harvard Library system as part of its Discoverability Day. Marshall is an expert in discovery systems, i.e., technology that enables library users to find what they need and what they didn’t know they needed, across every medium and metadata boundary.

It’s a stupendously difficult problem, not least because the various providers of the metadata about non-catalog items — journal articles, etc. — don’t cooperate. On top of that, there’s a demand for “single searchbox solutions,” so that you can not only search everything the Googley way, but the results that come back will magically sort themselves in the order of what’s most useful to you. To bring us closer to that result, Marshall said that systems are beginning to use personal profiles and usage data. The personal profile lets the search engine know that you’re an astronomer, so that when you search for “mercury” you’re probably not looking for information about the chemical, the outboard motor company, or Queen. The usage data will let the engine sort based on what your community has voted on with its checkouts, recommendations, etc.

Marshall was careful to stipulate that using profiles or usage data will require user consent. I’m very interested in this because the Library Innovation Lab where I work has created an online library browser — StackLife — that sorts results based on a variety of measures of Harvard community usage. StackLife computes a “stackscore” based on a simple calculation of the number of checkouts by faculty, grad students or undergrads, how many copies are in Harvard’s 73 libraries, and potentially other metrics such as how often it’s put on reserve or called back early. The stackscores are based on 10-year aggregates without any personal identifiers, and with no knowledge of which books were checked out together. And our Awesome Box project, now in more than 40 libraries, provides a returns box into which users can deposit books that they thought were “awesome,” generating particularly delicious user-based (but completely anonymized) data.

Marshall is right: usage data is insanely useful for a community, and I’d love for us to be able to get our hands on more of it. But, I got into a Twitter discussion about the danger of re-identification with Mark Ockerbloom [twitter:jmarkockerbloom] and John Wilbanks [twitter:wilbanks], two people I greatly respect, and I agree that a simple opt-in isn’t enough, because people may not fully recognize the possibility that their info may be made public. So, I had an idea.

Suppose you are not allowed to do a “soft” opt-in, by which I mean an opt-in that requires you to read some terms and ticking a box that permits the sharing of information about what you check out from the library. Instead, you would be clearly told that you are opting-in to publishing your check-outs. Not to letting your checkouts be made public if someone figures out how to get them, or even to making your checkouts public to anyone who asks for them. No, you’d be agreeing to having a public page with your name on it that lists your checkouts. This is a service a lot of people want anyway, but the point would be to make it completely clear to you that ticking the checkbox means that, yes, your checkouts are so visible that they get their own page. And if you want to agree to the “soft” opt-in, but don’t want that public page posted, you can’t.

Presumably the library checkout system would allow you to exempt particular checkouts, but by default they all get posted. That would, I think, drive home what the legal language expressed in the “soft” version really entails.