Foo for Thought

Last weekend I had the extraordinary privilege to attend Foo Camp, an annual gathering of about 250 Friends Of O’Reilly (aka Foo). Tim O’Reilly, Sara Winge, and their colleagues have amazing friends, as you can see if you scan this unofficial list of attendees working on big data, open government, computer security, and more generally on the cutting edge of technology and culture (especially where the two overlap).

Foo Camp is an unconference, which merits some elaboration. No fees, no conference hotel (many attendees literally set up camp in the space O’Reilly provided), and no advance program aside from some preselected 5-minute Ignite presentations. Attendees proposed and organized sessions, merging and re-arranging them to optimize for participation. It was a bit chaotic (especially the mad rush after dinner to secure session slots), but very effective.

The minimalist format brought out the best in participants.

For example, I am passionate about (i.e., against) software patents, so I organized a session about them. I did a double-take when I realized that one of the participants was Pamela Samuelson, perhaps the world’s top expers on intellectual property law. I braced myself to be schooled — as I was. But she did it gently and constructively. Specifically, she pointed me to work that her colleagues Jason Schultz and Jennifer Urban were doing on a defensive patent strategy for open-source software (including a proposed license), as well as reminding me of the Berkeley Patent Survey supporting the argument that software entrepreneurs only file for patents because of real or perceived pressure from their investors. I also heard war stories from lawyers who have done pro bono work against patent trolls, reinforcing my own resolve and also reassuring me that the examples I’ve seen at close range are not isolated.

Another session asked whether we are too data driven in our work. What was notable is that this session included participants from some of the largest internet companies debating some of the must fundamental ways in which we work, e.g., do we actually learn from data or do we engage in assault by data to defend preconceived positions (cf. argumentative theory). Like all of the conference, the discussion was under “frieNDA”. so I’m being intentionally vague on the specifics. But it was refreshing to see candid admission that all of us know and have experienced the dangers of manipulating an audience with data, and that there are no algorithms to enforce common sense and good faith.

I won’t even try to enumerate the sessions and side conversations that excited me — topics included privacy, the future of publishing, a critical analysis of geek culture, and irrational user behavior. I missed the session on data-driven parenting, though others have pointed out to me that you can only learn so much if you don’t have twins and perform A/B tests. The best summary is intellectual diversity and overstimulation. If you’d like to get a general sense of the discussion, check out the #foocamp tweet stream. I also recommend Scott Berkun’s post on “What I learned at FOO Camp“.

As someone who organizes the occasionalevent, I’m intrigued by the unconference approach — especially now that I’ve experienced it first-hand. Moreover, I feel strongly that the academic conference model needs an upgrade. But I also know that open-ended, free-form discussion sessions are not a viable alternative — indeed, a big part of Foo Camp’s success was how it inspired participants to organize sessions — and to vote with their feet to attend the worthwhile ones. And of course part of that success came from inviting active, engaged participants rather than passive spectators.

Many of you also organize events, and I’m sure that all of you attend them. I’m curious to hear your thoughts about how to make them better, and happy to share more of what I learned at Foo Camp. After all, Foo is for (inspiring) thought.

23 responses so far ↓

Another session asked whether we are too data driven in our work. What was notable is that this session included participants from some of the largest internet companies debating some of the must fundamental ways in which we work, e.g., do we actually learn from data or do we engage in assault by data to defend preconceived positions (cf. argumentative theory). Like all of the conference, the discussion was under “frieNDA”. so I’m being intentionally vague on the specifics. But it was refreshing to see candid admission that all of us know and have experienced the dangers of manipulating an audience with data, and that there are no algorithms to enforce common sense and good faith.

This is a session that I would have enjoyed attending.

Though I’m not so interested in the question of whether data gets used to manipulate. Sometimes it does. Fine.

But what about when it doesn’t? What about when everyone is acting under best faith, being as careful as possible, explaining all their assumptions, agreeing on measurements, etc. Could it be that even in such scenarios, being driven by data can be dangerous, or even simple misleading, in that it often fails to recognize that not everything of interest or of salience can be captured in the data?

Now, I don’t want to come out as a full-blown anti-positivist, but I find that in today’s “big data” worshipping culture there is a tendency to reify the data itself, to believe that big data is both everything and the only thing.

And even if you’re not trying to consciously manipulate using the data, even if you’ve managed to take all our biases out of your analysis, there is still that danger, is there not?

With out breaking your frieNDA, can you say whether there was any sort of discussion along those lines? Or was everyone pretty much full on board, full steam ahead with big data as the end-all and be-all?

A point was raised not that different from you you raise in your own recent post — namely, that data-driven approaches work well for incremental progress in well-parameterized spaces (e.g., most web search improvement), but not so well for non-incremental changes, let alone new products.

But we agreed that all innovations should have associated hypotheses that they test quantitatively — it the bare minimum to keep the innovators honest.

Adam Mosseri of Facebook puts it well: design must be data-informed, not data-driven.

For anyone not on the O’Reilly list, there is an anyone-can-come series of unconferences called Barcamp. See http://en.wikipedia.org/wiki/BarCamp. The audience often include some very sharp and knowledgable people.

It’s not at all uncommon for someone to create and lead a session, only to find that there are attendees who are much bigger experts on the subject. The great thing is that the format is so unstructured that there’s nothing problematic about this at all.

The first Barcamp Boston that I went to had a very low budget and met in an elementary school. Nowdays there are corporate sponsors so they can afford better digs, e.g. the Stata Center at MIT.

namely, that data-driven approaches work well for incremental progress in well-parameterized spaces (e.g., most web search improvement), but not so well for non-incremental changes, let alone new products.

Yeah, I’ve been wondering about this for a few years now. Back at CIKM 2008, Ronny Kohavi gave a keynote talk on A/B testing, data driving iteration and improvement. And he was absolutely adamant that this sort of data driven methodology could not only handle the small increments (which I also believe), but that it could also handle large leaps. Most of what I’ve read from data evangelists mirrors this sentiment.

So I am frankly surprised to hear that there are other corners of the industry that are also expressing similar doubts.

Though yes, I do agree that innovations should have testable hypotheses. That was never in question. What was in question is whether you can have the same hypothesis for your old system A as for your new, huge leap, revolutionary changed system B. My feeling has always been that the more you change the system, the more you’re actually testing a different hypothesis. Which is fine — pick a different hypothesis. But don’t claim that you can use the same hypothesis to test a large leap.

Oh, interesting. I wonder if that current Foo 2011 consensus is a recent thing, arrived at gradually over the past few years, or if it has always been that way. Again, I’m only curious because the public statements I’ve seen from these web engine data evangelist types have not included this recent viewpoint. So I wonder if things are changing, or if they’ve been that way for a while but I just didn’t see them before.

It’s not the “be data driven” aka “use scientific method” sentiment that I’m reacting against. It’s the idea of “these two interfaces/tasks/user information needs are so different that they really require two different hypotheses in order to test them, because they’re really two different things. That awareness is what seems to be — at least according to your Foo report — new. Or at least the public expression of it is new. Again, like said, most of what I’ve read from the web major through leaders have typically echoed Kohavi, in that they *do* believe that you can take big leaps in user need, and test both the pre- and post-leap using the *same exact* hypothesis.

Heck, I wonder if that’s why exploratory search is (has been?) taking so long to get going, in general — that most folks are still using known item, [email protected] hypotheses to test exploratory HCIR interfaces. It might not be that users don’t have exploratory needs. It might be that too many people are still trying to scientifically, big-dataedly examine those needs in terms of [email protected] hypotheses.

Our discussion wasn’t entirely search-centric. But our consensus was pretty much the motherhood and apple-pie sentiment that a test should always include a testable — and measurable — hypothesis. But that for non-incremental changes to a system, you couldn’t just chase the gradient in the current parameter space.

And we also agreed that our objective functions are only models for what we hope to optimize, e.g., [email protected] as a surrogate for user happiness.

Oh, I’ll keep an eye out for such quotes. I’ve seen ’em every few months for the past few years in tech press articles. Couldn’t even begin to imagine how I would search for all of ’em right now. That would make an excellent exploratory search topic.

Nonetheless, people at Google feel that retooling to integrate the social element isn’t a luxury. It’s a necessity. As early as last August, I asked Gundotra whether he felt Emerald Sea was a bet-the-company project.

“I think so,” he replied. “I don’t know how you can look at it any other way.”

I’ve seen that keynote. And if you look at the “issue with controlled experiments”, it offers some moderation — in particular, concerns about newness effects. I’m not doubting the quote that is cut off, but at least I give Kohavi credit for being able to make balanced statements.

As for Google+, I’m reserving judgment until I receive an invite. I agree that it doesn’t look like an A/B test. 🙂

I like this quote toward the very end of the article on page 7: “We’re in this for the long run,” says Ben-Yair. “This isn’t like an experiment. We’re betting on this, so if obstacles arise, we’ll adapt.”

That could be read as “We are convinced that this will work, and we won’t let data undermine our faith. This isn’t like science where you test hypotheses. We know the answer, so if obstacles arise we’ll just find a different path to yes.”

OK, that’s a bit over the top, but there’s something very dangerous about saying “This isn’t like an experiment” — at least in the scientific sense of the word. I’m not a fan of faith-based engineering. They should be saying, “We think that if we do X, people will do Y. Let’s test this hypothesis. If we’re wrong, we’ll go back to the drawing board.”

I rewatched the video from about 17:00 to 21:15. I saw the bit on “newness”, but I don’t quite follow what it is that you’re seeing, what it is that is placating you.

I understand that change blindness, unfamiliarity, etc. means that you have to let something run longer in order to really see the effect.

But that doesn’t really speak to the “large leap requires large change in OEC, if not a completely different OEC altogether” issue. Rather, it’s related to something that he was saying on the previous slide, about an experiment in which load times increasing by 150-200ms causing a 1% drop in revenue. The issue is whether really was true as a permanent result, or if it was just a short term two week experimental truth. I.e. maybe 1% of people put off their purchase in that moment, during that two week experimental period. But a month later all those same people went back and bought what they were already going to buy anyway. In which case there might have been zero total revenue drop — it just might have been shifted past the boundary of the experimental time frame. But throughout that whole experiment, the OEC didn’t change, nor did Kohavi think that the OEC needed to change. Total revenue was the OEC.

That could be read as “We are convinced that this will work, and we won’t let data undermine our faith. This isn’t like science where you test hypotheses. We know the answer, so if obstacles arise we’ll just find a different path to yes.”

Actually, the way I read it was:

“We accept on faith the general domain of social-based search and information propagation. We’re not going to test that. We’re going to accept that as an axiom. However, as far as HOW people best interact with social search results, or with social data streams.. that we do not know. That we are going to develop hypotheses for and test and run controlled experiments on. But whatever social experiment we run, we’re still going to accept on faith that being social is better than not being social — or at least that being social needs to coexist with not being social.”

I actually approve of that mindset. Why? Because the alternative mindset, the one that I think that they’ve been operating under for the past decade, is that they won’t work on any domain or features until they see evidence of it in their logs. And that’s an eternal chicken-egg problem, if you know what I mean. Take query by humming, for example. Do people want to hum to find a song? Google goes to its logs, and looks. Nope, no humming happening here. So people must not want to query-hum.. Of course, the input box hasn’t actually allowed humming for most of the past ten years. See? Chicken-egg.

All I see them doing now is saying “Hey, we’re going to take it on faith that the egg exists. The only question now is how best to raise the chicken, from that egg. And for that, we’re still going to develop hypotheses and do experiments. But only because we’ve accepted the existence of the egg, without evidence.”

I know I haven’t had my second cup of coffee yet, but is it really the case that I’m attacking Google and you’re defending them? 🙂

Seriously, I understand where you’re coming from, and I mostly agree with your approach. I think you may be erring on the side of giving Ben-Yair too much benefit of the doubt, even as I could be accused of being overly literal.

Regardless, we all agree that you can’t predict how users might apply a new tool to satisfy future information needs simply be looking at how often they tried to meet similar past needs using an old tool. Cf. Henry Ford, faster horses.

I hope I’m never seen to be attacking Google.. only to be attacking Google’s ideas or what Google does. Similarly, I hope I’m never seen to be defending Google.. only to be defending Google’s ideas or what Google does.

It’s about the ideas, not about Google. Which is difficult to tease out, sometimes, because so many of the ideas are embodied in, and talked about in reference to, Google. Sigh.

But yes, we agree about Henry Ford / faster horse.

However, I am honestly surprised to hear such general agreement about that at Foo, at least under frieNDA. Maybe I’m talking to the wrong people, but the “taken on faith” belief that I’ve been hearing for the past decade from modern Web companies is that you *can* iteratively gradient-ascent your way into ever-better, constantly improving products. “We follow the user.. follow the user.. follow the user..” It’s a mantra.

Have you really not been hearing the dominance of that same mantra for the past decade?

And to invent the car is to not follow the user, because the user wants (what the user leads, for you to follow) is a better saddle for their horse.