Archives

Earlier this year, I discovered there wasn’t really a name for the thing I wanted to talk about. I wanted a word or phrase that includes journalism, social media, search engines, libraries, Wikipedia, and parts of academia, the idea of all these things as a system for knowledge and communication. But there is no such word. Nonetheless, this is an essay asking what all this stuff should do together.

What I see here is an ecosystem. There are narrow real-time feeds such as expertly curated Twitter accounts, and big general reference works like Wikipedia. There are armies of reporters working in their niches, but also colonies of computer scientists. There are curators both human and algorithmic. And I have no problem imagining that this ecosystem includes certain kinds of artists and artworks. Let’s say it includes all public acts and systems which come down to one person trying to tell another, “I didn’t just make this up. There’s something here of the world we share.”

I asked people what to call it. Some said “media.” That captures a lot of it, but I’m not really talking about the art or entertainment aspects of media. Also I wanted to include something of where ideas come from, something about discussions, collaborative investigation, and the generation of new knowledge. Other people said “information” but there is much more here than being informed. Information alone doesn’t make us care or act. It is part of, but only part of, what it means to connect to another human being at a distance. Someone else said “the fourth estate” and this is much closer, because it pulls in all the ideas around civic participation and public discourse and speaking truth to power, loads of stuff we generally file under “democracy.” But the fourth estate today means “the press” and what I want to talk about is broader than journalism.

I’m just going to call this the “digital public sphere”, building on Jürgen Habermas’ idea of a place for the discussion of shared concerns, public yet apart from the state. Maybe that’s not a great name — it’s a bit dry for my taste — but perhaps it’s the best that can be done in three words, and it’s already in use as a phrase to refer to many of the sorts of things I want to talk about. “Public sphere” captures something important, something about the societal goals of the system, and “digital” is a modifier that means we have to account for interactivity, networks, and computation. Taking inspiration from Michael Schudson’s essay “Six or seven things that news can do for democracy,” I want to ask what the digital public sphere can do for us. I think I see three broad categories, which are also three goals to keep in mind as we build our institutions and systems.

1. Information. It should be possible for people to find things out, whatever they want to know. Our institutions should help people organize to produce valuable new knowledge. And important information should automatically reach each person at just the right moment.

2. Empathy. The vast majority of people in the world, we will only know through media. We must strive to represent the “other” to each-other with compassion and reality. We can’t forget that there are people on the other end of the wire.

3. Collective action. What good is public deliberation if we can’t eventually come to a decision and act? But truly enabling the formation of broad agreement also requires that our information systems support conflict resolution. In this age of complex overlapping communities, this role spans everything from the local to the global.

Each of these is its own rich area, and each of these roles already cuts across many different forms and institutions of media.

Information
I’d like to live in a world where it’s cheap and easy for anyone to satisfy the following desires:

“I want to learn about X.”

“How do we know that about X?”

“What are the most interesting things we don’t know about X?”

“Please keep me informed about X.”

“I think we should know more about X.”

“I know something about X and want to tell others.”

These desires span everything from mundane queries (“what time does the store close?”) to complex questions of fact (“what will be the effects of global climate change?”) And they apply at all scales; I might have a burning desire to know how the city government is going to deal with bike lanes, or I might be curious about the sum total of humanity’s knowledge of breast cancer — everything we know today, plus all the good questions we can’t yet answer. Different institutions exist to address each of these needs in various ways. Libraries have historically served the need to answer specific questions, desires number #1 and #2, but search engines also do this. Journalism strives to keep people abreast of current events, the essence of #4. Academia has focused on how we know and what we don’t yet know, which is #2 and #3.

This list includes two functions related to the production of new knowledge, because it seems to me that the public information ecosystem should support people working together to become collectively smarter. That’s why I’ve included #5, which is something like casting a vote for an unanswered question, and #6, the peer-to-peer ability to provide an answer. These seem like key elements in the democratic production of knowledge, because the resources which can be devoted to investigating answers are limited. There will always be a finite number of people well placed to answer any particular question, whether those people are researchers, reporters, subject matter experts, or simply well-informed. I like to imagine that their collective output is dwarfed by human curiosity. So efficiency matters, and we need to find ways to aggregate the questions of a community, and route each question to the person or people best positioned to find out the answer.

In the context of professional journalism, this amounts to asking what unanswered questions are most pressing to the community served by a newsroom. One could devise systems of asking the audience (like Quora and StackExchange) or analyze search logs (ala Demand Media.) That newsrooms don’t frequently do these things is, I think, an artifact of industrial history — and an unfilled niche in the current ecosystem. Search engines know where the gaps between supply and demand lie, but they’re not in the business of researching new answers. Newsrooms can produce the supply, but they don’t have an understanding of the demand. Today, these two sides of the industry do not work together to close this loop. Some symbiotic hybrid of Google and The Associated Press might be an uncannily good system for answering civic questions.

When new information does become available, there’s the issue of timing and routing. This is #4 again, “please keep me informed.” Traditionally, journalism has answered the question “who should know when?” with “everyone everything as fast as possible” but this is ridiculous today. I really don’t want my phone to vibrate for every news article ever written, which is why only “important” stories generate alerts. But taste and specialization dictate different definitions of “important” for each person, and old answers delivered when I need them might be just as valuable as new information delivered hot and fresh. Google is far down this track with its thinking on knowing what I want before I search for it.

Empathy
There is no better way to show one person to another, across a distance, than the human story. These stories about other people may be informative, sure, but maybe their real purpose is to help us feel what it is like to be someone else. This is an old art; one journalist friend credits Homer with the last major innovation in the form.

But we also have to show whole groups to each other, a very “mass media” goal. If I’ve never met a Cambodian or hung out with a union organizer, I only know what I see in the media. How can and should entire communities, groups, cultures, races, interests or nations be represented?

A good journalist, anthropologist, or writer can live with a community for a while, observing and learning, then articulate generalizations. This is important and useful. It’s also wildly subjective. But then, so is empathy. Curation and amplification can also be empathetic processes: someone can direct attention to the genuine voices of a community. This “don’t speak, point” role has been articulated by Ethan Zuckerman and practiced by Andy Carvin.

But these are still at the level of individual stories. Who is representative? If I can only talk to five people, which five people should I know? Maybe a human story, no matter how effective, is just a single sample in the sense of a tiny part standing for the whole. Turning this notion around, making it personal, I come to an ideal: If I am to be seen as part of some group, then I want representations of that group to include me in some way. This is an argument that mass media coverage of a community should try to account for every person in that community. This is absurd in practical terms, but it can serve as a signpost, a core idea, something to aim for.

Fortunately, more inclusive representations are getting easier. Most profoundly, the widespread availability of peer-to-peer communication networks makes it easier than ever for a single member of a community to speak and be heard widely.

We also have data. We can compile the demographics of social movements, or conduct polls to find “public opinion.” We can learn a lot from the numbers that describe a particular population, which is why surveys and censuses persist. But data are terrible at producing the emotional response at the core of empathy. For most people, learning that 23% of the children in some state live in poverty lacks the gut-punch of a story about a child who goes hungry at the end of every month. In fact there is evidence that making someone think analytically about an issue actually makes them less compassionate.

The best reporting might combine human stories with broader data. I am impressed by CNN’s interactive exploration of American casualties in Iraq, which links mass visualization with photographs and stories about each individual. But that piece covers a comparatively small population, only a few thousand people. There are emerging techniques to understand much larger groups, such as by visualizing the data trails of online life, all of the personal information that we leave behind. We can visualize communities, using aggregate information to see the patterns of human association at all scales. I suspect that mass data visualization represents a fundamentally new way of understanding large groups, a way that is perhaps more inclusive than anecdotes yet richer than demographics. Also, visualization forces us into conversations about who exactly is a member of the community in question, because each person is either included in a particular visualization or not. Drawing such a hard boundary is often difficult, but it’s good to talk about the meanings of our labels.

And yet, for all this new technology, empathy remains a deeply human pursuit. Do we really want statistically unbiased samples of a community? My friend Quinn Norton says that journalism should “strive to show us our better selves.” Sometimes, what we need is brutal honesty. At other times, what we need is kindness and inspiration.

Collective action

What a difficult challenge advances in communication have become in recent decades. On the one hand they are definitely bringing us closer to each other, but are they really bringing us together?

I am sensitive to the idea of filter bubbles and concerns about the fragmentation of media, the worry that the personalization of information will create a series of insular and homogenous communities, but I cannot abide the implied nostalgia for the broadcast era. I do not see how one-size-fits-all media can ever serve a diverse and specialized society, and so: let a million micro-cultures bloom! But I do see a need for powerful unifying forces within the public sphere, because everything from keeping a park clean to tackling global climate change requires the agreement and cooperation of a community.

We have long had decision making systems at all scales — from the neighborhood to the United Nations — and these mechanisms span a range from very lightweight and informal to global and ritualized. In many cases decision-making is built upon voting, with some majority required to pass, such as 51% or 66%. But is a vicious, hard-fought 51% in a polarized society really the best we can do? And what about all the issues that we will not be voting on — that is to say, most of them?

Unfortunately, getting agreement among even very moderate numbers of people seems phenomenally difficult. People disagree about methods, but in a pluralistic society they often disagree even more strongly about goals. Sometimes presenting all sides with credible information is enough, but strongly held disagreements usually cannot be resolved by shared facts; experimental work shows that, in many circumstances, polarization deepens with more information. This is the painful truth that blows a hole in ideas like “informed public” and “deliberative democracy.”

Something else is needed here. I want to bring the field of conflict resolution into the digital public sphere. As a named pursuit with its own literature and community, this is a young subject, really only begun after World War II. I love the field, but it’s in its infancy; I think it’s safe to say that we really don’t know very much about how to help groups with incompatible values find acceptable common solutions. We know even less about how to do this in an online setting.

But we can say for sure that “moderator” is an important role in the digital public sphere. This is old-school internet culture, dating back to the pre-web Usenet days, and we have evolved very many tools for keeping online discussions well-ordered, from classic comment moderation to collaborative filtering, reputation systems, online polls, and various other tricks. At the edges, moderation turns into conflict resolution, and there are tools for this too. I’m particularly intrigued by visualizations that show where a community agrees or disagrees along multiple axes, because the conceptually similar process of “peace polls” has had some success in real-world conflict situations such as Northern Ireland. I bet we could also learn from the arduously evolved dispute resolution processes of Wikipedia.

It seems to me that the ideal of legitimate community decision making is consensus, 100% agreement. This is very difficult, another unreachable goal, but we could define a scale from 51% agreement to 100%, and say that the goal is “as consensus as possible” decision making, which would also be “as legitimate as possible.” With this sort of metric — and always remembering that the goal is to reach a decision on a collective action, not to make people agree for the sake of it — we could undertake a systematic study of online consensus formation. For any given community, for any given issue, how fragmented is the discourse? Do people with different opinions hang out in different places online? Can we document examples of successful and unsuccessful online consensus formation, as has been done in the offline case? What role do human moderators play, and how can well-designed social software contribute? How do the processes of online agreement and disagreement play out at different scales and under different circumstances? How we do know when the process has converged to a “good” answer, and when it has degraded into hegemony or groupthink? These are mostly unexplored questions. Fortunately, there’s a huge amount of related work to draw on: voting systems and public choice theory, social network analysis, cognitive psychology, information flow and media ecosystems, social software design, issues of identity and culture, language and semiotics, epistemology…

I would like conflict resolution to be an explicit goal of our media platforms and processes, because we cannot afford to be polarized and grid-locked while there are important collective problems to be solved. We may have lost the unifying narrative of the front page, but that narrative was neither comprehensive nor inclusive: it didn’t always address the problems of concern to me, nor did it ask me what I thought. Effective collective action, at all relevant scales, seems a better and more concrete goal than “shared narrative.” It is also an exceptionally hard problem — in some ways it is the problem of democracy itself — but there’s lots to try, and our public sphere must be designed to support this.

Why now?
I began writing this essay because I wanted to say something very simple: all of these things — journalism, search engines, Wikipedia, social media and the lot — have to work together to common ends. There is today no one profession which encompasses the entirety of the public sphere. Journalism used to be the primary bearer of these responsibilities — or perhaps that was a well-meaning illusion sprung from near monopolies on mass information distribution channels. Either way, that era is now approaching two decades gone. Now what we have is an ecosystem, and in true networked fashion there may not ever again be a central authority. From algorithm designers to dedicated curators to, yes, traditional on-the-scene pro journalists, a great many people in different fields now have a part in shaping the digital public sphere. I wanted try to understand what all of us are working toward. I hope that I have at least articulated goals that we can agree are important.

There are in fact no masses; there are only ways of seeing people as masses.–Raymond Williams

Who are the masses that the “mass media” speaks to? What can it mean to ask what “teachers” or “blacks” or “the people” of a country think? These words are all fiction, a shorthand which covers over our inability to understand large groups of unique individuals. Real people don’t move in homogeneous herds, nor can any one person be neatly assigned to a single category. Someone might view themselves simultaneously as the inhabitant of a town, a new parent, and an active amateur astronomer. Now multiply this by a million, and imagine trying to describe the overlapping patchwork of beliefs and allegiances.

But patterns of association leave digital traces. Blogs link to each other, we have “friends” and “followers” and “circles,” we share interesting tidbits on social networks, we write emails, and we read or buy things. We can visualize this data, and each type of visualization gives us a different answer to the question “what is a community?” This is different from the other ways we know how to describe groups. Anecdotes are tiny slices of life that may or may not be representative of the whole, while statistics are often so general as to obscure important distinctions. Visualizations are unique in being both universal and granular: they have detail at all levels, from the broadest patterns right down to individuals. Large scale visualizations of the commonalities between people are, potentially, a new way to represent and understand the public — that is, ourselves.

I’m going to go through the major types of community visualizations that I’ve seen, and then talk about what I’d like to do with them. Like most powerful technologies, large scale visualization is a capability that can also be used to oppress and to sell. But I imagine social ends, worthwhile ways of using visualization to understand the “public” not as we imagine it, but as something closer to how we really exist.

There is something extraordinarily rich in the intersection of computer science and journalism. It feels like there’s a nascent field in the making, tied to the rise of the internet. The last few years have seen calls for a new class of “programmer journalist” and the birth of a community of hacks and hackers. Meanwhile, several schools are now offering joint degrees. But we’ll need more than competent programmers in newsrooms. What are the key problems of computational journalism? What other fields can we draw upon for ideas and theory? For that matter, what is it?

I’d like to propose a working definition of computational journalism as the application of computer science to the problems of public information, knowledge, and belief, by practitioners who see their mission as outside of both commerce and government. This includes the journalistic mainstay of “reporting” — because information not published is information not known — but my definition is intentionally much broader than that. To succeed, this young discipline will need to draw heavily from social science, computer science, public communications, cognitive psychology and other fields, as well as the traditional values and practices of the journalism profession.

“Computational journalism” has no textbooks yet. In fact the term barely is barely recognized. The phrase seems to have emerged at Georgia Tech in 2006 or 2007. Nonetheless I feel like there are already important topics and key references.

Data journalism
Data journalism is obtaining, reporting on, curating and publishing data in the public interest. The practice is often more about spreadsheets than algorithms, so I’ll suggest that not all data journalism is “computational,” in the same way that a novel written on a word processor isn’t “computational.” But data journalism is interesting and important and dovetails with computational journalism in many ways.

The web is a linked system of human-readable documents. Now Tim Berners-Lee wants to create a web of machine-readable linked data. The full potential is unclear, but it’s a big idea that may come to be the backbone of semantic web visions. The New York Times, The Guardian, and others are experimenting with open data APIs.

Visualization
Big data requires powerful exploration and storytelling tools, and increasingly that means visualization. But there’s good visualization and bad visualization, and the field has advanced tremendously since Tufte wrote The Visual Display of Quantitative Information. There is lots of good science that is too little known, and many open problems here.

Tamara Munzner’s chapter on visualization is the essential primer. She puts visualization on rigorous perceptual footing, and discusses all the major categories of practice. Absolutely required reading for anyone who works with pictures of data.

Computational linguistics
Data is more than numbers. Given that the web is designed to be read by humans, it makes heavy use of human language. And then there are all the world’s books, and the archival recordings of millions of speeches and interviews. Computers are slowly getting better at dealing with language.

Reuters maintains the OpenCalais entity extraction service, which parses text to contextually determine who and what is referenced.

IBM’s Watson project built a question-answering system that reads reference books and wins at Jeopardy. Imagine how useful to journalists and curious readers this could be! This paper on the DeepQA system describes how they did it.

Communications technology and free speechCode is law. Because our communications systems use software, the underlying mathematics of communication lead to staggering political consequences — including whether or not it is possible for governments to verify online identity or remove things from the internet. The key topics here are networks, cryptography, and information theory.

Anonymity is deeply important to online free speech, and very hard. The Tor project is the outstanding leader in anonymity-related research.

Information theory is stunningly useful across almost every technical discipline. Pierce’s short textbook is the classic introduction, while Tom Schneider’s Information Theory Primer seems to be the best free online reference.

Tracking the spread of information (and misinformation)
What do we know about how information spreads through society? Very little. But one nice side effect of our increasingly digital public sphere is the ability to track such things, at least in principle.

Memetracker was (AFAIK) the first credible demonstration of whole-web information tracking, following quoted soundbites through blogs and mainstream news sites and everything in between. Zach Seward has cogent reflections on their findings.

The Truthy Project aims for automated detection of astro-turfing on Twitter. They specialize in covert political messaging, or as I like to call it, computational propaganda.

We badly need tools to help us determine the source of any given online “fact.” There are many existing techniques that could be applied to the problem, as I discussed in a previous post.

If we had information provenance tools that worked across a spectrum of media outlets and feed types (web, social media, etc.) it would be much cheaper to do the sort of information ecosystem studies that Pew and others occasionally undertake. This would lead to a much better understanding of who does original reporting.

Filtering and recommendation
With vastly more information than ever before available to us, attention becomes the scarcest resource. Algorithms are an essential tool in filtering the flood of information that reaches each person. (Social media networks also act as filters.)

The paper on preference networks by Turyen et. al. is probably as good an introduction as anything to the state of the art in recommendation engines, those algorithms that tell you what articles you might like to read or what movies you might like to watch.

Before Google News there was Columbia News Blaster, which incorporated a number of interesting algorithms such as multi-lingual article clustering, automatic summarization, and more as described in this paper by McKeown et. al.

Any digital journalism product which involves the audience to any degree — that should be all digital journalism products — is a piece of social software, well defined by Clay Shirky in his classic essay, “A Group Is Its Own Worst Enemy.” It’s also a “collective knowledge system” as articulated by Chris Dixon.

Measuring public knowledge
If journalism is about “informing the public” then we must consider what happens to stories after publication — this is the “last mile” problem in journalism. There is almost none of this happening in professional journalism today, aside from basic traffic analytics. The key question here is, how does journalism change ideas and action? Can we apply computers to help answer this question empirically?

UN Global Pulse is a serious attempt to create a real-time global monitoring system to detect humanitarian threats in crisis situations. They plan to do this by mining the “data exhaust” of entire societies — social media postings, online records, news reports, and whatever else they can get their hands on. Sounds like key technology for journalism.

Vox Civitas is an ambitious social media mining tool designed for journalists. Computational linguistics, visualization, and more.

Research agenda
I know of only one work which proposes a research agenda for computational journalism.

This paper presents a broad vision and is really a must-read. However, it deals almost exclusively with reporting, that is, finding new knowledge and making it public. I’d like to suggest that the following unsolved problems are also important:

Tracing the source of any particular “fact” found online, and generally tracking the spread and mutation of information.

Cheap metrics for the state of the public information ecosystem. How accurate is the web? How accurate is a particular source?

Techniques for mapping public knowledge. What is it that people actually know and believe? How polarized is a population? What is under-reported? What is well reported but poorly appreciated?

Information routing and timing: how can we route each story to the set of people who might be most concerned about it, or best in a position to act, at the moment when it will be most relevant to them?

This sort of attention to the health of the public information ecosystem as a whole, beyond just the traditional surfacing of new stories, seems essential to the project of making journalism work.

A recent study by World Public Opinion.org shows that the majority of the American population believed false things about basic national issues, right before the 2010 mid-term elections. I don’t know how to interpret this as anything other than a catastrophic failure of American journalism, in its most fundamental, clichéd, “inform the public” role.

The most damning section of the report (PDF) is titled “Evidence of Misinformation Among Voters.”

The poll found strong evidence that voters were substantially misinformed on many of the issues prominent in the election campaign, including the stimulus legislation, the healthcare reform law, TARP, the state of the economy, climate change, campaign contributions by the US Chamber of Commerce and President Obama’s birthplace. In particular, voters had perceptions about the expert opinion of economists and other scientists that were quite different from actual expert opinion.

This study also found that Fox viewers were significantly more misinformed than average on many issues, which is mostly how this survey was covered in the blogosphere and mainstream news outlets. I think this Fox thing is a terrible diversion from the core problem: the American press did not succeed in informing the public. Not even right before an election, not even on the narrow set of issues that, by survey, voters cared to base their votes on.

The travesty here is that the relevant facts were instantly available from primary sources, such as the Congressional Budget Office and the Intergovernmental Panel on Climate Change. I interpret this failure in the following way: for many kinds of issues, the web makes it easy to find true information. But it doesn’t solve the problem of making people go look. That, perhaps, is a key role for modern journalism. Unfortunately, modern American journalism seems to be very bad at it. I imagine the same problem exists in the journalism of many other countries.

What the study actually says
The study compares what voters think experts believe with what those experts actually believe. This is a bit tricky, and the study isn’t saying that the experts are necessarily right, but we’ll get to that. First, some example findings:

53% of voters thought that economists believe that Obama’s health care reform plan will increase the deficit, while 29% said that economists were evenly divided on this issue. Only 13% said correctly that a majority of economists think that health care reform will not increase the deficit. (The Congressional Budget Office estimates a net reduction in deficits of $143 billion over 2010-2019, and Boards of Trustees of the Medicare Fund also believe that the Affordable Care act will “postpone the exhaustion of … trust fund assets.”)

12% of voters thought that “most scientists believe” that climate change is not occurring, while 33% thought scientists were evenly divided on the issue. That’s 45% with an incorrect perception, as opposed to the 54% who said, correctly, that most scientists think climate change is occurring. (Aside from the IPCC reports and virtually every governmental study of the issue worldwide, an April 2010 survey of climate scientists showed that 97% believe that human-caused climate change is occurring.)

A fussy but necessary digression: all of this rests on the reliability of the WorldPublicOpinion.org survey results. The survey was conducted by Knowledge Networks, Inc. using an online response panel randomly selected from the US population. Those without internet access were apparently provided it for free. I have been unable to find any serious independent evaluation of Knowledge Networks’ methodology, but their many research papers on sample design certainly talk the talk. All of the basic sampling errors, such as self-selection and language bias (what about Hispanics?) are at least addressed on paper. The margin of error is reported as 3.9%.

So let’s take these survey results as accurate, for the moment. This means that the majority of the American public had an incorrect conception of expert opinion on the issues that they voted on. That’s a mouthful. It’s not the same as “believed false things,” and in fact asking “what do you think experts believe” deliberately dodges the tricky question of what is true. If there is some misperception of expert belief, then in the strictest terms the public is misinformed. The study addresses this point as follows:

In most cases we inquired about respondents’ views of expert opinion, as well as the respondents’ own views. While one may argue that a respondent who had a belief that is at odds with expert opinion is misinformed, in designing this study we took the position that some respondents may have had correct information about prevailing expert opinion but nonetheless came to a contrary conclusion, and thus should not be regarded as ‘misinformed.’

So this study does not say “the American public are wrong about the economy and climate change.” It says that they haven’t really looked into it. I’m all for questioning authority’s claim to truth — anyone who follows my work knows that I’m generally a fan of Wikipedia, for example — but I believe we must take lifelong study and rigorous methodology seriously. To put it another way: voting contrary to the opinions of economists may be a fine thing, but voting without any awareness of their work is just silly. Yet that seems to be exactly what happened in the last election.

The role of the press, then and now
Of course, voting is hard and stuff is complex, which is why we rely on the media to break it all down for us. The sad part is that economics and climate change are familiar ground for journalists. It’s not like the facts of these issues were not published in mainstream news outlets. For that matter, journalists were not even necessary here. Any citizen with a web browser could have found out exactly what the Affordable Care Act was predicted to do to the deficit. The Congressional Budget Office published their report and then blogged about it in plain language.

Maybe publishing the truth was never enough. Maybe journalism never actually “informed the public,” but merely created conditions where the curious could get themselves informed by diligently reading the news. But on big issues like whether a piece of national legislation will affect the deficit, we no longer need professionals to enable this kind of self-motivated discovery. The sources go direct in such cases, as the Congressional Budget Office did. And do we really expect that the social media sphere — that’s all of us — will remain silent about the next big global warming study? We’re all going to use Facebook etc. to share links to the next IPCC report when it comes out.

If the problem of having access to true information about these sorts of “votable issues” is solved by the web, what isn’t solved by the web is getting every voter to go look at least once. That might be a job for informed professionals at the helm of big media channels. This is a big responsibility for a news organization to try to take, but I don’t see how it’s anything but the corollary to the responsibility to only publish true information. Presumably some of that information is important enough to know, so consumers would probably appreciate the idea that your mission is to ensure they are informed.

I suspect that paper-based habits are holding journalism back here. There is a deeply ingrained newsroom emphasis on reporting only what’s “new.” A budget report only gets to be news once, even if what it says is relevant for years. But there are no “editions” online; the same headline can float on the hot topics list for as long as it’s relevant. There is even more reason to keep directing attention to an issue if people are actively discussing it, if it is greatly polarized, or if there’s a lot of spin around it (see: the rise of fact-check journalism). In any case, journalists have long been good at keeping an issue in the news, by advancing the story daily in one way or another. But first they have to know what the public doesn’t know.

So the burning question that the World Public Opinion study leaves me with is just this: why wasn’t it a news organization that commissioned this survey?

UPDATE: Debrouwere continues the conversation with a response to the key points here, in the comments to his original post.

Dutch journalist/coder Stijn Debrouwere has written a very thorough post describing the ways in which standard tags, like the ones on this blog or on Flickr, fall short when applied to news articles. There are lots of things we might like to know about a story, such as where and when it happened and who was involved. This additional information, sort of like the index to a book, is known as “metadata”, and there is within the online journalism community a great call for its use, including by Debrouwere:

Each story could function as part of a web of knowledge around a certain topic, but it doesn’t.

So here’s a well-intentioned idea you’ve heard before: journalists should start tagging. Jay Rosen insists that “Getting disciplined and strategic about tagging” may be one way professional journalism separates itself from the flood of cheap content online.” Tags can show how a news article relates to broader themes and topics. Just the ticket.

News metadata is a major topic, and many people have speculated deeply about the value of creating news metadata at the time of reporting, such as the ever-sarcastic Xark and the thoughtful Martin Belam who writes about why “linked data” is good for journalism. But I’m going to respond to Debrouwere because I read him today, because he has lovely diagrams that explain his good ideas, and because, in criticizing “tags” as a form of metadata, I think he misses some very important points.

And he’s not alone. My sense is that many of the coder-journalists of today have not learned from the mistakes of generations of technically-minded people who wished to talk about the world in more precise ways.

Moving forward from simple tagging, Debrouwere imagines more sophisticated annotation schemes that start to pick up on what the tags actually mean. For starters, the tags could be drawn from separate “vocabularies.” Does a tag refer to a person, or a place, or perhaps an event? Debrouwere uses the following picture, which I’m going to borrow here because it explains the idea so nicely:

I found this awesome little video exploring the idea of plugging in someone else’s brain for a while to see how they see the world.

I sort of feel like this is what I’m doing when I hang out with certain people, or when I read certain authors or watch certain films. It’s always exhilarating to step inside someone else’s exquisitely constructed universe. Communication excites me.

This is from TV Ontario’s YouTube channel — that would be in Canada, folks, and the purveyor of my childhood television. My mom used to direct shows for them. Glad to see they’re still doing the occasional interesting thing.

I was told in grade school that the giraffe’s neck evolved to be long because taller giraffes could reach more tasty tree leaves in times of drought. It’s a lovely example of natural selection, and also completely wrong, as I discovered when researching an edit to the Wikipedia article. Eventually, someone just went and checked: it turns out that during times of drought or food scarcity, giraffes eat from low bushes.

There is an important lesson here about what it means to “explain something.”

Rudyard Kipling wrote a children’s book of myths about the origins of animals titled Just-So Stories. In it he explains the origin of the elephant’s trunk, how the camel got his hump, and where the leopard’s spots came from (they were drawn by an Ethiopian from the leftover black of his own dark skin, so that the leopard would better blend into the background when they hunted zebra together.) Clearly, making sense is not the criterion for truth. It’s very easy to forget this, when someone gives you a complex explanation and you get that “aha! I understand” feeling. Human beings constantly confuse congruence with truth.

Sensible and false explanations are such a problem in science that the term “just-so story” has come to refer to any sort of explanation that fits the facts, but cannot be verified. Scientific theories are supposed to differ from literary criticism and other forms of creative writing by demanding explanations that are true. This means testing them against reality.

A crucial point here: you can’t test a theory against the same facts that you used to come up with the theory to begin with. Of course a theory is going to fit the facts that inspired it! Instead, a theory — an explanation of something — needs to predict things that haven’t been observed yet. Prediction is the essence of science; it is the ability to say what will happen before it happens that makes it possible to “design” a bicycle rather than just gluing random objects together until they roll. If our aim is to come up with a true theory about evolution, we need to use the length of the giraffe’s neck to make predictions about something else, something we can go check (repeatedly, if we are serious about testing the theory.)

This seemingly philosophical notion is incredibly useful for spotting subtle bullshit that sounds like science.

Consider, for example, the trial of a vitamin for preventing the common cold. Let’s say it’s even a controlled trial. One hundred volunteers are given Vitamin Z daily, while another hundred are (unknowingly) given a placebo. At the end of the study, the Vitamin Z group had the same number of colds. But, the researchers discover as they analyze the data, they had fewer headaches. Does this mean Vitamin Z prevents headaches? Not necessarily, because the theory “Vitamin Z prevents headaches” was formulated by noticing a pattern, any pattern, then making up a story about how that pattern came to be. That doesn’t make the story true. And there will always be patterns. If the volunteers can suffer from hundreds of different ailments, then by sheer dumb chance the Vitamin Z group will be found to suffer from less of at least one of them. (Applied to controlled experiments, this notion can be made mathematically precise, by the way. See post-hoc analysis.)

Put another way, if you keep turning over rocks you will eventually find something. The whole point of a theory — an explanation, a model, a statement of the causal relationships of reality — is to say what you will find before the rock is turned over. Otherwise you only have a story that fits the facts, a just-so story.

I have found just-so stories to be most common in alternative medicine, economics, and evolutionary explanations of human behavior. If nothing testable has been predicted, then nothing has been “explained.”

We dream the internet to be a great public meeting place where all the world’s cultures interact and learn from one another, but it is far less than that. We are separated from ourselves by language, culture and the normal tendency to seek out only what we already know. In reality the net is cliquish and insular. We each live in our own little corner, only dimly aware of the world of information just outside. In this the internet is no different from normal human life, where most people still die within a few kilometers of their birthplace. Nonetheless, we all know that there is something else out there: we have maps of the world. We do not have maps of the web.

I have met people who have never seen a world map. I once had a conversation with herders in the south Sahara who asked me if Canada was in Europe. As we talked I realized that the patriarch of the settlement couldn’t name more than half a dozen countries, and had no idea how long it might take to get to any of the ones he did know. He simply had no notion of how big the planet was. And to him, the world really is small: he lives in the desert, occasionally catches a ride to town for supplies, and will never leave the country in which he was born.

Online, we are all that man. Even the most global and sophisticated among us does not know the true scope of our informational world. Statistics on the “size” of the web are surprisingly hard to come by and even harder to grasp; learning that there are a trillion unique URLs is like being told that the land area of the Earth is 148 million square kilometers. We really have no idea what we’re missing, no visceral experience that teaches our ignorance.

Despite spending the last several days reading up on Treasury Secretary Geithner’s plan to buy bad bank assests, I now feel only marginally better prepared to judge whether this is a good idea or not. Of course, no one is asking me, but I still think it’s a big problem that I can’t evaluate this plan, because the fact that we live in a democracy means that citizens need to be able to understand what their government is doing.

Now, I am no economist and I have no idea how to run a bank — much less all the banks. However, I am smart, interested, and I’ve done my homework, including previously reading a first year economics textbook (covering both micro- and macro-economics) and several other interesting books (1,2,3) on how markets work or don’t. In short I have been the model of a concerned citizen, and I still have no idea what is going on. This is partially because the situation is very complex, but it is also because there is no way a private citizen can get access to the data that would clarify matters — large banks will barely share their balance sheets with the government, much less me.

This is a problem. It means that the government, financial, and academic communities have not paid nearly enough attention both to basic economics education, and to transparency in real-world business. It is therefore impossible for anyone else to check their assumptions and restrain their huge power. Lest this sounds like unhelpful complaining, I promise to make a concrete suggestion for improvement by the end of this post.

Wikileaks is often in the news, but for the wrong reasons. The web site provides a highly public outlet for “classified, censored, or otherwise restricted material of political, diplomatic, or ethical significance.” It is designed to be a journalistic tool for whistle-blowers and citizens of oppressive government and corporate regimes, a place of first and last resort for sensitive information from sources who need protection. It is a great irony, then, that an organization which specializes in censored information only makes the news when somebody violently objects.

I first stumbled upon Wikileaks about a year ago and have been watching it closely ever since. Despite its mission of openness, the site has a certain mystery about it: nowhere on the site are the principals publicly named. I was delighted, then, to attend a talk by two of the Wikileaks founders at the 25th Annual Chaos Communication Congress in Berlin. The 50-minute presentation was titled Wikileaks vs. The World, or “a talk about some conclusions observing Wikileaks.”

You may have heard about some of the things we’ve done in the media, but what you hear about tends to be what is frequently of greatest salacious interest to the Western media and to people in general. That doesn’t tend to be our everyday work.