Archives

Writing style needs to change to take advantage of the hyperlink. That’s the message I want to inject into the discussion about whether deep, long-form writing can survive online — especially long-form journalism. Manypeopleassert that articles need to be shorter online than they are in print, and Nic Carr even famously argues that the internet is making us stupid by destroying our attention span.

But I don’t think the web is shallow at all. I think it’s the deepest medium ever invented, with incredible potential for telling complex, irreducible stories — or at least it can be, if you don’t treat it like print.

Here’s how a long story works in print:

The yellow bits are the unique information that can only be found on that page. The other bits of writing are there to provide the context needed to make sense of the whole , or to get the reader up to speed if they don’t know the backstory. In print, this is absolutely necessary, because a print story is a self-contained object. There is no where else the reader can go to look up an unfamiliar term, to check out a sub-plot, or to investigate the history. That is why we have context paragraphs. That is why we have little definitions of terms that might be unfamiliar to some, and explanations of who a source is and why we should believe them.

Online is different. We can move the context, verification, and background out of the main story, paring the piece down to a thing of streamlined beauty — but all the depth is still there, via links, for anyone who wants it.

When I try to understand how the internet changes communication, one of the points I keep coming back to is the personal nature of online media: it’s now possible to present a different experience to every single reader (user? viewer?) Choosing whether or not to follow a link is a simple way for a reader to tailor the presentation to what they already know, and to indulge their own curiosity. And links let you skip the boring bits.

We’re used thinking of “an article” as a self-contained unit of story. It’s not. The component parts of a story might even be written at different times, published on different sites, or created by different authors.

And now our diagram looks like the web actually looks, lots of pages from different people about different aspects of a very complex world. That is the medium we write in, not some simulation of a stack of paper, no matter what your word processor shows you. Of course, the web also allows fully interactive stories, but it’s often forgotten that hypertext itself is an interactive medium — or it can be, when we put the right links in the right places. People have been experimenting with non-linear stories for decades, but given that a generation or two has now grown up with hypertext, it’s probably time to let our storytelling style grow up too.

Online, short doesn’t necessarily mean shallow. We’re just measuring “depth” wrong when we only look at a single article. That’s not how people actually consume the web, and we shouldn’t force them to.

There was a dream that the internet would mean the end of the media gatekeeper; that anyone could get their message out without having to get the attention and approval of the media powers that be. This turns out to be not quite the case.

I took data from the Project form Excellence in Journalism’s State of the News Media 2010 report to create this chart showing the market share of the top 20 news web sites. In theory, the internet busts media monopolies by allowing anyone to publish for free. And there’s no doubt it’s been disruptive. But according to data from Nielsen, the top 7% of 4600 news and information sites get 80% of traffic (from American viewers.) We see a big concentration of power, as the rapid falloff in the chart above shows, and much of it still belongs to “old media.”

Organizations such as CNN, Fox, the New York Times and USA Today rank in the top 20. But so do new media giants AOL, Google News, The Huffington Post and Yahoo.com, which is the biggest news site of all.

(It’s also interesting to note that many of the top 20 new media news sites produce little or none of their own news; in the extreme case Google News produces no stories at all of its own. While some see aggregation as parasitic, I think it’s obvious that it delivers a tremendously valuable service to readers.)

For better or worse, the ability to publish anything nearly for free hasn’t meant the end of big media monopolies. It’s simply shifted the landscape and the power balance.

The limiting factor to getting your message out is no longer having access to an expensive printing press or a TV station. It’s attention: how many minutes of time can you get from how many people? In this game, brand still matters hugely. There are only so many URLs a person can remember, only so many sites they can check in a day.

You have an audience, or you don’t. Mindshare is now the barrier to entry in the media world. Perhaps it always was, though I daresay it was easier to get viewers to check out your new television network when there were only 13 channels. Online, the number of channels is infinite for all intents and purposes; a single person will never exhaust them all.

Which is not to say that the internet has changed nothing. We have seen over and over that bottom-up effects can propel something to mass attention, with no big company behind them. This is often called “going viral,” but that’s not quite a broad enough description of the effect. In many cases, what happens is that something becomes just popular enough to get picked up by mainstream media, who then propel it into the spotlight.

And what this PEJ top 20 list doesn’t take into account is that people now get online news from lots and lots of sources other than news websites.

Facebook is now the most widely used news reading program. It’s also now the #1 site on the internet. Should it top this chart of news sources? Meanwhile, Twitter has become a primary news source for very many people. And then there are mobile news apps, some of which belong to old media news organizations and some of which don’t. The richness of news distribution systems today is well captured in another PEJ report on the “participatory news consumer.”

So has the internet made it easier to get non-mainstream messages out? I think the answer can only be yes. But don’t expect that anyone will be reading your alternative narratives just because you’ve put them online. Your best bet to to be heard still lies with a small number of very large companies. And although the internet per se is relatively uncensored in many countries, commercial gatekeepers like Apple and Facebook own important dedicated channels, and both of them engage in censorship (1, 2).

Leo Laporte of This Week in Tech gave a truly marvelous talk on Friday about how his online journalism model works. The first half of the talk is all about how TWIT moved from TV to podcasting and became profitable, and includes such gems as

Advertisers have been smoking the Google and Facebook crack. And they no longer want that shakeweed that the [TV] networks are offering.

The second half is in many ways even better, when Leo takes questions from the audience and discusses topics such as the future of printing news on dead trees

Maybe there will always be [paper] news, but it will be brought to you by your butler who has ironed it out carefully for you. It will be the realm of the rich person.

and the “holy calling” of being a journalist:

You reporters are really the monks of the information world. You labour in obscurity. You have to be driven by passion because you’re paid nothing. And you sleep on rocks.

He goes on to discuss the necessity of bidirectional communication, Twitter as the “emerging nervous system” of the net, etc. — all the standard new media stuff, but put very succinctly by someone who has deep experience in both old and new media. Very information-dense and enlightening!

Anything that’s hard to put into words is hard to put into Google. What are the right keywords if I want to learn about 18th century British aristocratic slang? What if I have a picture of someone and I want to know who it is? How to I tell Google to count the number of web pages that are written in Chinese?

We’ve all lived with Google for so long that most of us can’t even conceive of other methods of information retrieval. But as computer scientists and librarians will tell you, boolean keyword search is not the end-all. There are other classic search techniques, such as latent semantic analysis which tries to return results which are “conceptually similar” to the user’s query, even if the relevant documents don’t contain any of the search terms. I also believe that full-scale maps of the online world are important, I would like to know which web sites act as bridges between languages, and I want tools to track the source of statements made online. These sorts of applications might be a huge advance over keyword search, but large-scale search experiments are, at the moment, prohibitively expensive.

The problem is that the web is really big, and only a few companies have invested in the hardware and software required to index all of it. A full crawl of the web is expensive and valuable, and all of the companies who have one (Google, Yahoo, Bing, Ask, SEOmoz) have so far chosen to keep their databases private. Essentially, there is a natural monopoly here. We would like a thousand garage-scale search ventures to bloom in the best Silicon Valley tradition, but it’s just too expensive to get into the business.

DotBot is the only open web index project I am aware of. They are crawling the entire web and making the results available for download via BitTorrent, because

We believe the internet should be open to everyone. Currently, only a select few corporations have access to an index of the world wide web. Our intention is to change that.

Bravo! However, a web crawl is a truly enormous file. The first part of the DotBot index, with just 600,000 pages, clocks in at 3.2 gigabytes. Extrapolating to the more than 44 billion pages so far crawled, I estimate that they currently have 234 terabytes of data. At today’s storage technology prices of about $100 per terabyte, it would cost $24,000 just to store the file. Real-world use also requires backups, redundancy, and maintenance, all of which push data center costs to something closer to $1000 per terabyte. And this says nothing of trying to download a web crawl over the network — it turns out that sending hard drives in the mail is still the fastest and cheapest way to move big data.

Full web indices are just too big to play with casually; there will always be a very small number of them.

I think the solution to this is to turn web indices and other large quasi-public datasets into infrastructure: a few large companies collect the data and run the servers, other companies buy fine-grained access at market rates. We’ve had this model for years in the telecommunications industry, where big companies own the lines and lease access to anyone who is willing to pay.

The key to the whole proposition is a precise definition of access. Google’s keyword “access” is very narrow. Something like SQL queries would expand the space of expressible questions, but you still couldn’t run image comparison algorithms or do the computational linguistics processing necessary for true semantic search. The right way to extract the full potential of a database is to run arbitrary programs on it, and that means the data has to be local.

The only model for open search that works both technologically and financially is to store the web index on a cloud, let your users run their own software against it, and sell the compute cycles.

It is my hope that this is what DotBot is up to. The pieces are all in place already: Amazon and others sell cheap cloud-computing services, and the basic computer science of large-scale parallel data processing is now well understood. To be precise, I want an open search company that sells map-reduce access to their index. Map-reduce is a standard framework for breaking down large computational tasks into small pieces that can be distributed across hundreds or thousands of processors, and Google already uses it internally for all their own applications — but they don’t currently let anyone else run it on their data.

I really think there’s money to be made in providing open search infrastructure, because I really think there’s money to be made in better search. In fact I see an entire category of applications that hasn’t yet been explored outside of a few very well-funded labs (Google, Bellcore, the NSA): “information engineering,” the question of what you can do with all of the world’s data available for processing at high speed. Got an idea for better search? Want to ask new questions of the entire internet? Working on an investigative journalism story that requires specialized data-mining? Code the algorithm in map-reduce, and buy the compute time in tenth-of-a-second chunks on the web index cloud. Suddenly, experimentation is cheap — and anyone who can figure out something valuable to do with a web index can build a business out of it without massive prior investment.

The business landscape will change if web indices do become infrastructure. Most significantly, Google will lose its search monopoly. Competition will probably force them to open up access their web indices, and this is good. As Google knows, the world’s data is exceedingly valuable — too valuable to leave in the hands of a few large companies. There is an issue of public interest here. Fortunately, there is money to be made in selling open access. Just as energy drives change in physical systems, money drives changes in economic systems. I don’t know who is going to do it or when, but open search infrastructure is probably inevitable. If Google has any sense, they’ll enter the search infrastructure market long before they’re forced (say, before Yahoo and Bing do it first.)

Let me know when it happens. There are some things I want to do with the internet.

Oh Front Page, your days are clearly numbered. For generations all eyes were upon you; you set the public agenda, and advertisers loved you best. In the tumult of the world, your voice carried above all others, and we needed you. You told us when the war ended, and when The Beatles came to town.

But you are in your autumn now.

We know that your children killed you, though they did not mean it. In the age of the scribe, it seemed that anyone could own a printing press. But now, Front Page, we talk online about the monopoly you once claimed. Some will pine for newsprint, but paper is just too expensive, too heavy and static.

But this is not about paper. This is about the way you lived your life, your insistence on a space that you and you alone controlled. You tried to move online, Front Page, but your model would not yield and your children ate your lunch. Google News chooses from the best, while Digg lets us choose for ourselves. There will always be reporters — those who assemble the narratives — but there may not always be editors. Your stubborn insistence on one for all made us question your purpose.

We loved you and you ignored us! Advertisers deserted you first; they were very quick to understand that reader information could be leveraged into relevance. Google itself was built on this model. Meanwhile Amazon and iTunes grasped that efficiencies of delivery had moved the money to the infinite niche. But you admitted none of this, Front Page, and also you did not see that people live in networks, that our friends know what is important to us.

Why would you not give us what we wanted? No one questions your integrity, the standards of journalism you uphold. No one questions that we, the public, need to be told at least as much as we need to be listened to. But suddenly we could talk back, and you weren’t listening. You insisted that we go to you instead of just coming to us. Why did you not use our input to customize the agenda? You could have spawned Facebook applications and iPhone applications and even innovative social RSS readers that determined our interests and automatically delivered ten million personalized headlines! (And their ads.)

You had everything you needed, and this was your unforgivable sin. A hundred years ago you built the Associated Press to feed you, the prototype of distributed journalism. This could have been the beginning, if you had embraced more than the cream of international stories, if you had realized how cheap local reporting could be. Those long tail stories could be vastly cheaper, Front Page, if you embraced more sources, if you fought for transparency instead of access, if you taught citizens to be journalists instead of insisting that they can’t. You could have set the standards and franchised the platform. But instead of finding innovative ways to gather the news and innovative ways to deliver it to us, even now you fight hard to be seen less!

Instead of owning the aggregators and bringing to them the wisdom of an old hand, you scoffed at Digg, at Google, at Memeorandum. Why are there still so many news sites without a panel of “Share This” links beneath each story? Why are we not allowed to speak to the New York Times with user ratings buttons? Your mannerisms are quaint as hoop-skirts, Front Page.

We know also that your less reputable cousin is only slightly younger, and the world will never listen to Television as their parents did. The internet will devour Broadcast too; in only a few more years bandwidth will be cheap enough for anyone to run their own station. We know that upcoming content analysis algorithms will soon make video search a reality, and we know that the RSS future will soon disaggregate Television News just as it only recently disaggregated you.

Front Page, your children are brash, but they are filled with the energy of youth. They have inherited a world you never foresaw, and they are hopeful in a way you are not. It is their world now. You must guide them, but you must let them have it.

The Turkish Government censors internet access from within the country, as I discovered yesterday when attempting to access YouTube from the Turkish town of Selçuk, as this screenshot shows (click to enlarge):

The English text on this page reads: “Access to this web site is banned by ‘TELEKOMÜNİKASYON İLETİŞİM BAŞKANLIĞI’ according to the order of: Ankara 1. Sulh Ceza Mahkemesi, 05.05.2008 of 2008/402″

Just to complete the irony, I was looking for a video of the Oscar Grant shooting when I first discovered this “blocked site” page.

When you edit Wikipedia, what do you write about? Did you sit in the front row or the back row as a child? Did you grow up on science fiction, were you an activist in college? Did no one understand you, or have you always been perfectly normal? Tell me, because I want to know who’s in this conversation.

When television is good, nothing — not the theater, not the magazines or newspapers — nothing is better.

But when television is bad, nothing is worse. I invite you to sit down in front of your television set when your station goes on the air and stay there, for a day, without a book, without a magazine, without a newspaper, without a profit and loss sheet or a rating book to distract you. Keep your eyes glued to that set until the station signs off. I can assure you that what you will observe is a vast wasteland.

FCC Chairman Newton Minow gave this speech in 1961, decrying the state of the medium that many had hoped would bring new light to humanity. What is to say that the Internet will not sink into the same mediocrity?

There are differences, of course. The internet is (currently) very much an active, two-way medium; the internet is (currently) a very democratic place, where anyone can espouse their worldview to the whole world for only the effort of typing. And the internet is (currently) far too large and diverse to be effectively controlled by any particular corporate or goverment interest.

But I have a morbid interest in dystopia; and already I see signs that not everyone realizes what freedoms we could lose. Like bad science fiction, here are a few scenarios where the internet fails to live up to its almost obscene promise, where it becomes just another “vast wasteland.”

Iran, a country rich in history, culture, and education, supports a large online community, including perhaps the fourth largest ‘blogosphere’ in the world (or the second, third or seventh). Because the Iranian press is under the control of religious conservatives who sit above elected officials in Iran’s peculiar hybrid political system, and because that conservative control is used to silence dissent, Iranians who think differently go online to express their views. Here, the inherent freedom of the Internet (anonymity, decentralized control, etc.) allows the true minds of Iran’s youth, journalists, and intellectuals to be known publicly. In their blogs and online chats we see their rejection of the regime, its brutal paternalistic control, its enforcement of archaic sexual mores, its corruption and incompetence, and of the legitimacy of the Islamic Republic itself. The government, worried, has cracked down. Bloggers have been sent to jail, websites are being blocked, and user bandwidth is constricted, but the Internet continues to be one of the best hopes for homegrown democratic change in autocratic Iran. If you read Iranian blogs, it is clear that many Iranians want drastic social and political change.

The authors of the paper then do the homework to ask if this story is true. And it is true– but so is a story about social and religious conservatives using the internet, or a story about the many sites devoted to Persian poetry and literature. Part of the confusion here is that we have, in the West, our own story about what it means to be liberal, freedom-loving, democratic, as contrasted with closed, repressive, backwards. Our ideas about the social and political struggles of Iranians do not map neatly to reality.