The Algorithmic Newsroom

I just came back from News Foo, an un-conference for technologists, academics and journalists in Phoenix on the future of news. The following post details my thoughts, heavily inspired by the conversations and sessions I had the privilege to be a part of.

There are a growing number of algorithms that are deciding what topics people’s attention should be given to. Algorithms are taking over the historical raison d’etre of news editors, generating top news lists, hot trends and personalized recommendations. Algorithms have the perception of being neutral, yet they encode political choices and have cultural values baked in. At a time when audience attention has become a scarce commodity, an algorithm’s ability to command user attention is true power within our media ecosystem. As curatorial power is handed over to automated systems, we must make sure that the public understands the biases at play and that product engineers are optimizing for the wanted outcome – an informed public – not just what generates traffic.

Human vs. Algorithm

An algorithm is a finite list of instructions that a machine performs in order to calculate a function. From simple counting operations to complex information sorting, a good algorithm is thought through and well defined to give the wanted output in the least computationally complex manner. Algorithms are extremely good at scale. They can be used to efficiently classify text from millions of documents within micro-seconds, extract images of a certain type, and identify complex correlations between multiple data points. Recommendation systems such as the ones used by Netflix and Amazon employ algorithms that learn about user preferences through their actions, and personalize the information presented for every user, an impossible task to be completed manually.

Algorithmically curated, personalized recommendations have become popular within digital media spaces. “Most read articles” modules are based on simple math: the top 10 articles in terms of page views. On the other hand, “hottest articles” lists are more ambiguous and vary based on what the organization defines as “hot”. Is it new content? Is it popular? Spiking? How far back is the data being compared? Are there white listed or blacklisted topics? Whats hot is an intuitive and very humane assessment of an ecosystem, yet a mathematically complex formula, if at all possible to reproduce.

Yet humans are still unbeatable for many types of tasks. Journalists and editors drive agendas, made up of qualities that are difficult to determine in a formula: trust, excitement, impression and intuition. Humans aren’t always rational, and may trust a source despite a bad reputation. The intuition that an experienced editor or journalist brings to the table could never be replaced by automated formulas.

Algorithmic Bias vs. Perception of Neutrality

As soon as digital information providers add any form of curation and recommendation mechanisms (a common practice within social network spaces), the technology loses its neutrality. In some ways, “Twitter’s trending topics algorithm acts like a lot of human news editors, who are more interested in the latest news rather than ongoing stories”, says Tarleton Gillespie of Cornell University. Values are coded into the way these systems make recommendations:

These are not Google, Apple, Amazon or Twitter conspiracies, but rather the unexpected consequences of algorithmic recommendations being misaligned with people’s value systems and expectations of how the technology should work. The larger the gap between people’s expectations and the algorithmic output, the more user trust will be violated. Liz Strauss eloquently describes why she quit Klout, feeling cheated by an algorithm that constantly changes under her feet. She wanted to trust the algorithm, even through initial doubts, but broke down and quit after multiple algorithm changes.

As designers and builders of these technologies, we need to strike a fine balance between making sure our users understand enough about the choices we encode into our algorithms, but not too much to enable them to game the system. People’s perception affects trust. And once trust is violated, it is incredibly difficult to gain back. There’s a misplaced faith in the algorithm, assuming that the algorithm should accurately represent what we think is true.

Ryan Rawson's tweet in response to claims that Twitter is censoring #OWS from trending

While it is clear for technologists that algorithms are biased, the general public perception is that of neutrality. Someone at News FOO brought up the famous Rumsfeld quote, adding that it is the unknown unknowns that we should be most worried about. When people don’t know that they don’t know how the algorithms that govern their interfaces work, they may get burned, angry and blame the technology.

Claire Diaz Ortiz leads social innovation at Twitter and is constantly managing the gap between people's expectation of its Trending Topics algorithm

The Augmented Journalist

We need to be thinking about hybrid approaches. On the news production side, how do we utilize algorithms for scale while using journalists and editors for compelling narratives and thoughtful judgement. Algorithmic Investigative Journalism may hold a treasure trove of possibilities for new types of stories, where journalists will use the output of a complex data query to feed their intuitions and draw conclusions from correlations in the data. Tom Lee at Sunlight Labs is doing an amazing job pushing projects that derive insight from big data, while Kris Hammond uses machines to write stories where automation is possible.

On the flip side, we need to make sure the general public has a better understanding of the algorithms at play, the algorithms that feed their attention, without giving away too much of the special sauce. We must come up with the right vocabulary to define editorial workflows, and work with engineers to code them into the algorithms. As danah boyd stressed during the session, it is important to be constantly thinking through what we’re optimizing for. The editor and journalist’s job is to inform the public. Is it possible to design and implement algorithms that optimize for an informed public? How do we even start to quantify a person’s level of “informed-ness”?

Pete Skomoroch raises a similar question. We need to strike the right balance between automated news personalization and curated, editorialized feeds. Advanced chess (or computer-assisted chess) is a relatively new form of chess, wherein each human player uses a computer chess program to help explore the possible results of candidate moves. The human players, despite this computer assistance, are still fully in control of what moves their “team” (of one human and one computer) make. What would the augmented journalist or editor look like? How can technology and algorithms be used effectively in the newsroom to inform both journalists and the general public?

The conversation should not be focused on humans vs. algorithms, but rather how we utilize algorithms to take our media ecosystem to the next level.

5 comments to The Algorithmic Newsroom

Hi Gilad – Great article and sounds like News Foo was v. interesting. The company I work for has being working in this space for a while. We launched Trendsmap.com a couple of years ago which analyses and displays real-time local Twitter trends. We just yesterday launched thewall.com and thewall.co.uk which take this a step further by clustering those trends into topics. We categorize and tag each topic and in the associated tweets, media and links. It does require some editorial work but we’re gradually automating as many parts as we can.

I can vouch for what you’re saying about decisions that you make – it can’t simply a ‘pure’ process because different types of events and stories have different characteristics. How you consider identify something as being important is based on judgements that you need to make then finetune around speed, volume and other factors.

If the trends that surfaced in these information engines is a reflection of the preferences of the engine’s average user. Are the creators of these engines at fault for not creating algorithms that pick up topics that are not being discussed with so much intensity, but may be qualified by some as more important or significant? If they design their technology to bring to the surfaced these “more important and significant,” topics that are valued by an intellectually sophisticated group, are they practicing bias?

Independently of the answers to these questions, these engines provide us tools to filter topics that are relevant to us.