algorithms

Fortune media hound Mathew Ingram noted in May 2015, when Facebook’s Instant Articles format launched, that Big Blue saw it as “as a mutual exchange of goods, driven by the company’s desire to help publishers make their articles look as good as possible and reach more readers.” He went on to say:

But whenever you have an entity with the size and power of Facebook, even the simplest of arrangements becomes fraught with peril, and this is no exception. Why? Because a single player holds all of the cards in this particular game.

Around that time, Gawker’s Nick Denton, since brought low by a multimillion-dollar lawsuit loss you may have seen coverage about, went so far as to call the Facebook-publisher relationship not a distribution partnership but “abject surrender”:

So many media organizations are just playing to Facebook. They’re just catering to the preferences…expressed in some algorithm that nobody understands. It’s almost like we’re leaving offerings for some unpredictable machine god that may or may not bless us.

Almost a year after its launch, and a year’s worth of tweaks to the Instant Articles product, we have a more complete picture of the pros and cons.

Pros

Massive distribution open to many publishers
Following its closed launch with a limited amount of “partners,” including the New York Times and National Geographic, Facebook has opened the program to publishers big and small, in the U.S. and around the world, “giving every news organization the capability to publish their content on the social network,” according to Poynter.

WordPress plug-ins make it easier
After a rocky launch that required programmers to reformat every article especially for Facebook, the company was able to scale it to most new organizations through a WordPress plugin the company created, “essentially greasing the skids for mass adoption of the program among news organizations.” Per Poynter:

The plugin is being built in partnership with Automattic, the parent company of WordPress.com, and helps translate news stories to Facebook’s Instant Articles format. This removes a significant hurdle for news organizations.

New potential revenue streams
It’s no secret that magazines are continuing to fold and even digital-native sites can’t make the numbers work. We’ve also seen the rise of ad blockers and native/sponsored/branded content. Are content partnerships like these the answer, or at least an answer?

Cons

Only certain companies are seeing real benefits
BuzzFeed and Vox, to name two, are on board with the new format. Vox even hired media heavy hitter Choire Sicha to oversee its distributed partnerships (Facebook, Snapchat, Apple News and others, presumably). Per the WSJ, “Vox Media has long counted its own content platform as a key to its success. But now it says the future lies in platforms run by others, so it’s bringing in a digital media stalwart to help strengthen those ties.”

But others have yet to make hay from Facebook’s sunshine. As Fortune notes:

The media industry is in a “get big or go home” phase.

BuzzFeed and Vox are big, so they can play in Facebook’s Instant Articles world better than the smaller guys can.

It’s difficult (and costly) to track the audience
As AdAge reports, publishers have to pay more to track their audiences on distributed platforms. Yes, they get bigger distribution (theoretically, anyway), but ComScore apparently charges “$15,000, per platform, per year, to add tracking capabilities.” And six months post-launch, Apple News still doesn’t even have ComScore integration. This puts publishers in a tough position: In order to help their bottom lines, they want to reach the audience wherever the audience is, but doing so costs money they don’t have.

It’s not clear that publishers make money
Following on the point above, in the distributed content ad model, if you don’t know how much audience you have, you also don’t know how much revenue you stand to make. At this point, publishers are still crossing their fingers that this translates to revenue.

Jobs continue to be cut but not added back
Publishers are “re-allocating resources to build teams that produce content for specific social platforms,” per AdAge, but they’re cutting far, far more than they’re adding. Journalism is going through the kind of massive…transition, disruption, sea change, slaughter, whatever you want to call it, that is epic in scale. There are too many outlets that have closed up shop or gone through major layoffs to name. It’s especially chilling when digital-only publications like Mashable, IBT and Slant (just in the past couple of weeks) can’t even make the numbers work.

Now that I’ve met Dr. Hammond and heard him speak, I’m more a believer than ever that this is the future of journalism — and not just journalism, but all of media, education, healthcare, pharmaceutical, finance, on and on. Most folks at FoST seemed to be open to his message (it’s hard to disagree that translating big data into understandable stories probably is the future of storytelling, or at least part of it). But Hammond did admit that since the Wired story came out in which he was quoted as saying that in 15 years, 95 percent of news will be written by machines, most journos have approached him with pitchforks in hand.

I went in thinking that the two-year-old Narrative Science went hand-in-hand with Patch and Journatic in the automated-and-hyperlocal space, but I now think that Hammond’s goals, separate from these other companies, are grander and potentially more landscape-altering.

I know I sound like a fangurl, but I was truly that impressed with his vision for what his product can be, and what it will mean to the future of journalism. No, it can’t pick up the phone and call a source. It can’t interview a bystander. It can’t write a mood piece…yet. But they’re working on it.

With that, my top 10 quotes of the day from Dr. Hammond:

The first question we ask is not “What’s the data,” it’s “What’s the story?” Our first conversation with anyone doesn’t involve technology. Our first conversation starts, “What do you need to know, who needs to know it and how do they wanted it presented to them?”

Our journalists start with a story and drive back into the data, not drive forward into the data.

We have a machine that will look at a lot and bring it down to a little.

The technology affords a genuinely personal story.

It’s hard, as a business, to crack the nut of local. For example, Patch doesn’t have the data, but they’re the distribution channel. There’s what the technology affords and what the business affords…. We don’t want to be in the publication business.

Meta-journalists’ [his staff is one-third journalists and two-thirds programmers] job is to look at a situation, and map a constellation of possibilities. If we don’t understand it, we pull in domain experts.

The world of big data is a world that’s dying for good analysis. We will always have journalists and data analysts. What we’re doing is, we’re taking a skill set that we have tremendous respect for and expanding it into a whole new world.

The overall effort is to try to humanize the machine, but not to the point where it’s super-creepy. We will decide at some point that there’s data we have that we won’t use.

Bias at scale is a danger.

The government commitment to transparency falls short because only well-trained data journalists can make something of the data. I see our role as making it for everybody…. Let’s go beyond data transparency to insight transparency. It can’t be done at the data level, it can’t be done at the visualization level, it has to be done at the story level.

“Journatic’s approach — and the change it represents — is not going away. That means it’s important for journalism to find ethical, responsible and productive ways to integrate these approaches. To set benchmarks and guidelines for producing quality content using the kind of low-cost labor and mass production techniques that were long ago adopted in manufacturing. To find a better way forward.”

“You have to determine which stories can be written from afar, and which must be done by those with local knowledge. … The starting point is to establish policies, procedures, and standards to guide outsourced, mass production content operations [for] quality control.”

The Journatic fallout continues, and apparently the story has legs. On the heels of the controversy around it systematically faking bylines so its offshore labor could appear to be nearby to its clients (that is, local newspapers) and named “Jimmy” and “Ann,” one of its biggest clients, TribLocal, discovered plagiarism (from Patch, no less!) and suspended its use of Journatic indefinitely, saying:

“[Fake bylines and plagiarism] are the most egregious sins in journalism. We do not tolerate these acts at the Chicago Tribune under any circumstances, whether from a staff member or an outside supplier like Journatic.”

But Tribune Co. is actually also a Journatic investor, so that’s a bit of sticky wicket, innit?

Then one of Journatics’s high-ranking (and quite recently hired) editors, Mike Fourcher, quit, on the grounds that Journatic is attempting to “treat community news reporting the same way as data reporting”:

Inevitably, as you distribute reporting work to an increasingly remote team, you break traditional bonds of trust between writers and editors until they are implicitly discouraged from doing high quality work for the sake of increasing production efficiency and making more money.

Cutting through the noise, it sounds like he tried to argue for paying people more for better quality stuff, and Journatic’s owners balked.

As I have said, hyperlocal, algorithmic journalism at scale is such a tough area, and one that’s evolving all the time (actually, at a very quick rate, if you take the long view). But the Venn diagram of quality, quantity, turnaround time, local expertise, ease of assignment, keeping readers happy, keeping writers happy, keeping staff editors happy, data-mining technology costs, platform costs, actually making money — and, you know, not lying about any of it — it’s not an easy nut to crack, and that’s why no one’s done it yet.

My dabblings in this area at now-defunct Seed certainly didn’t pan out as planned. But nonetheless I agree with Fourcher, the ex-Journatic guy, on this:

Journatic’s core premise is sound: most data and raw information can be managed much more efficiently outside the traditional newsroom; and, in order for major market community news to be commercially viable, it needs be conducted on a broader scale than ever before.

For Journatic’s part, it released a statement saying: “We are in the process of conducting a thorough review of our policies, software, technology and personnel. We are immediately and forcefully addressing the issues we find and making changes where necessary. Until we have completed our review we will decline any further comment.”

So all of this being said, now that TribLocal is back in the hands of “real” journalists, what will happen? Will the quality of coverage be so amazing that readers demand it continue? Will they even notice? Will the cost of paying writers who can write well in the first place be less than Journatic’s current model of paying editors to correct the writing of non-native English speakers, then selling that as a third party to TribLocal and others? Will the other papers who use Journatic’s service (the Chicago Sun-Times, the San Francisco Chronicle, the Houston Chronicle) also balk amid the controversy? Will there be a resurgence in hiring actual journalists to cover local news?

All remains to be seen, of course. But it’s exciting, because at the very least this kerfuffle has people (lotsofthem!) talking about this, and publicly instead of in back room deals and investments about which local readers are unaware. The Fourth Estate is actually weighing in on a controversy, doing their jobs — reporting on it, ruffling feathers, making waves. And ultimately that is a very good thing for us all.

My argument with hyperlocal is that no one has yet figured out how to do it right. It sounds to me like Journatic is finding some success, but it’s also failing in important ways. My defense of algorithms is mostly to do with the company Narrative Science, which as I said is “not a threat, it’s a tool, and it fills a need.” That need is basically the scut work of news reporting, and although the folks there are working on this very issue, for now, “It’s a tool that does a programmatic task, but not a contextual one, as well as a human.”

Journatic aims to solve the hyperlocal problem with the algorithmic solution. The company scrapes databases of all kinds, then uses that data to “report” on local bowling scores, trash pickup times, where the cheapest gas is, and who has died recently. The company does this by using algorithms to mine and sort public information, and there’s nothing necessarily wrong with that.

When it launched, Journatic-populated site BlockShopper was basically a real-estate listings site based on publicly available data. Using public records, it would “report,” for example, that “123 Main St. is in foreclosure.” But since then, the algorithms and tools have gotten smarter. Soon it was able to say a home was in foreclosure “by the bank” and also add that it “is up for auction on March 31.” The site is now so smart that it actually feels almost invasive. To wit:

The real estate information contained in the article is publicly available, from the names of the people involved in the transaction to the price paid to the location details. The fascinating thing, and what pushes it into a brave new frontier of journalism and privacy invasion, though, is that the information on the professions of the involved is also publicly available (probably via LinkedIn). Arguably, all the article is doing is presenting public data in a new format. The difference is access and availability. In the pre-Internet days, there was no way to know public information except to go to the city records office and look, and there was really no way to know about peoples’ professions except to know them or ask them. These tasks required interested and motivated parties (such as journalists), because actually going places and talking to people requires on-the-ground reporting (not to mention complicit consent). This is not the sort of work Journatic traffics in. That’s not a criticism, necessarily, just a fact: There used to be barriers to the information; now there aren’t; Journatic uses this lack of barriers plus its algorithms to surface the data.

Journatic aims to solve the hyperlocal problem with the algorithmic solution.

At first, the company didn’t do any (or much) writing or analysis. According to This American Life and its whistle-blower, though, the company now pays non-native-English-speakers in the Philippines between $.35 and $.40 a story to try to add a bit of context to the data. Thirty-five to forty cents! However shady this is, though, it is not necessarily unethical. It’s capitalistic, and it’s pretty shameful, and it feels wrong somehow, but it’s not unethical journalistically.

Where it does get unethical is when readers are misled, and that has apparently occurred. They force these writers in the Philippines to use fake bylines like “Amy Anderson,” “Jimmy Finkel” and any number of fake bylines with the last name “Andrews,” in order to Americanize them and dupe readers, according to the show. This is flat-out wrong, and I think Journatic knew it — they apparently reversed their stance on this after the story aired.

But ethics aside, and journalism in broader context here, Journatic’s founder, Brian Timpone, claims that the “single reporter model” doesn’t work anymore. The Chicago Tribune, one of Journatic’s customers, says that it’s gotten three times more content for a lot less money. These are serious issues for the future of the profession (along with the opportunity for privacy invasion and privacy mishandling that all this unfiltered data presents). It’s no doubt true that the Trib paid less money for more content versus hiring local reporters. But what is the quality of the work? I think we all know the answer. Shouldn’t that be a bigger factor than it is? If you’re just turning out junk, your brand gets diluted, and your readers soon abandon you altogether.

It’s easy to criticize, but it seems to me that Timpone is trying, as we all are, to devise a way forward. That’s admirable, in its way. It’s a little scary, and the desire for progress sometimes makes us color outside of the lines, and when that happens, places like This American Life need to be there as a regulator, as has just happened. We’re all still muddling our way through the ever-changing new online media landscape, and we will test theories and make mistakes and learn lessons, and with any luck we will end up with a better product, one that serves readers first, last and always. I hope someone is able to someday crack the code of good news done quickly at good quality for a good wage. Until then, we must keep trying.

Wired‘s recent story about Narrative Science seems to have put some journalists into a bit of a tizz. The article is a must-read for journalists and coders — really interesting tidbits about what’s going on in this field now, and what might come to pass in the future.

I’m actually very excited about the possibilities of Narrative Science, an artificial intelligence product that transforms data (currently primarily from the sports and finance world) into stories. This is the exact kind of thing we’re after when we encourage J-Schools to put software engineering into journalism curricula so we can teach young journalists valuable new skills so they, in turn, can not end up helpless on the sidelines, as many of us current journos have been during the technology advances of the last decade.

The method does not determine the value

Narrative Science is not a threat, it’s a tool, and it fills a need. Instead of some capable writer poring over boring financial statements and trying to add sizzle in reporting on them, a machine reads the data and spits out two grafs. Two serviceable but really snoozy grafs, which probably would have happened if written by a human, too.

Here’s what’s intriguing, though: Narrative Science is working on ways to be not-snoozy, and in so doing they’re calling journalists on our BS, in a way. What I mean is this: Journalists have formulas. We do, and they’re taught in schools and learned on the job. “Reverse pyramid.” “Nut graf.” “Lede.” “Attribution.” These are plug-and-play tactics most of the time. Sure, these elements vary from story to story, and that is the fun part of what we do. We add details and context. We observe and report. But at core, we tell different stories using some slightly different combinations of these tactics and tools.

Arguably, feature stories have slightly more variety, but I’d also point out that (sadly) many features are also just puzzle pieces, if not downright parodies of themselves. For example, every feature on every female celebrity ever starts this way:

“[Lady celeb] walks into [L.A.’s or New York’s] [restaurant or cafe in trendy neighborhood] looking gorgeous in [brand] jeans and no makeup.”

Whether the editors or writers are making the words hacky, hacky they are — and boring, just like the pieces Narrative Science is creating with its algorithmic journalism. Fascinatingly, according to Wired, the company actually has “meta-writers” whose job it is to help the computers add context:

“[Meta-writers are] trained journalists who have built a set of templates. They work with the engineers to coach the computers to identify various ‘angles’ from the data. Who won the game? Was it a come-from-behind victory or a blowout? Did one player have a fantastic day at the plate? The algorithm considers context and information from other databases as well: Did a losing streak end?”

But to answer the question posed in the headline of the piece, “Can an Algorithm Write a Better News Story Than a Human Reporter?” for now the answer is no. And journalists vs. algorithms is a faulty comparison.

Writers and editors add value using tools

Narrative Science, thanks to algorithms created by human engineers and journalists, is now at the level of being able to programmatically spit out phrases like “whacking home runs.” But it can’t gauge a crowd’s restlessness or excitement. It can’t interview a superfan after the game, sense that he’s fed up with the team and write a mood piece. It can’t connect on a human level to a victim of a crime, or spend days following a subject then put together disparate threads of the subject’s life into a coherent portrait.

Which is why it’s not a real threat just yet. The way I see it:

Narrative Science : journalists : : spell-check : copy editors

It’s a tool that does a programmatic task, but not a contextual one, as well as a human. Does spell-check tell you you have the wrong “hear/here”? No. Does it correct you when you’ve spelled “embarrassing” incorrectly because it is drawing from an enormous database of correctly spelled words? Sure, easy enough. Can it check a fact’s accuracy against a thousand links on the Internet? Probably. But can it call a source and make sure she wasn’t misquoted, then correct the quote before publication? Not likely.

Context is everything, and it’s ours to use. But we journalists have to use it. Yes, we have formulas. We write ledes, and we edit the story so the most important information is up front. But we have to step up our game. We have to go to the match, or the crime scene, or the meeting, or the fashion show, or the foreign city, or the war, and add context for readers. We shouldn’t hack our way through the really interesting stuff — we shouldn’t be allowed to. Let’s let bottom-scrapers scrape the bottom for us. Let’s not waste human effort on shitty content farms that pay $2 (!) an article. Let’s leave that for robots and invest elsewhere: in hiring more and better writers and editors to make connections, describe the atmosphere, make sense of things, tease out themes and (cue dramatic music) better humanity. Let’s invest in creating data and algorithms that we can program to help us help ourselves.