Law & Disorder —

AP: tech coming to stop “wholesale theft” on ‘Net

It looks like the Associated Press is getting pretty close to deploying that ' …

Ever since the Associated Press warned in April that it is going to take steps against "misappropriation" of its content, Ars has been wondering what exactly those efforts will entail. After all, the press release wasn't exactly chock full of details; it simply disclosed that the AP will "develop a system to track content distributed online to determine if it is being legally used."

"We can no longer stand by and watch others walk off with our work under misguided legal theories," AP Chairman Dean Singleton declared around the same time. The statement followed last year's AP Digital Millennium Copyright Act takedown warnings against the Drudge Retort for posting AP content, which called some posts a "'hot news' misappropriation."

The guidelines cometh

So what exactly is AP going to do? While attending a Knight Center for Specialized Journalism conference, we put the question to AP news editor Ted Bridis, who spoke to the gathering of tech-savvy journalists and bloggers on Friday. Bridis explained that the news company is going to update its staff about its mysterious new misappropriation heat-seeking system soon via an internal webcast.

"The guidelines are coming," Bridis promised. "AP's main concern are not the bloggers that excerpt a relevant passage, and then derive some commentary. What happens an awful lot is just wholesale theft. So those are the ones that will find the cease and desist letters arriving."

OK, we said. How will you define "wholesale theft?" If somebody publishes a paragraph of AP copy with a link to the AP story, will that be theft?

"Not at all," Bridis replied. "I don't think AP would have any problem with that." We didn't want to give the impression that we were bargaining, but we pressed on as to exactly how one would disturb AP's comfort zone. Was this about not posting links?

No, Bridis replied. "What I'm talking about, and what has really riled up our internal copyright folks, are the bloggers who take, just paste an entire 800 word story into their blog. They don't even comment on it. And it happens way more than most people realize."

L'affair Cadenhead

Bridis called the reaction to last years' food fight with the Drudge Retort "distorted." That may come as news to supporters of the site, an anti-Drudge Report, and its publisher Rogers Cadenhead, who handled AP's DMCA takedown requests in June 2008. Protesting an AP story excerpt in a post about Hillary Clinton, an AP lawyer told him that "the use is not fair use simply because the work copied happened to be a news article and that the use is of the headline and the first few sentences only."

That was a misunderstanding of the concept, AP explained. The company "considers taking the headline and lede of a story without a proper license to be an infringement of its copyrights that additionally constitutes 'hot news' misappropriation." The Retort removed the item in question and some others. In his posts on the controversy, Cadenhead pointed out that the offending excerpts took as few as 33 words from AP articles, and no more than 79.

One wonders how much the blogs AP didn't like then resemble the blogs Bridis now says he thinks would be OK. History may play a role in the decision-making process. Robert Cox of the Media Bloggers Association, which helped Cadenhead, noted that, prior to this notorious case, Retort had indeed posted various AP articles in their entirety, which is what had first drawn the company's ire.

"AP is not on some wild rampage through the blogosphere, lawyering up to to go after every blogger who quotes an AP story in any way," Cox insisted. "Yet that is how this story has been portrayed, including by a lot of people who should know better but are having too much fun bashing AP."

Cadenhead was less sanguine about the future, even after he settled with AP. "If AP's guidelines end up like the ones they shared with me, we're headed for a Napster-style battle on the issue of fair use," he warned.

So flag me

So what's next? Here's Bridis' explanation of the new application AP plans to deploy.

"What we're doing is employing some technology, and the technology is not going to be looking for a paragraph," he disclosed. "The technology is going to be looking for the entire story that gets republished somewhere, and at that point it flags it. It doesn't do anything in an automated way, it's going to flag it for a lawyer or a paralegal to look at, and make a judgment on 'Well, is this OK? Is this a one-time offense?'"

OK. "Entire stories"—that's the problem?

"There are commercial websites, not even bloggers, necessarily," Bridis added, "that take some of our best AP stories, and rewrite them with a word or two here, and say 'the Associated Press has reported, the AP said, the AP said.' That's not fair. We pay our reporters. We set up the bureaus that are very expensive to run, and, you know, if they want to report what the AP is reporting they either need to buy the service or they need to staff their own bureaus."

We need the dough

Bridis did acknowledge the importance of fair use. "Because we do it too, necessarily," the AP news editor conceded. "If the New York Times has a story, we may take an element of it and attribute it to the Times and build a story around it."

The rest of the discussion covered familiar debate territory. If it weren't for journalists, Bridis noted, bloggers wouldn't have much material. And he graciously placed Ars on the journalist end of the equation. "You guys have original content, obviously," he said. "You should be very protective of it. It is valuable and worthwhile. You should zealously guard it."

Returning the complement, it should be mentioned that AP provides terrific coverage of the Federal Communications Commission, my usual beat around here. I'm willing to bet, however, that Ars isn't about to launch a search-and-maybe-threaten bot against the many bloggers who magnify the site with their commentaries on our posts (and which may even include a chunk or two of Ars content).

As for AP, though, bloggers may want to prepare themselves for what is coming, whatever exactly that is. "We're going to be learning more ourselves about exactly how the technology is going to work" in about two weeks, Bridis said.

But about this he is sure. "You can't just taken an entire AP wire feed or even an entire AP story, or even half of an AP story, necessarily, and republish it or repurpose it," he said. "We need the money. The industry is falling apart."

Matthew Lasar
Matt writes for Ars Technica about media/technology history, intellectual property, the FCC, or the Internet in general. He teaches United States history and politics at the University of California at Santa Cruz. Emailmatthew.lasar@arstechnica.com//Twitter@matthewlasar

30 Reader Comments

I'm willing to bet, however, that Ars isn't about to launch a search-and-maybe-threaten bot against the many bloggers who magnify the site with their commentaries on our posts (and which may even include a chunk or two of Ars content).

AP isn't doing this either, from what I got out of this article. It sounds like as long as they are cited, they're OK (for now).

Originally posted by idontunderstand: It sounds like as long as they are cited, they're OK (for now).

um i believe if you read the article, the guy specifically mentions its not ok

to continue my thoughts: if commercial sites ripoff AP content in a big enough way, thats obviously bad. But if a blogger does the same, it gets far less black and white. How much of the story was CP'd? Was there a link followthrough? was there original commentary, and how much? What was the blog about - is this a "My personal blog about random happenings" or "My opinions on goings-on around the world" or "News about Sports/Obama/Religion/etc brought to you conveniently"?

So many questions. IS AP really gonna go after noncommercial blogs, when the above questions' answers can be so subjective???

I have never noticed the kind of wholesale copying I think they're talking about. Does anybody have links or screenshots (other than the Drudge Retort) of what they're talking about? I just must not be visiting those sites.

I think it's a classic case of lawyers and tech. There *are* a lot of domain squatters, link farms, and so on who use entirely borrowed content to try to improve their search engine rankings.

The problem is that these folks are largely Latvians and Kazhakistanis who are trying to ekk out a couple of hundred dollars a month in ad clicks by writing homebrew software to lift content and repost it. C&D letters aren't going to do much good there.

Basically, anyone the AP is going to go after is either going to have the resources to do their own reporting, or at least factual aggregation, or they're going to be overseas small time operators.

Obviously the AP can do whatever it wants with its time and money, but it would probably make more sense to fix their business model. The whole competing with their customers thing is one of the more idiotic moves from a large company in recent memory, and now they're going down the road of believing their own distortions about how important content is. It's their foot to shoot, but it's a little sad to see.

I have noticed wholesale copying of articles and information... news articles (like say New Scientist) on occasion but technical HowTo's are copied very frequently without reference to the original authors.

I assume it is happening enough otherwise AP wouldn't be putting money into it. To me at this point it seems different than the music industry searching for someone to blame for their failures to understand the new market... but time will tell on that one.

You know I was initially opposed to this in its entirety. Now, I kind of think the AP has a point. Protecting the ability to profit from written works is 100% what copyright is for and the AP must protect their own work. I have seen articles lifted wholecloth on some sites. This article provides some interesting perspective.

The way I see it the AP needs to be very careful how they implement this but if they don't go overboard and really do only target wholesale theft of their copy then I don't see a problem. Thing is however if they get it wrong a few times and step on the toes of the blogosphere the repercussions could be more than they bargained for in the long run.

Originally posted by Grashnak:How exactly does it infringe on your free speech if they prevent you from wholesale copying of their text without attribution?

Because, if you read TFA, that's NOT what they have a history of getting all pissy about:

quote:

The company "considers taking the headline and lede of a story without a proper license to be an infringement of its copyrights that additionally constitutes 'hot news' misappropriation." ... In his posts on the controversy, Cadenhead pointed out that the offending excerpts took as few as 33 words from AP articles, and no more than 79.

In case you are keeping score at home, I just quoted ~50 words. Per the AP's position established in the quote above, I just committed copyright infringement. I don't think anyone here should be under any illusions that detecting things like what I just did will not be the ultimate goal of this "technology solution" the AP is proposing.

My only defense if accused in this case would be to hire an attorney and argue fair use in front of a court. Would I win? Probably, since I'm providing commentary expanding on the quote, but the cost of putting up a fight would be prohibitive for me (and probably for most folks), so it'd be safer for me to just not quote anything at all.

That, IMHO, is a corporate entity abusing it's rights in order to mount an attack on free speech.

EDIT: Happy 200th for me!EDIT 2: Stupid English language with negative modifiers changing the meaning of a whole sentence.

Doesn't matter if they attribute the text to AP, because you're supposed to be an AP partner (i.e. subscriber) to be able to publish their content.

2) headline and lede ...

Right or wrong, headlines are widely recognized as being the "identifier" of a story. Why more cheaters don't try to hide their plagiarism by simply writing a new headline, I don't know. Lead paragraphs are the nut of the story. Steal that and then write 300 of your own words and you've still committed theft.

3) re-reporting is done everyday by news organizations. They watch each other all the time. Newspaper newsrooms have TVs with CNN and local news on. Bloggers TiVo TV news etc. The key is to do it smartly and in a way that nets you some originality. Where you can't, you become a subscriber to AP or some other source + show proper attribution, which many non-professionals don't know how to do.

Bridis is trying to prevent original AP text from getting copied and/or regurgitated in a slightly new way and passed off as original reporting. I'm not saying the AP has necessarily done it the right way as in the Retort case, but I believe their ultimate goal is just and fair.

Are you seriously unaware that there are sites which do wholesale copies of other people's content and post it as if it was new content? It wouldn't surprise me to find that dozens of sites are running virtual AP news feeds without licensing the content. It wouldn't surprise me if there's more than one doing that with Ars articles.

I know I've found ripoffs of content from sites I created where the contents of an entire page had been copied and repurposed as "original" content on another site. If it's happened to the low profile sites I've worked on then I'm sure it happens to the AP. A lot of times it's a simple search engine play. If they can get indexed then they can run ads for a while.

I think we need to wait and see whether the AP only goes after the really egregious sites, as they claim they are going to, or if the lawyers get overzealous and start trying to shut down the entire blogosphere.

Originally posted by Grashnak:How exactly does it infringe on your free speech if they prevent you from wholesale copying of their text without attribution?

Because, if you read TFA, that's NOT what they have a history of getting all pissy about:

quote:

The company "considers taking the headline and lede of a story without a proper license to be an infringement of its copyrights that additionally constitutes 'hot news' misappropriation." ... In his posts on the controversy, Cadenhead pointed out that the offending excerpts took as few as 33 words from AP articles, and no more than 79.

In case you are keeping score at home, I just quoted ~50 words. Per the AP's position established in the quote above, I just committed copyright infringement. I don't think anyone here should be under any illusions that detecting things like what I just did will not be the ultimate goal of this "technology solution" the AP is proposing.

Not being a newsie, I suppose you might not understand the point. In journalism's history, the most important part of a story is the lede. It's called the inverted pyramid. You start with the absolute most important part of the story, and get as many details in as fast as you can in the first paragraph - the lede (yes that's spelled correctly). As the story progresses you get to less and less important information. The reason is the wire services did not know how much of a story could run at a given news outlet. So the idea is they could whack off the end of the story paragraph by paragraph until it fits, and they were removing the least important parts bit by bit.

So unlike much of the writing that goes on many blogs, not all mind you, that style of writing doesn't work. In fact, online it doesn't work at all. But The AP still is owned as a cooperative of newspapers and that's the primary client. Rewriting the story for online would be very complex. So they still go with the inverted pyramid. I bet they would have a lot less problem with quoting something from the middle of the story than taking the most important part - the nut - of the story wholesale.

That being an aside to my objection to the way The AP treats photographers and writers by forcing them to give up copyright of their work.

Originally posted by ewelch:Now, if The AP would stop appropriating freelance writers' and photographers' work with egregious rights grabs for the copyright to their work, well, that would be nice too.

Well if they don't like the terms they are given, they can always sell their works to other companies...

Good for AP. I've seen bloggers and pseudo-journalists steal entire articles from both AP and Reuters and always wondered how they got away with it. I guess the short answer is that neither news organization did anything about it. Fair use is fair use, but when you simply copy and paste and entire article into your blog or whatever--citation or no citation--you're walking on the wrong side of a very fine line, IMO.

Originally posted by ewelch:Now, if The AP would stop appropriating freelance writers' and photographers' work with egregious rights grabs for the copyright to their work, well, that would be nice too.

Well if they don't like the terms they are given, they can always sell their works to other companies...

??? A lot of times papers & services don't even ask - they just take. I think that this happens a lot, going both ways; just that with everything being on the internet, it's a lot easier to find out.

As for wholesale appropriation of content to fill blogs for link spam - I've seen a lot of it, especially when searching for specific info via blogs.google.com. Blogs that are either poorly abandoned or not very well locked down get taken over rather quickly.

Have you seen the character on SNL News who is supposed to be a blogger, and she says "bitch please!" all the time? The point of that character is that some bloggers are worthless. Some have nothing useful to add to the world, but they do it anyway because it takes so little effort, time, and money.

Simply quoting stories written and published by others has low value. If it simply helps spread an AP story to a different customer base, then AP needs to be paid for what they created.

Originally posted by whquaint:Have you seen the character on SNL News who is supposed to be a blogger, and she says "bitch please!" all the time? The point of that character is that some bloggers are worthless. Some have nothing useful to add to the world, but they do it anyway because it takes so little effort, time, and money.

Simply quoting stories written and published by others has low value. If it simply helps spread an AP story to a different customer base, then AP needs to be paid for what they created.

Are you saying expressing an opinion, however shallow, on a news item even if you cite the source will be illegal? Maybe you are saying you have to do this without any references to the story. Wow! That's a good way to kill the internet. Half the value of the blogs is the comments they generate. People won't be allowed to discuss anything.

Any other ideas coming down the pike? Like no one being allowed to wear the outfit combination you wear for the day? Or, how about you can't cook a meal you had at a restaurant and invite some friends over?

Originally posted by aiken_d:Obviously the AP can do whatever it wants with its time and money, but it would probably make more sense to fix their business model. The whole competing with their customers thing is one of the more idiotic moves from a large company in recent memory, and now they're going down the road of believing their own distortions about how important content is. It's their foot to shoot, but it's a little sad to see.

And this:

quote:

From the article:"We need the money. The industry is falling apart."

These are an issue, I think. While I do agree that reporters need to be paid in order to ensure quality news being put forth to the public, the real problem is that business models of the past have become obsolete what with the wide-spread almost instantaneous sharing of information through the Internet. AP has every right to defend its creations and gain temporary monopoly on the profits from such writing, but that is getting harder and harder to enforce in the ever-growing global society.

I do no think this proposed solution from AP will do anything to move them forward into adapting to the Internet news community to make their business model one that will last. I strongly think they need to start looking into other options as well in case this one fails miserably. What was that saying about all eggs in one basket. . .?

I don't want to see them go down the same road as other copyright holders.

If it weren't for journalists, Bridis noted, bloggers wouldn't have much material.

I'm not so sure about this. Ok, no doubt a lot of bloggers spend a lot of time commenting on reporting done by large news organizations, sure. But Bloggers vs. Journalists is a false dichotomy, and, I think, more dramatically so with each passing year. There seems to be an a priori assumption by some establishment journalists (tainted, I suspect, with more than a hint of professional arrogance) that bloggers are little more than parasites, dependent on mainstream journalists for their content, as opposed to "real" journalists, who provide all the original reporting (implying along the way that they would never dirty themselves by using stories originally broken by "mere" bloggers). Well, I submit to you the following article by Glenn Greenwald, with some documentation to the contrary. My point is, it's not so clear cut, and the old boys are gonna have to face that fact sooner or later:

The point about AP needing the money is no joke. Problem is, throwing technology and lawyers at the problem of copying content isn't going to help AP's balance sheets in any meaningful way. The internet is a great big copy machine, among other things. Maybe if we shut down the internet, everyone in the news/publishing/media business could go back to making large profits. Oh yes, no more TIVO too. Then we could re-fight the battles over home video recorders and cassette audio tapes.

Since none of that is going to happen, change is coming painfully swiftly. It's going to be most painful to those who don't recognize that their business model just shattered into small pieces, and they need to find another way forward.

What is that way forward? I don't know. Maybe nobody knows. Trying to find an answer to that question is probably much more important than going after a few (or even many) news bootleggers.

Originally posted by jackatwill:The point about AP needing the money is no joke. Problem is, throwing technology and lawyers at the problem of copying content isn't going to help AP's balance sheets in any meaningful way. The internet is a great big copy machine, among other things. Maybe if we shut down the internet, everyone in the news/publishing/media business could go back to making large profits. Oh yes, no more TIVO too. Then we could re-fight the battles over home video recorders and cassette audio tapes.

Since none of that is going to happen, change is coming painfully swiftly. It's going to be most painful to those who don't recognize that their business model just shattered into small pieces, and they need to find another way forward.

What is that way forward? I don't know. Maybe nobody knows. Trying to find an answer to that question is probably much more important than going after a few (or even many) news bootleggers.

Well put. You have identified the root of the problem, i.e. the business model itself. The answer may be a simple one though. But first you have to ask this question: What can AP do so that people would want to go straight to AP for their news, and not waste time reading their favourite blogs? It may require first class journalism (not the kind that just prints canned White house statements that end up getting us into a war) presented in a clean, organized and efficient way (without all the flash nonsense, and the advertisement) with a way to allow readers to express their thoughts and enter into interesting discussions.

I know, not so simple. But there must be a way to sift, rate and elevate these discussions into a useful hierarchy.

Usually yore articles are fine. This one was well written - content - however you should go back and re-read it for wording errors and a couple of redundant words. You had several moments where a sentence or paragraph made it hard to follow.