How a single DMCA notice took down 1.45 million education blogs

Web hosting firm ServerBeach recently received a Digital Millennium Copyright Act (DMCA) violation notice from Pearson, the well-known educational publishing company. The notice pertained to Edublogs, which hosts 1.45 million education-related blogs with ServerBeach, and it focused on a single Edublogs page from 2007 that contained a questionnaire copyrighted by Pearson. ServerBeach informed Edublogs about the alleged violation, and Edublogs says it quickly took down the allegedly infringing content.

Instead of calling the matter settled, though, ServerBeach took Edublogs' servers offline last Wednesday, temporarily shutting off all 1.45 million blogs, according to Edublogs. ServerBeach confirms taking all of the Edublogs offline, telling Ars that the outage lasted for "roughly 60 minutes before we brought them back online and confirmed their compliance with the DMCA takedown request."

As you might expect, ServerBeach and Edublogs have slightly different accounts of how it all happened.

The lights go out

Edublogs pays $6,954.37 to ServerBeach each month for hosting, and it was delighted with the company's service—until last week. Edublogs founder and CEO James Farmer wrote that he was stunned at "how quickly and proactively ServerBeach responded to Pearson's lawyers, as opposed to how they deal with one of their better customers (we've been with them for years and years, ok we're no WordPress.com—another one of their customers—but $75k+ [per year] has to count for something right?)."

Farmer posted his complaint to his blog Wednesday, the day of the outage, and it started receiving wider attention today with an article in TechDirt. Although Edublogs as a whole is back online, the particular blog that kicked off this mess has been marked as spam and is unavailable.

With Edublogs being based in Australia and ServerBeach based in the US, the time difference led to some middle-of-the-night fireworks at Edublogs. "Basically our sysadmin and CTO watched, in horror, live as our Web servers were shut down one-by-one and then we spent the next hour e-mailing, calling, and generally freaking out (it was around 3am for me; they are in the US) and through that we were able to get back up," Farmer told Ars via e-mail today. "If they hadn't been there, and we hadn't done that, it [the shutdown] would have been indefinite!"

In his blog, Farmer explained that the infringing material from 2007 was a reprint of "Beck's Hopelessness Scale," a 20-item self-evaluation questionnaire published in 1974 which Pearson sells for $120. The teacher who wrote that blog post apparently uploaded the questionnaire as a file (still available in Google's cache) to Edublogs' servers, then included a link to the document as part of a blog post containing a lesson plan related to suicide and self-harm.

After Edublogs was informed of the problem by ServerBeach, the company "figured that whether or not we liked it Pearson were probably correct about it," Farmer wrote. Edublogs thus took the appropriate action to make sure "the content was no longer available, and informed ServerBeach."

However, ServerBeach noticed that Edublogs still had the file in its Web server cache, and so it pulled the entire site offline even though the file in question was no longer easily accessible to the public. The October 10 shutdown came, Farmer said, less than 12 hours after ServerBeach provided Edublogs with this DMCA notice:

Farmer later found out that ServerBeach had also contacted his firm 10 days previously through their automated system, but "needless to say it either wasn't sent or we didn't get it, but they figured that they'd just shut down our servers regardless without doing something simple, like calling any of the three numbers for us they have on file."

ServerBeach, in response to our questions, offers a different account.

The view from ServerBeach

ServerBeach told us that Edublogs was in fact aware of the first DMCA notice. "ServerBeach received the first DMCA notice for this alleged infringement on September 26th, 2012 which was resolved by the customer within 24 hours of notification," ServerBeach GM Dax Moreno said in an e-mailed statement. "ServerBeach received the second notice for the same alleged infringing content on October 8th, 2012 which was not resolved/responded to, so a second notification e-mail was sent October 9th, 2012 which was also not responded to by the customer."

ServerBeach said the additional notice on October 8 came "because the same alleged infringing content was once again made available on their system despite the fact that it had already been removed due to the prior notice." Farmer acknowledges that "the blog was taken down when we got the message but the file stayed in varnish cache" until it too was taken down after the second notice. ServerBeach further said that Edublogs uses "a failover system that allowed Web traffic to still reach the allegedly infringing material."

The company also confirmed that all of the blogs hosted by Edublogs were taken offline.

"Unfortunately, we have no control or insight into their website/application so this meant that all of their blogs were subsequently impacted when we disabled this portion of their solution," Moreno wrote.

ServerBeach offered more details on its inability to target certain webpages for deletion, saying that it "is a dedicated, DIY hosting service which means that our customers have/maintain direct control over their own websites, applications and databases. When faced with these types of situations, ServerBeach is limited to only working with the resources that it has control over (network level systems, physical server hardware, networking devices, etc.). This means we cannot access a customer's website, application, or database and actively make changes to it on their behalf."

"Ham-fisted"

Still, taking down entire servers containing a million and a half blogs over an alleged copyright violation on just one page was an overreaction, according to intellectual property attorney Evan Brown. He confirmed that DMCA rules don't require anything close to such a response—particularly when the customer was working to take down the infringing content itself.

"It's pretty hard to believe that a hosting provider would be quite this ham-fisted as to take an entire network offline over one piece of content," Brown told Ars via e-mail. "The DMCA certainly does not require such drastic measures. Quite the contrary, actually. The statute requires copyright owners to identify with some particularity the content alleged to infringe and for intermediaries to remove or disable access to that content. There's nothing in there requiring whole sites to be taken down over one piece of infringement."

Edublogs is not exactly a "rogue site" when it comes to copyright, either. Besides the fact that the Australian company took down the infringing content in this specific case, Edublogs has its own DMCA policy and has developed its own software to kill "splog," or spam blog content. At Edublogs, they "invariably get a bunch of e-mails every day complaining about copyright issues" and take down content when the complaints are legitimate, Farmer explained.

ServerBeach says it wants to make things right. In a comment posted on Farmer's blog, Moreno wrote on Thursday, "I am disappointed that we find ourselves in this situation with you since we’ve enjoyed a great relationship up until this point. I very much want to get us back on the path of customer goodness with you and I think I have some options to share with you that can do just that."

Promoted Comments

This sounds like they blocked part of the blog site but whatever part was blocked was needed for the correct operation of the rest. Thus they didn't exactly take "the entire network offline over one piece of content" but rather, took one part that had a domino effect.

Am I the only one reading it that way?

Yes, that was the impression that ServerBeach gave. I don't know the details of how Edublogs is set up, but the days that a web site consisted of plain html files on a disk, with the URLs simply being the directory path, are long gone. Large sites commonly translate URLs (even those without query string) in some way to a database lookup, and then generate the returned html on the fly. ServerBeach's argument seems to be that they don't know how the database is set up, or how URLs are translated to queries, so they cannot block specific pages, and had no choice but to block the script that performs the lookup for all pages.

That has a kernel of truth (ServerBeach can't muck with the database), but it seems likely that if Pearson gave a specific URL, they could just block that URL, even if they aren't sure whether different URLs resolve to the same content.

There has to be some kind of legal recourse that can be taken for this.

Assuming the DMCA request was valid than not really.

Probably incorrect. Edublogs could argue that there was a breach of contract depending on what their contract states on copyright infringement. Given the overreaction this might have some traction.

The important lesson here is that content hosts are so deathly afraid of losing their safe harbor that they tend to react swiftly and often without much forethought. I think part of the problem is that it is so easy to issue a takedown and the repercussions if you are wrong are minimal. On the other hand a host has much more to lose if they don't react relatively quickly. The system is highly skewed to the copyright owners. Overall, the DMCA needs to get looked at closely and largely rewritten, the chances of that happening are slim to none in the next few years. Maybe around 2017-18 when the Copyright Term Extension (Sonny Bono Act) ends we might see some discussion on the topic.

My question is why, if Edublogs has a DMCA policy, did Pearson contact ServerBeach in the first place? It seems like this situation could have been avoided if only Pearson had used Edublog's facilities first.

"My question is why, if Edublogs has a DMCA policy, did Pearson contact ServerBeach in the first place? It seems like this situation could have been avoided if only Pearson had used Edublog's facilities first."

I can answer this. We are a much smaller organization than Pearson, yet our content gets copied incessantly (by automated splogs, by people posting entire articles in forum threads, and by bloggers reposting entire articles, often without a link). Even if we receive a link, Google, especially since its Panda algorithm update, will often rank the reposts higher than us, even though its our content and it preceded the repost.

So we need to get the content taken down fast. Every other Friday we send out a bunch of DMCA notifications. There are three places you can notify: The website itself, the ISP, and Google (you can also notify an upstream backbone provider if the ISP ignores you, which virtually never happens).

We don't bother with Google. If Google takes it down it's still in Bing, etc., and we want to do as little work as possible. But the main reason is that Google sends our DMCA notifications to an outfit called Chilling Effects. The pages on Chilling Effects actually rank for our site names, pushing other pages of ours down the list.

We don't bother with notifying the site owner. It's harder to find their contact information, but mainly the problem is that they want to be your correspondent, engaging in self-taught discussions on copyright law. We don't want to chit-chat with these people. We don't want to educate them on copyright. We don't want to explain to them that something they read on a website written by an anonymous copyright pundit (even on Ars) about copyright law is not correct. We don't want to discuss "information wants to be free." We just want our content removed.

The ISP has lawyers. They know copyright law, from law school and from litigation experience. There is no wasted time. If our notification is legitimate, they get the content down fast, with no BS.

The individual infringer only sees it from his little world: once a year he gets a complaint, and it leads to a stimulating exchange of e-mails. The publisher's side is that he has to pay people to send out dozens or hundreds of notifications a month, and he doesn't have time to entertain and educate individual site owners.

Also, note that the DMCA only applies to online "service providers," and the sloppily drafted law does not define that. There is still some doubt as to whether an outfit like Edublogs falls under its terms. Service providers are required to pay the Copyright Office a fee and to supply them with contact information for their copyright agent. As you can see from the Copyright Office's list of copyright agents, Edublogs has not done this:

Another point is that Edublogs is not a U.S. corporate person. Although we have had little problem getting infringing content taken down by foreign ISPs (except those in China, unless we call them on the phone several times and get through to someone who speaks English), we would always send to a U.S.-based ISP over an overseas service provider if we had the choice.

This sounds like they blocked part of the blog site but whatever part was blocked was needed for the correct operation of the rest. Thus they didn't exactly take "the entire network offline over one piece of content" but rather, took one part that had a domino effect.

If I remember correctly, the DMCA has a section that states that nothing can be done against false removals and or losses that occur sure to uses if the DMCA.. This was a law bought and paid for by the RIAA and the MPAA to use as a sledge hammer without care whom gets hurt, and they made sure it had loop holes for situations like this.

This sounds like they blocked part of the blog site but whatever part was blocked was needed for the correct operation of the rest. Thus they didn't exactly take "the entire network offline over one piece of content" but rather, took one part that had a domino effect.

Am I the only one reading it that way?

I'm having trouble figuring just how the site was taken down.

When I read this it looks like they physically shut down the servers.

-- "Basically our sysadmin and CTO watched, in horror, live as our Web servers were-- shut down one-by-one and then we spent the next hour e-mailing, calling, and generally-- freaking out (it was around 3am for me; they are in the US)...

"a 20-item self-evaluation questionnaire published in 1974 which Pearson sells for $120."

Nearly 40 year old questionnaire and charges $6 per question. Sheesh, someone should make an open-source version of it.

That's pretty much the educational world. Everything "new" in education is already 5 years old in the real world and 10 years old if you talk to a geek. Pearson tries, but like all factory education corporations it guards its antiquated material like a rabid dog hoping to squeeze every underfunded school or teacher for every penny they have.

In the case of the server company, it would seem that they don't even pretend to advocate for their users, they just roll over for whomever has the most expensive lawyer. Welcome to the federally monitored internet...

If I remember correctly, the DMCA has a section that states that nothing can be done against false removals and or losses that occur sure to uses if the DMCA.. This was a law bought and paid for by the RIAA and the MPAA to use as a sledge hammer without care whom gets hurt, and they made sure it had loop holes for situations like this.

Its why so many complain about the DMCA

No, in fact there is an explicit clause that false claims can be taken to court. But there was no evidence of a false claim here, it was legitimate. The problem was that the claim was not handled how ServerBeach expected it to be, and thus the servers were taken down. So the real debate here is how ServerBeach handled the claim, not whether the claim was valid.

I'm guessing many or most of their customers in the educational field are from the US.

Yes, pretty much this. It's a massive site, with lots of countries well-represented amongst their users. Certainly lots of Canadian teachers I work with utilize it, as do lots of Americans. Combine that with the affordability of US hosting, makes perfect sense to me.

My question is why, if Edublogs has a DMCA policy, did Pearson contact ServerBeach in the first place? It seems like this situation could have been avoided if only Pearson had used Edublog's facilities first.

It is time that we as a society address the drawbacks of stringent copyright policy. For one, it encourages "milking" of existing content rather than creation of new one (ex: a survey from nearly 40 years ago).

That and automated DMCA programs ought to be banned - if something is so important as to justify a takedown, each case should be subject to manual review. I wonder how many DMCAs were taken down by bots that infringed by mistake? Under current laws, there's very little recourse for false removals or unjustified ones.

Finally, it is quite alarming to see a web-hosting company like this just "pull the plug" so quickly without further thought. I wonder if America's technological leadership will someday be hampered by this - perhaps someday people will start a serious debate on the costs and benefits of hosting for example or building a datacentre in the US due to the overly restrictive IP laws.

My question is why, if Edublogs has a DMCA policy, did Pearson contact ServerBeach in the first place? It seems like this situation could have been avoided if only Pearson had used Edublog's facilities first.

I was wondering this too. Surely the general DMCA rules mean that it's none of ServerBeach's business what is on those servers? Since EduBlogs have an explicit DMCA policy, shouldn't going to the host be what you do when the actual infringer refuses or otherwise doesn't comply? I would guess that the blog owner might not have had contact information, so then you go up to EduBlogs (who clearly have contact information) then you go to ServerBeach. Pearson decided to just go up to the top instead...

I'm guessing that DMCA takedown procedures don't outline that kind of process. They should, since it forces content owners to take reasonable steps, rather than take the easy way out.

Lastly, this is a good example of why copyright terms should be reduced. That questionnaire is almost 50 years old now. Never mind that it is probably really out of date, I think that 50 years is plenty of time to recoup the cost of making it (40 years at $120 is $480,000 if you only sell 100 a year). This is a good example of how current copyright goes against the original idea behind it, copyright is supposed to encourage innovation and production by protecting the works temporarily. Maybe if this questionnaire was made public domain 15 years ago, there would be a new, better, option on the market. A questionnaire that reflects 30 years of progress in the field of psychology...

It is time that we as a society address the drawbacks of stringent copyright policy. For one, it encourages "milking" of existing content rather than creation of new one (ex: a survey from nearly 40 years ago).

That and automated DMCA programs ought to be banned - if something is so important as to justify a takedown, each case should be subject to manual review. I wonder how many DMCAs were taken down by bots that infringed by mistake? Under current laws, there's very little recourse for false removals or unjustified ones.

Finally, it is quite alarming to see a web-hosting company like this just "pull the plug" so quickly without further thought. I wonder if America's technological leadership will someday be hampered by this - perhaps someday people will start a serious debate on the costs and benefits of hosting for example or building a datacentre in the US due to the overly restrictive IP laws.

It's actually already happening. The discussion, if not much action anyway.

Expanding it out to IP in general, the hostile environment for companies, especially smaller ones, is pushing people overseas. There are many developers moving to Europe because you aren't going to be sued into oblivion over generic features. New Zealand (where I live) is also a good option because software patents are actually invalid, and we [the developer community] are making sure it stays that way. We recently fought a change that would open up a massive loophole in the law. "as such" changes so much.

Many analysts, both inside and outside of America are commenting that the hostile IP environment is making it harder and harder to do business in the US. Again, contrasted to NZ which is ranked among the easiest countries in the world to do business. The lack of software patents hasn't really affected the vibrant software industry in Wellington.

There has to be some kind of legal recourse that can be taken for this.

Assuming the DMCA request was valid than not really.

Probably incorrect. Edublogs could argue that there was a breach of contract depending on what their contract states on copyright infringement. Given the overreaction this might have some traction.

The important lesson here is that content hosts are so deathly afraid of losing their safe harbor that they tend to react swiftly and often without much forethought. I think part of the problem is that it is so easy to issue a takedown and the repercussions if you are wrong are minimal. On the other hand a host has much more to lose if they don't react relatively quickly. The system is highly skewed to the copyright owners. Overall, the DMCA needs to get looked at closely and largely rewritten, the chances of that happening are slim to none in the next few years. Maybe around 2017-18 when the Copyright Term Extension (Sonny Bono Act) ends we might see some discussion on the topic.

Ridiculous. They found an error rate of 1:1,450,000 and that was enough to take down the server. I'm going to go out on a limb and say that I'd be OK with this if Pearson text books had the same level of perfection. Sadly, they've (speaking in general of the publishing industry) created a world where misspelled words and hideous grammar are not only acceptable, but they can charge $100+ per book for their awesome editors.

As an American, I really hate to say this, but: Were I starting a new web site, I'd seriously consider hosting it outside the US.

In the online world, one misapplied DMCA notice—especially if it shut down the whole site—could spell disaster for a fledgling business. Why take the chance?

Hosting it outside of the United State likely is no longer enough. You have to not only not use a .com or .net domain you also have to block users that live in the United States ( even more so if you accept our money ).

You can't eat your cake ( paid by our money ) and not follow the laws within the United States. I don't agree with what happen but the survey should have never been posted in the first place. While it wasn't the companies fault that a user illegal posted the survey, the file and its cache should have been removed, and based on the story it sounds like only one was deleted.

You also can't play games like Mega Upload did, where they delete a single link to a file, but apparently didn't delete the actual file ( or prevent the file from being downloaded ). All they had to do is prevent the file hash from being downloaded once told it was an illegal file.

This sounds like they blocked part of the blog site but whatever part was blocked was needed for the correct operation of the rest. Thus they didn't exactly take "the entire network offline over one piece of content" but rather, took one part that had a domino effect.

Am I the only one reading it that way?

Yes, that was the impression that ServerBeach gave. I don't know the details of how Edublogs is set up, but the days that a web site consisted of plain html files on a disk, with the URLs simply being the directory path, are long gone. Large sites commonly translate URLs (even those without query string) in some way to a database lookup, and then generate the returned html on the fly. ServerBeach's argument seems to be that they don't know how the database is set up, or how URLs are translated to queries, so they cannot block specific pages, and had no choice but to block the script that performs the lookup for all pages.

That has a kernel of truth (ServerBeach can't muck with the database), but it seems likely that if Pearson gave a specific URL, they could just block that URL, even if they aren't sure whether different URLs resolve to the same content.

-- "Basically our sysadmin and CTO watched, in horror, live as our Web servers were-- shut down one-by-one and then we spent the next hour e-mailing, calling, and generally-- freaking out (it was around 3am for me; they are in the US)...

As an American, I really hate to say this, but: Were I starting a new web site, I'd seriously consider hosting it outside the US.

In the online world, one misapplied DMCA notice—especially if it shut down the whole site—could spell disaster for a fledgling business. Why take the chance?

Hosting it outside of the United State likely is no longer enough. You have to not only not use a .com or .net domain you also have to block users that live in the United States ( even more so if you accept our money ).

You can't eat your cake ( paid by our money ) and not follow the laws within the United States. I don't agree with what happen but the survey should have never been posted in the first place. While it wasn't the companies fault that a user illegal posted the survey, the file and its cache should have been removed, and based on the story it sounds like only one was deleted.

Well, you can still have a .com address, and just have it 301 over to your real site. Using the gTLDs of .com and .org only matters if the Fed gets involved, which so far has happened mostly with shady websites (not necessarily illegal, but definitely skirting the limits a little close).

You don't need to block US users, and can take their money, although I would use both a US based payment processor and a non-US based one so banks don't "voluntarily" cut off access to my site a la Wikileaks.

The whole premise of the Internet is that there is no law, outside of the laws extended by technological implementation. And indeed, this premise can be fully executed by hosting your site on Tor, at the risk of losing respectability and userbase.

As an American, I really hate to say this, but: Were I starting a new web site, I'd seriously consider hosting it outside the US.

In the online world, one misapplied DMCA notice—especially if it shut down the whole site—could spell disaster for a fledgling business. Why take the chance?

I second that one.If I was going to go and find Hosting for the two personal Domains I have in this Present Day I would want to do the same thing.I am getting more and more angry and fedup at this damn Government and the lame Laws we have to deal with.Unreal this tale on how an A-Hole Corporation that exists by preying on Students gets to sell a 1974 questionaire for $120.Education needs to provide Open Source Cheaper Materials and tell these ripoff people to go where the sun don't shine.

I guess I take the ISP's side here. My company operates websites, and hosts them on an ISP. We get copyright complaints we have to deal with ourselves, and we make many DMCA complaints to U.S.-based ISPs (as well as non-DMCA complaints to overseas ISPs). A few points:

-- Invariably ISPs have their own terms of service which are stricter than what the DMCA specifies. These terms are part of the contract with their customer. You do need to read those. The DMCA lets you prevail in court if you are sued for copyright infringement. The goal of ISPs is not to prevail in court, but rather to avoid being sued at all. That requires more aggressive action than required by the DMCA.

-- ISPs are swamped with DMCA notifications daily. No, they are not going to call you on the phone. This is all automated. The DMCA notification is forwarded to the perp, the DMCA has specific URLs listed, and the URLs either are down the next day or they aren't. If they aren't, you are offline until they are. If image or PDF URLs, rather than page URLs, are complained of, even if the linking page is down, the ISP will have no way to know that, and it really doesn't matter anyway from a legal standpoint since they could still be sued.

-- In dealing with DMCA notifications, you need to get the material out of your cache, and you need to get it out of Google and the Internet Archive also. This is a skill that any publisher needs to have. You get it out of your cache by flushing the cache. The ISP has no idea how to do this. We use all kinds of caching, custom-coded caching for some webapps, caching built into PHP frameworks, caching from WordPress plug-ins, and so on. That is not the ISP's problem. You need to take care of that. As far as getting it out of Google, you need to ask for removal via the Webmaster Tools interface. For Internet Archive you need to modify your robots.txt file. This you need to keep indefinitely, so most savvy web masters just block the Internet Archive from the beginning. For Google you can specify no-cache and prevent Google-side caching from the beginning, which I recommend.