The Harry Potter Watermarking Experiment April 8, 2012

As more users explore the magical world of Pottermore, J.K. Rowling’s site for all things Harry Potter, we are finding out that the EPUB e-book files it sells may be DRM-free, strictly speaking, but are not devoid of rights technology. Instead of encryption-based DRM, Pottermore is using a watermarking scheme that the Dutch vendor Booxtream markets as “social DRM.”

Users can purchase each Harry Potter e-book title once and download it up to eight times, in multiple formats. That’s a real convenience; it’s a “rights locker” scheme reminiscent of UltraViolet for movies. As I mentioned previously, the Kindle and Nook versions have DRM. The EPUB version that I downloaded is not DRM-protected; instead it contains two things: “This book is watermarked and was acquired by user ec107c00b9577436d6354e54cd9da5c9 on 31 March 2012″ on the copyright page, and various bits of data inserted invisibly into images and other places inside the book.

This data ought to be easy to remove without trace. The files appeared on torrent sites very shortly after the Pottermore Shop went live. A programmer with middling skills could write code that detects and removes the data; even if the illustrations in the book were a bit damaged, readers wouldn’t care. Such a hack for Booxtream doesn’t exist yet (at least publicly), but the irony is that if this scheme catches on with more authors and publishers, it surely will.

Such a program would be perfectly legal; it would not violate anticircumvention law such as DMCA 1201 in the United States. It would be what I call a “one-click hack,” like the (illegal) DeCSS rippers that hack the weak CSS encryption on DVDs, which the non-tech-savvy can easily use and which is permanent. In other words, it would impose the same level of effort on users as a format conversion tool, such as the free Calibre, which can (among other things) convert EPUB files to MOBI files for Kindles so that users who are in online colleges that offer free laptops can get DRM-free Harry Potter titles for their Kindles after all.

Furthermore, even though Section 1202 of the DMCA forbids removing “copyright management information” from files, the watermark does not qualify as copyright management information as defined in the law. This means that under U.S. copyright law, the user is free to apply such a hack.

Some would argue that watermarks are no different from weak DRMs (like CSS) in terms of the “speed bump factor” because both have one-click hacks available. But the fact that watermark removal tools are legal and DRM strippers aren’t makes a difference. DRM strippers must hide in the shadows, but watermark removal tools can exist out in the open. If they are available for free (which seems very likely), then it would be difficult to try to stop them through legal channels. I could even see a watermark removal feature built into a popular application like Calibre, since it’s free and open-source.

Pottermore’s Terms and Conditions forbid altering or removing the watermark data, but this may not mean much. It is possible that copyright law may prevail over such terms; this is a legal gray area.

The legal principle here is First Sale (Section 109 of the U.S. copyright law), known as “exhaustion” outside the U.S. This says that the publisher has no further control over a work once a person has obtained it lawfully. While this law enables libraries, used book/record/video stores, and other such institutions for physical goods, its applicability to digital files is unsettled — although as I said previously in connection with ReDigi, the digital music resale service, both media companies and digital retailers are highly motivated to ensure that Digital First Sale never happens. This Harry Potter case is yet another example of why.

(By the way, an update on ReDigi since I wrote about it last November: EMI sued the company back in January. The following month, the judge in the case denied EMI’s request for preliminary injunction, meaning that ReDigi can keep operating as the case goes to trial.)

This all leads me to question why Pottermore bothered with this watermarking scheme in the first place. It seems rather pointless.

I assume that “user ec107c00b9577436d6354e54cd9da5c9″ is an obfuscated version of my user account ID on Pottermore. I also expect that Booxtream lets the retailer use whatever character strings it wants. If Pottermore really wanted to discourage me from infringing the copyright on the e-book, it would put my email address, or even the number of the credit card I used to buy it (which was an option in the now-discontinued Microsoft Reader e-book technology). Even the vehemently anti-DRM publisher O’Reilly & Associates uses a watermarking scheme for its downloaded PDFs that puts the user’s real name on every page of the books.

Instead Pottermore, put a character string that means nothing to nontechnical users, presumably to avoid privacy complaints (which would also encourage hacking), and put it in a single place that most readers ignore. This “social DRM,” at least the way Pottermore has implemented it, is a shy and retiring beast. There is also a standard legalese copyright notice in the e-book, but no one pays any attention to those either.

Given that non-EPUB versions of the Harry Potter e-books have DRM, I suspect that Pottermore would have used DRM if it were possible to have a seamless user experience with EPUB files, as is the case within the Kindle and Nook ecosystems. (Pottermore could have chosen to do without DRM for those formats too, but it didn’t.) The lack of a standard DRM for EPUB integrated with EPUB reader apps makes such an experience unobtainable; hence Pottermore’s use of Booxtream instead of DRM. In other words, Pottermore is not against DRM, but it intentionally traded off the best possible user experience and respect for user privacy against some level of protection.

I fail to understand what behaviors Pottermore is trying to prevent here. Even a plain-language message to purchasers — which involves no technology and costs nothing to implement — would alert them to legal and contractual limitations on use. Instead, the current scheme, with its cryptic message, legalese, and hidden data, doesn’t really alert anyone to anything, let alone prevent anyone from doing anything. At best, it’s a “Gotcha!” for nontechnical users who upload files to places where Pottermore presumably pays Booxtream to look for watermarked files. Those aren’t the users whom Pottermore should be most interested in targeting, and if Booxtream does catch anyone and cause a nastygram to be sent, then backlash will ensue. And isn’t Pottermore trying to prevent backlash in the first place?

Retailers that pay for rights technology ought to get something for their money. Booxtream might be effective if used differently; otherwise I don’t see much benefit to Pottermore for this watermarking scheme.

Share this:

Like this:

Related

Wow Bill, this is the first time you seem completely miss the point. This is a very good move from a social media aspect and a major move for the biggest problem of all with digital e-commerce.

If you take on board that it is basically impossible to stop piracy, as we all know it is. Why burden your content with a feature consumers hate. Now I know this statement does not go for all forms of content, but books/music.. yes mostly..

The point here is that a move like this helps inform the people who really do appreciate the content that it has value, as they value there name and own reputation.

It plants the seed to the new generation who don;t pay for anything, that they SHOULD, especially if they value the content.

The biggest blunder the industry has made is to not lead by an informed and mutually beneficial argument with the consumer. They are basically treated like criminals, with the sellers of media expecting them to steal. As such, they have become exactly that, and the conchous of the consumer has no shame in doing so. They are being treated like shit after all.

This, in my mind, is the biggest challenge the industry has, and so far is doing very little about it.

This move is a step in that direction, and explains why the ID is nothing but a code, as anything else would again alienate the consumer.

I personally thing it is genius of them to do it this way.

A also personally thing it is amazing it has taken so long for the industry to realize this.

Did I not bring this topic up with you a year or so ago, in that online video sellers should also burn in ID info.. It would be an amazing technical challenge to burn it into every individual video sold. Probably not possible apart from adding it to the meta data, but Apple ate already do that. But its not the point. Its when the consumer can see it, and they get reminded every time they watch. “Yo dude, your name is burnt like a tattoo into this.. ” that it would have any real social effect.

An obfuscated code like “ec107c00b9577436d6354e54cd9da5c9,” appearing once on a page of a book that most readers ignore, fails to convey the message “Yo dude, your name is burnt like a tattoo into this” to the average mass-market, non-techie reader. Thus it also fails to create an “informed consumer.”

Perhaps you are confusing this with the type of session-based watermarking you see in the video world, such as in early-window VOD HD distribution. Those are intended to be intentionally invisible and to support after-the-fact enforcement. They are also virtually impossible to remove without destroying the content.

The scheme used in Harry Potter e-books is none of the above: the inserted data is intentionally visible and not hard to remove. They fail as both a “friendly reminder” and a way of enforcing copyright among knowing, rather than naive and unsuspecting, users.

When I was young, some people who bought books put “Ex Libris [your name here]” stickers to put in the inside covers. Booxtream can do this for publishers/retailers with EPUB e-books; they understand the value and effect of such things. They make it look like an e-book is something that you bought, that is your property, that should be taken care of in a certain way. Yet Pottermore didn’t do it this way. (And according to the law — and specifically the Pottermore Terms and Conditions — it isn’t really your property and you can’t do whatever you want (within copyright law) with it.)

If Pottermore really wanted to create “an informed and mutually beneficial [agreement] with the consumer” then they could have done better with the user’s actual (not obfuscated) Pottermore user ID or email address, along with a non-legalese explanation of the consumer’s rights and expectations rather than this overly cautious and potentially counterproductive approach. The only plausible rationale I can think of, as I implied above, is that if they elected to put real usernames or email addresses in the e-books, then the hacker community would be up in arms over privacy issues and that much more motivated to produce (entirely legal) watermark removal tools. That’s a reasonable explanation. But I still suspect the hacks to this scheme will not be long in coming if it catches on with more publishers and authors.

They should have started with a stronger scheme, then if they got moans and groans from users over privacy, they could have dialed it back to the obfuscated code and looked like heroes. Instead they will find it very hard indeed to strengthen their current scheme if and when it is deemed a waste of resources.

Bill,
I agree that the deterrence should be the main idea and for that, the traceability needs to be communicated. Apparently not done effectively here, since the first leaked copies seem to still contain that visible, obvious mark, which should be easy to remove.
Having a visible number in there though doesn’t exclude a more robust mark for the next step, when piracy evolves. Technically it’s possible to also mark text documents with robust watermarks that provide a good level of security against analysis, averaging attacks and degradation like re-printing. Though I don’t know what BooXtream implemented.

How can you create a robust watermark with reflowable text content? I don’t see how.

Booxstream puts data (that people have detected by simple observation) in images. Even if the images are damaged by watermark removal, it won’t matter (in this case the images are little illustrations at the start of each chapter). They also put extra spaces in between words or perhaps kerning between characters. An algorithm for removing these would be trivial.

All people really care about is the text and simple formatting such as paragraphs. What am I missing? How else could you watermark reflowable text?

Sorry Bill, I disagree. The fact the ID number is there should be enough. This, at the end of the day, is not so much about enforcement, tho it may be used in some example cases, as it is about placing a message on every page.
Using a mark that anyone can understand would alienate the consumer. Exactly what they need to avoid.
Saying people will not understand it as a identification mark is like saying kids don;t know about Sex until their parents tell them. In general, they know a lot more then you think, and I can tell you, they will know what a unique ID mark on the page is very quickly. And, if they don;t know what the ID mark is, that user is probably not smart enough to pirate either.

I think this is a smart move. The only issue here is its not good for those pushing complex DRM system in that mass market model.

Um, James, the mark does not appear on every page. It only appears once, on the copyright page, several pages before the table of contents. A place where most people never look. This is true in both the DRM-free EPUB and the Amazon versions.

You might want to try actually looking at the thing before commenting on it…

Well Bill, not on every page not so good but the social engineering aspect still holds true.
Here in Australia, pirate behavior is very high relative to most parts of the world. With the NBN (National broadband network) giving everyone fiber with 100megabit to the home. You can have a Media Collection, and be able to access it from anywhere and play the content in HD. This is/will be possible and as such will be used/done by consumers. (Imagine schoolies having a friend that is basically the media guy serving his curated collection to all his mates).
Streaming and convenience seems to be the only real alternative the big players have.
It is still basically trending towards.. Convincing the consumers to pay. I like to call it the Busking principle… The future of all forms of media. For busking to work well, the consumer has to respect you.

So social engineering towards that end.. in my opinion, is a good long term understanding.

Bill,
Ah, it’s reflowable text. That makes it more challenging to embed a robust mark indeed. I just downloaded a copy to understand the format. I agree that extra spaces and tabs can be reformatted, but there is probably still some redundancy left that could be exploited, like font formatting and similar characters or typos and synonyms that would survive ASCII transformation. Not sure what the best approach would be in this environment, but with 70,000+ words I’d think there would be some space.

I don’t think EPUB or MOBI (the Kindle format) gives you enough control over fonts to be able to manipulate them. I could perhaps imagine schemes involving typos — that would require a known plaintext attack, which would be harder than something that could be de-watermarked regardless of the text — but I also don’t see that technique passing muster with the author.

Reblogged this on Weleys Schadenfreud and commented:
Rosenblatt may have missed the point. Watermarking is a receipting mechanism that works in the digital and analog domains. DRM is a file format that is anti-consumer. Finally, beyond data, function can be embedded into data to enable use and other conditions beyond the inherent serialization and traceability of digital watermarks comprised of only data alone.

No. Not only are you using the terminology differently, but your comment isn’t accurate in any case. First, what you say applies to the kind of “watermarking” that’s used with audio or visual content. There is no such thing as the “analog domain” when it comes to EPUB e-books, though it does apply to PDF because that’s really more like a still image.

Furthermore, if you are going to embed functionality in watermarks, the only way to make sure that functionality actually works is to use some kind of DRM. See the ill-fated SDMI standard from 1999-2000 for one example of this.

I talked specifically to two highly respected copyright lawyers on “both sides” of this issue who are longtime colleagues of mine. I can’t name them because they gave me their views on background, other than to say that one is a high-profile litigator and the other is a government official. Various cases over the years have been testing the boundaries of copyright and contract, such as Vernor v. Autodesk. Capitol Records v. ReDigi will be next if it makes it to trial.

A year later, it’s not easy to find a epub watermark stripper, which suggests there’s not much demand for such a thing, and that maybe the legal fine print doesn’t matter to most people. As long as it’s easy to do the things that you want to do with an ebook…

The same Google search tells you that people find PDF watermarking to be obnoxious- there are all sorts of watermark stripping tools for PDF.

Of course there is no demand for EPUB watermark strippers. First of all, the vast majority of EPUBs from commercial publishers are DRM’ed and not watermarked. Secondly, few people know what “EPUB” means, and even fewer know what “watermark” means (in this context), as opposed to the probably much larger number of people who are familiar with “DRM.”

I would also note that O’Reilly has done a great job of keeping quiet about the fact that they watermark their PDFs (at least last time I looked).