Viacom, Google agree to mask 12TB of YouTube user data

As part of a $1 billion copyright infringement lawsuit, YouTube has to turn …

According to our Google/Viacom scoreboard, the Big G beat the Big V 3-2 in court earlier this month, but that still meant Google had to turn over a 12TB database of every YouTube video ever watched—complete with user IDs and IP addresses. The decision immediately raised privacy concerns, but Google and Viacom have now signed an agreement to anonymize the logging database before the handover.

According to a document filed yesterday with the court, both sides in the $1 billion copyright infringement case have agreed that the actual user data isn't so important after all. Viacom apparently wants to see just how popular allegedly infringing content was on YouTube, on the theory that YouTube largely owes its success to big budget (and infringing) fare like The Simpsons and The Colbert Report, rather than to clips of the often amusing interplay between cats and ferrets.

"When producing data from the Logging Database pursuant to the order," says the new agreement, "Defendants shall substitute values while preserving uniqueness for entries in the following fields: User ID, IP Address and Visitor ID." The protocol for actually making the change will be hashed out over the next week.

As part of the deal, both sides also agree not to object later in the trial on the grounds that the substituted values are still somehow "personally identifiable information." This might be a preemptive strike at any sort of "AOL argument" that would claim even substitute values could identify individuals. AOL faced the same issue when it released a set of search queries for research purposes. The IP addresses had been altered, but it turned out that news organizations were able to identify individuals just by looking through their search terms (the debacle resulted in several sackings and, eventually, a play).

On the official YouTube blog, the company said that it was "pleased to report that Viacom, MTV and other litigants have backed off their original demand for all users' viewing histories, and we will not be providing that information.... We remain committed to protecting your privacy and we'll continue to fight for your right to share and broadcast your work on YouTube."

One danger in hosting a corporate blog, though, is that people tend to leave comments; in this case, that means more complaints about the very existence of such complete logging data.

"Great!" wrote one poster. "So when are you going to let us OPT OUT of info collection? You dodged a bullet today, but what happens the next time? What happens if the government decides to start spying on us thanks to your data collecting?"