Deduplication and client-side encryption (without vendor access to the key) is not mutually exclusive. Wuala is the best example, it works beautifully. I'm a big fan of it.

For something particular to audio libraries, I'd think that deduplication across accounts would be extremely useful to the service provider. Especially if audio containers are cracked and audio data treated separately from embedded tags/artwork, with bonus points for taking offset matching into account.

Now, if you're saying that dedupe across accounts w/ client-side encryption is doable, then I'm curious how it'd be implemented.

Especially if audio containers are cracked and audio data treated separately from embedded tags/artwork, with bonus points for taking offset matching into account.

Matching offsets that are not multiples of the audio frame format is probably not going to work. But most rips are going to have a few common offsets, matching the most distributed drive manufacturers. And then there's the fact that AccurateRip and the Drive Offset DB belong to spoon...

Now, if you're saying that dedupe across accounts w/ client-side encryption is doable, then I'm curious how it'd be implemented.

Wuala implements this by using (symmetric) encryption keys derived from content hashes. If you don't know the exact content, you don't know the key. If you do know the content, you can recreate the key, but only to decrypt what you already know.

It is possible for a server to deduplicate encrypted files of unknown content, when identity information is being preserved across encrypted entities, like described. The only security feature, which is sacrificed, is plausible deniability (but no additional plain-text-attack vectors, ect.). You cannot deny to be in possession of a file, if a prosecutor is in possession of an identical, decrypted copy. For pirates this might be an issue.

In this specific case, an audio-only service, not much could be gained by a decrypt-by-content-hash-scheme. Since many audio files are identical across individual collections, the information which songs you own exactly, could be recovered by a fair share. And there isn't really much more to hide than exactly that information in a music collection, when you use an audio-only backup service. Wuala is general purpose, and thus a different case. Your diary entries, your family pictures are unique files, and their content hashes cannot be correlated with known content. Thus they are perfectly private. Other files, like Windows installation images are identical across thousands of users and can be stored as a single copy. To achieve plausible deniability in a case like that, it is sufficient to zip a popular file and include a random text file or directory entry. To achieve plausible deniability for your music collection, it is sufficient to have custom tags or individual codec (padding, ect.) settings. For Wuala it is impossible to deduplicate them in this case. A service like spoon's could deduplicate anyway, of course, when it generates an individual track identifier on the client machine, but I wouldn't see any benefit of an additional encryption scheme in that a case.

Wuala implements this by using (symmetric) encryption keys derived from content hashes. If you don't know the exact content, you don't know the key. If you do know the content, you can recreate the key, but only to decrypt what you already know.

Let me make sure I understand this process correctly:

Client #1 generates hash key for a file's content

Client #1 encrypts that file with that hash key

Client #1 then generates a hash id of the encrypted blob

Client #1 uploads encrypted blob and hash id to storage

Client #2 comes along, and generates the same hash key, block, & hash id for the same content file

To achieve plausible deniability for your music collection, it is sufficient to have custom tags or individual codec (padding, ect.) settings. For Wuala it is impossible to deduplicate them in this case. A service like spoon's could deduplicate anyway, of course, when it generates an individual track identifier on the client machine. I wouldn't see any benefit of an additional encryption scheme in a case like this.

It's entirely possible to crack file formats and deal with each internal blob of data independently for the purposes of deduplication, making it resistant to tag changes (or at least ensuring that any further clients would only need to upload the modified tags). Wuala could do this if it wanted to, and products like Druva inSync already do. inSync isn't a cloud-based backup product, just an example of a content-aware dedupe implementation.

I suspect thats not the clause you intended to link, since allowing judges to issue warrants to seize property suspected of being involved in a crime is fairly common in most European countries as well.

It's entirely possible to crack file formats and deal with each internal blob of data independently for the purposes of deduplication, making it resistant to tag changes (or at least ensuring that any further clients would only need to upload the modified tags).

Yes, it is indeed possible. And I also think that spoon has programmed something like that. The point I was trying to make was: There is no greater secret about a music collection than what it lists (compare that to private videos where the actual content is the secret). A deduplicating audio backup service with intelligent track ID mechanisms, that survive encryption (while only scrambling the content) makes no sense. Track IDs and there correlation among users ARE the secret.

I don't think that Wuala would be interested in more fine grained deduplication, because privacy is one of their biggest unique selling points. Deduplication of anonymous blobs is just a welcome side effect of how their P2P protocol distributes content to storage nodes.

Actually, it wasn't in the thread, unless I missed it. It was, however, in the fine print on the web page. Thanks. ++ audio has no limits other than a fair use policy, non-audio files subject to 500MB total locker space

Edit: Unless you mean this "Very interesting, though 500 MB limit on non-audio data seems wrong", which is not an official statement from Spoon.

A popular recent saying is that if the product is free, then most likely YOU are the product.

So what I want to know is what is in it for the AudioSafe developers/employees/investors? There cannot be much expectation of daily restore revenue. Certainly not enough to offset storage costs in the short to mid term. What is the benefit of having petabytes of digital music? Testing of algorithms, processing, compression?

So what I want to know is what is in it for the AudioSafe developers/employees/investors? There cannot be much expectation of daily restore revenue. Certainly not enough to offset storage costs in the short to mid term. What is the benefit of having petabytes of digital music? Testing of algorithms, processing, compression?

I also want to know this. From what I've read, the business model depends on users' having data failures. Given how uncommon that really seems to be these days, what is the revenue stream?

@saratoga:As far as I understand it, in EU your data can be investigated if there's reasonable suspicion that you committed some crime.In US your data can be investigated if you're suspected of being related to somebody, for whom there is a reasonable suspicion that this person was planning to commit some crime.