The Cloud and Your Privacy

Privacy concerns for cloud computing range from hackers obtaining personal information to searches by the government. What is rarely considered are the autoscan searches routinely conducted by some cloud providers. These autoscans usually search material when it is uploaded or downloaded. The cloud providers accomplish this by comparing “unique hashes or fingerprints” of files to a database. If matches are found, content could be removed from your cloud storage as a possible copyright violation, or the police could be notified in a case of child pornography. This is great to reduce the exploitation of children, but it raises the question: What else is being scanned?

Dropbox’s privacy policy says “We may disclose to parties outside Dropbox files stored in your Dropbox and information about you that we collect when we have a good faith belief that disclosure is reasonably necessary . . . .” This indicates Dropbox is not actively scanning but instead is only allowing access after a good faith belief arises that disclosure is necessary for a variety of reasons (to comply with the law, protect another’s safety, prevent fraud or abuse, or protect Dropbox’s property rights). Amazon and Google are similar. But Apple, Microsoft and Verizon Online have reserved the right to actively search stored files. So on Dropbox, Amazon and Google, nothing is being scanned, while Apple, Microsoft and Verizon Online all appear to have an open-ended policy allowing them to scan your files on their cloud.

Beyond automated scanning concerns, there are hacking concerns. Recently Evernote was hacked, where passwords may have been exposed but user data was not breached. Through no fault of your own, your private files may end up in the hands of hackers, raising questions about liability. But even so, the terms of use of the cloud companies limit their liability in these cases.

If this is a concern, there are other options like Spideroak. On Spideroak, each user is the only person to have access to the data stored on the cloud, which Spideroak calls ”zero-knowledge privacy.” Not even Spideroak has access and the files are encrypted on their server, so even if a hacker were to get your files, they would still be encrypted. This does raise another problem: if you lose your password, the data is lost forever, and Spideroak cannot recover it for you, which makes its offering both riskier and more secure than other cloud services.

But even so, should a cloud provider’s terms of service determine the protection your cloud-stored files receive? It seems like some legislative intervention is required. Well, Senator Patrick Leahy has introduced an amendment to the 1986 Electronic Communication Privacy Act which may protect cloud files from government search without a warrant. While this does not address issues between private parties, it could help clear up the Fourth Amendment questions. There is, however, some dispute about the effectiveness of the amendment, and additional problems arise with cloud providers who host their servers outside of the United States, where US law does not apply.

As time goes on, this problem is going to grow, because more and more devices are coming to market that rely on the cloud to operate (e.g., the Chromebook). Further, most tablets and netbooks rely on the cloud to augment their small storage capacities. As these technologies become more and more prevalent, the cloud privacy problem will intensify and require more solutions.

If the cloud’s siren call is too much to resist, I would recommend considering the level of privacy you desire, the ease of access and use you want, and choosing the cloud product that fits. Further, keep in mind the potential need to make a back-up of cloud files at home, just in case a hacker erases them or takes over your account, the cloud provider goes out of business, or some other unforeseen event occurs that deletes everything stored in that cloud.

I think Senator Leahy’s amendment is definitely a step in the right direction and shouldn’t be taken lightly. By its ordinary meaning at least, the amendment attempts to remove the 180-day rule that has been being used to soften warrant requirements for email and other online data and prohibit online servers from ‘voluntarily’ giving data the the government. My worry is what the word ‘voluntarily’ really means in relation to all of the other requirements are imposed upon service providers to obtain ‘safe-harbor’ status under the DMCA. I suppose notice-and-takedown can operate without ‘voluntarily’ giving data to the government, but like Nick said, if a hacker gains access or a service is found to be outside of the safe-harbor, this amendment still may not protect your data.

Great post, Nick. There is also another reason this is more likely to occur, and that is that data deduplication provides a performance-related reason for cloud service providers to create these hashes in the first place.

Essentially, if a thousand people upload the exact same song file to their individual accounts, the service provider can avoid making a thousand copies of that one file by storing it for the 1st user, hashing it, and then merely creating a link to it for the 2nd-1000th users (the hash allows it to know when users 2-1000 have uploaded the same file). Services that offer users the ability to control the encryption keys for their files, like Spideroak and its zero-knowledge privacy, cannot create these hashes (because they have the encrypted files, not the source files) and cannot perform data deduplication. For some service providers, however, these gains in storage efficiency will be too good to pass up, giving them every incentive to create and maintain a database of file hashes, even apart from the law enforcement use of such a tool.