One of the biggest concerns still haunting enterprises is about giving up control over their data to third party services. They are worried that they have absolutely no control over the data once it leaves their hands and they are yet to find a fool proof way (except for encryption which causes considerable inconvenience and additional expenses) to protect their data. This is not just with the enterprises, slowly consumers are starting to worry about such Data Blackholes as Ars Technica discovered with Facebook photos. I define Data Blackholes as follows:

Data Blackhole: Any cloud service (for that matter, any web service) where users cannot delete their own data

Data portability, the ability to take out your own data from any service in one of the open formats, is also a critical issue and pertains to ownership of your own data. It is a fundamental right and slowly some of the organizations are playing by these rules. Google’s Data Liberation project and Facebook allowing you to download your data are beginning stages of the realization on the data portability rights of users. However, Data Ownership goes beyond just Data Portability. As I highlighted in the Whitepaper on the questions one should ask a Cloud vendor before signing up, we define Data Ownership as follows:

The data ownership questions

Your data, regardless of who hosts the data, should be your data. The vendor’s terms and conditions should clearly indicate that you own both the data and the metadata you create. The following set of questions is designed to get as much information from the vendor as possible regarding data ownership.

What are your terms when it comes to ownership of data? How about any metadata I generate while using the application?

How easy is it to export data from your service when moving to a new service? Do you offer an option to export the data in one of the open data formats like XMLor JSON? Are there any extra charges for exporting the data?

Do you delete my data completely if I delete it from the application?

What happens to my data if I discontinue your service – do you delete it immediately? Can I retain access to a read only copy for a fee?

Another key aspect of data ownership is the ability to delete data and see it deleted from the service in its entirety (including all backups). This is critical for privacy and, in some cases, for compliance reasons. Recently, I was playing around with Desk.com, a service put out by Salesforce.com using their Assist.ly acquisition. It is a pretty good helpdesk priced very aggressively for both small businesses and large organizations. While playing around with it, I realized that I cannot delete the emails, facebook messages and Twitter messages (including personal DMs) I have passed into the service for setting up a seamless helpdesk. When I dug around the forums I realized that it cannot be deleted from their service. In fact, they offer something called “Erase” function to remove personal information. When I use the function, it removes that message from the case but the contents do show up in the preview that comes in the list of cases. For example, I have added what I see in two screens after I deleted a tweet I imported.

This is the case view

But in the case list view, I see this from the erased tweet

Clearly, the data I deleted is not actually getting deleted. As they mentioned in their post, it is not actually deleted from their service completely. In my case, I am testing out my personal accounts and there are no compliance issues. Imagine an enterprise using a service like this and they face this situation. It could turn out to be a compliance headache. Even otherwise, it impacts the data ownership issue. The service takes the ownership of data from the user which is a big no no according to my books.

Wait, before we pick on Desk.com and Salesforce, I want to highlight that it is not just the case with them. There are many services who treat user data irresponsibly and, more importantly, there are so many users who give out their data irresponsibly. The responsibility comes from both sides. Vendors should be responsible enough to let users keep the ownership of their data and users should first check the vendor terms and services before they put their data in (a fact which I have emphasized in the whitepaper and in many blog posts at CloudAve). I think I should work on a report that will evaluate some of the popular apps for meeting this condition.

This once again brings into focus the user bill of rights and the importance of education from the side of analysts. Many pundits like Ray Wang, James Urquhart and myself are arguing in favor of user bill of rights for a long time but there is not much momentum on the side of users and vendors. I think it is high time the industry (vendors, buyers and pundits) come together and have an enforceable user bill of rights. I am glad that US President Obama has called out for data privacy today and I hope the issue of Data Ownership also gains considerable traction and attracts the attention of government leaders around the world. If we don’t fix the problem ourselves, regulations will eventually descend on us. Do vendors and users really want it to go to that level? Data Blackholes could as well end up being a bigger problem in cloud computing. Wake up and solve the problem NOW.

Trivia: BTW, you can’t delete either your Desk.com account or Salesforce account. How many of you know about it?

Director, OpenShift Strategy at Red Hat. Founder of Rishidot Research, a research community focused on services world. His focus is on Platform Services, Infrastructure and the role of Open Source in the services era. Krish has been writing @ CloudAve from its inception and had also been part of GigaOm Pro Analyst Group. The opinions expressed here are his own and are neither representative of his employer, Red Hat, nor CloudAve, nor its sponsors.

2 responses to “Data Blackholes – Wake Up Before It Is Too Late”

While not a new problem, its one that is growing in concern, as information becomes more visible and valuable/vital to the applications usage. Welcome to big data = data monetization.

Remember those days of database marketing, getting that catalog in the mail from store X, then asking to be taken off their list, which was complied, while yes you were taking off the list, but never really deleted. Also, if store X had sold the data to other partners, you were then getting catalogs from them as well. Was X responsible to tell its partners to remove you from their list.

Many database marketers have SLAs/bills of rights, on what information will be used, what is visible to the marketer when the information is sent back to them, and when the contract was not renewed how the information was deleted/shipped back to the organization. Much of this was a black box, and many organizations lost control of their data, and in some cases were locked in due to who owned what piece of the customer record. But one thing was certain, a customer record was never deleted, because the consumer opted out or ended their relationship with the vendor, they were just no longer marketed to.