Data management in the cloud - current issues and research directions*Patrick ValduriezINRIA and LIRMM, Montpellier

Cloud computing involves hosted services over the internet (the cloud) with easy access to scalable, virtualized resources. Cloud providers such as Amazon, Google or eBay offer vast amounts of computing, storage and networking capacities. Through very simple interfaces and at small incremental cost, users can outsource complex tasks, such as data storage, system administration, or failure detection and recovery. Clouds promise to bring existing distributed computing concepts to a very large scale by relying on clusters, virtualization, highly automated management, and extensive failure detection and recovery.

One of the main challenges of cloud computing is to provide ease of programming, consistency, scalability and elasticity at the same time, over cloud data. Distributed database systems have been successful over the internet, e.g. using mediators/wrappers or data grids, but with the virtualization of many databases in a cloud, it is even more important to take the basic principles of distributed databases into consideration. Current cloud storage services have simply sacrificed consistency and ease of programming for the sake of scalability. This approach has resulted in a pervasive approach relying on data partitioning and forcing applications to access data partitions individually. This has resulted in a loss of transactional guarantees across data partitions. This means that no isolation and failure atomicity are provided across multiple data partitions. Therefore, application developers of clouds are faced with a very difficult problem: providing isolation and atomicity across data partitions, through message-passing programming.

In this talk, I will review the current solutions for data management in the cloud, identify current issues and propose directions of research. These directions of research will shape our future research project.