A Data Placement Strategy in Scientific Cloud Workflows

Executive Summary

In scientific cloud workflows, large amounts of application data need to be stored in distributed data centers. To effectively store these data, a data manager must intelligently select data centers in which these data will reside. This is, however, not the case for data which must have a fixed location. When one task needs several datasets located in different data centers, the movement of large volumes of data becomes a challenge. In this paper, the authors propose a matrix based k-means clustering strategy for data placement in scientific cloud workflows.