Details

Provided storage allows data stored outside HDFS to be mapped to and addressed from HDFS. It builds on heterogeneous storage by introducing a new storage type, PROVIDED, to the set of media in a datanode. Clients accessing data in PROVIDED storages can cache replicas in local media, enforce HDFS invariants (e.g., security, quotas), and address more data than the cluster could persist in the storage attached to DataNodes.

Provided storage allows data stored outside HDFS to be mapped to and addressed from HDFS. It builds on heterogeneous storage by introducing a new storage type, PROVIDED, to the set of media in a datanode. Clients accessing data in PROVIDED storages can cache replicas in local media, enforce HDFS invariants (e.g., security, quotas), and address more data than the cluster could persist in the storage attached to DataNodes.

Description

In addition to heterogeneous media, many applications work with heterogeneous storage systems. The guarantees and semantics provided by these systems are often similar, but not identical to those of HDFS. Any client accessing multiple storage systems is responsible for reasoning about each system independently, and must propagate/and renew credentials for each store.

Remote stores could be mounted under HDFS. Block locations could be mapped to immutable file regions, opaque IDs, or other tokens that represent a consistent view of the data. While correctness for arbitrary operations requires careful coordination between stores, in practice we can provide workable semantics with weaker guarantees.