Methods to Handle Data Durability Challenges for Big Data

Tags:

Revolutionary Methods to Handle Data

Methods to Handle Big Data Durability Challenges for Big Data

The growth in unstructured data is pushing the limits of data center scalability at the same time that disk drive vendors are pushing the limits of data density at tolerable device level bit error rates (BER). For organizations delivering cloud-hosted services involving images, videos, MP3 files, social media, and other applications, data reliability will be a primary concern. The traditional RAID (Redundant Array of Inexpensive Disks) approach in wide use today simply will not provide the levels of data durability and performance required by enterprises dealing with the escalating volume of data. New approaches that go beyond traditional RAID promise to improve rebuild times on high-density disk drives, and reduce susceptibility to disk-error induced corruption, which otherwise would result in crisis if traditional RAID is simply scaled up using current algorithms.

In this paper, we will discuss why RAID doesn’t scale for Big Data, why erasure code is a better option, and how various erasure code alternatives compare.

We will use the long-standing mean-time-to-data-loss (MTTDL) model to compute the risk of data loss over time and show how the Amplidata computationally intense BitSpread* algorithm deployed on Intel® Xeon® processor-based platforms deliver high levels of storage durability with a significant reduction in raw disk capacity overhead. BitSpread is Amplidata’s rateless erasure coding software which is delivered commercially in the AmpliStor Optimized Object Storage system, a Petabyte-scale storage system purposely built for storing massive amounts of big unstructured data.

The growth in unstructured data is pushing the limits of data center scalability at the same time that disk drive vendors are pushing the limits of data density at tolerable device level bit error rates (BER). For organizations delivering cloud-hosted services involving images, videos, MP3 files, social media, and other applications, data reliability will be a primary concern. The traditional RAID (Redundant Array of Inexpensive Disks) approach in wide use today simply will not provide the levels of data durability and performance required by enterprises dealing with the escalating volume of data. New approaches that go beyond traditional RAID promise to improve rebuild times on high-density disk drives, and reduce susceptibility to disk-error induced corruption, which otherwise would result in crisis if traditional RAID is simply scaled up using current algorithms.

In this paper, we will discuss why RAID doesn’t scale for Big Data, why erasure code is a better option, and how various erasure code alternatives compare.

We will use the long-standing mean-time-to-data-loss (MTTDL) model to compute the risk of data loss over time and show how the Amplidata computationally intense BitSpread* algorithm deployed on Intel® Xeon® processor-based platforms deliver high levels of storage durability with a significant reduction in raw disk capacity overhead. BitSpread is Amplidata’s rateless erasure coding software which is delivered commercially in the AmpliStor Optimized Object Storage system, a Petabyte-scale storage system purposely built for storing massive amounts of big unstructured data.