Storage bits fail, and as a result storage protection systems need the ability to recover data from other storage. There are currently two main methods of storage protection:

Replication,

RAID.

Erasure coding is the new kid on the block for storage protection. It started life in RAID 6, and is now poised to be the underpinning of future storage, including cloud storage.

An erasure code provides redundancy by breaking objects up into smaller fragments and storing the fragments in different places. The key is that you can recover the data from any combination of a smaller number of those fragments.

Number of fragments data divided into:......m

Number of fragments data recoded into:.....n (n>m)

The key property of erasure codes is that the stored object in n fragments can be reconstructed from any m fragments.

When recovering data, it is important that it is know if any fragment is corrupted. m is the number of verified fragments required to reconstruct the original data. It is also important to identify the data to ensure immutability. A secure verification hashing scheme is required to both verify and identify data fragments.

It can be shown mathematically that the greater the number of fragments, the greater the availability of the system for the same storage cost. However, the greater the number of fragments that blocks of data are broken into, the higher the compute power required.

For example, if two copies (r = ½) provided 99% availability, 32 fragments with the same r (16/32) and therefore the same amount of storage would provide an availability of 99.9999998%. You can find the math in a paper by Hakim Weatherspoon and John D. Kubiatowic.

Figure 1 below shows the topology of a storage system using erasure coding and RAIN, a Redundant Array of Inexpensive Nodes.

Wikibon believes that storage using erasure coding with a large number of fragments will be particularly important for cloud storage but will also become used within the data center. Archiving will be an early adopter of these techniques.

Action Item: All storage professionals will need to be familiar with erasure coding and the trade-offs for data center and cloud storage.

Footnotes: The ideas in this post are used in two other posts on erasure coding:

The data is available when any m identified and verified pieces are available of the n fragments. If a piece is unavailable because of hardware issues, then it is reconstructed as a background task. The amount of processing time required for reading and writing data goes up as n increases, but the elapsed time of processing is small compared to normal IO latency, As the fragments are small and distributed, all the resources can work in parallel to read, write and recover. Recovery time does not increase with the number of fragments.

The metadata is a is about where the data is stored, the time it was stored, etc. It helps connect the application with the data. It is a much smaller amount of data, but critical to manage both availability and performance.

With erasure coding, the original data is recoded using Reed-Solomon polynomial codes into n fragments. The data can survive the known erasure to up to n-m fragments. Hence the term erasure coding. It is used in RAID 6, and in RAM bit error correction, in Blu-ray disks, in DSL transmission, and in many other places.