Testing

Error Correction with Reed-Solomon

By José R.C. Cruz, June 25, 2013

Reed-Solomon might well be the most ubiquitously implemented algorithm: Barcodes use it; every CD, DVD, RAID6, and digital tape device uses it; so do digital TV and DSL. Even in deep space, Reed-Solomon toils away. Here's how it works its magic.

In Listing Five is the entry method RSEncode() from the class ReedSolomon. It starts with a call to _rsGenPoly(), passing for input the argument errSize and storing the polynomial result into polyGen (line 10). It sets the local outBuffer to the combined size of both argMesg and errSize and initializes the list elements to 0 (lines 13-14). Then, it copies each message byte into outBuffer (lines 17-19).

Next, RSEncode() begins encoding. It reads each message byte and checks its value (lines 23-24). For a non-zero byte, RSEncode() combines the byte with the generator polynomial using __gfMult() (line 26). Then it updates each element in outBuffer using an exclusive-or (line 27).

Once all the message bytes are encoded, RSEncode() copies the message bytes again into outBuffer (line 30-32); and it returns outBuffer as the encoded result.

Figure 4 illustrates the encoding process itself. In this process, I have the word "Lorem" as my sample block. I assumed text encoding to be 8-bit ASCII and the number of error symbols to be 5.

Figure 4.

The top is the sample block, the bottom the encoded block. Before encoding, the message bytes populates the first half of the list object outBuffer. Each encoding pass then clears a message byte and updates the rest of outBuffer. Note the update moves from left to right. When encoding ends, note that none of the original message bytes remain, while the error symbols occupy the right half of outBuffer.

Decoding with Reed-Solomon

To decode a message block, you still need to supply the number of error symbols. Make sure to use the same number used for encoding. Otherwise, Reed-Solomon will fail to decode the block and correct any erasures or errors present.

To Reed-Solomon, erasures and errors are two different things (Figure 5). An erasure is a missing byte. Its location is known; its value usually a -1 (or 0xff for 8-bit ASCII). An error is an incorrect byte. Its location is not known. Reed-Solomon has to take extra steps to locate the affected byte(s).

Figure 5.

There is a limit to a number of erasures and errors that Reed-Solomon can correct. That limit is tied to the number of error symbols appended to the message. As a rule, Reed-Solomon can correct up to errSize erasures, but only errSize/2 errors. It can also correct a combination of erasures and errors, provided enough error symbols are available for both.

The decoding process consists of five distinct steps (Figure 6). First step is to locate and count the erasures present in the message block. Next is to create the syndrome polynomial. This polynomial reveals any errors present in the block. If all its terms have zero coefficients, then the block is free of errors and decoding ends.

Third step is to calculate the error locator polynomial. This polynomial maps out the location of each incorrect message byte. Next is to calculate the error evaluator polynomial. And the last step is, of course, to correct the encoded block.

Figure 6.

Listing Six shows how the ReedSolomon class prepares the syndrome polynomial. This private method _rsSyndPoly() gets two arguments: the encoded block (argCode) and the number of error symbols (errSize). It resizes the local polyValu to errSize and initializes its elements to zero (line 10). Then it calculates the polynomial terms using __GFEXP and _gfPolyEval() (lines 14-15). It stores each term into polyValu and returns the local once done (lines 18).

Listing Seven shows how ReedSolomon calculates its error-locator polynomial. This method, _rsForney(), has three arguments: the syndrome polynomial (polySynd), the list of erasures (eraseLoci), and the number of error symbols. It implements the Forney algorithm, which uses interpolation to generate the polynomial.

The method starts by copying polySynd into its local polyValu (line 11). In its outer loop, _rsForney() parses each erasure and uses the position to access __GFEXP (lines 15-16). In its inner loop, _rsForney() combines each term from polyValu with the value from __GFEXP using __gfMult() (line 17-19). And it uses the resulting product to update polyValu (line 20). After parsing the erasures, _rsForney() removes the last polynomial term with a call to pop() (line 21) and returns polyValu as its result (line 24).

In Listing Eight is the method _rsFindErr(), which ReedSolomon uses to identify the affected bytes. Its arguments are the error locator polynomial (errLoci) and the number of error symbols (errSize). It uses the Berlekamp-Massey algorithm to prepare its locator polynomial errPoly (lines 14-30). That algorithm creates a minimal polynomial, one that focuses solely on errors. Then, _rsFindErr() counts the number of terms in errPoly (line 33) and compares that count against polySynd (line 34). It returns a None when the encoded block has too many errors (line 36).

Next, _rsFindErr() evaluates errPoly, solving for zeroes (lines 41-46) and storing the results into errList. Each zero marks the location of each affected byte. Again, _rsFindErr() returns a None when no zeroes were produced (line 50). Otherwise, it returns errList.

Listing Nine contains the entry method RSDecode(). Its arguments are the encoded message block (argCode) and in the number of error symbols. It begins by copying the block into the local codeBuffer (line 10). It parses codeBuffer and counts any erasures it finds (lines 13-17). If the number of erasures exceeds errSize, RSDecode() returns a None for a result (lines 18-20).

Next, RSDecode() calls _rsSyndPoly() and checks the resulting syndrome polynomial polySynd (lines 23-26). In this case, it returns codeBuffer if polySynd revealed no errors in the encoded block.

RSDecode() then calls _rsForney(), storing the error locator polynomial into the local errLoci (line 29). It calls _rsFindErr(), storing the list of error locations into the local errList (line 32). Finally, RSDecode() calls _rsCorrect() to perform the necessary corrections (lines 40). It stores the corrected message into outMesg and returns outMesg as its result (line 41).

Conclusion

Reed-Solomon is a fast and effective way to protect data integrity. It is not the only error-correction algorithm: Other algorithms exists that are faster and perhaps better than Reed-Solomon under specific circumstances. But Reed-Solomon an excellent solution that is widely used wherever small data blocks need to be verified.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!