Sign up to receive free email alerts when patent applications with chosen keywords are publishedSIGN UP

Abstract:

A method of decoding data encoded with a polar code and devices that
encode data with a polar code. A received word of polar encoded data is
decoded following several distinct decoding paths to generate a list of
codeword candidates. The decoding paths are successively duplicated and
selectively pruned to generate a list of potential decoding paths. A
single decoding path among the list of potential decoding paths is
selected as the output and a single candidate codeword is thereby
identified. In another preferred embodiment, the polar encoded data
includes redundancy values in its unfrozen bits. The redundancy values
aid the selection of the single decoding path. A preferred device of the
invention is a cellular network device, (e.g., a handset) that conducts
decoding in accordance with the methods of the invention.

Claims:

1. A method of decoding data encoded with a polar code, the method
comprising: receiving a word of polar encoded data from a channel and
conducting decoding of the word by following several distinct decoding
paths to generate codeword candidates; list decoding by successively
duplicating and pruning said decoding paths to generate a list of
potential decoding paths, selecting a single decoding path from the list
of potential decoding paths and thereby identifying a single codeword
output.

2. The method of claim 1, wherein said duplicating and pruning splits a
decoding path into two child pats to be examined for each decision on the
value of an unfrozen bit.

3. The method of claim 2, wherein the list decoding assigns child paths
duplicate data structures each time a decoding path splits; each assigned
duplicate data structure is flagged as belonging to multiple paths
without copying the data structures at the time of assignment; and a copy
of an assigned duplicate data structure is made only when a selected
decoding path requires access to an assigned duplicate data structure
during decoding.

4. The method of claim 1, wherein said duplicating and pruning doubles
the number of decoding paths at each decoding step, and then performs
pruning procedure to discard all but the L best paths.

5. The method of claim 1, wherein the pruning is performed using an
accumulated likelihood of each path.

6. The method of claim 1, implemented in a cellular network device.

7. The method of claim 1, wherein the word of polar coded data includes
k-r unfrozen bits of k unfrozen bits as data bits and r bits of
redundancy data; and wherein the pruning uses the redundancy data for
decoding decisions to prune the list of potential decoding paths.

9. A method for encoding and decoding data using polar codes, the method
comprising: reserving k-r unfrozen bits of k unfrozen bits available as
data bits; using the remaining r unfrozen bits to add redundancy to the
data bits; and then using said redundancy to aid in the selection of a
decoding path from a list of decoding paths generated during the
decoding.

10. The method of claim 9, wherein the redundancy bits are assigned to
cyclic redundancy check (CRC) values of the data.

11. The method of claim 10, wherein all the decoding paths with incorrect
cyclic redundancy check values are discarded.

12. The method of claim 9, implemented in a cellular network device

13. A polar decoding device for receiving polar coded data from a
channel, the decoding device comprising: means for list decoding the
polar coded data; and means for successively duplicating and pruning
decording paths during decoding.

[0003] A field of the invention is information coding and decoding. The
invention is particularly useful for communication over noisy mediums.
Example applications of the invention include communications (such as
wireless communications including, e.g., cellular communications,
satellite communications, and deep space communications) and data storage
(such as used in data storage devices, e.g., computer hard disk devices).

BACKGROUND

[0004] Error-correcting codes are used whenever communication over a noisy
medium (channel) takes place. Cell phones, computer hard disks,
deep-space communication and many other devices communicate over a noisy
medium. Error-correcting codes have been widely used and improved since
1948 as the search for optimal error correcting codes continued for
decades. The basic problem in decoding is attempting to recover an
original transmitted codeword from a received word that is a distorted
version of the original codeword. The distortion is introduced by the
noisy medium.

[0006] Polar codes were the first and presently remain the only family of
codes known to have an explicit construction (no ensemble to pick from)
and efficient encoding and decoding algorithms, while also being capacity
achieving over binary input symmetric memoryless channels.). A drawback
of existing polar codes to date is disappointing performance for short to
moderate block lengths. Polar codes have not been widely implemented
despite recognized inherent advantages over other coding schemes, such as
turbo codes, because channel polarization remains slow in prior methods.

[0007] List decoding was introduced in the 1950s. See, P. Elias, "List
decoding for noisy channels," Technical Report 335, Research Laboratory
of Electronics, MIT (1957). List decoding addresses a worst case scenario
by outputting a small number of codewords that are a small distance from
the code word. List decoding has not been widely used, however. Modern
applications of list decoding have sought to reduce worst-case decoding
penalties. Successive cancellation list decoding has been applied in to
Reed-Muller codes. See, I. Dumer and K. Shabunov, "Soft-decision Decoding
of Reed-Muller codes: Recursive Lists," IEEE Trans. Inform. Theory, vol.
52, pp. 1260-1266 (2006). Reed-Muller codes are structured differently
than polar codes and are widely considered in the art to have a different
decoding approach. Indeed, Arikan's original paper that presented polar
codes emphasized differences between polar codes and Reed-Muller codes.
Phrasing of the decoding algorithms in Reed-Muller and Polar codes makes
comparison difficult. The present inventors recognized that Arikan's
successive cancellation decoding is similar in nature to the successive
cancellation decoding of Reed-Muller codes as in Dumer-Shabnov. However,
application of successive list decoding as set forth in Dumer-Shavanov
would increase complexity of polar decoding to an extent that would make
its application impractical. The successive cancellation list decoding in
Dumer-Shabunov is also complex, and can lead to Ω(Ln2)
complexity. As with prior list decoders, it will also fail to produce a
single output, instead producing a small list of candidates without a
single, explicit codeword.

[0008] The observation that one can reduce the space complexity of
successive cancellation decoders for polar codes with hardware
architectures to O(n) was noted, in the context of VLSI design, by the
present inventors and colleagues in C. Leroux, I. Tal, A. Vardy, and W.
J. Gross, "Hardware Architectures for Successive Cancellation Decoding of
Polar Codes," rXiv:1011.2919v1 (2010). This paper does not provide a
different decoding approach for polar codes, but provides architectures
that can reduce the space complexity for the decoding scheme that was
provided by Arikan with the introduction of polar codes.

SUMMARY OF THE INVENTION

[0009] An embodiment of the invention is a method of decoding data encoded
with a polar code. A received word of polar encoded data is decoded
following several distinct decoding paths to generate codeword
candidates. The decoding paths are selectively successively duplicated
and pruned to generate a list of potential decoding paths. A single
decoding path among the list of potential decoding paths is selected as
the output and a single candidate codeword is identified as the output.
In another preferred embodiment, the polar encoded data includes
redundancy data in its unfrozen bits. The redundancy data aids the
selection of the single decoding path. A preferred device of the
invention is a cellular network device, e.g., a handset that conducts
decoding in according with the methods of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010] FIG. 1 shows word error rate at a length n=2048 rate 1/2 polar code
optimized for SNR=2 dB under various list sizes for a code construction
consistent with I. Tal and A. Vardy, "How to construct polar codes,"
submitted to IEEE Trans. Inform. Theory, available online as
arXiv:1105.6164v2 (2011); with two dots representing upper and lower
bounds on the SNR needed to reach a word error rate of 10-5;

[0011] FIG. 2 shows a comparison of polar coding in according with present
embodiments and decoding schemes to an implementation of the WiMax
standard from TurboBest, "IEEE 802.16e LDPC Encoder/Decoder Core."
[Online], with codes of rate 1/2, a length of the polar code is 2048, the
length of the WiMax code is 2034, a list size used was L=32, and a CRC
use was 16 bits long;

[0012] FIG. 3 shows a comparison of normalized rate for a wide class of
codes with a target word error rate is 10-4;

[0013]FIG. 4 shows decoding paths in accordance with the invention of
unfrozen bits for L=4: with each level has at most 4 nodes with paths
that continue downward and discontinued paths being grayed out;

[0014] FIGS. 5A and 5B show word error rate of length n=2048 (FIG. 5A) and
n=8192 (FIG. 5B) rate 1/2 polar code in accordance with the present
information optimized for SNR=2 dB under various list sizes; and the code
construction was carried out as in FIG. 1.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0015] The present inventors have recognized that drawbacks in polar codes
at short to medium block lengths arises from inherent weaknesses of the
code itself at these lengths or from the fact that the successive
cancellation (SC) decoder employed to decode them is significantly
degraded with respect to maximum likelihood (ML) decoding performance.
These two possibilities are complementary, and so both may occur.

[0016] Disclosed are methods and their computer implementation that
greatly improve the error-correcting performance of polar codes. Polar
codes are a family of error correcting codes that facilitate the transfer
of information from a transmitter to a receiver over a noisy medium
(e.g., as happens in cell phones, computer hard disks, deep-space
communication, etc). The invention employs a new decoding method for
polar codes as well as a modification of the codes themselves. The method
has been fully implemented and tested. The resulting performance is
better than the current state-of-the-art in error-correction coding.

[0017] Preferred embodiments of the invention also provide a modified
polar code and decoding method. In prior polar coding methods, on the
transmitting end (encoding), a sequence of K information bits are mapped
to a codeword of length n. In the preferred embodiment, k information
bits and r CRC (cyclic redundancy check) bits together constitute the
K=k+r bits mapped to a codeword of length n. These bits are denoted as
u--1,u--2, . . . , u_K. On the receiving end (decoding),
instead of first decoding u--1 to either 0 or 1, then decoding
u--2 to either 0 or 1, and so forth, what occurs is as follows. When
decoding u--1, both the option of it being a 0 and the option of it
being a 1 are considered. These two options are termed "paths". For each
such path, both options of u--2 lead to 4 paths, and so forth.

[0018] An aspect of the invention provides an improvement to the SC
decoder, namely, a successive cancellation list (SCL) decoder. The list
decoder has a corresponding list size L, and setting L=1 results in the
classic SC decoder. While lists are used in the decoder when the
algorithm executes, the decoder returns a single codeword. L is an
integer parameter that can be freely chosen by a system designer. L=1
indicates the classic successive cancellation decoding (or Arikan).
Higher values of L lead to better performance and higher complexity. In
HI examples used to test the invention, the highest value of L is 32, but
much higher values are possible extending into tens of thousands. In
embodiments of the invention, L is the number of different decoding paths
after pruning. After duplication, that value is 2 L. In preferred
embodiments, the pruning reduces the number of paths from 2 L to L.

[0019] Embodiments of the invention also include encoders and decoders for
ECC (error corrected coded) polar code communications. A preferred
embodiment decoder uses a theory that duplicates data structures used by
a parent path each time a decoding path spits into two forks. Each fork
receives a copy. The complexity of making actual copies can grow the cost
of copying quickly. To avoid this and provide reduced copying expense,
namely, at each given stage, the same array may be flagged as belonging
to more than one decoding path. However, when a given decoding path needs
access to an array it is sharing with another path, a copy is made.

[0020] Embodiments of the invention also include polar concatenated codes
executed by computer hardware and or software that aid the decoding.
Instead of setting all unfrozen bits to information bits to transmit, a
following concatenation is applied. For some small constant r,
embodiments of the invention set the first k-r unfrozen bits to
information bits. The last r unfrozen bits will hold the r-bit CRC
(cyclic redundancy code) value of the first k-r unfrozen bits. This
provides a reasonable penalty rate of (k-r)/n. During decoding, the
concantenation provides a shortcut to refine selection. A path for which
the CRC is invalid can not correspond to the transmitted codeword. Thus,
the selection can be refined as follows. If at least one path has a
correct CRC, then remove from the list all paths having incorrect CRC and
then choose the most likely path. Otherwise, select the most likely path
in the hope of reducing the number of bits in error, but with the
knowledge that at least one bit is in error.

[0021] Artisans will appreciate that preferred embodiments of the
invention efficiently decode polar codes by generating a list of
candidate codewords with successive cancellation decoding. In preferred
embodiments, a codeword is selected from the list using an outer CRC
coded polar code provided by the invention.

[0022] In a preferred list decoder of the invention, up to L decoding
paths are considered concurrently at each decoding stage. Simulation
results show that the resulting performance is very close to that of a
maximum-likelihood decoder, even for moderate values of L. The present
list decoder effectively bridges the gap between successive-cancellation
and maximum-likelihood decoding of polar codes. The specific
list-decoding algorithm that achieves this performance doubles the number
of decoding paths at each decoding step, and then uses the pruning
procedure to discard all but the L best paths. The natural pruning
criterion can be easily evaluated. Nevertheless, a straightforward
implementation still requires O(L n 2) time, which is in stark contrast
with the O(n log n) complexity of the original successive-cancellation
decoder. The structure of polar codes is used to overcome this problem
and provide an efficient, numerically stable, implementation taking only
O(L n log n) time and O(L n) space. The polar coding strategies of the
invention achieve better performance with lower complexity. In the
preferred embodiment list decoder, up to L decoding paths are considered
concurrently at each decoding stage. Then, a single codeword is selected
from the list as output. If the most likely codeword is selected,
simulation results show that the resulting performance is very close to
that of a maximum-likelihood decoder, even for moderate values of L.
Alternatively, if an intelligent selection procedure selects the codeword
from the list, the results are comparable to the current state of the
LDPC codes.

[0023] The preferred list decoder doubles the number of decoding paths at
each decoding step, and then uses a pruning procedure to discard all but
the L best paths. Nevertheless, a straightforward implementation still
requires O(Ln) time, which is in stark contrast with the O(n log n)
complexity of the original successive-cancellation decoder. The structure
or polar codes is exploited by the invention with an efficient,
numerically stable, implementation taking only O(Ln log n) time and O(Ln)
space.

[0024] Those knowledgeable in the art will appreciate that embodiments of
the present invention lend themselves well to practice in the form of
computer program products. Accordingly, it will be appreciated that
embodiments of the present invention may comprise computer program
products comprising computer executable instructions stored on a
non-transitory computer readable medium that, when executed, cause a
computer to undertake methods according to the present invention, or a
computer configured to carry out such methods. The executable
instructions may comprise computer program language instructions that
have been compiled into a machine-readable format. The non-transitory
computer-readable medium may comprise, by way of example, a magnetic,
optical, signal-based, and/or circuitry medium useful for storing data.
The instructions may be downloaded entirely or in part from a networked
computer. Also, it will be appreciated that the term "computer" as used
herein is intended to broadly refer to any machine capable of reading and
executing recorded instructions. It will also be understood that results
of methods of the present invention may be displayed on one or more
monitors or displays (e.g., as text, graphics, charts, code, etc.),
printed on suitable media, stored in appropriate memory or storage, etc.

[0025] Preferred embodiments of the invention will now be discussed with
respect to the experiments and results. Artisans will appreciate
additional features of the invention from the discussion of the
experimental results and example embodiments.

[0026] A preferred embodiment Successive Cancellation (SC) decoder is a
modification to polar codec of Arikan. The basic decoder must first be
defined to explain the modifications.

[0027] Let the polar code under consideration have length n=2m and
dimension k. Thus, the number of frozen bits is n-k. The reformulation
denotes by u=(ui)i=0n-1=u0n-1 the information
bits vector (including the frozen bits), and by c=c0n-1 the
corresponding codeword, which is sent over a binary-input channel
W:χ→γ, where χ={0, 1}. At the other end of the
channel, the received word is y=y0n-1. A decoding algorithm is
then applied to y, resulting in a decoded codeword c having corresponding
information bits u.

[0028] A. An Outline of Successive Cancellation

[0029] A high-level description of the SC decoding algorithm is provided
in Algorithm I (see Algorithm section). In Algorithm I, at each phase
φ of the algorithm, the pair of probabilities
Wm.sup.φ(y0n-1,u0.sup.φ-1|0) and
Wm.sup.φ(y0n-1,u0.sup.φ-1|1) is calculated.
Then the value of u.sub.φ is determined according to the pair of
probabilities.

[0030] Probabilities are calculated as follows. For layer
0≦λ≦m

Λ=2.sup.λ. (1)

For

0≦φ<Λ, (2)

[0031] bit channel W.sub.λ.sup.φ is a binary input channel with
output alphabet γ.sup.Λ×χ.sup.φ, the
conditional probability of which is generically denoted

[0034] For algorithm 1 to become concrete, it is necessary to specify how
the probability pair associated with Wm.sup.φ is calculated, and
how the values of u, namely u0.sup.φ-1, are propagated into
those calculations. For λ>0 and 0≦φ<Λ,
recall the recursive definition of
W.sub.λ.sup.φ(y0.sup.Λ-1,u0.sup.φ-1|u.sub.φ) given in either (4) or (5), depending on the parity of φ. For
either φ=2ψ or φ=2ψ+1, the channel
W.sub.λ-1.sup.φ is evaluated with output
(y.sub.Λ/2.sup.Λ-1,u0,even2ψ-1⊕u0,od-
d2ψ-1) as wall as with output
(y.sub.Λ/2.sup.Λ-1,u0,odd2ψ-1). Preferred
embodiments utilize these recursions. The output can be defined simply to
aid the analysis. This can be accomplished by specifying, apart from the
layer λ and the phase φ which define the channel, the branch
number

0≦β<2m-λ (6)

[0035] Since, during the run of the SC algorithm, the channel
Wm.sup.φ is only evaluated with a single output
(y0.sup.Λ-1,u0.sup.φ-1|u.sub.φ) and
corresponding branch number β=0 is assigned to each output. Next,
proceed recursively as follows. For λ>0, consider a channel
W.sub.λ.sup.φ, with output
(y0.sup.Λ-1,u0.sup.φ-1), a branch number β=0 is
assigned to each such output. Next proceed recursively as follows. For
λ>0, consider a channel W.sub.λ.sup.φ with output
(y0.sup.Λ-1,u0.sup.φ-1) and corresponding branch
number β. Denote ψ=.left brkt-bot.φ/2.right brkt-bot.. The
output (y.sub.Λ/2.sup.Λ-1,u0,even2ψ-1⊕u.s-
ub.0,odd2ψ-1) will have a branch number of 2β+1. An output
corresponding to branch β of a channel is introduced.

[0036] Embodiments of an invention define and use a first data structure.
For each layer 0≦λ<m, a probabilities array is denoted by
P.sub.λ, indexed by an integer 0≦i<2m and a bit b
.di-elect cons. {0, 1}. For a given layer λ, an index i will
correspond to a phase 0≦φ<Λ and a branch
0≦β<2m-λ using the following quotient/reminder
representation.

i=φ,βλ=φ+2.sup.λβ (7)

[0037] To avoid repetition, the following shorthand is adopted

P.sub.λ[φ,β]=P.sub.λ[φ,βλ (8)

[0038] The probabilities array data structure P.sub.λ is used as
follows. Let layer 0≦λ<m, phase
0≦φ<Λ, and branch 0≦β<2m-λ
be given. Denote the output corresponding to branch β of
W.sub.λ.sup.φ as (y0.sup.Λ-1,u0.sup.φ-1).
Then ultimately, both values of b are obtained that

P.sub.λ[φ,β][b]=W.sub.λ.sup.φ(y0.sup..LAMB-
DA.-1,u0.sup.φ-1|b) (9)

[0039] The input corresponding to a branch can be defined via a similar
terminology. Start at layer m and continue recursively. Consider the
channel Wm.sup.φ, and let u.sub.φ be the corresponding input
which Algorithm 1 assumes. This input is assigned a branch number
β=0. Proceed recursively as follows. For layer λ>0,
consider the channels W.sub.λ2ψ and
W.sub.λ2ψ+1 having the same branch β with
corresponding inputs u2ψ and u2ψ+1, respectively. In
view of (5) consider W.sub.λ-1.sup.ψ and define the input
corresponding to branch 2β as u2ψ⊕u2ψ+1. Under
this recursive definition, for all 0≦λ<m,
0≦φ<Λ, and 0≦β<2m-λ, and
input corresponding to branch β of W.sub.λ.sup.φ is well
defined.

[0040] The following lemma points at the natural meaning that a branch
number has at layer λ=0. This can be proved using a straightforward
induction.

[0041] Lemma 1: Let y and c be as in Algorithm 1, the received vector and
the decoded codeword. Consider layer λ=0 and thus set φ=0.
Next, fix a branch number 0≦β<2n. Then, the input and
output corresponding to branch β of W0.sup.(0) are y.sub.β
and c.sub.β respectively.

[0042] A second data structure is now introduced. For each layer
0≦λ<m, a bit array, is denoted by B.sub.λ, and
indexed by an integer 0≦i<2m, as in (7). The data
structure can be used as follows. Let layer 0≦λ<m, phase
0≦φ<Λ, and branch β of W.sub.λ.sup.φ
as u(λ,φ,β). Then ultimately,

B.sub.λ[φ,β=u(λ,φ,β), (10)

[0043] which adopts the same shorthand as (8). The total memory consumed
by this algorithm is O(n log n). A preferred first implementation of the
SC decoder is given as Algorithms 2-4 (see Algorithm Section). The main
loop is given in Algorithm 2, and follows the high-level description
given in Algorithm 1. Note that the elements of the probabilities arrays
P.sub.λ and bit array B.sub.λ, start-out uninitialized, and
become initialized as the algorithm runs its course. The code to
initialize the array values is given in Algorithms 3 and 4.

[0044] Lemma 2: Algorithms 2-4 are a valid implementation of the SC
decoder.

[0045] Proof: In addition to proving the claim explicitly stated in the
lemma, the implicit claim can also be proven. Namely, the actions taken
by the algorithm should be shown to be well defined. This could be shown
by demonstrating that when an array element is read from, it was already
written to (it is initialized).

[0046] Both the implicit and the explicit claims are easily derived from
the following observation. For a given 0≦φ<n, consider
iteration φ of the main loop in Algorithm 2. Fix a layer
0≦λ<m, and a branch 0≦β<2m-λ.
If a run of the algorithm is suspended just after the iteration ends,
then (9) hold up with {acute over (φ)} instead of φ, for all

0 ≦ Φ ' ≦ Φ 2 m - λ ##EQU00002##

[0047] Similarly, (10) holds with all {acute over (φ)} instead of
φ for all

[0047] 0 ≦ Φ ' < Φ + 1 2 m - λ
##EQU00003##

[0048] The above observation is proved by induction on
φ.

[0049] The running time of the known SC decoder is O(n log n) and the
implementation provided above is no exception. The space complexity of
the present algorithm is O(n log n) as well. However, in the above
observation the space complexity can be reduced to O(n).

[0050] As a first step towards this end, consider the probability pair
array Pm. By examining the main loop in Algorithm 2, it is seen that
when it is currently at phase φ, then it will never again make of use
Pm.left brkt-bot.φ',0.right brkt-bot., for all φ'<φ.
On the other hand, Pm.left brkt-bot.φ'',0.right brkt-bot. is
uninitialized for all φ''>φ. Thus, instead of reading and
writing to Pmφ,0.right brkt-bot., it is possible to essentially
disregard the phase information, and use only the first element
Pm[0] of the array, discarding all the rest. By the recursive nature
of the polar codes, this observation--disregarding the phase
information--can be exploited for a general layer λ as well.
Specifically, for all 0≦λ<m, it is now possible t define
the number of elements in P.sub.λ to be 2m-λ.

[0051] Accordingly,

P.sub.λ[φ,β] is replaced by P.sub.λ[β]. (11)

[0052] Note that the total space needed to hold the P arrays has gone down
from O(n log n) to O(n). That is also desirable for the B arrays.
However, the above implementation does not permit the phase to be
disregarded, as can be seen, for example, in line 3 of Algorithm 4. The
solution is a simple renaming. As a first step, define for each
0≦λ≦m an array C.sub.λ consisting of bit pairs
and have length n/2. Next, let a generic reference of the form
B.sub.λ[φ,β] be replaced by
C.sub.λ[ψ+β2.sup.λ-1][φ mod 2], where
ψ=.left brkt-bot.φ/2.right brkt-bot.. This renames the elements
of B.sub.λ as elements of C.sub.λ. It is now possible to
disregard the value of ψ and take note only of the parity of φ.
With one more substitution: replace every instance of
C.sub.λ[ψ+β2.sup.λ-1][φ mod 2] by
C.sub.λ[β][φ mod 2], and resize each array with
C.sub.λ to have 2m-λ bit pairs. To sum up,

B.sub.λ[φ,β] is replaced by C.sub.λ[β][φ
mod 2]. (12)

[0053] A further reduction in space is possible: for λ=0, φ=0,
and thus the parity of φ is always even. However, this reduction does
not affect the asymptotic space complexity which is now indeed down to
O(n). The revised algorithm is given as Algorithms 5-7.

[0054] The above statements are also of use in analyzing the time
complexity of the preferred embodiment list decoder.

[0055] A preferred embodiment is referred to as a successive cancellation
list (SCL) decoder, and example decoding is shown in FIG. 4. The list
decoder has a parameter L, called the list size. Generally speaking,
larger values of L mean lower error rates but longer running times. In
the main loop of an SC decoder, each phase provides a decision on the
value of u.sub.φ. In the present SCL decoder, instead of deciding to
set the value of an unfrozen u.sub.φ to either a 0 or a 1, the
decoding path splits into two paths (see FIG. 4) to be examined. Paths
must be pruned to limit the maximum number of paths allowed to the
specified list size, L. A pruning criterion is provided to keep the most
likely paths at each stage. A simple implementation of the pruning can
proceed as follows. Each time a decoding path is split into two forks,
the data structures used by the "parent" path are duplicated, with one
copy given to the first fork and other to the second. Since the number of
splits is Ω(Ln), and since the size of the data structures used by
each path is Ω(n), the copying operation alone would consume time
Ω(Ln2). This running time is only practical for short codes.
However, all known (to the present inventors) implementations of
successive cancellation list have complexity at least (Ln2).
Preferred embodiments of SCL decoding reduces time complexity to O(Ln log
n) instead of Ω(Ln2).

[0056] Consider the P arrays of and recall that the size of P.sub.λ
is proportional to 2m-λ. Thus, the cost of copying
P.sub.λ grows exponentially small with λ. On the other hand,
looking at the main loop of Algorithm 5 and unwinding the recursion,
P.sub.λ is accessed only every 2m-λ increments of φ.
The bigger P.sub.λ is, the less frequently it is accessed. The same
observation applies to the C arrays. This observation of the present
inventors leads to the use of a "lazy-copy" operation in preferred
embodiments. Namely, at each given stage, the same array may be flagged
as belonging to more than one decoding path. However, when a given
decoding path needs access to an array it is sharing with another path, a
copy is made.

[0057] Low-level functions and data structures can provide the "lazy-copy"
methodology. The formulation is kept simple for purposes of explanation,
but artisan will recognize some clear optimizations. The following data
structures are defined and initialized in Algorithm 8.

[0058] Each path with have an index l, where 0≦l<L. At first,
only one path will be active. As the algorithm runs its course, paths
will change states between "active" and "inactive" The
inactivePathIndices stack (See, Section 10.1 of T. H. Cormen, C. E.
Leiserson, R. L. Rivest, and C. Stein, "Introduction to Algorithms, 2nd
ed. Cambridge, Mass.: The MIT Press (2001)) will hold the indices of the
inactive paths. This assumes the "array" implementation of a stack, in
which both "push" and "pop" operations take O(1) time and a stack of
capacity L takes O(L) space. The activePath array is a Boolean array such
that activePath[l] is true if path l is active. Note that, essentially,
both inactivePathIndices and activePath store the same information. The
utility of this redundancy will be made clear shortly.

[0059] For every layer λ, there will be a "bank" of L
probability-pair arrays for use by the active paths. At any given moment,
some these arrays might be used by several paths, while others might not
be used by any path. Each such array is pointed by an element of
arrayPointer_P. Likewise, there will be a bank of bit-pair arrays,
pointed to by elements of arrayPointer_C.

[0060] The pathIndexToArrayIndex array is used as follows. For a given
layer λ and path index l, the probability-pair array and bit-pair
array corresponding to layer λ of path l are pointed to by

arrayPointer_P[λ][pathIndexToArrayIndex[λ][l]]

and

arrayPointer_C[λ][pathIndexToArrayIndex[λ][l]],

respectively.

[0061] At any given moment, some probability-pair and bit-pair arrays from
the bank might be used to multiple paths, while others may not be used by
any. The value of arrayReferenceCount[λ][s] denotes the number of
paths currently using the array pointed to by arrayPointer_P[λ][s].
This is also the number of paths making use of
arrayPointer_C[λ][s]. The index s is contained in the stack
inactiveARrayIndieces[λ] if arrayReferenceCount[λ][s] is
zero.

[0062] With the data structures initialized, the low-level functions by
which paths are made active and inactive can be stated. Start by
reference to Algorithm 9, by which the initial path of the algorithm is
assigned and allocated. This serves to choose a path index l that is not
currently in use (none of them are), and mark it as used. Then, for each
layer λ, mark (through pathIndexToArrayIndex) an index s such that
both arrayPointer_P[λ][s] and arrayPointer_C[λ][s] are
allocated to the current path.

[0063] Algorithm 10 is used to clone a path--the final step before
splitting that path in two. The logic is very similar to that of
Algorithm 9, but now the two paths are made to share bit-arrays and
probability arrays.

[0064] Algorithm 11 is used to terminate a path, which is achieved by
marking it as inactive. After this is done, the arrays marked as
associated with the path are considered. Since the path is inactive, it
is treated as not having any associated arrays, and thus all the arrays
that were previously associated with the path can have their reference
count decreased by one.

[0065] The goal of all previously discussed low-level functions was
essential to enable the abstraction implemented by the functions
getArrayPointer_P and getArrayPointer_C. The function getArrayPointer_P
is called each time a higher-level function needs to access (either for
reading or writing) the probability-pair array associated with a certain
path l and layer λ. The implementation of getArrayPointer_P is
provided in Algorithm 12. There are two cases to consider: either the
array is associated with more than one path or it is not. If it is not,
then nothing needs to be done, and a pointer can be returned to the
array. On the other hand, if he array is shared a private copy is created
for path l, and a pointer is returned to that copy. This ensures that two
paths will never write to the same array. The function getArrayPointer_C
is used in the same manner for bit-pair arrays, and has exactly the same
essential implementation.

[0066] The above implementation deliberately sacrifices speed for
simplicity. Namely, each such function is called either before reading or
writing to an array. A variation to optimize speed conducts the copy
operation only before writing.

[0067] This completes the definition of almost all low-level functions.
Constraints that should be followed and what is expected if these
constraints are met are provided next.

[0068] Definition 1 (Valid calling sequence): Consider a sequence
(ft)t=0T of T+1 calls to the low-level functions
implemented in Algorithms 8-12. The sequence is considered valid if the
following traits hold.

[0069] Initialized: The one and only index t for which ft is equal to
intializedDataStructures is t=0. The one and only index t for which
ft is equal to assignIntialPath is t=1.

[0070] Balanced: For 1≦t≦T, denote the number of times the
function clonePath was called up to and including t as

#.sub.clonePath.sup.(t)=|{1≦i≦t:fi is clonePath}|.

[0071] Define #.sub.killPath.sup.(t) similarly. Then for every
1≦t≦L, the algorithm requires that

1≦(1+#.sub.clonePath.sup.(t)-#.sub.killpatht)≦L.
(13)

[0072] Active: A path l is active at the end of stage 1≦t≦T
if the following conditions hold. First, there exists an index
1≦i≦t for which fi is either clonePath with
corresponding output l or assignIntialPath with output l. Second, there
is no intermediate index i<j≦t for which fi is killPath
with input l. For each 1≦t≦T we require that ft+1 has
input l, then l is active at the end of stage t.

[0073] Lemma 3: Let (ft)t=0T be a valid sequence of calls
to the low-level function implemented in Algorithms 8-12. Then, the run
is well defined: i) A "pop" operation is never carried out on an empty
stack, ii) a "push" operation never results in a stack with more than L
elements, and iii) a "read" operation from any array defined in lines 2-7
of Algorithm 8 is always preceded by a "write" operation to the same
location in the array.

[0074] Proof: The proof reduces to proving the following four statements
concurrently for the end of each step 1≦t≦T, by induction
on t.

[0075] I A path index l is active by Definition 1 if activePath[l] is true
if inactivePathIndices does not contail the index l.

[0076] II The bracketed expression in (13) is the number of active paths
at the end of stage t.

[0077] III The value of arrayReferenceCount[λ][s] is positive if the
stack inactiveArrayIndices[λ] does not contain the index s, and is
zero otherwise.

[0078] IV The value of arrayReferenceCount[λ][s] is equal to the
number of active paths l for which pathIndexArrayIndex[λ][l]=s.

[0079] Before completing formalization of the utility of the low-level
functions, the concept of a descendant path needs to be specified. Let
(ft)t=0T be a valid sequence of calls. Next let l be an
active path index at the end of stage≦t≦T. Henceforth,
abbreviate the phrase "path index l at the end of stage t" by "[l,t]".
[l',t+1] is a child of "[l,t] if i) l' is active at the end of stage t+1,
and ii) either l'=l or ft+1 was the clonePath operation with input l
and output l'. Likewise, [l',t'] is a descendant of [l,t] if
1≦t≦,t'] and there is a (possibly empty) hereditary chain.

[0080] The definition of a valid function calling sequence can now be
broadened by allowing reads and writes to arrays.

[0081] Fresh pointer: consider the case where t>1 and ft is either
the getArrayPointer_P or getArrayPointer_C function with input
(λ,l) and output p. Then, for valid indices I, allow read and write
operations to p[i] after stage t but only before ant stage t'>t for
which ft' is either clonePath or killPath.

[0082] Informally, the following lemma states that each path effectively
sees a private set of arrays.

[0083] Lemma 4: Let (ft)t=0T be a valid sequence of calls
to the low-level functions implemented in Algorithms 8-12. Assume the
read/write operations between stages satisfy the "fresh pointer"
condition.

[0084] Let the function ft be getArrayPointer_P with input
(λ,l') and output p. Similarly, for stage t'>t, let ft' be
getArrayPointer_P with input (λ,l') and output p'. Assume that
[l',t'] is a descendant of [l,t].

[0085] Consider a "fresh pointer" write operation to p[i]. Similarly,
consider a "fresh pointer" read operation from p'[i] carried out after
the "write" operation. Then assuming no intermediate "write" operations
of the above nature, the value written is the value read.

[0086] A similar claim holds for getArrayPointer_C.

[0087] Proof: With the observations made in the proof Lemma 3 at hand, a
simple induction on t is all that is needed.

[0088] The function pathIndexInactive given in Algorithm 13 is simply a
shorthand, meant to help readability.

[0089] B. Mid-Level Functions

[0090] Algorithms 14 and 15 are modified implementations of Algorithms 6
and 7, respectively, for the list decoding setting.

[0091] These implementations of the preferred embodiment loop over all the
path indices l. Thus, the implementations make use of the functions
getArrayPointer_P and getArrayPointer_C in order to assure that the
consistency of calculations is preserved, despite multiple paths sharing
information. In addition, Algorithm 6 contains code to normalize
probabilities. The normalization retained for a technical reason (to
avoid floating-point underflow), and will be expanded on shortly.

[0092] Note that the "fresh pointer" condition imposed indeed holds. To
see this, consider first Algorithm 14. The key point to note is that
neither the killPath nor the clonePath function is called from inside the
algorithm. The same observation holds for Algorithm 15. Thus the "fresh
pointer" condition is met, and Lemma 4 holds.

[0093] Consider next the normalization step carried out in lines 21-27 of
Algorithm 14. Recall that a floating-point variable cannot be used to
hold arbitrarily small positive reals, and in a typical implementation,
the result of a calculation that is "too small" will be rounded to 0.
This scenario is called an "underflow".

[0094] Previous implementations of SC decoders were prone to "underflow".
To see this, consider line 1 in the outline implementation given in
Algorithm 2. Denote by Y and U the random vectors corresponding to y and
u, respectively. For .di-elect cons. {0,1},

[0095] Recall that φ iterates from 0 to n-1. Thus, for codes having
length greater than some small constant, the comparison in line 1 of
Algorithm 2 ultimately becomes meaningless, since both probabilities are
rounded to 0.

[0096] Preferred embodiments provide a fix to this problem. After the
probabilities are calculated in lines 5-20 of Algorithm 14, normalize the
highest probability to be 1 in lines 21-27. The correction does not
guarantee in all circumstances that underflows will not occur. However,
the probability of a meaningless comparison due to underflow will be
extremely low.

[0097] Apart from minimizing risk of overflows, normalization does not
alter the algorithm. The following lemma formalizes this claim.

[0098] Lemma 5: Assume "perfect" floating-point numbers. That is,
floating-point variables are infinitely accurate and do not suffer from
underflow/overflow. Next, consider a variant of Algorithm 14, termed
Algorithm 14', in which just before line 21 is first executed, the
variable σ is set to 1. That is effectively, there is no
normalization of probabilities in Algorithm 14'.

[0099] Consider two runs, one of Algorithm 14 and one of Algorithm 14'. In
both runs, the input parameters to both algorithms are the same.
Moreover, assume that in both runs, the state of the auxiliary data
structures is the same, apart from the following.

[0100] Recall that the present algorithm is recursive, and let
λ0 be the first value of the variable λ for which line 5
is executed. That is, λ0 is the layer in which (both)
algorithms do not perform preliminary recursive calculations. Assume at
this base stage λ=λ0, the following holds: the values
read from P.sub.λ-1 in lines 15 and 20 in the run of Algorithm 14
are a multiple by α.sub.λ-1 of the corresponding values read
in the run of Algorithm 14'. Then, for every
λ≧λ0, there exists a constant α.sub.λ
such that values written to P.sub.λ in line 27 in the run of
Algorithm 14 are a multiple by α.sub.λ of the corresponding
values written by Algorithm 14'.

[0101] Proof: For the base case λ=λ0 inspections shows
that the constant α.sub.λ is simply
(α.sub.λ-1)2, divided by the value of a after the main
loop has finished executing in Algorithm 14. The claim for a general
λ follows by induction.

[0102] C. High-Level Functions

[0103] Consider the topmost function, the main loop given in algorithm 16.
Line 1 and 2, provide that the condition "initialized" in Definition 1 is
satisfied. Also, for the inductive basis, the condition "balanced" holds
that for t=1 at the the end of line 2. Next, notice that lines 3-5 are in
accordance with the "fresh pointer" condition.

[0104] The main loop, lines 6-13, is the analog of the main loop in
Algorithm 5. After the main loop has finished, the algorithm selects (in
lines 14-16) the most likely codeword from the list and returns it.

[0105] Algorithms 17 and 18 are now introduced. Algorithm 17 is the analog
of line 6 in Algorithm 5, applied to active paths.

[0106] Algorithm 18 is the analog of lines 8-11 in Algorithm 5. However,
now, instead of choosing the most likely fork out of 2 possible forks, it
is typical to need to choose the L most likely forks out of 2 L possible
forks. The most interesting line is 14, in which the best ρ forks are
marked. Surprisingly, this can be done in O(L) time (See, Section 9.3 of
T. H. Cormen, et al., "Introduction to Algorithms, 2nd ed. Cambridge,
Mass.: The MIT Press (2001)). The O(L) time result rather theoretical.
Since L is typically a small number, the fastest way to achieve the
selection goal would be through simple sorting. After the forks are
marked, first kill the path for which both forks are discontinued, and
then continue paths for which one or both are the forks are marked. In
case of the latter, the path is first split. The procedure first kills
paths and only then splits paths in order for the "balanced" constraint
(13) to hold. This provides a limit of L active paths at a time.

[0107] A primary function of Algorithm 18 is to prune the list and leave
only the L "best" paths. This pruning is performed using the accumulated
likelihood of each path. The accumulated likelihood is stored in the
"probForks" array of Algorithm 18. The selection of the L "best" paths is
conducted on line 14 of Algorithm 18. Selection of the L "best" paths is
indeed achieved, in the following sense. At stage φ to rank each
patch according to the probability

Wm.sup.φ(y0n-1,u0.sup.φ-1|u.sub.φ).

[0108] By (9) and (11), this would indeed by the if the floating point
variables were "perfect", and the normalization step in lines 21-27 of
Algorithm 14 were not carried out. By Lemma 5 this is still the case if
normalization is carried out.

[0109] In Algorithm 19, the most probable path is selected from the final
list. As before, by (9)-(12) and Lemma 5, the value of
Pm[0][Cm[0][1]] is simply

[0113] Proof: All the data structures of the list decoder are allocated by
in Algorithm 8, and it can be checked that the total space used by them
is O(Ln). Apart from these, the space complexity needed in order to
perform the selection operation in line 14 of Algorithm 18 is O(L).
Lastly, the various local variables needed by the algorithm take O(1)
space, and the stack needed in order to implement the recursion takes
O(log n) space.

[0114] Theorem 7: the running time of the SCL decoder is O(Ln log n).0

[0115] Proof: Recall that m=log n. The following bottom-to-top table
summarizes the running time of each function. The notation O.sub.Σ
will be explained after.

[0116] The first 7 functions in the table, the low-level functions, are
easily checked to have the stated running time. Note that the running
time of getArrayPointer_P and getArrayPointer_C is due to the copy
operation in line 6 of Algorithm 6 applied to an array of size
O(2m-λ). Thus, as we previously mentioned, reducing the size
of the arrays has helped to reduce the running time of the list decoding
algorithm.

[0117] Next, consider the two mid-level functions, namely, recusivelyCalcP
and recursivelyUpdateC. The notation

recursivelyCalcP(m,) .di-elect cons. O.sub.Σ(Lmn)

[0118] means that total running time of the n function calls

recursivelyCalcP(m,φ),0≦φ<2'

[0119] is O(Lmn). To see this, denote by f(λ) the total running time
of the above with m replaced by λ. By splitting the running time of
Algorithm 14 into a non-recursive part and a recursive part for
λ>0

f(λ)=2.sup.λO(L2m-λ)+f(λ-1).

[0120] Thus, it follows that

f(m) .di-elect cons. O(Lm2m)=O(Lmn).

[0121] In essentially the same way it can be proven that the total running
time of the recursivelyUpdateC(m, φ) over all 2n-1 valid (odd)
values of φ is O(mn). Note that the two mid-level functions are
invoked in lines 7 and 13 of Algorithm 16, on all valid inputs.

[0122] The running time of the high-level functions is easily checked to
agree with the table.

[0123] Modified Polar Codes

[0124] The plots in FIGS. 5A and 5B were obtained by simulation. The
performance of the decoder for various list sizes is given by the solid
lines in the figure. As expected, as the list size L increases, the
performance of the decoder improves.

[0125] A diminishing-returns phenomenon is noticeable in terms of
increasing list size. The reason for this turns out to be simple.

[0126] The dashed line, termed the "ML bound" was obtained as follows.
During simulations for L=32, each time a decoding failure occurred, a
check was conducted to see whether the decoded codeword was more likely
than the transmitted codeword. That is true whether W(y|c)>W(y|c). If
so, then the optimal ML decoder would surely misdecode y as well. The
dashed line records the frequency of the above event, and is thus a
lower-bound on the error probability of the ML decoder. Thus, for an SNR
value greater than about 1.5 dB, FIG. 1 suggests an essentially optimal
decoder is provided when L=32.

[0127] Better performance seems unlikely at least for the region in which
the decoder is essentially optimal. However, a modified polar code of a
preferred embodiment dramatically improved performance can be achieved.

[0128] During simulations, when a decoding error occurred, the path
corresponding to the transmitted codeword was often a member of the final
list. However, since there was a more likely path in the list, the
codeword corresponding tot that path was returned, which resulted in a
decoding error. Thus, intelligent selection at the final stage can
specify which path to pick from the list, then the performance of the
decoder can be improved.

[0129] Such intelligent selection can be implemented with preferred
embodiments that provide a modified polar code. Recall that there are k
unfrozen bits that are free to be set. Instead of setting all of them to
information bits to be transmitted, a concatenation scheme is employed.
For some small constant r, set the first k-r unfrozen bits to information
bits. The last r bits will hold the r-bit CRC (See Section8.8 of W. W.
Peterson and E. J. Weldon, Error-Correcting Codes, 2nd ed. Cambridge,
Mass.: The MIT Press, 1972) value of the first k-r unfrozen bits. A
binary linear code having a corresponding k×r parity-check matrix
constructed as follows will perform well. Le the first k-r columns be
chosen at random and the last r columns be equal to the identity matrix.
This concantated encoding is a variation of the polar coding scheme that
provides an important functionality for intelligent selection, while
being minor from the perspective of the polar code structure. The
concentration incurs a penalty in rate, since the rate of the code is now
(k-r)/n instead of the previous k/n.

[0130] What is gained is an approximation to a perfect selector at the
final stage of decoding, instead of calling the function
findMostProbablePath in Algorithm 19, do the following. A path for which
the CRC is invalid can not correspond to the transmitted codeword. Thus,
refine the selection as follows. If at least one path has a correct CRC,
then remove from the list all paths having incorrect CRC, and then choose
the most likely path. Otherwise, select the most likely path in the hope
of reducing the number of bits in error, but with the knowledge that
there is at least one bit in error.

[0131] FIGS. 1 and 2 contain a comparison of decoding performance between
the original polar codes and the preferred concantated polar codes of the
invention. A further improvement in bit-error-rate (but not in
block-error-rate) can be obtained when the decoding is performed
systematically as in E. Arikan, "Systematic polar coding," IEEE Commmun.
Lett., vol. 15, pp. 860-862, (2011).

[0132] Advantageously, when the preferred algorithm finishes it outputs a
single codeword. In addition, its performance approaches an ML decoder
even with modest L values.

[0133] The solid line in FIG. 1 correspond to choosing the most likely
codeword from the list as the decoder input. As can be seen, this choice
of the most likely codeword results in a large range in which the present
algorithm has performance very close to that of the ML decoder, even for
moderate values of L. Thus, the sub-optimality of the SC decoder indeed
does play a role in the disappointing performance of polar codes.

[0134] The invention also shows that polar-codes themselves are weak.
Instead of picking the most likely codeword from the list, an intelligent
selector can select the codeword in the list that was the transmitted
codeword (if the transmitted codeword was indeed present in the list).
Implementing such an intelligent selector turns out to be a minor
structural change of the polar code with a minor penalty in preferred
embodiments, and entails a modification of the polar code. With this
modification, the performance of polar codes is comparable to state of
the art LDPC codes, as can be seen in FIG. 2.

[0135] FIG. 3 shows that there are LDPC codes of length 2048 and rate 1/2
with better performance that the present polar codes. However, to the
best of our knowledge, for length 1024 and rate 1/2, the present
implementation is slightly better than previously known codes when
considering a target error rate probability of 10-4.

[0136] While specific embodiments of the present invention have been shown
and described, it should be understood that other modifications,
substitutions and alternatives are apparent to one of ordinary skill in
the art. Such modifications, substitutions and alternatives can be made
without departing from the spirit and scope of the invention, which
should be determined from the appended claims.

[0137] Various features of the invention are set forth in the appended
claims.