Thread: Enhancing the concept of fountain codes for better noise handling (joint correction)

1. Enhancing the concept of fountain codes for better noise handling (joint correction)

It is not exactly a data compression topic, but there are related problems here, like creating a few separate streams for lossy video compression of a single file, such that any subset of them would allow to decode the video, and the more streams we have, the better quality.

Probably many of you have met some fountain codes, also referred as rateless erasure codes – encoding a message as some number of packets, such that any large enough subset of them would allow to decode the message (a smaller subset is usually useless). Older approaches were usually based on Reed-Solomon method: that coefficients of degree d polynomial can be inferred from values in any d+1 points. In 2002 Luby has started more efficient LT codes: building packets as XOR as subsets of data blocks, chosen in a pseudorandom way with optimized probability distribution for subset sizes.

It allows to omit the costly problem of retransmission: we just wait until we obtain a large enough set of packets. However, there is usually also another issue: a noise damaging our packets (making them rather useless for pure fountain codes). We can protect against it by adding some forward error correction (FEC): attach some redundancy to each packet, which will allow to handle some noise level. However, the exact noise pattern is chosen stochastically, so sometimes we unnecessarily use too much redundancy, and sometimes insufficiently – losing the whole packet.
This suboptimality can be handled by enhancing the concept of rateless erasure codes to allow also for some noise level for received packets. Assuming binary symmetric channel (BSC): that each bit has epsilon independent probability of being flipped (0 <-> 1), the theoretical informational content of such damaged packet is 1 – h(epsilon) bits/position (where h(p) = -p lg(p) – (1-p) lg(1-p) is Shannon entropy). So theoretically we could perform a joint correction/reconstruction of such packets if the sum of "1 – h(epsilon)" exceeds the message size:

I have just implemented such joint reconstruction codes (JRC), reconstructing the message from all available scraps of information, using correction trees approach (like sequential decoding for convolutional codes, but with larger: 64 bit state): https://github.com/JarekDuda/JointReconstructionCodes
It is in an early experimental stage now, but I would be grateful for any comments.
The reconstruction is much more demanding than for pure fountain codes (without FEC), using a larger number of packets for this purpose could be impractical – so it might be beneficial to concatenate it with some fountain codes.

Have you met something like that before?
What real life applications could you think of?

2. Thanks:

Cyan (24th September 2014)

3. These are very interesting issues, and you should continue your journey even if there is no immediate practical application right now. You are getting better at it, and that's what matters.

As far as I know, fountain code, although not new, can nonetheless be considered "highly specialized"; I can't count the number of ridiculous transmission protocols which just repeat their messages in the hope to be listened at least once (especially within renewed proliferation of "narrowband M2M" protocols). At the very least, fountain code would improve the situation quite drastically.
Associating with FEC is basically standard.

Regarding joint correction : your proposal seems to improve the amount of useful data that can be extracted from a fixed amount of transmission. In theory, it's good news, but as I don't know LT code, I can't tell if a "part of a block" is nonetheless useful enough for reconstruction.
The key issue is that it is associated to much higher computation cost, so it's difficult to tell if this is really beneficial. In the world of IoT, probably not, since the objective is to reduce energy costs. For high bandwidth transmission though (TV and satellites come to mind), it could be. I would probably start considering satellites, since that's where bandwidth costs are higher.

Regards

4. Thanks:

Jarek (25th September 2014)

5. Hi Yann,
My journey with "correction tree approach" (connecting redundancy or freedom using a large pseudorandom state (64 bit)) around known or new situations of information theory is nearly over - after correcting BSC, constrained coding (encoding with constraints known only to the sender, can be also used for rate distortion), deletion channel and now this enhancement of fountain codes, the only interesting (?) left is distributed source coding - I will probably eventually write it if I will find some nice new application.
It is difficult to call JRC as "correction" as we don't want to correct the packets - instead, we go straight to reconstructing the message from all available pieces of information.

Regarding protocols - I can believe that there are still used wasteful ones, like the extreme: simple split and and send copies until almost certainly the receiver has obtained the whole set. From the other side, retransmission means waiting for classifying a packet as lost ... pure fountain codes would already greatly improve situation in both cases.
Regarding noise for LT and other fountain codes - they construct packets as XOR of some subset of blocks of the message. So each bit-flip would propagate to multiple blocks - making the situation even worse.

Regarding computational cost, if noise is tiny or none, this cost would be also tiny - just any N+1 slightly damaged blocks would allow to quickly reconstruct the message.
For larger noises, you would already need to attach a (costly!) forward error correction to fountain codes - joint reconstruction is exactly to make this part more efficient: intuitively connect redundancies of packets, allowing to transfer abundant redundancy from less damaged packets, to help those more damaged.
This JRC can be probably also made using some LDPC, however correction trees often turn out more effective (from http://arxiv.org/abs/1204.5317 ):

It still needs more work - from set of tests, through optimizations, to better theoretical understanding - I will probably do a part of it in some future ... but generally this approach needs a larger research group ...

Regards

6. I'm a novice at this stuff, but it sounds like what you're describing is a way to get error correction and reliable transmission with a single code rather than two different codes. That is great.

Assuming an application like streaming video, the ultimate encoding would not only allow reconstructing the signal from any n packets, but also degrade gracefully below n packets the way analog would. I don't think I can help much with that...

7. Assuming an application like streaming video, the ultimate encoding would not only allow reconstructing the signal from any n packets, but also degrade gracefully below n packets the way analog would.
Unfortunately, the entire point of error correction algorithms is to transform an otherwise slow degradation into a guarantee of perfection, up to a certain threshold. The unavoidable side effect is that, beyond the threshold, the error cliff is abrupt.
All approaches to better error correction try to make the threshold farther and less costly, making the error cliff even more abrupt in the process.

8. nburns, it indeed can seen as merging two processes: of error correction and of reconstruction, but there are some essential differences:
1) we don't even try to correct the packets, instead we go directly to reconstructing the message from all available pieces of information,
2) this joint reconstruction makes it more effective. In fountain codes + FEC you would separately protect each packet: add some redundancy level which can protect up to some noise level. But the actual damage pattern is stochastic - we need to set this redundancy lever higher and still there remain some probability that it will turn out insufficient - loosing the whole packet.
In JRC we connect redundancy of all packets, making that the limit for successful repair is set for the set of packets as a whole, instead of individually for each of them. So intuitively we can transfer abundant redundancy between packets when needed.
3) in JRC we use the fact that the set of packets is already redundant, while in fountain codes + FEC we need to attach some additional (redundant ) redundancy. Generally concatenating error correction codes means drastic performance loss - kind of their inefficiencies are multiplied.

Regarding video compression - indeed the perfect situation would be that you can get a poor quality from any single stream, and the quality should improve when you get more of them (no matter which).
However, it is more complicated than e.g. loading succeeding low->high frequencies improving quality e.g. in JPEG2000. Here each stream would also need to carry low frequencies - but somehow "different aspects of them" - there would be rather needed a sophisticated rate distortion scheme.
In rate distortion you choose a set of codewords you will to use to approximate bit sequences. So the space of all bit sequences (hypercube) is covered by balls (usually Hamming distance) with centers in codewords - sequences from a ball are approximated (encoded) as their centers.
Here each stream should use a different rate distortion (different centers) and knowing a few of them (having a few streams), you can get better approximation: as average of all centers ... but it sounds really difficult to do it well.

9. Thanks:

nburns (25th September 2014)

10. I have just finished a paper about JRC: https://dl.dropboxusercontent.com/u/12405967/joint.pdf

So the basic income is that while in standard error correction we need to know what redundancy/protection level will be required – we need a priori knowledge of the damage level, JRC allows the same using only a posteriori information about the damage level. There is no artificial addition of redundancy, instead the packets are treated as simultaneously data and redundancy.

Some examples of application – situations when we have incomplete or no a priori knowledge of the final damage level, hence standard approach requires costly overprotection, which still can be insufficient:

• Storage media – the final damage depends on age, conditions, random accidents. For example imagine 1000 DVD copies of a movie – they will degrade, finally exceeding the applied protection/redundancy level, making all of them useless. In contrast, JRC would allow to reconstruct the original content from some number of damaged ones, where the number depends of the actual damage level,
• Various networks – while sending packets we usually don’t control the route they will choose, and different routes can correspond to a different damage level. So standard approach requires performing error correction for each link of the network – increasing hardware cost and energy consumption of nodes (e.g. battery-operated sensor networks). JRC allows to shift all error correction to a single correction+reconstruction performed by the receiver,
• Watermarking – while embedding we don’t know e.g. which frames of a movie, or what capturing conditions will be used. In JRC approach we should just spread the information in packets,
• Rapidly varying transmission conditions, e.g. radio or acoustic – the receiver has usually a better (a posteriori) information about the actual damage level.

I would be grateful for any comments. What other applications could you think of?

11. Thanks:

Cyan (20th May 2015)

12. New version: http://arxiv.org/pdf/1505.07056
lots of polishing and additions, I plan to submit it in a month - I would be grateful for any comments.

1) broadcasting problem (missing basic application): while every packet (receiver) can have a different damage level, the broadcaster has to prepare universal packets - cannot take into consideration individual damage levels of every packet/receiver.
The question of achievable rate in this case seems highly nontrivial - standard approach requires not only knowledge of individual damage levels, but also individually preparing packets (adding redundancy accordingly to the damage level), what is unavailable in the broadcasting problem.
However, JRC allows to achieve optimal rates here: as if the broadcaster would use all individual damage levels.

2) 3 level decoding for storage media. For example we could write DVD disks (or HDDs ...), such that there would be basically 3 options:
- fast straightforward decoding from any single undamaged disk,
- error correction from any single disk if damage is low,
- for high damage level, error correction from multiple disks.

3) Analyzed idealized unknown noise channel problem: large number of channels, each of them has a random epsilons \in [0,1/2] bit-flip probability chosen with uniform probability distribution, the epsilons are unknown to the sender.
Theoretical rate limit if the sender would know the epsilons would be integral of (1-h(epsilon)) ~ 0.27865 (bits/potentially damaged bit)
- in standard fountain codes + forward error correction, using a constant level of redundancy, packets below this level are corrected, the remaining have to be discarded. The lack of some packets (discarded) is handled by fountain code.
The optimal rate is 0.11712 here for choosing forward error correction for epsilon = 0.15455 damage level.
So it is far from what should be possible for this situation.
- pure JRC allow to asymptotically approach the 0.27865 rate, but in practice it is limited to a relatively small number of packets.
Hence, it is beneficial to concatenate JRC with fountain codes ... Section 5.

4) Interleaving: to relax limitation of the number of packets to reconstruct from.

ps. Correction possibilities for storage media and broadcasting - unavailable in standard approach (encoder needs to know damage level to choose redundancy level), but available for JRC (decoder uses packets accordingly to their damage levels):

13. Thanks:

Cyan (20th August 2015)

Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•