Increasing robustness of runlength decoding for scantable access in mpeg12.c

Description

When decoding australian MPEG2 video broadcast with many stream errors the decoder causes a memory error crash because the index into the scan table is far outside the array.
The simple dirty fix is to mask the index into the scan table to a value between 0 and 63 to prevent memory access error.
The attached diff files documents the required patch.
As a result the robustness of the decoder has increased.

Yes, but it is somewhere in a 4 GByte recording. Crashing depends on OS and memory allocation so I had great difficulty to get a reproducable crash, any change of code or code ordering can make it disappear and come back randomly in other corrupted recordings.
Libmpeg2 has a similar robustness precaution. There it is better implemented. They explicitly check on values <0 or >= 64 and have a fallback in case the runlength decoding is corrupt. My proposal is rude but simple.

On posting on ffmpeg-devel. I hesitate to claim the status to be able to contribute there.

Even if the crash is not reproducible a valgrind error usually is.
And you _should_ be able to extract a small sample from the place it crashes and it should still crash (though it might cost some time to find the right spot).
Concerning the patch: The maintainer will have to say, but it might make more sense to just move the existing i > 63 check up so it is done before the access instead of after.
Though maybe even better just extend the ScanTable? struct so we can always read some more data - if "run" can e.g. never become more that 64 an extra 64 bytes at the end would fix it. Has the advantage of possibly helping for other codecs, too, should they have similar issues.
And FFmpeg has a "fallback" for such corruption, it is called error concealment and it is run afterwards and I don't remember anything that would indicate libmpeg2 is any better at it (which doesn't mean too much though).

I will put some range check debug message writing code before the scan table access to quickly detect where the error is so I do not need to replicate a crash.
The wrong index was -31023 or something in that range so adding a bit to the table won't help.

I instrumented the code to log the illegal values.
I use avformat for the demuxing and avcodec for decoding.
Will run the patched SW on a week worth of recording on 11 channels.
If there is no more crashing (and there used to be a lot) then at least I am happy
But there is of course the possibility that this index error is a side effect of another memory corruption error, then I am back to square 1

Part of the scantable access is done by type array indexing but the last use case is done by pointer arithmic

Valgrind runs the executable on an emulated CPU. What the source code looks like has no relevance. However valgind can sometimes miss a bad access if there is valid memory, but it is rather unlikely.

The wrong index was -31023 or something in that range so adding a bit to the table won't help.

The run comes out of a VLC table. The VLC table should never contain an entry for a run length outside the 0 - 63 range or something like that.
Thus I can't see how you should be able to get that value except by corrupt VLC tables.
Haven't yet tested the sample.

Don't you think it would have helped a lot if you had mentioned you used the "fast" flag?
While its effect for decoding may not be fully documented the point if "fast" is to decode as fast as possible. It is not intended to be used for corrupted streams. As long as it is not exploitable crashing is perfectly acceptable if it allows for faster code, and error concealment is unlikely to work.

Your remark clearly proves I should NOT post on the developers mailing list.
Do accept my apologies.
As the users of Comskip want fastest possible processing I always have "fast" set. Also lowres is either 1 or 2. But if possible I would like to prevent crashing on off-air recordings so I am perfectly happy with the small check I added as it stopped all crashing on many terrabytes of DVB-T recordings.

Do you have any numbers on how much "fast" helps? MPEG-2 decoding should be really cheap by today's standards and I wouldn't expect it to really help much.
Worse, with errors as in your case it will even process a large amount of data for no purpose.
My recommendation would be
1) try without fast flag. You get error concealment and it might not have a relevant cost. (note: only for decoding, if you are also encoding you should keep it for that part).
2) Should it be relevantly slower, try leaving the fast flag in but use the non-fast functions for this case.
3) Add the > 63 check as in the other cases instead. It should be only little to not at all slower than your change for the normal decoding case and it should be _much_ faster for the error case (if you reached values like 7000 that would indicate that you looped around that loop about 100 times more than necessary).

And I don't want you to apologize, and I don't want you to give up because I sometimes get a bit annoyed. But it is a really good idea for you to be aware of any non-default configuration you have chosen and to mention it in bug reports. And also to make sure you have a solid justification for it, since they might cause you a significant maintenance cost in the long term. Apart from that there clearly is a lack of documentation we are to blame for.