After investigation, I found what the problem is: read memory outside of the bound of the array pointed to by register R1 in MACRO h264_chroma_mc8 or MACRO h264_chroma_mc4 in libavcodec/arm/h264dsp_neon.S (verion 0.8.10) or libavcodec/arm/h264cmc_neon.S(version 0.11.1). I fixed the bug by modifying those two macros. Here is updated macros in version 0.8.10:

As shown in the code, register R1 points to ARRAY src (type is uint_t*). The idea in the modification is to test if register R3 (ARGUMENT h in caller of C program) is less than or equal to zero before reading elements pointed to by registe R1. If it is, then skip reading and jump to the end of function.

I tested the code using several videos, and it works. For version 0.11.1, the modification is the same.

This patch may slow the code down a little bit, but it works. The original code seems too ambitious to be correct. Anyway, it is just my proposal. If someone have more efficient code to fix the bug, I will appreciate it.

Ive made a mistake during testing, the patch is not working, it breaks h264 decoding as can be seen with "make fate" with --samples=...
Thus patch reverted, please submit a working patch and make sure that speed loss is kept to a minimum if it cannot be entirely avoided