HDMI connection causes crash on i.MX6 Solo

On 50% or so of our custom i.MX6 solo boards we're seeing a crash when an HDMI monitor is connected. The crash occurs once the system is fully booted and the HDMI cable is inserted. There is no useful information spit out on the console port, and the crash seems to be very low-level as the user LED (tied to a hearbeat trigger) stops flashing. Additionally, the kernel will halt during boot if an HDMI cable/monitor is connected with no useful output (see attached boot log).

Furthermore, i've been able to determine that its not just an HDMI monitor (of which I've tried many different types) being connected which causes the crash. I can cause the crash simply by pulling the HDMI HPD pin to 5v.

Lastly, and hopefully this is the smoking gun: Normally we use a kernel with a bundled initramfs which mounts/loads a squashfs and an aufs overlay rootfs. The crash does not occur on boards that are known to fail if I use a kernel that does not have a bundled initramfs, even though the version and defconfig are identical (obviously the boot does not complete, but it fails at the init as expected and I get an image on the HDMI).

My bundled uImage is ~9MB in size. The uImage is loaded to 0x12000000, dtb loaded to 0x18000000. Linux is based on the 3.10.17 GA release, as is the rootfs and initramfs (built in Yocto).

So, my questions are:

1.) Why would a bundled initramfs cause this failure? When no HDMI is connected it seems to operate just fine.

2.) Why does this only occur on 50% of our boards? Boards from the same batches will work or fail, there does not seem to be a common hardware issue.

Thanks for the suggestion. I tested the offending board with the DDR3 Stress Tester using the calibration values I previously obtained and put in U-boot, and the board passed 100% over a couple hours of testing.

Also, mtest in U-boot seemed to pass easily over a range of memory values.

Do you think thats sufficient for testing memory or do I need to run something like stressapptest? Unfortunately running in an NFS is tough as we don't have an ethernet connection, just WiFi.

I ran memtester and was unable to generate any errors after a few iterations of testing. I'm currently working on getting the stressapptest into my build and up and running, hopefully that yeilds something different.

Regarding the layout, we did have FSL verify it and I know it meets all the design recommendations. In terms of speed, I'm currently running at 400mhz.

Unfortunately the NFS mount isn't much of an option in the near future, as I'll need to order the adapter. In the meantime, can you think of other ways I might be able to trigger burst mode?

Agreed, I've worked around this issue on some of our boards but it would be nice to know why it doesn't affect all of them.

The root cause appears to be a a HDMI PHY Frame Composer Overflow interrupt storm in Linux when HDMI has already been enabled by u-boot. The rate of the interrupt seems to very by temperature on some of my boards. If the rate is slow enough the kernel will recover, fix the interrupt and boot.