I have an evaluation board AT91SAM9M10-G45-EK and have made a dozen of them by ourselves. The main chips, like CPU (SoC), nand flash, DDR, come from our supplier.

The software architecture is bootstrap + uboot + linux (2.6.30). Most of the mimic work well, while several don't boot successfully. The issue is that one or two bits of local variables on stack (DDR) are incorrect. It usually happened on nand ECC calculation, if enable ECC soft. About 4% nand page ECC calculation hit ECC error, that is the calculated ECCs are not equal to the ECCs read from nand. When the ECC error occurs, we invoke nand_calculate_ecc function again, the second result mostly is equal to the ECC read from nand. We digged into the local variables on stack (DDR) for the two ECC calculation processes and noticed they are different. The following log shows bit23~20 are incorrect on some variables on the two ECC calculations, like B7D36895 and B7936895, 24830BA6 and 24C30BA6.

We have made some experiments, like exchanging the SoC chip, nand flash and DDR on boards. It looks like the issue exactly follows SoC chips, which means the issue always happens on whichever boards the specific SoC chips go.

My questions are:
1. Are those SoC chips defective, fake or anything else?
2. Are there some variations for SAM9M10-G45 and ways to detect the variations, so that we can fix or patch them from Linux community by software.
3. If those SoC chips are defective or fake, are there any easy method to detect them? By now we detect the issue by exchanging them on boards and it takes too much effort...