I am planning to revert this change. This works with tests like test/CodeGen/AArch64/copy-zero-reg.ll. However, if there are multiple branches, this patch degrades in performance due to large number of mov instructions in each fall through. Here's an example of test case, where it degrades in performance.

Apr 3 2018

I think we should ask backend if we want to gang up loads and stores, and if we do want to gang up loads and stores, then how many should we be ganging up together. Ganging up lots of loads may result in high register pressure.