The code's problem is that it reads the memory backward and forward and the memory cache
cannot follow the code's read timings. To resolve this problem, we should rearrange the
arrays using the following code.

Performance Tools

You can use gprof if you are using GCC. You should search the hot spot before you optimize your code
using SIMD and so on. Otherwise, your effort may not work well.

SIMD optimization

Next, we can optimize this code using SIMD.
You can learn AoS(Arrays-of-Structure) and SoA(Structure-of-Arrays)
from the Intel or AMD's web site, so please read them if you want to know more about them.