I'm not an expert on thread memory fundamentals, but I did find something interesting. If you add the 1M scalars to the array from within the threads, IT WILL gain the memory you seek. I had to reduce
the number from 100 to 50, because at 100, it was killed by the kernel.