The choice as to which levels in a memory hierarchy are exposed within a programming language or API can be critical. Expose too many, and you risk programmability, and performance portability.

Heterogeneous computing and GPGPU aims to repurpose the data-parallel capability of graphics and commodity hardware for general calculations. GPGPU APIs, which now include OpenCL SYCL; Apple's Metal; and Qualcomm's MARE; must all decide on a suitable abstraction for hardware memory levels. Established GPGPU APIs such as CUDA, C++AMP, and OpenCL offer language support for four levels of volatile memory. However, while the presence of GPUs are now essentially ubiquitous, the diminished role of discrete graphics cards invigorates questions regarding memory abstraction.

The multicore revolutionaries have now ceded mobile computing to the CPU-GPU system-on-chips; firmly established in mainstream options such as the Qualcomm Snapdragon; Samsung Exynos; and the AMD APU series. Meanwhile, the HSA Foundation builds upon a bedrock of uniform memory access; the Android GPGPU API, Renderscript, eschews explicit memory address spaces; and CUDA now offers "unified" memory. Can caché once again mean hidden?