Authors

Abstract

As computation becomes increasingly limited by data movement and energy consumption, exploiting locality
throughout the memory hierarchy becomes critical to continued performance scaling. Moving computation closer
to memory presents an opportunity to reduce both energy and data movement overheads. We explore the use of 3D
die stacking to move memory-intensive computations closer to memory. This approach to processing in memory
addresses some drawbacks of prior research on in-memory computing and is commercially viable in the foreseeable
future.

Because 3D stacking provides increased bandwidth, we study throughput-oriented computing using programmable
GPU compute units across a broad range of benchmarks, including graph and HPC applications. We also introduce a
methodology for rapid design space exploration by analytically predicting performance and energy of in-memory
processors based on metrics obtained from execution on today's GPU hardware. Our results show that, on average,
viable PIM configurations show moderate performance losses (27%) in return for significant energy efficiency
improvements (76% reduction in EDP) relative to a representative mainstream GPU at 22nm technology. At 16nm
technology, on average, viable PIM configurations are performance competitive with a representative mainstream
GPU (7% speedup) and provide even greater energy efficiency improvements (85% reduction in EDP).