As transistor scaling is coming to a halt, systems today are becoming more and more power limited. Given recent trends in increasing network sizes and the need to process more data (such as Deep Learning and Big Data applications), the cost to store and move data around in a system can far exceed the computation cost by energy overheads over 80%.

Recently, there has been an emergence of interest in the field of Approximate Computing, which explores the performance (accuracy) of an algorithm with reduced precision. Convolutional Neural Networks (ConvNets) are one example of a class of stochastic algorithms which can tolerate reduced precision for little degradation in algorithmic performance. Recent work in hardware accelerators for ConvNets and simulations in fixed-point representation indicate we can use much lower bit precisions than conventional GPU 64/32-bit floating point precision. Since power and area scale with precision (number of bits), this implies we can achieve significant energy/area savings by exploiting the algorithm’s tolerance to noise.

We propose to reduce the system energy by exploiting error tolerance of the algorithm using approximate memory and interconnect communication design. From a memory designer’s perspective, this is rarely considered a viable option since most general purpose systems require robust storage and communication. By designing application-specific memory, however, we can achieve orders of magnitude improvement in energy, area, and performance. We propose to further improve the system’s classification performance by embedding known circuit nonidealities (i.e noise and coupling) into the algorithm’s training phase to better model translation from software to hardware.