Deep neural networks have shown great powerfulness in intelligent tasks such as image/speech recognition, but they heavily rely on the power-hungry hardware platforms such as GPU for training/inference at the cloud. The bottleneck of the computational and energy efficiency is the back-and-forth data transfer between the memory units and the computational units. Therefore, a shift in the computing paradigm towards “compute-in-memory” is promising to minimize the data transfer, and potentially enable the training/inference at the low-power mobile and edge devices.

In this talk, we will present our recent progresses in this direction which published in the top-tier conferences [IEDM 2017][ISSCC 2018][DATE 2018][DAC 2018]. The key idea of our design is to use the bitline of the memory array to sum up the analog current which effectively realizes the vector-matrix multiplication in parallel fashion, thereby eliminating the row-by-row multiply-and-accumulate (MAC) operations. First, we designed “inference” engines. For CMOS implementations, we proposed a 8T XNOR bit-cell and realized the parallel computation in SRAM arrays, and we successfully taped-out prototype chips in 65nm TSMC process and achieved >60 TOPS/W energy efficiency for dot-product. For post-CMOS implementation, we proposed a 2T2R bit-cell and realized parallel computation in RRAM arrays, and we also taped-out prototype chips in 90nm process with monolithic integration of RRAM on top of the CMOS substrate. A series of design considerations such as multilevel sense amplifier and nonlinear quantization for partial sum are applied to minimize the degradation of the inference accuracy less than 2% on CIFAR-10 dataset. Second, we will discuss the desired characteristics of the resistive synaptic devices for “online training”. We will discuss the design considerations in the resistive crossbar array design including the selector and the compact oscillation neuron device at the edge of the array, and we will show our array-level experimental demonstrations for implementing the convolution kernel. Finally, we will introduce “NeuroSim”, a device-circuit-algorithm co-design framework to evaluate the impact of non-ideal device effects (e.g. the weight update asymmetry/nonlinearity, the reliability effects) on the system-level performance (i.e. learning accuracy) and trade-offs in the circuit-level performance (i.e. area, latency, energy). This talk will be concluded with a holistic view of my research vision from materials/device engineering, and circuit/architecture co-optimization for developing hardware accelerators with emerging nanoelectronic devices.

About Shimeng Yu:

Shimeng Yu received the B.S. degree in microelectronics from Peking University, Beijing, China in 2009, and the M.S. degree and Ph.D. degree in electrical engineering from Stanford University, Stanford, CA, USA in 2011, and in 2013, respectively. He is currently an assistant professor of electrical engineering and computer engineering at Arizona State University, Tempe, AZ, USA.

His research interests are emerging nano-devices and circuits with a focus on the resistive memories for different applications including machine/deep learning, neuromorphic computing, monolithic 3D integration, hardware security, radiation-hard electronics, etc. He has published >70 journal papers and >100 conference papers with citations >5500 and H-index 34.Among his honors, he is a recipient of the DOD-DTRA Young Investigator Award in 2015, the NSF Faculty Early CAREER Award in 2016, the ASU Fulton Outstanding Assistant Professor in 2017 and the IEEE Electron Devices Society Early Career Award in 2017. He serves the Technical Program Committee for IEEE International Symposium on Circuits and Systems (ISCAS) 2015-2017, ACM/IEEE Design Automation Conference (DAC) 2017-2018, and IEEE International Electron Devices Meeting (IEDM) 2017-2018, etc.