Running Sparse and Low-Precision Neural Networks: An Interactive Play between Software and Hardware

Following technology advances in high performance computation systems and fast growth of data acquisition, machine learning, especially deep learning, made remarkable success in many research areas and applications. Such a success, to a great extent, is enabled by developing large-scale deep neural networks (DNN) that learn from a huge volume of data. The deployment of such a big model, however, is both computation-intensive and memory-intensive. Though the research on hardware acceleration for neural network has been extensively studied, the progress of hardware development still falls far behind the upscaling of DNN models at soft-ware level. We envision that hardware/software co-design for performance acceleration of deep neural networks is necessary. In this work, I will start with the trends of machine learning study in academia and industry, followed by our study on how to run sparse and low-precision neural networks, demonstrating an interactive play between software and hardware.