The efficacy of deep learning has resulted in it becoming one of the most important applications run in data centers today. The NVIDIA Tesla V100 GPU introduced a specialized functional unit called the Tensor Core to meet growing demand for higher performance on this workload. To exploit the full capability of current NVIDIA GPUs machine learning researchers have started to use Tensor Cores. For example, 5 out of 6, 2018 Gordon Bell Award Finalists used Tensor Cores in their work. However, currently no open-source GPU microarchitectural simulators model Tensor Cores. In this paper, we comprehensively investigate NVIDIA’s Tensor Core implementation found in Volta and Turing architectures and propose an architectural model for it. Our Tensor Core timing model, implemented in GPGPU-Sim, achieves 99.6% IPC correlation versus a physical V100 GPU. Building upon this we also enable GPGPU-Sim to run NVIDIA’s CUTLASS, an open-source CUDA C++ templates library providing customizable GEMM templates including the support for Tensor Cores.

Email address protected by JavaScript. Activate javascript to see the email.

We use cookies to improve our service for you. You can find more information in our data protection declaration. By continuing to use our site, you accept our use of cookies and Privacy Policy.OkPrivacy policy