New Streaming Multiprocessor (SM)Turing introduces a new processor architecture, the Turing SM, that delivers a dramatic boost inshading efficiency, achieving 50% improvement in delivered performance per CUDA Corecompared to the Pascal generation. These improvements are enabled by two key architecturalchanges. First, the Turing SM adds a new independent integer datapath that can executeinstructions concurrently with the floating-point math datapath. In previous generations,executing these instructions would have blocked floating-point instructions from issuing. Second,the SM memory path has been redesigned to unify shared memory, texture caching, and memoryload caching into one unit. This translates to 2x more bandwidth and more than 2x more capacityavailable for L1 cache for common workloads.

...

Turing Tensor CoresTensor Cores are specialized execution units designed specifically for performing the tensor /matrix operations that are the core compute function used in Deep Learning. Similar to VoltaTensor Cores, the Turing Tensor Cores provide tremendous speed-ups for matrix computations atthe heart of deep learning neural network training and inferencing operations. Turing GPUsinclude a new version of the Tensor Core design that has been enhanced for inferencing. TuringTensor Cores add new INT8 and INT4 precision modes for inferencing workloads that can toleratequantization and don’t require FP16 precision. Turing Tensor Cores bring new deep learningbasedAI capabilities to GeForce gaming PCs and Quadro-based workstations for the first time. Anew technique called Deep Learning Super Sampling (DLSS) is powered by Tensor Cores. DLSSleverages a deep neural network to extract multidimensional features of the rendered scene andintelligently combine details from multiple frames to construct a high-quality final image. DLSSuses fewer input samples than traditional techniques such as TAA, while avoiding the algorithmicdifficulties such techniques face with transparency and other complex scene elements.