At SC11, NVIDIA® (booth# 2719) will be showcasing the advances in applications and research with GPU computing and scientific discovery. We invite you to visit our booth to learn more about how parallel computing is driving the industry trend in heterogeneous computing.

New computers using parallel processor such as Tesla™ GPUs, companion processors to the CPU, are accelerating HPC applications by 10x. Stop by the NVIDIA booth and find out how.

This tutorial will introduce CUDA to the supercomputing audience and motivate its use with traditional HPC examples. We will first teach the basics of CUDA programming with step-by-step walkthroughs of code samples, then review the main optimizations techniques, and describe profiling and tuning best practices to maximize performance. While CUDA C and CUDA Fortran will be used for illustration, the concepts covered will apply equally to programs written with the OpenCL and DirectCompute APIs. Finally, we will close with case studies from academia and industry.

The GPU Technology Theater hosts talks on a wide range of topics on high performance computing. Open to all attendees, the theater is located in the NVIDIA booth and will feature industry luminaries, scientists, and developers.

GTC On-Demand gives you archival access to the world-class education delivered at GTC and is an essential resource for the scientists, engineers, researchers, and developers who rely on GPUs to tackle enormous computational challenges. Visit GTC On-Demand today and explore and learn from the best and brightest minds working in High Performance Computing.

Developer Demos in NVIDIA Booth

NVIDIA is hosting developer demos that will help you accelerate your code on GPUs. Demos feature CUDA C, libraries and directive-based solutions. Can't make it to the NVIDIA booth? Check out the demo videos online at NVIDIA's Developer Zone.

This tutorial will introduce CUDA to the supercomputing audience and motivate its use with traditional HPC examples. We will first teach the basics of CUDA programming with step-by-step walkthroughs of code samples, then review the main optimizations techniques, and describe profiling and tuning best practices to maximize performance. While CUDA C and CUDA Fortran will be used for illustration, the concepts covered will apply equally to programs written with the OpenCL and DirectCompute APIs. Finally, we will close with case studies from academia and industry.

Addison Snell, a leading HPC analyst with Intersect360 Research, will moderate this lively panel discussion, in which he asks visionary leaders from the supercomputing community to comment on forward-looking trends that will shape the industry in 2012 and beyond. An audience Q&A for the panelists will follow the live recording session.

The GPU is offering more than an order of magnitude speedup of peak floating-point computing over conventional processors. In this paper we present a thorough experience on tuning double-precision matrix-matrix multiplication (DGEMM) on the Fermi GPU architecture. We choose an optimal algorithm with blocking in both shared memory and registers to satisfy the constraints of the Fermi memory hierarchy. Our optimization strategy is further guided by a performance modeling based on micro-architecture benchmarks. Our optimizations include software pipelining, use of vector memory operations, and instruction scheduling. Our best CUDA algorithm achieves comparable performance with the latest vendor supplied library: CUBLAS 3.2. We further improve upon this with an implementation in the native machine language, leading to a 20% increase in performance over CUBLAS. That is, the achieved peak performance (efficiency) is improved from 302Gflop/s (58%) to 362Gflop/s (70%).

There is consensus in the community that higher-level GPU programming models based on directives or language extensions have significant value for enabling GPU programming by domain experts. Several efforts are under way to develop such models layered on top of standard C, C++ and Fortran through either standards committees or the introduction of proposed de facto standard solutions by large industry players. This BoF will explore and debate the merits of several current options and approaches for high-level heterogeneous manycore programming.

Since its introduction just a few years ago, the high performance computing (HPC) industry has widely adopted GPU acceleration to solve extremely challenging problems. Industry's computational need is constantly increasing as large and complex computational problems become commonplace. We will discuss the driving forces behind the rapid adoption of GPU computing and explore its impact across various industries.