Introduction to Parallel Computing on an NVIDIA GPU using PyCUDA

With Andreas Klöckner's PyCUDA, you can harness the massively parallel supercomputing power of your NVIDIA graphics card to crunch numerically intensive scientific computing applications in a fraction of the runtime it would take on a CPU and at a fraction of the development cost of C++. We'll cover hardware architecture, API fundamentals and several examples to get you started.

Abstract

There are two approaches to parallelizing a computationally heavy procedure: use a messaging queue such as AMQP to distribute tasks among a networked cluster or increase the number of processors in a single machine. This talk focuses on techniques for adapting mathematical code to run on specialized multi-core graphic processors.

Modern graphic processors have hard-coded transistors for common vector and matrix operations, making them ideal for general scientific computing. However, the NVIDIA CUDA's unique design requires knowledge of its hardware to adapt algorithms effectively. This talk covers basic CUDA architecture, API functions and several examples to illustrate the different kinds of problems that will benefit from parallelization.