What's New

Version 6.0.37:

Introduced support for the Maxwell architecture (sm_50). More information on Maxwell can be found here: https://developer.nvidia.com/maxwell-compute- architecture. Although the CUDA Toolkit supports developing applications targeted to sm_50, the driver bundled with the CUDA installer does not. Users will need to obtain a driver compatible with the Maxwell architecture from http:// more...

Ratings

Details

NVIDIA CUDA is a C language development environment for CUDA-enabled GPUs. The CUDA development environment includes:

nvcc C compiler

CUDA FFT and BLAS libraries for the GPU

Profiler

gdb debugger for the GPU (alpha available in March, 2008)

CUDA runtime driver (now also available in the standard NVIDIA GPU driver)

CUDA programming manual

The CUDA Developer SDK provides examples with source code to help you get started with CUDA. Examples include:

Parallel bitonic sort

Matrix multiplication

Matrix transpose

Performance profiling using timers

Parallel prefix sum (scan) of large arrays

Image convolution

1D DWT using Haar wavelet

Many more features

Version 6.0.37:

Introduced support for the Maxwell architecture (sm_50). More information on Maxwell can be found here: https://developer.nvidia.com/maxwell-compute- architecture. Although the CUDA Toolkit supports developing applications targeted to sm_50, the driver bundled with the CUDA installer does not. Users will need to obtain a driver compatible with the Maxwell architecture from http:// www.nvidia.com/drivers.

Unified Memory is a new feature enabling a type of memory that can be accessed by both the CPU and GPU without explicit copying between the two. This is called "managed memory" in the software APIs. Unified Memory is automatically migrated to the physical memory attached to the processor that is accessing it. This migration provides high performance access from either processor, unlike "zero- copy" memory where all accesses are out of CPU system memory.

Added a standalone header library for calculating occupancy (the library is not dependent on the CUDA Runtime or CUDA Driver APIs). The header library provides a programmatic interface for the occupancy calculations previously contained in the CUDA Occupancy Calculator. This library is currently in beta status. The interface and implementation are subject to change.

The Dynamic Parallelism runtime should no longer generate a cudaErrorLaunchPendingCountExceeded error when the number of

CUDA Inter-Process Communication (IPC) is now supported for applications running under MPS. CUDA IPC event and memory handles can be exported and opened by the MPS clients of a single MPS server.

Applications running under MPS can now use assert() in their kernels. When an assert is triggered, all work submitted by MPS clients will be stalled until the assert is handled. The MPS client that triggered the assert will exit, but will not interfere with other running MPS clients.

Previously, a wide variety of errors were reported by an "Unspecified

Launch Failure (ULF)" message or by the corresponding error codes CUDA_ERROR_LAUNCH_FAILED and cudaErrorLaunchFailed. The CUDA driver now supports enhanced error reporting by providing richer error messages when exceptions occur. This will help developers determine the causes of application faults without the need of additional tools.