* The CUFFT Library now supports double-precision transforms and includes significant performance improvements for single-precision transforms as well. See the CUDA Toolkit release notes for details. * The CUDA-GDB hardware debugger and CUDA Visual Profiler are now included in the CUDA Toolkit installer, and the CUDA-GDB debugger is now available for all supported Linux distros. (see below) * Each GPU in an SLI group is now enumerated individually, so compute applications can now take advantage of multi-GPU performance even when SLI is enabled for graphics. * The 64-bit versions of the CUDA Toolkit now support compiling 32-bit applications. Please note that the installation location of the libraries has changed, so developers on 64-bit Linux must update their LD_LIBRARY_PATH to contain either /usr/local/cuda/lib or /usr/local/cuda/lib64. * New support for fp16 <-> fp32 conversion intrinsics allows storage of data in fp16 format with computation in fp32. Use of fp16 format is ideal for applications that require higher numerical range than 16-bit integer but less precision than fp32 and reduces memory space and bandwidth consumption. * The CUDA SDK has been updated to include: o A new pitchLinearTexure code sample that shows how to efficiently texture from pitch linear memory. o A new PTXJIT code sample illustrating how to use cuModuleLoadDataEx() to load PTX source from memory instead of loading a file. o Two new code samples for Windows, showing how to use the NVCUVID library to decode MPEG-2, VC-1, and H.264 content and pass frames to OpenGL or Direct3D for display. o Updated code samples showing how to properly align CUDA kernel function parameters so the same code works on both x32 and x64 systems.

* The Visual Profiler includes several enhancements: o All memory transfer API calls are now reported o Support for profiling multiple contexts per GPU o Synchronized clocks for requested start time on the CPU and start/end times on the GPU for all kernel launches and memory transfers o Global memory load and store efficiency metrics for GPUs with compute capability 1.2 and higher