Search form

OpenCL

OpenCL

OpenCL™ (Open Computing Language) is a low-level API for heterogeneous computing that runs on CUDA-powered GPUs. Using the OpenCL API, developers can launch compute kernels written using a limited subset of the C programming language on a GPU.

NVIDIA OpenCL SDK Code Samples

OpenCL Multi Threads

This sample shows the implementation of multi-threaded heterogeneous computing workloads with tight cooperation between CPU and GPU. The new OpenCL 1.1 features user events, thread-safe API calls and event callbacks are utilized.

This is a simple test program to measure the memcopy bandwidth of the GPU. It currently is capable of measuring device to device copy bandwidth, host to device and host to device copy bandwidth for pageable and page-locked memory, memory mapped and direct access.

Simple program which demonstrates Direct3D10 texture interoperability with OpenCL. The program creates a number of D3D10 textures (2D, 3D, and CubeMap) which are written to from OpenCL kernels. Direct3D then renders the results on the screen.

Simple program which demonstrates Direct3D9 texture interoperability with OpenCL. The program creates a number of D3D9 textures (2D, 3D, and CubeMap) which are written to from OpenCL kernels. Direct3D then renders the results on the screen.

This example demonstrates an efficient OpenCL implementation of parallel prefix sum, also known as "scan". Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array.

This sample implements matrix multiplication and is exactly the same as Chapter 6 of the programming guide.
It has been written for clarity of exposition to illustrate various OpenCL programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication.
CUBLAS provides high-performance matrix multiplication.

High Quality DXT Compression using OpenCL.
This example shows how to implement an existing computationally-intensive CPU compression algorithm in parallel on the GPU, and obtain an order of magnitude performance improvement.

Linear 2-dimensional variable-width Box Filter of RGBA image. Implemented in OpenCL for CUDA GPU's, with performance comparison against simple C++ on host CPU. Each of the R, G, B and A channels are treated independently with results computed concurrently for each.

2-dimensional 3x3 Sobel Magnitude Filter of RGBA image. Implemented in OpenCL for CUDA GPU's, with performance comparison against simple C++ on host CPU. Gradient magnitude for each of the R, G & B channels is computed concurrently and independently, then combined into a single gradient intensity with linear weighting factors.

2-dimensional Gaussian Blur Filter of RGBA image using IRF method. Implemented in OpenCL for CUDA GPU's, with performance comparison against simple C++ on host CPU. Each of the R, G, B and A channels are treated independently with results computed concurrently for each.