OpenCL

News/Changelog

NEWS for the OpenCL package0.2 (under development)oOpenCL context and command queue can be persisted, allowing to keep data between calls. The context also remembers whether to default to single- or double-precision for numeric vectors.o Data can stay on the OpenCL device (GPU) between kernel calls. This is extremely valuable when working with discrete GPUs connected over a relatively slow PCIe connection.o A single-precision data type is no longer required. The conversion takes place when transferring the data to the OpenCL device. On the R side, data remains in numeric vectors.o Kernels are executed asynchronously and possibly out-of-order, if theOpenCL implementation allows it. Synchronization need not to be done manually and happens without the user knowing: OpenCL events corresponding to a kernel execution are attached to the output buffer. Following kernel executions having the buffer as input then wait for the event, hence for the preceding kernel execution to finish. Likewise, reads from buffers wait on the attached event as well.oOpenCL device information is amended by maximum frequency. Also the list of extensions is broken down to make it easier searchable.o By default, we choose GPU devices. CPU devices usually don't make a lot of sense. Also, if there are multiple GPU devices available - think of a notebook with integrated and discrete GPU - we try to choose the faster device.o There are now several tests covering most of the functionality.0.1-4(under development)o devices with very long extensions strings could cause error on retrieval. Fixed with larger static buffer. (Thanks to Valerio Aimale again)o Improve error reporting by always including the OpenCL error code0.1-32012-05-25o fix a bug causing device enumeration to use the default device for device count regradless of the specified type. This affects only systems with more than one type of device. (Thanks to Valerio Aimale for reporting)o added dim argument to oclRun() which allows multidimensional indexing (up to 3d) in the kernel. The dimensions can be obtained in the kernel via get_global_size() and the index values with get_global_id(). Note that using index vectors instead of multidimensional indexing may perform better depending on the device. The default is to use single dimension (dim=size) which is the same as previous versions ofOpenCL.o add precision="best" in oclSimpleKernel which switches automatically to double-precision if supported by the deviceo kernels objects are now less cryptic - they implementprint(), names() and $ methods for access to their attributes.0.1-22012-03-07o add the support for asynchronous calls, i.e., execution parallel to the CPU or multiple parallel GPU operations. This is done by using x <- oclRun(..., wait=FALSE) to dispatch the kernel and then oclResult(x) to collect the results later.o minor cleanup0.1-12011-08-09o improve memory management and clean up on error in oclRun()o use CL_MEM_USE_HOST_PTR instead of clEnqueueWriteBuffer() for better performance on large input vectorso add support for native single precision representation (see ?clFloat and native.result argument in oclRun())o added INSTALL file with links to common OpenCL implementations0.1-02011-08-08o first public release includes support for single and double precision computations as well as simple kernels (one output vector, arbitrary input)