Applications

Because GPUs are SIMD machines, to exploit CUDA's potential you must pose the problem in an SIMD manner. Computation that can be partitioned in such a way that each thread can compute one element independently is ideal for the GPU.

Some algorithms either cannot be written in parallel, or cannot be used on CUDA (due to architecture constraints). In those cases, research is ongoing to introduce alternative methods to use the GPU to perform those computations.

In this section some usage of CUDA programming inside Mathematica is showcased. All the following examples use CUDAFunctionLoad, which allows you to load CUDA source, binaries, or libraries into Mathematica.

The above is a prime example of combining Mathematica and CUDA programming to make an algorithm partially parallel that would otherwise be written in only serial code.

Random Number Generation

Many algorithms ranging from ray tracing to PDE solving require random numbers as input. This is, however, a difficult problem on many core systems, where each core has the same state as any other core. To avoid that, parallel random number generators usually use entropy values such as the time of day to seed the random number generators, but those calls are not available in CUDA.

The following section gives three classes of algorithms to generate random numbers. The first is uniform random number generators (where pseudorandom number generators and quasi-random number generators are showcased). The second is random number generators that exploit the uniform distribution of a hashing function to generate random numbers. The final is normal random number generators.

Pseudorandom Number Generators

Pseudorandom number generators are deterministic algorithms that generate numbers that appear to be random. They rely on the seed value to generate further random numbers.

In this section, a simple linear congruential random number generator (the Park-Miller algorithm) is shown, along with a more complex Mersenne Twister.

Park-Miller

The Park-Miller random number generator is defined by the following recurrence equation:

Here, the Mersenne Twister input parameters are defined. The twister requires seed values. Those values can be computed offline and stored in a file or they can be generated by Mathematica. The latter is shown here.

A Mathematica function can be written that takes the number of random numbers to be generated as input, performs the required allocations and setting of parameters, and returns the random output memory.

Considering that random numbers are seeds to other problems, a user may get performance increase in the overall algorithm even if Mathematica's timings are superior to the CUDA implementation.

Quasi-Random Number Generators

This section describes quasi-random number generators. Unlike pseudorandom number generators, these sequences are nonuniform and have underlying structures that are sometimes useful in numerical methods. For instance, these sequences typically provide faster convergence in multidimensional Monte Carlo integration.

Halton Sequence

The Halton sequence generates quasi-random numbers that are uniform on the unit interval. While the code works with arbitrary dimensions, only the van der Corput sequence is discussed, which works on 1D space. This is adequate for comparison.

The resulting numbers of the Halton (or van der Corput) sequence are deterministic but have low discrepancy over the unit interval. Because they fill the space uniformly in some applications, such as Monte Carlo integration, they are preferred to pseudorandom number generators.

For a given with base :

The one-dimensional Halton (or van der Corput) value in base :

The sequence of length is then written as:

Given a number in base representation , the van der Corput sequence mirrors the number across the decimal point, so that its sequence value is .

Applications of Random Number Generators

Random numbers have applications in many areas. Here two main applications are presented: Monte Carlo integration (by approximating and an arbitrary function) and simulating Brownian motion.

Approximating

The value of can be approximated using Monte Carlo integration. First, generate uniformly random numbers in the unit square. Then the number of points inside the first quadrant of the unit circle is counted. The result is then divided by the number of points. This will give .

This implements reduction, counting the number of points in a unit circle.