PCAST - PGI Compiler Assisted Software Testing

Overview

PGI Compiler Assisted Software Testing (PCAST) is a set of capabilities
intended to help test for program correctness, and determine points of
divergence. PCAST is useful for detecting when results diverge between
CPU and GPU versions of code, and also between the same code run on
different processor architectures.

There are three ways to invoke PCAST:

Through the pgi_compare run-time call

Through the acc_compare run-time call

With the autocompare compiler flag

We'll use the following simple example to illustrate using each approach.
This OpenACC C program allocates two arrays on the heap, copies the data to the GPU,
and creates gangs of workers to execute the inner loop. The next sections
demonstrate the different ways to use PCAST to test for program correctness.

Controlling Comparison Options

PCAST's behavior is set through the PGI_COMPARE
environment variable. This environment variable contains a
comma-separated list of options that control various
parameters of the comparison. Use it, for example, to set
relative or absolute tolerance thresholds, halt at the
first difference found, and more. See
the PGI COMPARE table at the end
for a full listing of the available options.

Auto-compare

The first, and simplest, way to invoke PCAST is through the use of the
autocompare compiler flag. Setting
-⁠ta=tesla:autocompare in the compiler options is the
only change necessary to invoke the autocompare feature. When compiled with
this option, code in OpenACC compute regions will run redundantly on the CPU as
well as the GPU.
Whenever computed data is copied off the GPU and back into host memory, it
is compared against the values computed on the CPU. Hence, any data in a
copy, copyout, or update host directive will be compared when it is copied off the device.
Note that the -⁠ta=tesla:autocompare implies
-⁠ta=tesla:redundant.

To use autocompare, compile the example using these compiler options:

$ pgcc -Minfo=accel -ta=tesla:autocompare -o a.out example.c

Next, running the compiled executable using the options below, results in the following
output:

The 'summary' option to PGI_COMPARE will print out a short summary (the last
three lines) of how much data was compared, whether any differences were found,
and what the tolerances were. Here, we can see that autocompare checked two
blocks of data, which contained a total of 2000 elements, and those elements
took up 8000 bytes of memory. It didn't find any errors, and the relative
tolerance, controlled by the 'rel' option to PGI_COMPARE, was set to 10^(-1).
In general, setting 'rel=n' in PGI_COMPARE will set the tolerance to 10^(-n).
The one special case is 0, which will not tolerate any errors.

If there are some differences in the data, the output should look similar to this:

Function Calls

Use the acc_compare function to explicitly compare data. When called, it copies
the data in GPU memory back to the host and compares it with the corresponding
CPU memory. acc_compare must be called from CPU code, not from a device compute
region. To use acc_compare, compile with -t⁠a=tesla:redundant compiler option.

For reference, acc_compare's signature is:

acc_compare(x, n)

where x is the data to compare and
n is the number of elements to compare.
(Note that, unlike functions such
as memcpy or malloc, the number of elements to compare is not sized
in bytes.) In the example, size number of elements are compared.
Even though size is an integer, the call would remain the same
if we changed the type from int to, say, double.

Note that, in this example, acc_compare is called five times in the outer loop
on an array of size 1000, with each element of size four bytes, totalling 20,000
bytes. With autocompare the data was compared just once at the end of the
data directive, instead of in each iteration of the outer loop.

While acc_compare will keep the contents of the data in memory, pgi_compare
writes the data to be compared to a file. Subsequent calls to pgi_compare will
compare data between the file and data in the host memory. One advantage to this
approach is that successive comparisons can be done in a quicker fashion since a
"golden" copy is already on the disk. The downside to this approach, however,
is that the data file can grow very large depending on the amount of data the
program is using and how often comparisons are done.
In general, it is a good idea to use pgi_compare sparingly, on
programs where the data involved is relatively small, or when it is necessary to
compare results on different machines.

Its signature is as follows, where a is the variable to
be compared, "type"is a string of variable
a's data type, n is the number
of elements to be compared (again, not in bytes), and the last two arguments specify
function name and line number respectively.

Running the program for the first time, the data file "pgi_compare.dat" is
created. Subsequent runs compare calculated data against this file. Use the
PGI_COMPARE environment variable to set the name of the file, or force the program
to create a new file on the disk with PGI_COMPARE=create. To use pgi_compare on GPU data,
include an update host directive with the data to compare. pgi_compare will only write
the data that is in host memory.