HINT is a universal benchmark. It has been ported to numerous platforms,
almost all of which are different kinds of computers. In the following
guide, we describe how to port and use HINT for a specific target machine.
The simplest way is to use an executable
if one available. If not or for the best performance results, you may need
to port the code to the target platform. In about 90% cases, setting appropriate
Makefile options is sufficient for porting HINT. In case the code
needs to be changed, we recommend starting with a similar version and then
modifying it. For best results, we recommend modifying with available
compiler optimization flags and adjustable
defines in the code.

Quick Reference

Remember we use C here!

In this document, we will refer to the ANSI C version of HINT. Experiments
with HINT have revealed that the programming language used for the kernel
code (C or Fortran), has no effect on the performance numbers. C has an
added advantage over Fortran 77 that it allow us to use the malloc()
and free() library calls to dynamically increase the size of the
problem. C code is often easier to port to many platforms. We do have a
Fortran version of HINT which we will discuss briefly in a later
section.

HINT Source Code and functions

Makefile is the project maintenance
file.It containsthe machine types, pre-preprocessor
directives, and the compiler optimization flags for compilation and linking.

hint.h is the header file which contains
the essential #include, adjustable
defines and non-adjustable defines, macros, type declarations, and
function prototypes. We will discuss more about adjustable defines in a
later section.

typedefs.h is the header file which
contains all the type definitions. In particular, it determine the appropriate
type of DSIZE (data type of computation), and ISIZE (data type of
indexing) depending upon the pre-processor directives typically specified
in the Makefile.

hint.c contains the main driver code.
It has three main functions:

main(...) : Depending upon the index and computation data
types, this function is responsible for determining the increasing workload
and invoking Run() for each workload.

double Run(..) : This function is responsible for allocating and
deallocating the workload. It is also responsible for timing and validating
the results. Usually, we comment out the validation code during actual
performance measurement to avoid any overhead.

double When() : This function returns the wall clock value.

hkernel.c contains the kernel code. It
has only one function: DSIZE Hint(...) which calculatesthe
first-order hierarchical integration of a monotonously decreasing function.
The refinement and accuracy of the result depends upon the
number of subintervals. The larger the subinterval, the more accurate the
results will be. However, it would require more calculations and hence
it would take longer to compute.

Quick Start

In the following section, we will describe what you need to do port HINT
to any desired machine.

General guidelines:

You must first choose the data type for computation. For double/int/long/longlong/float
data type, the preprocessor define (see typedefs.h) is DOUBLE/INT/LONG/LONGLONG/FLOAT.
You must also choose the data types for indexing. For int/long data
types, the preprocessor define (see typedefs.h) is IINT/ILONG.

MacOS
(PowerPC/G3) : The Codewarrior project and source code has been provided.
It is quite easy to perform a compilation using the project file on Macintosh.
If you wish to use some other compiler, we recommend starting with
new project file then adding in the HINT source code. Please check
the following URL:

Windows95/WindowNT/DOS
: A Visual C++ 5.0 version of HINT has been provided. First,
open the HINT workspace, set the compilation optimization and rebuild the
HINT. For any other compiler on Windows NT, Windows95 or DOS, we
recommend starting with new project file, and then adding in the Visual
C++ source code. Please check the following URL:

HINT executable for Pentium/PentiumII/PentiumPro/80486/Blend.
Blend executable optimizes the code to favor Pentium. It blends the optimization
for 80386, 80486, Pentium Pro, and Pentium.

Java
Version - The Java version of HINT has been contributed by Mark Millard.
Download the following file : jHINT.
Carefully read the instructions in README.java
. Note that currently this version doesn't perform respectably (when compared
to the C version) due to the lack of a good Java environment that produces
optimal code.

Fortran
Version - The HINT kernel has been written in Fortran. You must
link the given C driver code the given kernel code following your machine
conventions (please refer to the compiler user manual on how to link C
and Fortran code together). You need to compile this program in a
manner similar to the serial version described above.

Vector/Parallel-Vector
VersionFor porting a vector version, remember the following. You can
find details for most of the following in the vector computer's user manual.

The data types used for computation should be vectorizable.

Remember to switch on the compiler switch that reports on vectorization.
Most of the inner loop of hkernel.c should be vectorizable.

A reduction operator (for summation) is used in the hkernel.c.
Make sure that you use the right command #pragma to indicate this.
In worst case, one can say that summation is done in serial.

It is important to have the vector length set to the optimal size (must
be a multiple of 2). For most machines, it is either 64 or 128.

Most vector computers have an optimized timer. After initial porting, you
may try rewriting when(). Make sure that the timer you use
is a wall time and not the cpu or system timer. Check details this
section on how to make sure your timer is right.

Single Vector Processor: Cray
Machines (CRAYC90). Hitachi-SX4.
The difference between Hitachi-SX4 and Cray vector programming is that
Hitachi C version doesn't allow C structure assignment (ANSI) as one
vector operation . So, one needs to do a structure assignment element by
element.

Fujitsu and other vendors: Use either the Cray version or the Hitachi
version whichever is close to your machine programming. Use appropriate
pragmas, and compilers optimization flags specific to the respective vendors.

Smart Tuning

We will now present here some ways of tuning HINT in a question and answer
format. In order to get results that may be fairly compared across
machines, it is necessary to fine tune your HINT run to get the best curve
possible. The tradeoff is the best curve versus running time.
The longer HINT is run, closer one gets to the "perfect HINT curve".
However do note that a short node may not produce the finest possible curve,
butit is often sufficient for practical purpose. We recommend getting an
initial version of HINT up and running before you attempt to fine tune
it.

Q: What are adjustable defines?
How, and why can I change them?
A: The header file hint.h contains a list of defines
which the user can change to get good performance numbers.

ADVANCE is the step size (workload multiplies with this step size).
We use roughly 1 decibel as the step size. A step size closer to
1.0 takes longer to run, but might produce a slightly higher net QUIPS.

NCHUNK is the number of chunks for scatter decomposition. It must
be a multiple of 2. Larger numbers increase the time needed to get
the first result (latency), but it scatter domain more evenly.

NSAMP defines the size of the array used to store the number
of QUIPS measurements. Increase only if required, for instance, if ADVANCE
is small then number of sample points (QUIPS measurement) will be more.

NTRIAL is the number of times a trial is run. Increase this if the computer
is prone to interruption; if your HINT curve is noisy (i.e. not smooth
and jittery curve).

PATIENCE is the number of times a bogus trial is re-run.

RUNTM is the target time in seconds. Each workload is run for an
approximate RUNTM seconds. Hence number of iterations is large for a smaller
workload and small for a larger workload. We recommend the reduction
of RUNTM for high-resolution timers since fewer iterations can yield fairly
accurate reading. Obviously, RUNTM should be much larger than the timer
resolution.

STOPRT is the ratio between current QUIPS to peak QUIPS, at which the run
must terminate. Smaller numbers instruct HINT to run even if the
performance drop is huge. This might end up running on virtual memory.

STOPTM is the longest acceptable running time in seconds.

MXPROC is the maximum number of processors. It is only valid in the
parallel and parallel-vector versions of the HINT code.

Q: I don't care about best results! How can I get results
fast?
A: The best way to get faster results is to increase the ADVANCE.
You can also reduce the number of the trials (NTRIALS) and the number of
re-trial for bogus results (PATIENCE). The following configuration
in hint.h will produce faster results. Though you may
change either one or all the following #define.

#define ADVANCE 1.2589#define PATIENCE 7#define NTRIAL 5

The following configuration will give better results but will be slower
than the one above.

#define ADVANCE 1.1#define PATIENCE 13#define NTRIAL 20

Q: My timer resolution is not good. What should I do ?
A: The easiest thing which you can do is to increase the RUNTM.
This will increase the number of iterations for a workload.

Q: I wrote my own timer. How do I
test it?
A: Here are few tips:

Check your timer resolution (tdelta). A value of tdelta like
0.0166 means it is low resolution. A value of tdelta like 0.0001666
means resolution is probably high. The sample code to determine tdelta
is as follows:

Please send any comments or suggestions to hint@scl.ameslab.gov.
If you write a new version of HINT or improve upon any of the
existing versions, please submit your code to us. We
will appropriately acknowledge your work.