Step 3: Run the script tune_setup.py

Specify which GPU you are autotuning for by passing the appropriate parameters_GPU.json file as an argument with -p.
In addition, the script takes as arguments the blocksizes you want to add to libcusmm. For example, if the system you want to autotune for contains blocks of size 5 and 8, run:

For each possible parameter-set a launcher is generated. A launcher is a small snippet of C code, which launches the kernel by using the cuda specific <<< >>>-notation. It also instantiates the C++ template which contains the actual kernel code.

In order to parallelize the benchmarking, the launchers are distributed over multiple executables.
Currently, up to 10'000 launchers are benchmarked by one executable. Each executable is linked together from several tune_*_part???.o and a tune_*_main.o. Each part-files contains up to 100 launchers. This allows to parallelize the compilation over multiple CPU cores.

Step 4: Adapt tune_submit.py to your environment

The script tune_submit.py was written for the slurm batch system as used e.g. by CRAY supercomputers. If your computer runs a different batch system, you have to adapt tune_submit.py accordingly.

Step 5: Submit Jobs

Each tune-directory contains a job file.
Since there might be many tune-directories, the convenience script tune_submit.py can be used to submit jobs. It will go through all the tune_*-directories and check if its job has already been submitted or run. For this, the script calls squeue in the background and it searches for slurm-*.out files.

When tune_submit.py is called without arguments, it will just list the jobs that could be submitted:

$ ./tune_submit.py
tune_5x5x5: Would submit, run with "doit!"
tune_5x5x8: Would submit, run with "doit!"
tune_5x8x5: Would submit, run with "doit!"
tune_5x8x8: Would submit, run with "doit!"
tune_8x5x5: Would submit, run with "doit!"
tune_8x5x8: Would submit, run with "doit!"
tune_8x8x5: Would submit, run with "doit!"
tune_8x8x8: Would submit, run with "doit!"
Number of jobs submitted: 8

Only when tune_submit.py is called with doit! as its first argument, will it actually submit jobs: