In the past I've hardcoded some preconfigured values for kernel-accel (-n) and kernel-loops (-u), depending on hash-type and GPU vendor. They were set to a value which is optimized to run best on the current high-end GPU in Brute-Force attack-mode. This was an suboptimal solution, because these values need to be changed in case you use a different attack-mode or a low-end GPU.

Therefore I've made the parameters -n and -u available to the commandline so you can adjust them for your specific case. Later I found out that there's a fixed relation between the optimal values for the different attack-modes, but not depending on the hash-types. Now it was possible to make this process more easy for the user, so I've added the --workload-profile (or -w) parameter. This parameters definition has changes with the new version, that's why I don't want to go to much into the details of how it was used before, that would just confuse you.

Now, with latest oclHashcat version which supports all OpenCL compatible device types like CPU or other accelerators, such a fixed value doesn't fit any longer. They are so different in how they are designed, they require their own strategy to find the best values for them.

I was forced to rethink about how to find the optimal settings. The first change was to fully get rid of the hardcoded values in oclHashcat and move them into a user-configurable text database. In that database you can set these ideal tuning values for each device, attack-mode and hash-type. But it quickly turned out that this database becomes huge and such databases are typically too hard to control and end up as a still birth. There was simply no way around an automatic solution and that's how the idea of an autotuning engine turned up.

How to use it?

You don't need to active it or do anything in general to make use of it. The autotune engine is always active, whenever you start oclHashcat. Since every automatism can create errors, because of some unknown variable or it lacks of informations, it's required to have a mechanism that can override whatever it calculates. There's two ways to override the autotune engine:

Set --opencl-vector-width, -n and -u by hand

The combination of device-name, attack-mode and hash-type match an entry in the tuning database

Tuning database?

The tuning database is a simple textfile, of which all entries in a line are separate with tabs or spaces, a CSV. Here's some rules:

This file is used to override autotune settings

This file is used to preset the Vector-Width, the Kernel-Accel and the Kernel-Loops Value per Device, Attack-Mode and Hash-Type

A valid line consists of the following fields (in that order):

Device-Name

Attack-Mode

Hash-Type

Vector-Width

Kernel-Accel

Kernel-Loops

The first three columns define the filter, the other three is what is assigned when that filter matches

If no filter matches, autotune is used

Columns are separated with one or many spaces or tabs

A line can not start with a space or a tab

Comment lines are allowed, use a # as first character

Invalid lines are ignored

The Device-Name is the OpenCL Device-Name. It's shown on oclHashcat startup.

If the device contains spaces, replace all spaces with _ character.

The Device-Name can be assigned an alias. This is useful if many devices share the same chip

The use of wildcards is allowed, some rules:

Wildcards can only replace an entire Device-Name, not parts just of it. eg: not Geforce_*

The policy is local > global, means the closer you configure something, the more likely it is selected

The policy testing order is from left to right

Attack modes can be:

0: Dictionary-Attack

1: Combinator-Attack, will also be used for attack-mode 6 and 7 since they share the same kernel

3: Mask-Attack

The Kernel-Accel is a multiplier to OpenCL's concept of a workitem, not the workitem count

The Kernel-Loops has a functionality depending on the hash-type:

Slow Hash: Number of iterations calculated per workitem

Fast Hash: Number of mutations calculated per workitem

None of both should be confused with the OpenCL concept of a "thread", this one is maintained automatically

The Vector-Width can have only the values 1, 2, 4, 8 or 'N', where 'N' stands for native, which is an OpenCl-queried data value

The Kernel-Accel is limited to 1024

The Kernel-Loops is limited to 1024

The Kernel-Accel can have 'A', where 'A' stands for autotune

The Kernel-Loops can have 'A', where 'A' stands for autotune

Personal tuning settings

The tuning database can also be used to store your personal tuning settings you like. For example if you want to go full power you can simply add an entry like this:

* * * N 1024 1024

But you have to live with all the implications this generates. High power consumption, extreme heat development, far distant restore checkpoints, slow speed updates, laggy desktop etc. Generally this is not what you want.

It makes much more sense to fine-tune the settings. To give you an idea of how to do it, here's how I do it:

Preparation

set your fanspeed to 100% (if applicable)

set your power limit to 100% (if applicable)

set your core clock to stock settings

set your memory clock to stock settings

for every run, give it time to settle down, that is when it seems to have reached a speed that doesn't increase anymore

use a single hash for testing, if you need an example hash for the different algorithms, see the wiki pages example hashes