Deterministic behavior of CPLEX: ticks or seconds?

If you are a longtime user of CPLEX, you probably noticed the addition of a number of “ticks” in addition to the displayed wall clock time count in recent versions. According to IBM, it’s a computer-independent measure of how much algorithmic work is required to obtain a provable optimum, independently of the computer on which it is run on. I decided to verify this claim and investigate whether ticks could prove to be a suitable metric to assess the amount of work required to solve a model.

Experiment

I took up a set of 100 randomly-generated capacitated facility location models of relatively large size (between 250 and 400 facilities, and between 400 and 1000 customers). The models take between 2 and 240 seconds to solve. I ran each of these models 5 times with CPLEX 12.5 on six different computers with multi-core CPUs; (2 with 32-bit architecture and 4 with 64-bit architecture). The amount of wall clock time and the number of ticks equired to solve the models were computed. The idea is to assess whether :

CPLEX will yield the same tick count when ran multiple times on the same computer;

CPLEX will yield the same tick count when ran on different computers.

Results

The first hypothesis proved to be absolutely true. While solution times were similar between multiple runs (about 2% coef. of variation), running a given model on a given computer (with the same parameters) will always yield the same tick count. 3000 replications is enough proof for me.

The tick count is also the same for a given processor architecture. This means that the 64-bit installation of CPLEX, when run on a 64-bit CPU, will always yield the same tick count (that’s good!). The same is true for 32-bit version on 32-bit CPUs. However, there are often differences in tick counts between 32-bit and 64-bit versions. Most of the time, the four 64-bit computers will deliver the same tick count (say, 101.4) and the two 32-bit computers will deliver another count. On average, the 32-bit version is 1.43% faster in terms of ticks. The graph below plots the proportion of models whose variation between 32-bit and 64-bit is higher than a certain threshold (in %). For instance, 17% of models had a relative tick count gap of more than 5%.

Discussion

Ticks are indeed a very promising metric for assessing the amount of algorithmic work required to solve a model to provable optimality. It is also interesting to see that while the laptop computers yielded high number of ticks per second for models that were solved quickly, its performance declined as the solution time increased, while computers with server processors (Xeon) were more stable, both old and new.

That being said, while ticks may be a relevant metric, it is a bit too obscure for an end-user to interact with. That user’s answer may well look like: “Well, you can tick as much as you like but I want that answer within two minutes”. Clock time and ticks are both relevant metrics for different reasons – and sometimes, to different people.

Comments

Very interesting experiment! What is the definition of a “tick”, what does it count? Btw, for laptops I have seen variations in the performance depending on the battery status , whether its plug-in or not, and also the power consumption plan.

the answers to your two experiments are “yes” for the first, and “no” for the second.

It is very typical for CPLEX workloads that the bottleneck for computations is the memory bus. If you have a cache miss, then the CPU has to wait very long to load the required data. Consequently, the deterministic ticks that CPLEX measures is basically the number of memory accesses that CPLEX performs (as a proxy for the number of cache misses). Yes: we have instrumented all of our code to count or estimate memory accesses. A lot of work but a big success as this does not only enable the deterministic time limit and deterministic parameter tuning (i.e., you tune parameters for a given set of models, and when you do the same again you will get exactly the same recommendations as in the first tuning session), but also lots of under-the-hood performance improvements.

For the deterministic time this means that two runs using the same CPLEX binary with the same settings (including that if the the “threads” parameter is still set to 0, then the machines must have the same number of cores) and the same data will produce the same deterministic time. This is independent on the load of the machine. If you have other jobs running in the background the wall-clock run-time can be significantly higher, but the deterministic time will not change. This is very useful for computational experiments, because it means that it is no longer necessary to have exclusive access to the machine. Moreover, it allows to conduct long running automated tuning sessions on machines that are used for other tasks as well.

The fact that the wall-clock run-time of CPLEX is mostly determined by the number of cache misses means that the deterministic time is a very good proxy for wall-clock time. The ratio of wall-clock time to deterministic time depends of course on the hardware, but it is not very dependent on the input data. This allows to set reasonable deterministic time limits: just measure the deterministic/wall-clock time ratio on your machine for some reasonable CPLEX work loads and from then on use a deterministic time limit by multiplying your wall-clock time limit with the ratio that you observed. As a consequence, CPLEX users are able to produce algorithms that use limited time optimization runs as sub-procedure and that will still show deterministic behavior. And when those algorithms are run on faster hardware (of the same architecture), the algorithm will behave identically, except that it will run faster.

Trackbacks

[…] as well as the nodes explored and the amount of ticks reported by CPLEX. (You can learn a bit about ticks here). Machines 1 and 2 were bought in 2008 and 2009, respectively, while machine 4 has about 3.5 […]

[…] run multiple times. CPLEX now proposes two parallel operation modes, a deterministic mode that guarantees similar tick counts (resulting in similar wall clock times) for runs on a given machine, and an opportunistic mode. […]