Non-reproducible results with MapleCOMSPS

I’ve been reading through the source code of the 2016 SAT Competition Main Track winner, MapleCOMSPS_DRUP, and I found that it has an important piece of code, regulating its behaviour that depends on timing:

C++

1

2

3

4

5

6

7

staticboolswitch_mode=false;

staticvoidSIGALRM_switch(intsignum){switch_mode=true;}

lbool Solver::solve_()

{

signal(SIGALRM,SIGALRM_switch);

alarm(2500);

This makes it difficult or potentially impossible to reproduce results. If you are unfamiliar with SAT solvers, they are difficult to understand and debug because their behaviour is randomised by the heuristic algorithmic choices which are influenced by essentially every solving parameter and every line of the input. However, with careful coding and the same parameters and input file, they run the same way. Unfortunately, that’s not always the case for this solver. Although the authors of the solver have fixed this, and you can download their fixed solvers here, I would like to talk about this a bit longer.

Usability issues

Without reproducibility, it can be hard to check results with the solver. One could have found a solution to an instance and if it doesn’t get solved on another try, it is most likely due to the timing of that alert(), which is something neither the original designers intended, nor the person trying to reproduce the results wishes to deal with.

An industrial system designer would have trouble choosing systems that don’t behave in a deterministic manner, even if they are the fastest. Debugging such systems could become very difficult. For example, if there are multiple solutions, such a solver, when unpatched, may return different solutions on different runs, with possibly different runtimes. Industrial systems often contain millions of lines of code, where a complex bugs may need to be hunted down on a regular basis — non-reproducibility could seriously hinder such an efforts.

Fixes

First of all, I think it’s a healthy sign that the authors recognised the issue and fixed it for the next iteration. It’s a shame, however, that this bug could slip through. I actually believed that there was a requirement that the solving must be reproducible. I believe such checks should be performed before the competition begins so the authors could fix their code before they win the competition with a bug that they didn’t intend to have.

It is important to have these kind of bugs being eliminated. Note that Microsoft spent a non-negligible amount of time and effort to make parallel SAT solving reproducible. I imagine the pressure they had from their industrial use-cases to have been considerable for such an effort to have been undertaken.

A potential fix for the future

The authors have fixed this issue for 2017. However, the winner of the 2017 SAT Competition, Maple_LCM_Dist, has the same bug as above since that team used the original, winning code without checking or consulting with the original team. So the new winner produces an output that may be non-reproducible, like last year. I am hopeful that this bug will eventually get removed, but it would have been nicer to have a pre-run of the competition that checks for such issues and gives some time for the authors to fix the potential non-reproducibility issues.