Symbooglix ICST2016 Appendix

This page contains extra data that could not fit into the ICST2016
page limit that some may find useful and is noted as being available
as "on the project website ".

Run of Symbooglix snapshots on all benchmarks

To optimise Symbooglix we ran it on a randomly selected
subset (10%) of each benchmark suite (SV-COMP and GPU) which we refer
to as the "training set". We then
optimised Symbooglix using only this "training set" taking
periodic snapshots of the tool and running it on those bencharks.

In the paper we provided "quantile plots" for each benchmark
suite where we ran each Symbooglix snapshot on the "training set".
We now show these plots along side quanitle plots where those snapshots
have been run on all the benchmarks. This was done after we had performed
optimisation so we did not have access to this data during our optimisation
process to guide our optimisations.

Result type counts for GPU

Here are the classification counts when running on the training set.

Snapshot

Verified

Bug found

False alarm

Unknown

Baseline

33

3

0

21

GlobalDDE

33

3

0

21

GotoAssumeLA

33

3

0

21

ExprSimpl

33

3

0

21

ConstrIndep

33

3

0

21

RemSomeRecur

37

3

0

17

RemSomeDbg

37

3

0

17

MapConstIdx

37

3

0

17

MapSymIdx

37

3

0

17

EffcntClone

37

3

0

17

SimplSolv

37

3

0

17

Here are the classification counts when running on the whole GPU benchmark suite.

Snapshot

Verified

Bug found

False alarm

Unknown

Baseline

260

29

0

290

GlobalDDE

260

29

0

290

GotoAssumeLA

261

31

0

287

ExprSimpl

268

31

0

280

ConstrIndep

266

31

0

282

RemSomeRecur

301

35

0

243

RemSomeDbg

301

35

0

243

MapConstIdx

303

35

0

241

MapSymIdx

302

35

0

242

EffcntClone

300

35

0

244

SimplSolv

301

35

0

243

Looking at the table above a few interesting things can be observed.

Between ConstrIndep and ExprSimpl a regression occurred where two less
benchmkars were verified
(CUDA50/6_Advanced/mergeSort/mergeRanksAndIndicesKernel.bpl
and polybench/stencils/jacobi-1d-imper/kernel0.bpl). These
benchmarks weren't in the training set. It is unclear whether these
are significant though as the two benchmarks that no longer verified
with ExprSimpl were very close to the timeout in the ConstrIndep
snapshot. Those two benchmarks in the next snapshot (RemSomeRecur)
verified in under 22 seconds.

Between MapConstIdx and MapSymIdx one less benchmark (shoc/s3d/qssab/kernel.bpl)
was verified. In the ``MapConstIdx`` this benchmark verified very close to the timeout so it
is likely that the added overhead pushed this over the timeout.

Between MapSymIdx and EffcntClone two less benchmarks (gpgpu-sim_ispass2009/RAY/renderPixel.bpl
and shoc/devicememory/readGlobalMemoryCoalesced/kernel.bpl) verified. This is a significant
regression the former used to verify in ~14 seconds and in the newer snapshot hits a timeout and for the
later it used to verify in ~610 seconds and in the newer snapshot hits a timeout.

The results of the final snapshot don't precisely match the results used for the tool comparision in
the paper. The exact cause of this is unknown and is being investigated.

Result type counts for SVCOMP

Here are the classification counts when running on the training set.

Snapshot

Verified

Bug found

False alarm

Unknown

Baseline

0

0

0

374

GlobalDDE

9

35

0

330

GotoAssumeLA

9

36

0

329

ExprSimpl

14

37

0

323

ConstrIndep

15

37

0

322

RemSomeRecur

15

39

0

320

RemSomeDbg

15

39

0

320

MapConstIdx

15

38

0

321

MapSymIdx

17

40

0

317

EffcntClone

17

40

0

317

SimplSolv

17

40

0

317

One observation about the above table is that between MapConstIdx and RemSomeDbg one
less bug was found. This was the array-examples/standard_seq_init_true-unreach-call_ground.i_.bpl
benchmark. In RemSombDbg the bug was found very close to the timeout and so the either
result inconsistency (due to running on our compute cluster) or slight overhead added
by MapConstIdx lead to this benchmark timing out for MapConstIdx.

Here are the classification counts when running on the whole SVCOMP benchmark suite.

Snapshot

Verified

Bug found

False alarm

Unknown

Baseline

0

0

0

3749

GlobalDDE

141

339

0

3269

GotoAssumeLA

141

363

0

3245

ExprSimpl

200

390

0

3159

ConstrIndep

204

392

0

3153

RemSomeRecur

204

393

0

3152

RemSomeDbg

204

393

0

3152

MapConstIdx

204

393

0

3152

MapSymIdx

236

395

0

3118

EffcntClone

236

395

0

3118

SimplSolv

236

394

0

3119

A few observations about the above table:

Between EffcntClone and SmplSolv a regression ocurred where one less bug was found. This was the
ldv-consumption/32_7a_cilled_false-unreach-call_linux-3.8-rc1-32_7a-fs--ecryptfs--ecryptfs.ko-ldv_main1_sequence_infinite_withcheck_stateful.cil.out.c_.bpl benchmark. The bug was previously in aprroximately 450 seconds but for a SmplSolv a timeout was hit.

The results of the final snapshot don't precisely match the results used for the tool comparision in
the paper. The exact cause of this is unknown and is being investigated.

Scatter plot of Symbooglix vs Duality

Below is a scatter plot of bug finding times. Each point is a single
benchmark in the SV-COMP suite and its position shows the execution time
of Symbooglix and Duality for that benchmark. For 236 benchmarks
Symbooglix was faster than Duality and for 350 benchmarks Duality was
faster than Symbooglix.