Saturday, July 26, 2008

I have a problem with ranking tests at work -it just takes too long and the 'obvious algorithm' should be much quicker.

The Task

I should explain that we use modelsim for simulating a processor described in VerilogRTL and create PSL assertions and cover statements to help with the verification. Periodically we run coverage simulations and create coverage databases for each test that contains the coverage information for each PSL cover or assert statement.

Our test suite consists of ~2000 'froven' randomly generated tests with seeds we have used before and want to use again, because they give good coverage; and a few thousand newly randomly generated tests.

We want to continue keeping tests that worked well in the past, and add to the kept set of tests any new tests that contribute to the overall coverage of the design.

The Problem

We have a problem however with the Modelsim 'vcover ranktest' command that we use to perform the ranking. It takes around ten hours to rank ~5000 tests when we were hoping for around one hour.

Around 5000 coverage databases to rank each with around 5000 coverage points (of which less than halk are covered), shouldn't be that big a deal.

Initial Thoughts

I came up with the following algorithm that I thought would do a good job of ranking tests - it gets around the size problems of trying every possible permutation of tests for coverage by using a greedy algorithm: choose the test giving most coverage, then the test giving the most incremental additional coverage ...I put it in an email as:

My algorithm for Ranking tests by coverage.

Read in and convert each tests coverage data into its set of coverage points that the test contributes to

sum all the contributions of all tests marked as compulsory to form ranking set and remove the tests from the pool

keep note of tests added to the ranking set and the order they are added. (for compulsory tests order within compulsory tests is un-important. They all come 'first')

Repeat until pool is empty:

pool_max = (0, "" )

for test in pool:

contribution = number of cover points in test that are not covered already in ranking_set

if contribution == 0:

drop test from pool into the non-contributory list

else if contribution > pool_max.contribution:

pool_max = (contribution, test)

if pool_max.contribution > 0:

add the pool_max test into the ranking set

remove the pool_max test from the pool

else:

empty everything from the pool into the non-contributory list(They don't add anything)

You end with:

an ordered list of the ranking tests.

An (ordered) list of the non-contributing tests.

All tests marked as compulsory are in the ranking tests.

By only comparing against the sum of the already contributing tests, and removing non-contributing tests ASAP, I hope to get reasonable run times.

I'd write it myself if their was a python interface to UCDB files. I won't touch it with C though.

Scratching the Itch

I did decide to create my own ranking program last night.

The flow was to dump the PSL assertion coverage from Modelsims Universal Coverage DB, (UCDB) , files. a total of 1701 ucdb files generating 1701 separate textual report files. Then I would create a Python script to broadly follow the algorithm above to create a ranked list of tests from the report files.

Dumping the data from the UCDB's to reports took tenminutes to run.

A little experimentation lead me to read in test names and instance names for coverage points and substitute unique integers for these ASAP in the program as I had memory problems trying to create 25 million sets keeping set members as long instance names of around 130 chars each.I found ways to remove tests from the ranking just as soon as they contributed nothing.

My program, is 120 lines of Python and took a minute to run!

So Fast. Must Check.

I did some checks against the 'vcover ranktest' results:

One gives an extra 39 new contributing tests, the other an extra 40.

Of the extra tests 30 are the same in both programs.

Now I need to get my team-mates to check the Python code.

I would expect some variation in ranked tests as, for example, if two tests are equally contributing the most new coverage, how do you choose? I chose to select first on maximum contribution to coverage; then on minumum simulation cycles; then on test name.Once the algorithms make a different choice then there is no mechanism to make them converge.

Endpoint

My flow gets the data from the database at the beginning, then uses it in a n efficient manner. If Modelsims algorithm constantly goes to the UCDB for data then that could be the source of the bottleneck., as just getting the data once takes ten minutes.