Pages

Sunday, July 22, 2012

CACE: computer-aided code evaluation of CDK code

Computer-aided code evaluation (CACE) is an important part of scientific code development projects. There are many ways to do peer-review of source code (Maven, Gerrit, ...), and I won't go into details here. Instead, I focus on CDK's Nightly build system.

Nightly reports
Making sure the source code compiles is one of the most basic requirements. Given it compiles, we get a full report with a log of information:

The lead of the report contains useful links to a precompiled binary Java ARchive (jar file), a link to the latest git commit, source code, and the JavaDoc. Also very useful is the keyword list, which acts as an index to CDK functionality, using @cdk.keyword hashtags in the class JavaDoc.

Unit testing

Below the horizontal bar are the code evaluation reports. First, are the results for the unit tests (for which JUnit is used):

In the middle we get the JUnit test results for each module separately, and behind the 'Stable' link there is a summary, giving a quick glance at all modules:

Full reports are again available for individual reports, but we all get statistics per module on the number of unit tests run, the number of fails and errors, and the number of methods not tested.

JavaDoc quality

For JavaDoc we also run evaluations. For this, we use OpenJavaDocCheck for which too alternative solutions are available, I learned later. The front page section of Nightly looks like:

The summery is quite like that of the unit testing, and a single report for a module looks like this:

Many of these tests are general for JavaDoc, but we also have CDK-specific tests, such as shown below (along with the summary down the bottom of the page):

There is a lot of small code fixes for those who like to contribute to the CDK project, and like to learn git skills along the way.

Code evaluation

We use PMD for general code evaluation. which is most useful in computer-aided code evaluation. It often highlights the more interesting bits of code, and importantly, those code bits where errors may occur. Another set of tests involve tests for code readability, which is very important too, allowing your peers to review your code more efficiently. The Nighlty front page looks for PMD very much the same as for the other parts:

For example, we get warnings like these:

We here get reported about various things. For example, about short variable names, like 'st'. Really short variable names often make it harder to read the code, because they are less informative. Is 'ac' refering to the old or the new atom container?

We also get a warning about incorrect use of the StringBuffer.append() method, indicated where we can improve the code (making it faster in this case). We also see a CDK-specific test here (sources are here), warning us about a bad practice: interfaces should take data model interfaces, rather than implementations.

Conclusion

As will be clear, the Nightly reports provide a wealth of information helping code review. I hope this post has popularized this useful resource a bit more, and I invite you to visit it frequently. For example, it is a useful too to validate your own code before you send it for review. For the latter it is useful to know you do not have to install a full Nightly to do this. Mind you, for largest patch writing efforts, we can set up a Nightly crontab on a specific branch, as we have done frequently before.

But you can also run these code evaluations from the command line with:

$ ant clean dist-all test-dist-all jarTestdata

$ ant -Dmodule=io qa-module

This will run the JavaDoc, JUnit, and PMD tests, and store the results in the reports/ subfolder.

Search This Blog

This blog deals with chemblaics in the broader sense. Chemblaics (pronounced chem-bla-ics) is the science that uses computers to solve problems in chemistry, biochemistry and related fields. The big difference between chemblaics and areas such as chem(o)?informatics, chemometrics, computational chemistry, etc, is that chemblaics only uses open source software, open data, and open standards, making experimental results reproducible and validatable. And this is a big difference!

About Me

Assistant professor at the Dept of Bioinformatics - BiGCaT at NUTRIM, Maastricht University, studying biology at an unsupervised and atomic level. Open Science is my main hobby resulting in participation in, among many others, Bioclipse, CDK and WikiPathways. ORCID:0000-0001-7542-0286. Posts on G+ are personal.

Cookies

In the EU there is a directive upcoming requiring websites to warn people about HTTP cookies. This website uses the Blogger.com platform, Google Adsense (not that is it actually paying anything significantly), and a few scripts to count how often a blog post was tweeted, using Topsy and LinkedIn. These services undoubtedly make use of cookies, which you can disallow in your browser.