Some days have passed, and the Debian mirrors have now picked up the CDK package (unstable only so far), allowing you to sudo aptitude install libcdk-java from your favorite local mirror. The details are available from this packages.debian.org/libcdk-java page. The fact that it is listed as contrib is a small mistake; the package is really main material.

Now, also make sure to install BeanShell (sudo aptitude install bsh), which allows you to start scripting the CDK. For example, consider this simple script:

A wrapper script cdkbsh that adds the CLASSPATH seems desirable here :) But you get the point.

Interestingly, BeanShell also comes with a graphical user interface, as well as a command line based scripting environment. Both make perfect set ups for quickly testing some code. The GUI version xbsh looks like (don't forget to set the CLASSPATH):

Wednesday, February 20, 2008

Michael Koch (aka man-di) and Daniel Leidert (as part of the pkg-java team) have worked on packaging the CDK. The ran into some issues, such as the CDK build system not perfectly compatible with the Debian java libraries in /usr/share/java. Both detection of the available libraries as well as putting them in the classpath, caused trouble with the CDBS-based build system wrapping around the Antbuild.xml (note the many commit this weekend ;).

The result is noteworthy: CDK has entered the Debian NEW queue. This means that the Debian experts will check that CDK is really ready to enter Debian. Licenses will be checked, for example. This has been one of my long standing wishes, and I am happy that Michael got around to getting things done. Cheers!

To report a problem in CrystalEye, simply bookmark an example of the problem with the tag “crystaleyeproblem”, using the Description field to describe the problem. All the problems will appear on the tag feed.

When we fix the problem we’ll add the tag “crystaleyefixed” to the same bookmark. If you subscribe to this feed, you’ll know to remove the crystaleyeproblem tag.

In the fullness of time, we’re planning to use connotea tags to annotate structures where full processing hasn’t been possible (uncalculatable bond orders, charges etc).

Now, Connotea is advertised as a [f]ree online reference management for all researchers, clinicians and scientists, and I have never really been happy with any HTML page ending up in the system, I would counter the suggestion by using social bookmarking websites for any HTML page (not just publications), such as Del.icio.us (see their list of CrystalEye bookmarks).

Anyway, it does not really matter, and Connotea has an open API to query the database. This will allow Jim to write a simple userscript to enhance each CrystalEye page with a list of bug reports. That will allow every CrystalEye visitor to see what others are commenting on it. In that respect, many other things can be envisioned... Getting comments on the paper behind the crystal structure from Chemical blogspace and Postgenomic, ...

Tuesday, February 05, 2008

Mathieu Fourment (et al.) just published a paper on some performance testing on 6 programming languages in BMC Bioinformatics: A comparison of common programming languages used in bioinformatics (doi:10.1186/1471-2105-9-82). The below figure is from the paper, for a sequence alignment exercise (copyright with paper authors, OpenAccess license of journal):

Nothing shocking, I'd say; Java is similar in performance to C++.

What I'd love to have seen, was the performance of compiled Java too, using the java compiler (gcj) which comes with GCC 4.1.1. No idea why that was left out. One could also question why they did no use the 1.6 JVM of Sun, which is more faster (see these results on running the CDK unit tests). And, a major omission is Fortran.

Anyway, the authors provide the source code, so we can easily test ourselves the effects of that.

Saturday, February 02, 2008

Today, Miguel (who made the 10000th CDK commit) and I gave LaunchPad a go, because if offers a nice GUI for planning and monitoring source code development. We have set up a CDK team and a CDK project. LaunchPad has overlap with SourceForge functionality, but they idea is not to duplicate functionality. Moreover, we do not translate the CDK either, so that LaunchPad functionality is not useful either. Not for the CDK at least; maybe for Jmol and Bioclipse?

However, we are interested in the task management system of LaunchPad. While the CDK project is currently maintaining a Project Maintenance Tasks tracker, it does not have the feature richness of the LaunchPad equivalent. The latter allows us to link tasks with series goals. We currently basically have two series: the cdk1.0.x/ branch, and trunk. Miguel and I have been working on getting the ionization potential prediction in trunk working, which involves about all the code Miguel wrote during his PhD thesis with Christoph. And, this is one of the goal of the next stable CDK series (replacing the 1.0.x series). This is something we can easily define in LaunchPad:

Getting the IP-prediction code updated for the new CDK atom types and other changes, and making it CDK stable involved quite a long list of tasks, which shows dependencies. For example, I can't continue cleaning up the partial charge prediction code, before the resonance structure generator in the reaction module is working properly again. This in turn depends on me adding missing radical and charge atom types, which in turn depends on expected atom types, which Miguel had to implement. And this last is actually what he was committing around the 10000th commit.

Now, Miguel and I will try to manage this development in trunk using LaunchPad. It allows as to define all these smaller tasks, but, more importantly, the dependencies between them:

As such, LaunchPad gives us the means to manage this complex development. It shows up what we're facing, how far we have progressed, and much, much more:

This goes well beyond what SourceForge has to offer; this will be an interesting experiment. I do not anticipate dropping SourceForge at all (just in case you were wondering...); they have served as generally very, very well; and completely free too! (LaunchPad is free too) As far as I can see, they form a perfect complement. Like a ligand and an enzyme, like opensource and open notebook science, or like a Mammoth and an ice field.

Speaking about ONS... Jean-Claude, not sure if LaunchPad would be open to projects without source code too...

Search This Blog

This blog deals with chemblaics in the broader sense. Chemblaics (pronounced chem-bla-ics) is the science that uses computers to solve problems in chemistry, biochemistry and related fields. The big difference between chemblaics and areas such as chem(o)?informatics, chemometrics, computational chemistry, etc, is that chemblaics only uses open source software, open data, and open standards, making experimental results reproducible and validatable. And this is a big difference!

About Me

Assistant professor at the Dept of Bioinformatics - BiGCaT at NUTRIM, Maastricht University, studying biology at an unsupervised and atomic level. Open Science is my main hobby resulting in participation in, among many others, Bioclipse, CDK and WikiPathways. ORCID:0000-0001-7542-0286. Posts on G+ are personal.

Cookies

In the EU there is a directive upcoming requiring websites to warn people about HTTP cookies. This website uses the Blogger.com platform, Google Adsense (not that is it actually paying anything significantly), and a few scripts to count how often a blog post was tweeted, using Topsy and LinkedIn. These services undoubtedly make use of cookies, which you can disallow in your browser.