The JavaDoc documentation notes that missing @param and @return values are inherited implicitly since JavaDoc 1.3, but I have never noticed this in Java6. The above explicit markup is confirmed to work.

The workflow paradigm allows scientists to flexibly create generic workflows using different kinds of data sources, filters and algorithms, which can later be adapted to changing needs. In order to achieve this, library methods are encapsulated in Lego(TM)-like building blocks which can be manipulated with a mouse or any pointing device in a graphical environment, relieving the scientist from the need to learn a programming language. Building blocks, so-called workers, are connected by data pipelines to enable data flow between them, which is why pipelining is often used interchangeably for workflow.

Taverna is not the only open source workflow environment, but certainly gotten a lot of visibility in the eScience communities in at least The Netherlands and the UK. There exists other workflow environments too with CDK nodes, including KNIME which is since version 2.1.0 licensed GPL3.

Thomas uploaded some 17 example workflows to MyExperiment.org, to give you further idea what the system can do. Development has gone down considerably, since Thomas finished his thesis, and if you like to work on the CDK-Taverna project, and be the next Dr Who, please contact me, Achim or Christoph. I started experimenting with CDK nodes for Taverna in 2005 (see CDK-Taverna fully recognized), and would love to see it live on. Andreas and I made an attempt last December to port things to Taverna 2.1, and the code we worked on can be found in this GitHub repository.

Saturday, March 27, 2010

I noted today that blogger.com, the blog service provider I am using, had new templates. I was getting tired of the old one anyway, so tried the simple template using the usual orange: quite satisfactory! At least, beats buying a book like this Blogger: Beyond the Basics. Don't have time for that.

I tweaked the template a bit. For example, the default labels widget does not allow me to limit the shown labels to those with at least X uses. So, I hacked the HTML of the widget and added an extra if statement:

Blogger.com Pages
I also move around some element, and also nice is the new Pages concept. I wish I could hide the side bar on these pages, but the currently fairly happy with the ability to embed my egonw.github.com homepage:

Internet Explorer 6 EOL
The new template does not work with Internet Explorer 6.0. Honestly, I see no reason why you would like to run that browser anyway, but now you no longer can use it to read my blog. Just upgrade, and complain with your IT department if you cannot do it yourself. There are not so many of you, though. Only 18.95% uses Internet Explorer, of which about 20% still uses 6.0:

Actually, of the 73 visits with IE6 in the past 30 days, only 12 were hits of regular visitors. So, could I please ask this one visitor to email me offline if upgrading is not an option?

Friday, March 26, 2010

PMD is a tool to run some tests against your source code. The check for code style, common problems, and places where code could be improved. The CDK has been using it for years now, such as here for CDK 1.3.x.

Running the PMD tests from the command line
When you are writing patches for the CDK, you can run the PMD tests via an Ant file, for example via the command line:

$ ant -f pmd.xml

However, when working on a single file, you will likely appreciated running the tests against a single module. This can be done with (for the data module):

The pmd.xml does not create HTML pages, like Nightly does. Instead, an XML file is currently created. The xpath utility can be used to filter out the information we are interested in. For example, if we want to reports just about DefaultChemObjectBuilder, we issue:

The meeting is also bound to be fun. I have not done much in the area of toxicology other than the more general QSAR/QSPR model building with chemometrics. But I have been recently taking to Nina and other of the OpenTox community, and started to play a bit with the data and computation API they are developing.

More efficient use of the LoggingTool
Quite a long time ago, Jmol developer Miguel introduced me to a nice performance hint with respect to using logging tools. Each debug(), info(), warn(), etc method should take more than one parameter, so that only when debugging (or the debug level) is turned on, the objects are concatenated. It indeed gave a considerable performance boost to things. The CDK supports this too, and you should not concatenate Strings and other objects, but let the LoggingTool do that.

Monday, March 15, 2010

As you know from my blog, one of the things I am working on is to push RDF functionality in Bioclipse, as I believe it to be the missing link between molecular chemometrics and literature, databases, and other non-numerical information sources.

Well, this really nice New QSAR Project wizard was cool enough to trigger a I-want-more reaction, so I just had to hack it up with some additional SPARQL functionality. So, the next version does not only use RDF and SPARQL to aggregate the QSAR data set, it also uses SPARQL to make the wizard interactive. While the user is typing a target ID, the wizard will check the SPARQL end point in the background and download the target's type, title and organism, as well as update the list of activities the user can select depending on what the chEMBL database has for that target:

The actual code base is pretty small, and that's what happens when you mash up the right technologies :)

Sunday, March 07, 2010

In a desperate attempt to force me to write on my CDK code snippet book, I'm going to write some code tips to create clear code. Hopefully, this is useful for people writing patches and reviewers alike, too.

Use List instead of the untyped List
Quite some time ago, the Java language introduced typed lists. These lists can contain only objects of a particular type, which is a very common use case. Indeed, the CDK has quite a few lists that are strongly typed. Typing the list prevents you from accidentally adding something of the wrong type, but also reduced the amount of casting, so that your code becomes cleaner.

If you do not need the index, use a for-each loop
When iterating over atoms in a list, you sometimes need to know the index, for example, to compare the IAtom with that at the same position in another list. However, when this is not needed, you can use the Java for-each loop instead. This will further simplify the above code to:

Saturday, March 06, 2010

I'd like to announce first alpha version of OOChemistry. It is an extension for OpenOffice.org which provides cross-platform OLE-like integration of OOo with JChemPaint chemical diagram editor. With OOChemistry you can draw structure, embed into document (text or presentation) and than double click and edit whenever you want on any platform having OpenOffice.org and Java Runtime (Windows, Linux, Mac OS X, other Unix flavours). Remember that it's only alpha and is not recommended for production use (e.g., compatibility with futher versions is not guaranteed).

OOChemistry needs your help! Experience in Java, in development of projects dealing with JChemPaint/CDK, or in development of OpenOffice.org extensions will be highly appreciated. Of course, you can help not only in coding, but also in translation of interface and writing docs.

--
Regards,
Konstantin

Your feedback as well as coding contributions are very much appreciated! I am excited about seeing chemical editing facilities in OpenOffice.org, and while the integration is not as good as Chem4Word, it is something I can run on my Linux system.

Summary
In a brief summary, this release mostly focuses on applying a number of small bug fixes and patches. But there are some things of interest: Stefan is working on structure generation and rewrote PartialFilledStructureMerger and CrossoverMachine. I introduced some generics magic in the reader API which I learned from Arvid in the CDK-JChemPaint patch. This patch removes the need to cast when reading an IChemObject from a file in the readers which have been updated (MDLV2000Reader only at this time). Instead, you can now just type:

IMolecule mol = reader.read(new Molecule());

The list of patches furthermore contains an update of the PubChem reader to support reading of additional fields, and the support of the CML @ref attribute in CMLReact (doi:10.1021/ci0502698).

But the most interesting bit of this release is to me, that the last few patches are now reviewed and applied to make CDK-JChemPaint compile against a off-the-shelf CDK release (1.3.3 or higher :).

Reviewers
This is a new category too, and created using the command git log cdk-1.3.2.. | grep Signed-off | cut -d':' -f2 | cut -d'<' -f1 | sort | uniq -c. Not every reviewer signs off commits, and no one other than the current commit right owners actually do this. Everyone is more than invited to check the patch tracker, and review patches give comments if you feel the patch can be improved, or sign it off otherwise (git commit --amend --signoff), which gives the other reviewers some idea of the state of the patch. Rajarshi did most of the reviewing work of this release; his contributions are very much appreciated.

Thursday, March 04, 2010

Jonathan worked this week on new features for the Bioclipse RDF editor (see thesetwo earlier items). This version still does not edit, but only display using Zest. Jonathan created for me an extension point so that anyone can make the editor aware of domain objects, by simply registering the extension implementation along with the rdf:Class URI of the rdf:type of an object. This fixes the problem of having to hardcode dependencies of the RDF editor on all the domain code, as was the case earlier.

For example, the cheminformatics IMolecule object is now linked to the rdf:type <http://www.bioclipse.net/structuredb/#Molecule>:

This is very much tied into the Jena data model, so not entirely clean, but has to do for now. The first method converts RDF content into a Bioclipse IBioObject, such as an IMolecule (see this list of currently supported objects). The second method returns an icon, which makes the editor more visually pleasing, and provides a nice way to see when you can double click the RDF node to have it open in an domain specific editor:

For example, double clicking the ron:mol2 node, would open up a JChemPaint editor.

Search This Blog

This blog deals with chemblaics in the broader sense. Chemblaics (pronounced chem-bla-ics) is the science that uses computers to solve problems in chemistry, biochemistry and related fields. The big difference between chemblaics and areas such as chem(o)?informatics, chemometrics, computational chemistry, etc, is that chemblaics only uses open source software, open data, and open standards, making experimental results reproducible and validatable. And this is a big difference!

About Me

Assistant professor at the Dept of Bioinformatics - BiGCaT at NUTRIM, Maastricht University, studying biology at an unsupervised and atomic level. Open Science is my main hobby resulting in participation in, among many others, Bioclipse, CDK and WikiPathways. ORCID:0000-0001-7542-0286. Posts on G+ are personal.

Cookies

In the EU there is a directive upcoming requiring websites to warn people about HTTP cookies. This website uses the Blogger.com platform, Google Adsense (not that is it actually paying anything significantly), and a few scripts to count how often a blog post was tweeted, using Topsy and LinkedIn. These services undoubtedly make use of cookies, which you can disallow in your browser.