Pages

Sunday, May 27, 2012

Finding where to put double bonds...

SMILES has a convenient feature to mark elements from the organic subset in lower case, indicating a particular hybridization state (aromaticity). The locations of double bonds are then not explicitly given, reflecting the delocalized nature of those systems:

However, there are many situations where you do like to know the position of those double bonds, or at least on solution of the set of possible combinations, such as:

Finding the positions of the double bonds is one of the core algorithms in cheminformatics. The CDK had a few algorithms for a long time, one looking at ring systems (DeduceBondSystemTool) and one tackling a more general problem (SaturationChecker); the first was recently found to be slow, caused by the use of the AllRingsFinder (which is slow because of the combinatorial set of ring combinations), and the second never really work that well, because it did not use the CDK atom type perception code.

Recently, Kevin and Klas set off in parallel to develop new implementations. Kevin focusing on improving the DeduceBondSystemTool, and Klas starting from the more general use case.

Kevin's new code was tested by Nina, and found to behave pretty well, with an error rate of well below 1%. Klas' code is still being developed, but I am very much looking forward to his code, as it is not limited to ring systems.

That said, Kevin's code has been merged into the cdk-1.4.x branch, and will be part of the next release, and is ready to be used now. The basic use is pretty simple when starting with SMILES:

Search This Blog

This blog deals with chemblaics in the broader sense. Chemblaics (pronounced chem-bla-ics) is the science that uses computers to solve problems in chemistry, biochemistry and related fields. The big difference between chemblaics and areas such as chem(o)?informatics, chemometrics, computational chemistry, etc, is that chemblaics only uses open source software, open data, and open standards, making experimental results reproducible and validatable. And this is a big difference!

About Me

Assistant professor at the Dept of Bioinformatics - BiGCaT at NUTRIM, Maastricht University, studying biology at an unsupervised but atomic level. Open science is my main hobby resulting in participation in, among many others, Bioclipse, CDK and Wikipathways. ORCID:0000-0001-7542-0286. Posts on G+ are personal.

Cookies

In the EU there is a directive upcoming requiring websites to warn people about HTTP cookies. This website uses the Blogger.com platform, Google Adsense (not that is it actually paying anything significantly), and a few scripts to count how often a blog post was tweeted, using Topsy and LinkedIn. These services undoubtedly make use of cookies, which you can disallow in your browser.