How to solve problems with coordinate bonds in Rdkit

When I have been working with chemical databases and import of molecules I have encountered numerous problems with the way chemical structures are drawn. Most often the problem arises as creative users in the past have had a problem with registering a compound the way they wanted it. Sometimes the used software did not support a correct way, so the users found a workaround, which can give rise to errors with other software packages. A common problem arises with organometallic compound and coordinate bonds. Read on if you want to learn more about how to handle these kind of compounds with Rdkit.

Chemical Bond Types and File Formats

Chemical bonds come in a variety of versions. A regular covalent bond is formed when two atoms combine orbitals with only one electron each. This gives a molecular orbital with two electrons. However, if the electronegativity differences are large enough, a ionic bond is formed where the most electronegative atom attracts both electrons. Both these situations are handled by Rdkit without problems.
For the coordinate or dative bond, which this blog post is about, one atom donates both electrons into a covalent bond forming with another atom possessing an empty orbital. These differences in bond types can give rise to problems with software and formats that only supports the other bond types. The widely used .mol and .sdf formats didn’t contain dative or coordinate bonds in the V2000 format, but from around 2011 the hydrogen bond and dative bond was added to the specification for the V3000 format. You can read more about the formats and specification at https://en.wikipedia.org/wiki/Chemical_table_file and http://download.accelrys.com/freeware/ctfile-formats/ctfile-formats.zip (Sorry, the last file is behind a Login wall).

I recently had a need to work with database registration of chemical complexes, where it could be nice to specify the coordinate bonds. In the existing .sdf files they were specified as regular covalent bonds, but this led to problems with the internal chemistry model and electron accounting in both RDKit and Chemdraw. MarvinSketch supports a coordinate bond type and can read and write correctly formed v3000 Molfiles, ChemDraw version 15 had some issues with the V3000 format that I have informed Perkin-Elmer about. I don’t know if they have fixed them.

The bond type are defined in the BONDS section of the v3000 molfile in the second column of the bond line (M V30 ). Below is shown a snippet where the bond type 9 is the dative or coordinate bond type.

RDKit and Dative/Coordination Bonds

Internally RDKit has support for a dative bond type, but after experimenting a bit, it was clear that v3000 formatted mol files with dative bonds could not be loaded into RDkit. If the bond types where changed to dative in an RDKit python session, they could also not be saved into .mol format when forcing the v3000 format.

Luckily, RDKit is open source (Thanks Greg and others :-), so it was possible to dive into the code and see the import and export functions. The C++ function responsible for loading mol files in v3000 formats was found in the ParseV3000BondBlock function in the Code/GraphMol/FileParsers/MolFileParser.cpp file. There I found a switch case handling the bond types and it was easy to add two extra cases for bond type 9 and 10.

This also gives the right bond type in the Testsave molfile (M V30 4 9 4 5 ). Good. This illustrates how easy it sometimes is to work with the RDKit codebase. I’ve just started to work with the CPP code of RDKit and I have not done extensive testing of the patch yet. However, All the ctest tests included with the RDKit source completes without error, so it seem no to have broken anything.