Tuesday, March 18, 2014

ChemDraw ChemAxon synergy

In my previous post, I was complaining that there wasn't any free software with a nice command line interface to reliably convert molfiles to InChI strings, and back again. I also mentioned that there was an issue with the way ChemDraw converts structures to InChI strings that made it unacceptable for my purposes. The technical issue with ChemDraw is that it doesn't preserve the isoform of tautomers. Some of the molecules I'm interested in contain amides that are typically found in the amide form, rather than the imidic acid form. When I copy these molecules as InChI from ChemDraw, and then paste them back in, the amide is changed to the imidic acid form, which I don't want. It turns out that this is due to a feature of the InChI format (a format that is still mostly opaque to me) called "Mobile H Perception", where it simplifies a molecule encoding by not specifying the which tautomer it is (thereby saving 1 bit of information I guess). Many programs have the option to export InChI with Mobile H perception off, which is what I want, but I can't find that option in Chemdraw.

The ChemAxon program MolConverter, does have the option to turn mobile H perception off. It is also a convenient, easy to use, free, program for converting molecule formats, and had I known about it earlier, I wouldn't have made my previous post because I would have used MolConverter instead.

What MolConverter doesn't do so well is generate 2d coordinates for atoms from InChI strings. Converting an InChI string of a big molecule into an image or a molfile winds up looking pretty bad in complicated regions. It's likely that there are ways to tweak the settings to make nicer renderings, but I haven't figured it out, and I was quite happy with the ChemDraw renderings, so those are the ones I'd like to use.

The problem: Same as the previous post. I still want "normalized" molfiles. Now, just because I can, I also want svg images of all of the molecules after normalization.

The solution: Use ChemAxon MolConverter to generate tautomerically unambiguous InChI strings (and InChI keys and SMILES) from molfiles. Then use pywin32 COM scripts to paste them into ChemDraw and save the resulting molfile. Then use MolConverter to convert the new molfiles into svg images.

Setup: ChemDraw Pro 13 (other versions may work too but I haven't tried any) should be installed. Also install ChemAxon MarvinBeans. It's also helpful to add the MarvinBeans bin directory to your PATH system variable (for me this was "C:\Program Files (x86)\ChemAxon\MarvinBeans\bin").

The FixedH option turns off Mobile H perception, and the SAbs option forces it to use absolute stereochemistry (which is what I want). AuxNone because I don't want to save the coordinates, and ChemDraw can't read that anyways. Key, because I also want the InChI key.

Run molconvert: molconvert smiles:"u" input.mol -o output.smiles

u option generates a unique smiles

Copy the SMILES (smiles seem to end up rendering a little nicer than InChI) string onto the clipboard

Result:
I think it worked pretty well. Below is the rendering of the drug Paclitaxel. It's not perfect, for example you can see where two of the methyl groups in the middle get drawn overlapping part of the ring. But it's still perfectly readable and unambiguous and looks pretty nice. So I think this adventure was successful.