mol2chemfig, a tool for rendering chemical structures from molfile or SMILES format to LATE X code.

Abstract

: Displaying chemical structures in LATE X documents currently requires either hand-coding of the structures using one of several LATE X packages, or the inclusion of finished graphics files produced with an external drawing program. There is currently no software tool available to render the large number of structures available in molfile or SMILES format to LATE X source code. We here present mol2chemfig, a Python program that provides this capability. Its output is written in the syntax defined by the chemfig TE X package, which allows for the flexible and concise description of chemical structures and reaction mechanisms. The program is freely available both through a web interface and for local installation on the user's computer. The code and accompanying documentation can be found at http://chimpsky.uwaterloo.ca/mol2chemfig.

mol2chemfig processing flowchart. Rendering molecular structures with mol2chemfig involves two separate executables, namely mol2chemfig itself (or, strictly speaking, the Python interpreter, which runs mol2chemfig) and a TE X engine such as pdftex. Processing inside TE X requires several packages, all of which will be loaded into LATE X by requiring the mol2chemfig package.

Structure of norepinephrine, rendered with hand-written or mol2chemfig-generated chemfig code. The code in A was hand-written and uses chemfig’s dedicated syntax for specifying rings (see lines 7–14). The code in B was generated with the mol2chemfig command shown at the top; it does not use chemfig’s ring syntax but instead treats the ring much like a regular branch. Each line of code specifies one bond; the number in the line-end comment specifies the atom that this bond connects to. While the code examples use line breaks and indentation for clarity, this is not required; whitespace is insignificant to chemfig.

Structure of FMNH. The structure of FMNH (flavin mononucleotide hydride) contains charges and a radical, which are preserved during conversion with mol2chemfig. The chemfig code was generated using the --terse option, which removes whitespace and comments from the output.

Structure of doxorubicin. The structure of doxorubicin, rendered from a PubChem record without A or with B recalculation of coordinates. The code examples in both A and B are truncated. See text for additional details.

Structure of aspirin, composed from two sub-molecules. This hand-written example illustrates the use of chemfig’s submol mechanism. Two named sub-molecules are defined, which can then be referenced to compose the complete molecule.

Construction of a tripeptide from a mol2chemfig-generated aminoacyl residue. The file containing the coordinates for a phenylalanyl residue was rendered to a ∖submol definition, and three copies of the latter were concatenated. In A, mol2chemfig was allowed to arbitrarily pick the first and last atoms of the sub-molecule’s main chain, which causes the connecting bonds to be misplaced. In B, the atom numbers of the molfile were displayed using the -n or --atom-numbers option. In C, atoms 6 and 11 were specified as the main chain entry and exit points, respectively; this causes the connecting bonds to be placed as intended. In the generated code, the amino group was manually adjusted.