Note that OBMolTorsionIter returns atom IDs which are off by one. That is, you need to add one to each ID to get the correct ID. Also, if you add or remove atoms, you will need to delete the existing TorsionData before using OBMolTorsionIter. This is done as follows:<pre>

+

mol.DeleteData(openbabel.TorsionData)

</pre>

</pre>

Revision as of 08:23, 13 October 2008

This page describes how to use the OpenBabel library from Python. First of all, if you haven't already installed the Python bindings, you should do so now:

Windows users should download the OpenBabel Python module from the Install page

Linux and MacOSX users should compile OpenBabel and the Python bindings themselves as described on the Install page. Alternatively, you can check your distribution's package manager for something like 'openbabel-python' or 'python-openbabel'.

If you have any problems or want to ask a question, please send an email to the openbabel-scripting mailing list.

The most important thing that you need to understand is that there are two ways to access the Open Babel library using Python:

The openbabel module, a direct binding of the Open Babel C++ library, created using the SWIG package.

Pybel, a set of convenience functions and classes that simplifies access to the openbabel module.

You should probably read the Pybel paper now. Remember to cite the Pybel paper if you use Pybel to obtain results for publication:

The openbabel module

The openbabel module provides direct access to the C++ Open Babel library from Python. This binding is generated using the SWIG package and provides access to almost all of the Open Babel interfaces via Python, including the base classes OBMol, OBAtom, OBBond, and OBResidue, as well as the conversion framework OBConversion. As such, essentially any call in the C++ API is available to Python scripts with very little difference in syntax. As a result, the principal documentation is the Open Babel C++ API documentation.

Examples

Here we give some examples of common Python syntax for the openbabel module and pointers to the appropriate sections of the API documentation.

The example script below creates atoms and bonds one-by-one using the OBMol, OBAtom, and OBBond classes.

More commonly, Open Babel can be used to read in molecules using the OBConversion framework. The following script reads in molecular information (a SMI file) from a string, adds hydrogens, and writes out an MDL file as a string.

OBMolAtomBFSIter - given an OBMol and the index of an atom, OBMolAtomBFSIter iterates over all the neighbouring atoms in a breadth-first manner. It differs from the other iterators in that it returns two values - an OBAtom, and the 'depth' of the OBAtom in the breadth-first search (this is useful, for example, when creating circular fingerprints)

OBMolPairIter - given an OBMol, iterate over all pairs of OBAtoms separated by more than three bonds

OBResidueIter - given an OBMol representing a protein, iterate over all OBResidues

These iterator classes can be used using the typical Python syntax for iterators:

for obatom in openbabel.OBMolAtomIter(obmol):
print obatom.GetAtomicMass()

Note that OBMolTorsionIter returns atom IDs which are off by one. That is, you need to add one to each ID to get the correct ID. Also, if you add or remove atoms, you will need to delete the existing TorsionData before using OBMolTorsionIter. This is done as follows:

mol.DeleteData(openbabel.TorsionData)

Calling a method requiring an array of C doubles

Some Open Babel toolkit methods, for example OBMol::Rotate(), require an array of doubles. It's not possible to directly use a list of floats when calling such a function from Python. Instead, you need to first explicitly create a C array using the double_array() function:

Pybel

Pybel provides convenience functions and classes that make it simpler to use the Open Babel libraries from Python, especially for file input/output and for accessing the attributes of atoms and molecules. The Atom and Molecule classes used by Pybel can be converted to and from the OBAtom and OBMol used by the openbabel module. These features are discussed in more detail below.

Information on the Pybel API can be found at the interactive Python prompt using the help() function, and is also available here: Pybel API.

It is always possible to access the OBMol or OBAtom on which a Molecule or Atom is based, by accessing the appropriate attribute, either .OBMol or .OBAtom. In this way, it is easy to combine the convenience of pybel with the many additional capabilities present in openbabel. See Combining Pybel with openbabel.py below.

Molecules have the following attributes: atoms, charge, data, dim, energy, exactmass, flags, formula, mod, molwt, spin, sssr, title and unitcell (if crystal data). The .atoms attribute provides a list of the Atoms in a Molecule. The .data attribute returns a dictionary-like object for accessing and editing the data fields associated with the molecule (technically, it's a MoleculeData object, but you can use it like it's a regular dictionary). The .unitcell attribute gives access to any unit cell data associated with the molecule (see OBUnitCell). The remaining attributes correspond directly to attributes of OBMols: e.g. Molecule.formula is equivalent to OBMol.GetFormula(). For more information on what these attributes are, please see the Open Babel Library documentation for OBMol.

For example, let's suppose we have an SD file containing descriptor values in the data fields:

Molecules have a .write() method that writes a representation of a Molecule to a file or to a string. See Input/Output below. They also have a .calcfp() method that calculates a molecular fingerprint. See Fingerprints below.

The .draw() method of a Molecule generates 2D coordinates and a 2D depiction of a molecule. It uses the OASA library by Beda Kosata to do this (see the section below on Installing OASA). The default options are to show the image on the screen (show=True), not to write to a file (filename=None), to calculate 2D coordinates (usecoords=False) but not to store them (update=False).

If a molecule does not have 3D coordinates, they can be generated using the .make3D() method. By default, this includes 50 steps of a geometry optimisation using the MMFF94 forcefield. The list of available forcefields is stored in the forcefields variable. To further optimise the structure, you can use the .localopt() method, which by default carries out 500 steps of an optimisation using MMFF94. Note that hydrogens need to be added before calling localopt().

The .calcdesc() method of a Molecule returns a dictionary containing descriptor values for LogP, Polar Surface Area ("TPSA") and Molar Refractivity ("MR"). A list of the available descriptors is contained in the variable descs. If only one or two descriptor values are required, you can specify the names as follows: calcdesc(["LogP", "TPSA"]). Since the .data attribute of a Molecule is also a dictionary, you can easily add the result of calcdesc() to an SD file (for example) as follows:

mol = readfile("sdf", "without_desc.sdf").next()
descvalues = mol.calcdesc()
# In Python, the update method of a dictionary allows you
# to add the contents of one dictionary to another
mol.data.update(descvalues)
output = Outputfile("sdf", "with_desc.sdf")
output.write(mol)
output.close()

For convenience, a Molecule provides an iterator over its Atoms. This is used as follows:

Input/Output

One of the strengths of Open Babel is the number of chemical file formats that it can handle. Pybel provides a dictionary of the input and output formats in the variables informats and outformats, where the keys are the three-letter codes for each format (e.g. 'pdb') and the values are the descriptions (e.g. 'Protein Data Bank format').

Pybel greatly simplifies the process of reading and writing molecules to and from strings or files. There are two functions for reading Molecules:

If a single molecule is to be written to a molecule or string, the .write() method of the Molecule should be used:

mymol.write(format) returns a string

mymol.write(format, filename) writes the Molecule to a file. An optional additional parameter, overwrite, should be set to True if you wish to overwrite an existing file.

For files containing multiple molecules, the Outputfile class should be used instead. This is initialised with a format and filename (and optional overwrite parameter). To write a Molecule to the file, the .write() method of the Outputfile is called with the Molecule as a parameter. When all molecules have been written, the .close() method of the Outputfile should be called.

Fingerprints

The .calcfp() method takes an optional argument, fptype, which should be one of the fingerprint types supported by OpenBabel (see Tutorial:Fingerprints). The list of supported fingerprints is stored in the variable fps. If unspecified, the default fingerprint ("FP2") is calculated.

Once created, the Fingerprint has two attributes: fp gives the original OpenBabel vector corresponding to the fingerprint, and bits gives a list of the bits that are set.

The Tanimoto coefficient of two Fingerprints can be calculated using the "|" operator.

Combining Pybel with openbabel.py

It is easy to combine the ease of use of Pybel, with the comprehensive coverage of the Open Babel toolkit that openbabel.py provides. Pybel is really a wrapper around openbabel.py, with the result that the OBAtom and OBMol used by openbabel.py can be interconverted to the Atom and Molecule used by Pybel.

The following example shows how to read a molecule from a PDB file using Pybel, and then how to use openbabel.py to add hydrogens. It also illustrates how to find out information on what methods and classes are available, while at the interactive Python prompt.