Author & Mentors

Abstract

Analysis of protein-protein complexes interfaces at a residue level yields significant information on the overall binding process. Such information can be broadly used for example in binding affinity studies, interface design, and enzymology. To tap into it, there is a need for tools that systematically and automatically analyze protein structures, or that provide means to this end. Protorop (http://www.bioinformatics.sussex.ac.uk/protorp/) is an example of such a tool and the elevated number of citations the server has had since its publication acknowledge its importance. However, being a webserver, Protorop is not suited for large-scale analysis and it leaves the community dependent on its maintainers to keep the service available. On the other hand, Biopython’s structural biology module, Bio.PDB, provides the ideal parsing machinery and programmatic structures for the development of an offline, open-source library for interface analysis. Such a library could be easily used in large-scale analysis of protein-protein interfaces, for example in the CAPRI experiment evaluation or in benchmark statistics. It would be also reasonable, if time permits, to extend this module to deal with protein-DNA or protein-RNA complexes, as Biopython supports nucleic acids already.

Project Schedule

Week 1 [23rd May - 31st June]

Add the new module backbone in current Bio.PDB code base

Evaluate possible code reuse and call it into the new module

Try simple calculations to be sure that there is stability between the different modules (parsing for example) and functions

Define a stable benchmark

Select few PDB files among interface size and proteins size would be different

Weeks 9-10 [26th July - 8th August]

Develop functions for Interface comparison

Otherwise, should be called through something like Ia.rmsd_to(Ib) where Ia and IB are interface objects

Calculation of iRMSD

Calculation of FCC (Fraction of Common Contacts)

Rough Identity and Similarity percentage

...

Unit tests, comparison with specific tools as Profit

Weeks 11 [9th July - 8th August]

Code organization and final testing

Unit tests

Unit tests will be perfomed along the project, allowing to do only a larger test at the end gathering every tests already performed.

Then the aim will be to optimized, if possible, some parts of the code in efficiency and rapidity without changes at algorithmic level. Several days will be booked to package code and be sure that everything can communicate with Biopython.

Project Progress

Implementation of Interface object backbone

Theory

We began to think of an easy way to add the Interface as a new part of the SMCRA scheme. The idea was to have this new scheme = SM-I-CRA. Unfortunately the Interface object is not as well defined as just a child of model and a parent of chains. Indeed, the main part of the interface is residues, and even residues pairs. We want to keep the information of the chain but we can't keep them as they are defined actually, since we will get some overlaps, duplication and miscompatibility between the chains of our model and the chains of our interface. In the same way, our try to link the creation of the interface with existing modules as StructureBuilder and Model wasn't successful.
So, we decided to simplify a bit the concept in adding the classes related to the Interface in an independent way. Obviously links will exist between the different levels of SMCRA but Interface would be considered now as a parallel entity, not integrated completely in the SMCRA scheme.

Coding

Interface.py is the definition of the Interface object inherited from Entity with the following methods : __init__(self, id), add(self, entity) and get_chains(self).

The add module overrides the add method of Entity in order to have an easy way to class residues according to their respective chains.
The get_chains modules returns the chains involved in the interface defined by the Interface object.

The second class created is InterfaceBuilder.py which deals directly with the interface building (hard to guess..!)
We find these different modules : __init__(self, model, id=None, threshold=5.0, include_waters=False, *chains), _unpack_chains(self, list_of_tuples), get_interface(self), _add_residue(self, residue), _build_interface(self, model, id, threshold, include_waters=False, *chains)

__init__ : In order to initialize an interface you need to provide the model for which you want to calculate the interface, that's the only mandatory argument.

_unpack_chains: Method used by __init__ so as to create self.chain_list, variable read in many parts of the class. It transforms a list of tuples (given by the user) in a list of characters representing the chains which will be involved in the definition of the interface.

get_interface: Returns simply the interface

_add_residue: Allows the user to add some specific residues to his interface

_build_interface: The machinery to build the interface, it uses NeighborSearch and Selection in order to define the interface depending on the arguments given by the user.

Extension of residues

Theory

In order to have several useful information about residues and in the aim to use them for further calculations inside interfaces, we want to implement a subclass of residues. This new class would be integrate as an inherited class of Residues (as DisorderedResidues) and would calculate by default few information about residues as polar charge, hydrophobicity or weight. We first thought to create a copy of an existing residue to create an extended residue but, with regards to the memory consumming, we preferred change directly the residue type. Thus, we change the class of the residue and perform calculation during the initialization of the new ExtendedResidue class.