My primary research goal is the understanding of the relationship
between the amino acid sequence of a protein and its consequent
three-dimensional structure and function. To this end I have been
developing computational methods for the prediction of protein
structure and function from sequence.

I am the chief developer of the
Phyre2 server for
protein structure prediction. Phyre2 receives approximately 1,000
submissions per day and has been cited over 1,800 times. Phyre2 is the
successor of my previous system known as 3D-PSSM. 3D-PSSM was one of
the first widely-used web-based protein structure prediction servers
and has received over 1,600 citations.

My most recent work has been on improving and developing the Phyre
system by adding new functionality to analyse model quality, predict
function, effects of mutations and building complexes. Phyre2.

Videos

A quick introduction to the basics of protein structure prediction by
homology as performed by the Phyre2 server

Philosophy of Science

I have a continuing collaboration with Dr.
Michael Scott at the University of Manchester on the philosophy of
science. We recently published "The evolution of biology. A shift
towards the engineering of
prediction-generating tools and away from traditional research
practice. Kelley LA and Scott MA. EMBO reports 9,
12, 1163-1167 (2008). [pdf]

&nbsp

ePlant and the 3D display initiative

Dr. Geoff Fucile and Prof. Nicholas Provart, University of Toronto on
the ePlant initiative.[article](2011).

ePlant is a suite of open-source world wide web-based tools for the
visualization of large-scale data sets from the model organism
Arabidopsis thaliana. This includes sequence homology relationships
and single nucleotide polymorphism data, protein structure models
(produced by Phyre), molecular interaction networks, subcellular localization and gene expression data.

&nbsp

Science/art

For the last 3 years I have been in close collaboration with the
computer artist Prof. William
Latham at Goldsmiths University of London and the rest of the Mutators team. Our first
piece of work involved combining William's techniques for data
visualisation with the evolutionary history of a protein that diverged
into two forms - one present in the lens of the eye and one present in
the liver. This became the "History of the Species" film which you can
watch on the right.

This work has been exhibited at several events including Siggraph '07, the Medical
Research Council (MRC) in Mill Hill and as an insert
center-fold poster in the Jan. 24 issue of the New Scientist.

More recently we have been developing, with the expertise of
Stephen Todd and pilot work by Ben Jefferys, a system to
interactively explore and visualise protein folding and
protein-protein docking in real-time. This system is known as Foldsynth
and I will be putting demos up soon. A litle snapshot can be seen below.

2001 - present - Post-doctoral Research Fellow at Imperial College London

Poing

With Dr. Ben Jefferys we developed the Poing model of protein folding
and studied the effect of macromolecular crowding on simulated folding
from a virtual ribosome [pdf].
See videos above for more information.

3DLigandsite

With Dr. Mark Wass
we developed the 3DLigandSite web
server for the prediction of potential ligand binding sites given a
protein structure - experimental or modelled. 3DLigandSite was ranked
as one of the top performing methods for binding site prediction for
the past 4 years at CASP. It is also tighly coupled to the new Phyre2
web server.

Toward a map of the global proteome

With Dr. Daniel Chubb we investigated the effect of the exponential
growth in the protein sequence database on our ability to detect
remote homologies over time [pdf].
A surprising result was that despite such enormous growth in sequence
data, our ability to detect remote homology has reached a plateau as
early as 2004 using the most widely cited method PSI-Blast. This work
has received several rewards such as an invited talk and poster prize
at MASAMB 2009 and a special presentation at CASP9.

Phyre

I together with Dr. Riccardo Bennet Lovesey, developed the Phyre
protein structure prediction server, which is now one of the most
widely used systems of its kind. Phyre is based on the application of
profile-profile matching to detect remote evolutionary relationships
between proteins [pdf].

AprilII

April II - Applications in probabilistic inductive logic. Worked in
the large international European-funded AprilII project to apply the
machine learning technique of Inductive Logic Programming combined
with probability measures to protein fold classification. This work
involved a collaboration with Prof. Stephen Muggleton.

SVILP predicting binding specificity

Applied the technique of Support Vector Inductive Logic Programming to
the problem of predicting protein-ligand binding site
specificity. SVILP involves learning rules using ILP. These rules then
form the attributes of a feature vector representing a training or
testing data example. A support vector machine is then used to learn
the relationship between input examples and their classification based
on these feature vectors. [pdf].

Argumentation

Argumentation is an established technique for reasoning about
situations where absolute truth or precise probability is impossible
to determine. Together with Dr. Ben Jefferys we developed an
argumentation system for 3D-PSSM to automate the application of expert
knowledge in interpreting protein structure prediction results [article].

Worked in the Biomolecular Modelling
Laboratory funded by Glaxo-Wellcome and the Imperial Cancer
Research Fund (ICRF)(now Cancer Research UK) on protein structure prediction
and the use of text.

While at the ICRF I developed with Dr. Bob
Maccallum the 3D-PSSM web server for protein structure prediction.
The Critical Assessment of Structure Prediction (CASP) is a meeting held every 2
years as an international blind trial of protein structure prediction
techniques. At CASP4 3D-PSSM was found to be the best-performing fully
automatic method for structure prediction. In addition, our manual,
human-crafted predictions were ranked 3rd out of the 100+ groups
attending. 3D-PSSM has received over 1400 citations in the literature.

Using textual annotation information - SAWTED

There is a wealth of largely untapped information available on
proteins in the form of human annotation and journal abstracts. Part
of the reason for the superior performance of experts using structure
prediction programs such as 3D-PSSM over the purely automatic use of
such algorithms is the human expert's ability to read textual
annotation and otherwise human-readable information. Myself and
Dr. Bob Maccallum
developed the SAWTED algorithm (Structural Annotation With TExt Description) (MacCallum et
al., 1999). The purpose
of SAWTED is to use the extent of shared human-assigned keywords between potentially
remote homologues to lend confidence to tenuous homology
assignments. The SwissProt homologues of a user's query sequence may
contain keywords such as "cytochrome" or "p450". Similarly, the
SwissProt homologues of a known structure may contain keywords such as
"cytochrome" and "mitochondria". These terms are represented as
abstract "term vectors" which can be compared using the vector cosine model of text
retrieval. Despite a very weak sequence match between these two
proteins using an algorithm such as PSI-Blast, the shared keyword or
SAWTED score can automatically provide an independent source of evidence for
genuine homology.

Predicting sub-cellular localisation

Dr. Ben Stapley and myself used tools from machine learning (specifically
Support Vector Machines (SVMs)) in conjunction with collated Medline
abstracts to represent proteins as very high-dimensional (40,000+)
vectors of English language terms. SVMs are binary classifiers that
permit such large spaces to be quickly and automatically partitioned
given training data. Once trained such systems can be used to classify
new proteins based on text alone, or combinations of text and sequence
features. (Stapley et al., 2001; Stapley et al., 2002).

1994 - 1997 - Ph.D. Biomolecular Computing, Leicester University, UK

My PhD involved the development of various methods of analysing
ensembles of protein structures determined by nuclear magnetic
resonance spectroscopy. I was supervised by
Dr. Mike Sutcliffe.

Designed and programmed NMRCLUST
(Kelley et al., 1996)[pdf], a tool to
cluster ensembles of NMR-derived protein structures into
conformationally-related sub-families. This required the development
of an automated method of clustering without rigid cut-offs or user
intervention.

NMRCLUST required the development of a new clustering technique, now incorporated in the R
statistics package known as the Kelley-Gardner-Sutcliffe (KGS)
measure. This has been applied to a wide range of fields beyond
proteins, such as HIV virtual screening of drugs, measuring biodiversity, and even
coffee bean morphology and taste and has over 170 citations.

Worked for two months in the Bioinformatics group at Oxford
Molecular Ltd. incorporating NMRCLUST into Architect, the IDITIS
protein database generation tool.

Designed and programmed NMRCORE
(Kelley et al., 1997)[pdf], a tool to define automatically the core atoms
and domains in an ensemble of protein structures.

Developed OLDERADO:On Line Database of Ensemble Representatives And
DOmains (Kelley et al., 1997)[pdf]. This is a searchable database
of the results of NMRCORE and NMRCLUST on the current set of
PDB-deposited NMR-derived ensembles.