The study of biology attempts to develop a detailed understanding of the processes of life at all levels from molecules to organisms. The advent of genomic-scale science now offers the promise of a knowledge of all molecules in living systems and created the expectation that this level of knowledge can be extended to include molecular structure, function, processes and even the systems biology of complete cells. Computation has become an indispensable tool in biology by providing a controlled environment for developing and testing hypotheses. My research interests have moved from a background in chemistry and computation to structural biology and structural bioinformatics. Research interests range from methods development in bioinformatics (disorder prediction with RONN, genome wide annotation with structural data) through protein production and data tracking (laboratory information management systems, LIMS) and structure determination to structure analysis and display.

As well has providing and maintaining the IT platform for the whole Division, I am collaborating on the development of an information management system for protein science, the Protein Information Management System (PiMS). I act as the scientific sponsor, directing the work of a team of developers and scientists across the UK. The aim of PiMS is to develop a freely available commercial-quality LIMS appropriate to tracking of protein production data for structural biology laboratories.

On the crystallographic side, through links with the Oxford Protein Production Facility I have been coordinating a structural proteomics-type study of proteins from Bacillus anthracis.

Understanding virus antigenicity is of fundamental importance for the development of better, more cross-reactive vaccines. However, as far as we are aware, no systematic work has yet been conducted using the 3D structure of a virus to identify novel epitopes. Therefore we have extended several existing structural prediction algorithms to build a method for identifying epitopes on the appropriate outer surface of intact virus capsids (which are structurally different from globular proteins in both shape and arrangement of multiple repeated elements) and applied it here as a proof of principle concept to the capsid of foot-and-mouth disease virus (FMDV). We have analysed how reliably several freely available structure-based B cell epitope prediction programs can identify already known viral epitopes of FMDV in the context of the viral capsid. To do this we constructed a simple objective metric to measure the sensitivity and discrimination of such algorithms. After optimising the parameters for five methods using an independent training set we used this measure to evaluate the methods. Individually any one algorithm performed rather poorly (three performing better than the other two) suggesting that there may be value in developing virus-specific software. Taking a very conservative approach requiring a consensus between all three top methods predicts a number of previously described antigenic residues as potential epitopes on more than one serotype of FMDV, consistent with experimental results. The consensus results identified novel residues as potential epitopes on more than one serotype. These include residues 190-192 of VP2 (not previously determined to be antigenic), residues 69-71 and 193-197 of VP3 spanning the pentamer-pentamer interface, and another region incorporating residues 83, 84 and 169-174 of VP1 (all only previously experimentally defined on serotype A). The computer programs needed to create a semi-automated procedure for carrying out this epitope prediction method are presented. Hide abstract

The techniques used in protein production and structural biology have been developing rapidly, but techniques for recording the laboratory information produced have not kept pace. One approach is the development of laboratory information-management systems (LIMS), which typically use a relational database schema to model and store results from a laboratory workflow. The underlying philosophy and implementation of the Protein Information Management System (PiMS), a LIMS development specifically targeted at the flexible and unpredictable workflows of protein-production research laboratories of all scales, is described. PiMS is a web-based Java application that uses either Postgres or Oracle as the underlying relational database-management system. PiMS is available under a free licence to all academic laboratories either for local installation or for use as a managed service. Hide abstract

Uridine monophosphate (UMP) kinase is a conserved enzyme that catalyzes the ATP-driven conversion of uridylate monophosphate into uridylate diphosphate, an essential metabolic step. In prokaryotes, the enzyme exists as a homohexamer that is regulated by various metabolites. Whereas the enzymatic mechanism of UMP kinase (UK) is well-characterized, the molecular basis of its regulation remains poorly understood. Here we report the crystal structure of UK from Bacillus anthracis (BA1797) in complex with ATP at 2.82 A resolution. It reveals that the cofactor, in addition to binding in the active sites, also interacts with separate binding pockets located near the center of the hexameric structure. The existence of such an allosteric binding site had been predicted by biochemical studies, but it was not identified in previous crystal structures of prokaryotic UKs. We show that this putative allosteric pocket is conserved across different bacterial species, suggesting that it is a feature common to bacterial UKs, and we present a structural model for the allosteric regulation of this enzyme. Hide abstract

A collaborative project between two Structural Proteomics In Europe (SPINE) partner laboratories, York and Oxford, aimed at high-throughput (HTP) structure determination of proteins from Bacillus anthracis, the aetiological agent of anthrax and a biomedically important target, is described. Based upon a target-selection strategy combining ;low-hanging fruit' and more challenging targets, this work has contributed to the body of knowledge of B. anthracis, established and developed HTP cloning and expression technologies and tested HTP pipelines. Both centres developed ligation-independent cloning (LIC) and expression systems, employing custom LIC-PCR, Gateway and In-Fusion technologies, used in combination with parallel protein purification and robotic nanolitre crystallization screening. Overall, 42 structures have been solved by X-ray crystallography, plus two by NMR through collaboration between York and the SPINE partner in Utrecht. Three biologically important protein structures, BA4899, BA1655 and BA3998, involved in tRNA modification, sporulation control and carbohydrate metabolism, respectively, are highlighted. Target analysis by biophysical clustering based on pI and hydropathy has provided useful information for future target-selection strategies. The technological developments and lessons learned from this project are discussed. The success rate of protein expression and structure solution is at least in keeping with that achieved in structural genomics programs. Hide abstract

MOTIVATION: Recent studies have found many proteins containing regions that do not form well-defined three-dimensional structures in their native states. The study and detection of such disordered regions is important both for understanding protein function and for facilitating structural analysis since disordered regions may affect solubility and/or crystallizability. RESULTS: We have developed the regional order neural network (RONN) software as an application of our recently developed 'bio-basis function neural network' pattern recognition algorithm to the detection of natively disordered regions in proteins. The results of blind-testing a panel of nine disorder prediction tools (including RONN) against 80 protein sequences derived from the Protein Data Bank shows that, based on the probability excess measure, RONN performed the best. Hide abstract

Naive T cell activation requires signaling by the T cell receptor and by nonclonotypic cell surface receptors. The most important costimulatory protein is the monovalent homodimer CD28, which interacts with CD80 and CD86 expressed on antigen-presenting cells. Here we present the crystal structure of a soluble form of CD28 in complex with the Fab fragment of a mitogenic antibody. Structural comparisons redefine the evolutionary relationships of CD28-related proteins, antigen receptors and adhesion molecules and account for the distinct ligand-binding and stoichiometric properties of CD28 and the related, inhibitory homodimer CTLA-4. Cryo-electron microscopy-based comparisons of complexes of CD28 with mitogenic and nonmitogenic antibodies place new constraints on models of antibody-induced receptor triggering. This work completes the initial structural characterization of the CD28-CTLA-4-CD80-CD86 signaling system. Hide abstract

Semaphorins, proteins characterized by an extracellular sema domain, regulate axon guidance, immune function and angiogenesis. The crystal structure of SEMA4D (residues 1-657) shows the sema topology to be a seven-bladed beta-propeller, revealing an unexpected homology with integrins. The sema beta-propeller contains a distinctive 77-residue insertion between beta-strands C and D of blade 5. Blade 7 is followed by a domain common to plexins, semaphorins and integrins (PSI domain), which forms a compact cysteine knot abutting the side of the propeller, and an Ig-like domain. The top face of the beta-propeller presents prominent loops characteristic of semaphorins. In addition to limited contact between the Ig-like domains, the homodimer is stabilized through extensive interactions between the top faces in a sector of the beta-propeller used for heterodimerization in integrins. This face of the propeller also mediates ligand binding in integrins, and functional data for semaphorin-receptor interactions map to the equivalent surface. Hide abstract

The structure of unliganded HIV-1 reverse transcriptase has been determined at 2.35 A resolution and refined to an R-factor of 0.219 (for all data) with good stereochemistry. The unliganded structure was produced by soaking out a weak binding non-nucleoside inhibitor, HEPT, from pregrown crystals. Comparison with the structures of four different RT and non-nucleoside inhibitor complexes reveals that only minor domain rearrangements occur, but there is a significant repositioning of a three-stranded beta-sheet in the p66 subunit (containing the catalytic aspartic acid residues 110, 185 and 186) with respect to the rest of the polymerase site. This suggests that NNIs inhibit RT by locking the polymerase active site in an inactive conformation, reminiscent of the conformation observed in the inactive p51 subunit. Hide abstract