Contact Information

Short Biography

Professor Yeates received his BS in 1983 and his PhD in 1988, both at UCLA. After working as a postdoctoral fellow at The Scripps Research Institute in La Jolla, he returned to UCLA as a member of the faulty in 1990

Biography

After earning his Bachelor's degree at UCLA, Yeates stayed on to do his PhD research under the direction of Prof. Douglas Rees. There he helped determine the crystal structure of the bacterial photosynthetic reaction center as part of a team racing to determine the first crystal structures of membrane proteins. He then moved to The Scripps Research Institute to do his postdoctoral research on the structure of poliovirus with Prof. James Hogle. Yeates returned to UCLA in 1990 to join the Faculty in the Department of Chemistry and Biochemistry. His interdisciplinary research, combining molecular biology with computing and mathematics, has focused on macromolecular structure and computational genomics. His research findings include: an explanation for why proteins crystallize in certain favored arrangements, the discovery of thermophilic microbes rich in intracellular disulfide bonds, co-development of phylogenetic profile methods in genomics, development of designed protein cages or 'nanohedra', the discovery of novel topological features such as slipknots in thermostable proteins, and the elucidation of the structures of the carboxysome shell proteins. Yeates is a member of the Molecular Biology Institute, the California Nanosystems Institute, the Institute of Genomics and Proteomics, and a Fellow of the American Association for the Advancement of Science. He has published approximately 100 research papers.

Research Interest

Molecular, Structural and Computational biology

Our research covers the areas of molecular, structural and computational biology.

In the area of structural biology, our emphasis is on supra-molecular protein assemblies. Much of our recent work has focused on bacterial microcompartments -- extraordinary protein assemblies comprised of thousands of subunits reminiscent of viral capsids. They encapsulate a series of enzymes within a protein shell, which controls the transport of substrates and products into and out of the microcompartment interior. They serve as primitive metabolic organelles in many bacteria. Our structural studies on these systems provided the first three-dimensional views of the shell proteins, and have generated long-needed mechanistic hypotheses for how bacterial microcompartments function.

Publications

2012

Although natural proteins are chiral and are all of one "handedness," their mirror image forms can be prepared by chemical synthesis. This opens up new opportunities for protein crystallography. A racemic mixture of the enantiomeric forms of a protein molecule can crystallize in ways that natural proteins cannot. Recent experimental data support a theoretical prediction that this should make racemic protein mixtures highly amenable to crystallization. Crystals obtained from racemic mixtures also offer advantages in structure determination strategies. The relevance of these potential advantages is heightened by advances in synthetic methods, which are extending the size limit for proteins that can be prepared by chemical synthesis. Recent ideas and results in the area of racemic protein crystallography are reviewed.

Racemic protein crystallography offers two key features: an increased probability of crystallization and the potential advantage of phasing centric diffraction data. In this study, a phasing strategy is developed for the scenario in which a crystal is grown from a mixture in which anomalous scattering atoms have been incorporated into only one enantiomeric form of the protein molecule in an otherwise racemic mixture. The structure of a protein crystallized in such a quasi-racemic form has been determined in previous work [Pentelute et al. (2008), J. Am. Chem. Soc. 130, 9695-9701] using the multiwavelength anomalous dispersion (MAD) method. Here, it is shown that although the phases from such a crystal are not strictly centric, their approximate centricity provides a powerful way to break the phase ambiguity that ordinarily arises when using the single-wavelength anomalous dispersion (SAD) method. It is shown that good phases and electron-density maps can be obtained from a quasi-racemic protein crystal based on single-wavelength data. A prerequisite problem of how to establish the origin of the anomalous scattering substructure relative to the center of pseudo-inversion is also addressed.

Two new crystal structures of the Escherichia coli high affinity methionine uptake ATP Binding Cassette (ABC) transporter MetNI, purified in the detergents cyclohexyl-pentyl-β-D-maltoside (CY5) and n-decyl-β-D-maltopyranoside (DM), have been solved in inward facing conformations to resolutions of 2.9 and 4.0 Å, respectively. Compared to the previously reported 3.7 Å resolution structure of MetNI purified in n-dodecyl-β-D-maltopyranoside (DDM), the higher resolution of the CY5 data enabled significant improvements to the structural model in several regions, including corrections to the sequence registry, and identification of ADP in the nucleotide binding site. CY5 crystals soaked with selenomethionine established details of the methionine binding site in the C2 regulatory domain of the ABC subunit, including the displacement of the side chain of MetN residue methionine 301 by the exogenous ligand. When compared to the CY5 or DDM structures, the DM structure exhibits a significant repositioning of the dimeric C2 domains, including an unexpected register shift in the intermolecular β-sheet hydrogen bonding between monomers, and a narrowing of the nucleotide binding space. The immediate proximity of the exogenous methionine binding site to the conformationally variable dimeric interface provides an indication of how methionine binding to the regulatory domains might mediate the phenomenon of transinhibition.

We describe a general computational method for designing proteins that self-assemble to a desired symmetric architecture. Protein building blocks are docked together symmetrically to identify complementary packing arrangements, and low-energy protein-protein interfaces are then designed between the building blocks in order to drive self-assembly. We used trimeric protein building blocks to design a 24-subunit, 13-nm diameter complex with octahedral symmetry and a 12-subunit, 11-nm diameter complex with tetrahedral symmetry. The designed proteins assembled to the desired oligomeric states in solution, and the crystal structures of the complexes revealed that the resulting materials closely match the design models. The method can be used to design a wide variety of self-assembling protein nanomaterials.

Designing protein molecules that will assemble into various kinds of ordered materials represents an important challenge in nanotechnology. We report the crystal structure of a 12-subunit protein cage that self-assembles by design to form a tetrahedral structure roughly 16 nanometers in diameter. The strategy of fusing together oligomeric protein domains can be generalized to produce other kinds of cages or extended materials.

2011

Bacterial microcompartments are large supramolecular assemblies, resembling viruses in size and shape, found inside many bacterial cells. A protein-based shell encapsulates a series of sequentially acting enzymes in order to sequester certain sensitive metabolic processes within the cell. Crystal structures of the individual shell proteins have revealed details about how they self-assemble and how pores through their centers facilitate molecular transport into and out of the microcompartments. Biochemical and genetic studies have shown that enzymes are directed to the interior in some cases by special targeting sequences in their termini. Together, these findings open up prospects for engineering bacterial microcompartments with novel functionalities for applications ranging from metabolic engineering to targeted drug delivery.

Disulfide bonds are generally not used to stabilize proteins in the cytosolic compartments of bacteria or eukaryotic cells, owing to the chemically reducing nature of those environments. In contrast, certain thermophilic archaea use disulfide bonding as a major mechanism for protein stabilization. Here, we provide a current survey of completely sequenced genomes, applying computational methods to estimate the use of disulfide bonding across the Archaea. Microbes belonging to the Crenarchaeal branch, which are essentially all hyperthermophilic, are universally rich in disulfide bonding while lesser degrees of disulfide bonding are found among the thermophilic Euryarchaea, excluding those that are methanogenic. The results help clarify which parts of the archaeal lineage are likely to yield more examples and additional specific data on protein disulfide bonding, as increasing genomic sequencing efforts are brought to bear.

Protein crystallization continues to be a major bottleneck in X-ray crystallography. Previous studies suggest that symmetric proteins, such as homodimers, might crystallize more readily than monomeric proteins or asymmetric complexes. Proteins that are naturally monomeric can be made homodimeric artificially. Our approach is to create homodimeric proteins by introducing single cysteines into the protein of interest, which are then oxidized to form a disulfide bond between the two monomers. By introducing the single cysteine at different sequence positions, one can produce a variety of synthetically dimerized versions of a protein, with each construct expected to exhibit its own crystallization behavior. In earlier work, we demonstrated the potential utility of the approach using T4 lysozyme as a model system. Here we report the successful application of the method to Thermotoga maritima CelA, a thermophilic endoglucanase enzyme with low sequence identity to proteins with structures previously reported in the Protein Data Bank. This protein had resisted crystallization in its natural monomeric form, despite a broad survey of crystallization conditions. The synthetic dimerization of the CelA mutant D188C yielded well-diffracting crystals with molecules in a packing arrangement that would not have occurred with native, monomeric CelA. A 2.4 Å crystal structure was determined by single anomalous dispersion using a seleno-methionine derivatized protein. The results support the notion that synthetic symmetrization can be a useful approach for enlarging the search space for crystallizing monomeric proteins or asymmetric complexes.

Details are emerging on the structure and function of a remarkable class of capsid-like protein assemblies that serve as simple metabolic organelles in many bacteria. These bacterial microcompartments consist of a few thousand shell proteins, which encapsulate two or more sequentially acting enzymes in order to enhance or sequester certain metabolic pathways, particularly those involving toxic or volatile intermediates. Genomic data indicate that bacterial microcompartment shell proteins are present in a wide range of bacterial species, where they encapsulate varied reactions. Crystal structures of numerous shell proteins from distinct types of microcompartments have provided keys for understanding how the shells are assembled and how they conduct molecular transport into and out of microcompartments. The structural data emphasize a high level of mechanistic sophistication in the protein shell, and point the way for further studies on this fascinating but poorly appreciated class of subcellular structures.

The polypeptide backbones of a few proteins are tied in a knot. The biophysical effects and potential biological roles of knots are not well understood. Here, we test the consequences of protein knotting by taking a monomeric protein, carbonic anhydrase II, whose native structure contains a shallow knot, and polymerizing it end-to-end to form a deeply and multiply knotted polymeric filament. Thermal stability experiments show that the polymer is stabilized against loss of structure and aggregation by the presence of deep knots.

Combining the concepts of synthetic symmetrization with the approach of engineering metal-binding sites, we have developed a new crystallization methodology termed metal-mediated synthetic symmetrization. In this method, pairs of histidine or cysteine mutations are introduced on the surface of target proteins, generating crystal lattice contacts or oligomeric assemblies upon coordination with metal. Metal-mediated synthetic symmetrization greatly expands the packing and oligomeric assembly possibilities of target proteins, thereby increasing the chances of growing diffraction-quality crystals. To demonstrate this method, we designed various T4 lysozyme (T4L) and maltose-binding protein (MBP) mutants and cocrystallized them with one of three metal ions: copper (Cu²⁺, nickel (Ni²⁺), or zinc (Zn²⁺). The approach resulted in 16 new crystal structures--eight for T4L and eight for MBP--displaying a variety of oligomeric assemblies and packing modes, representing in total 13 new and distinct crystal forms for these proteins. We discuss the potential utility of the method for crystallizing target proteins of unknown structure by engineering in pairs of histidine or cysteine residues. As an alternate strategy, we propose that the varied crystallization-prone forms of T4L or MBP engineered in this work could be used as crystallization chaperones, by fusing them genetically to target proteins of interest.

SsfX3 is a GDSL family acyltransferase that transfers salicylate to the C-4 hydroxyl of a tetracycline intermediate in the penultimate step during biosynthesis of the anticancer natural product SF2575. The C-4 salicylate takes the place of the more common C-4 dimethylamine functionality, making SsfX3 the first acyltransferase identified to act on a tetracycline substrate. The crystal structure of SsfX3 was determined at 2.5 Å, revealing two distinct domains as follows: an N-terminal β-sandwich domain that resembles a carbohydrate-binding module, and a C-terminal catalytic domain that contains the atypical α/β-hydrolase fold found in the GDSL hydrolase family of enzymes. The active site lies at one end of a large open binding pocket, which is spatially defined by structural elements from both the N- and C-terminal domains. Mutational analysis in the putative substrate binding pocket identified residues from both domains that are important for binding the acyl donor and acceptor. Furthermore, removal of the N-terminal carbohydrate-binding module-like domain rendered the stand-alone α/β-hydrolase domain inactive. The additional noncatalytic module is therefore proposed to be required to define the binding pocket and provide sufficient interactions with the spatially extended tetracyclic substrate. SsfX3 was also demonstrated to accept a variety of non-native acyl groups. This relaxed substrate specificity toward the acyl donor allowed the chemoenzymatic biosynthesis of C-4-modified analogs of the immediate precursor to the bioactive SF2575; these were used to assay the structure activity relationships at the C-4 position.

2010

Some bacteria contain organelles or microcompartments consisting of a large virion-like protein shell encapsulating sequentially acting enzymes. These organized microcompartments serve to enhance or protect key metabolic pathways inside the cell. The variety of bacterial microcompartments provide diverse metabolic functions, ranging from CO(2) fixation to the degradation of small organic molecules. Yet they share an evolutionarily related shell, which is defined by a conserved protein domain that is widely distributed across the bacterial kingdom. Structural studies on a number of these bacterial microcompartment shell proteins are illuminating the architecture of the shell and highlighting its critical role in controlling molecular transport into and out of microcompartments. Current structural, evolutionary, and mechanistic ideas are discussed, along with genomic studies for exploring the function and diversity of this family of bacterial organelles.

Many bacterial cells contain proteinaceous microcompartments that act as simple organelles by sequestering specific metabolic processes involving volatile or toxic metabolites. Here we report the three-dimensional (3D) crystal structures, with resolutions between 1.65 and 2.5 angstroms, of the four homologous proteins (EutS, EutL, EutK, and EutM) that are thought to be the major shell constituents of a functionally complex ethanolamine utilization (Eut) microcompartment. The Eut microcompartment is used to sequester the metabolism of ethanolamine in bacteria such as Escherichia coli and Salmonella enterica. The four Eut shell proteins share an overall similar 3D fold, but they have distinguishing structural features that help explain the specific roles they play in the microcompartment. For example, EutL undergoes a conformational change that is probably involved in gating molecular transport through shell protein pores, whereas structural evidence suggests that EutK might bind a nucleic acid component. Together these structures give mechanistic insight into bacterial microcompartments.

Hundreds of bacterial species produce proteinaceous microcompartments (MCPs) that act as simple organelles by confining the enzymes of metabolic pathways that have toxic or volatile intermediates. A fundamental unanswered question about bacterial MCPs is how enzymes are packaged within the protein shell that forms their outer surface. Here, we report that a short N-terminal peptide is necessary and sufficient for packaging enzymes into the lumen of an MCP involved in B(12)-dependent 1,2-propanediol utilization (Pdu MCP). Deletion of 10 or 14 amino acids from the N terminus of the propionaldehyde dehydrogenase (PduP) enzyme, which is normally found within the Pdu MCP, substantially impaired packaging, with minimal effects on its enzymatic activity. Fusion of the 18 N-terminal amino acids from PduP to GFP, GST, or maltose-binding protein resulted in their encapsulation within MCPs. Bioinformatic analyses revealed N-terminal extensions in two additional Pdu proteins and three proteins from two unrelated MCPs, suggesting that N-terminal peptides may be used to package proteins into diverse MCPs. The potential uses of MCP assembly principles in nature and in biotechnology are discussed.

The crystal structure of a putative NTPase, YP_001813558.1 from Exiguobacterium sibiricum 255-15 (PF09934, DUF2166) was determined to 1.78 Å resolution. YP_001813558.1 and its homologs (dimeric dUTPases, MazG proteins and HisE-encoded phosphoribosyl ATP pyrophosphohydrolases) form a superfamily of all-α-helical NTP pyrophosphatases. In dimeric dUTPase-like proteins, a central four-helix bundle forms the active site. However, in YP_001813558.1, an unexpected intertwined swapping of two of the helices that compose the conserved helix bundle results in a `linked dimer' that has not previously been observed for this family. Interestingly, despite this novel mode of dimerization, the metal-binding site for divalent cations, such as magnesium, that are essential for NTPase activity is still conserved. Furthermore, the active-site residues that are involved in sugar binding of the NTPs are also conserved when compared with other α-helical NTPases, but those that recognize the nucleotide bases are not conserved, suggesting a different substrate specificity.

The 50-residue snake venom protein L-omwaprin and its enantiomer D-omwaprin were prepared by total chemical synthesis. Radial diffusion assays were performed against Bacillus megaterium and Bacillus anthracis; both L- and D-omwaprin showed antibacterial activity against B. megaterium. The native protein enantiomer, made of L-amino acids, failed to crystallize readily. However, when a racemic mixture containing equal amounts of L- and D-omwaprin was used, diffraction quality crystals were obtained. The racemic protein sample crystallized in the centrosymmetric space group P2(1)/c and its structure was determined at atomic resolution (1.33 A) by a combination of Patterson and direct methods based on the strong scattering from the sulfur atoms in the eight cysteine residues per protein. Racemic crystallography once again proved to be a valuable method for obtaining crystals of recalcitrant proteins and for determining high-resolution X-ray structures by direct methods.

A very small number of natural proteins have folded configurations in which the polypeptide backbone is knotted. Relatively little is known about the folding energy landscapes of such proteins, or how they have evolved. We explore those questions here by designing a unique knotted protein structure. Biophysical characterization and X-ray crystal structure determination show that the designed protein folds to the intended configuration, tying itself in a knot in the process, and that it folds reversibly. The protein folds to its native, knotted configuration approximately 20 times more slowly than a control protein, which was designed to have a similar tertiary structure but to be unknotted. Preliminary kinetic experiments suggest a complicated folding mechanism, providing opportunities for further characterization. The findings illustrate a situation where a protein is able to successfully traverse a complex folding energy landscape, though the amino acid sequence of the protein has not been subjected to evolutionary pressure for that ability. The success of the design strategy--connecting two monomers of an intertwined homodimer into a single protein chain--supports a model for evolution of knotted structures via gene duplication.

Here we report the total synthesis of kaliotoxin by 'one pot' native chemical ligation of three synthetic peptides. A racemic mixture of D- and L-kaliotoxin synthetic protein molecules gave crystals in the centrosymmetric space group P1 that diffracted to atomic-resolution (0.95 Å), enabling the X-ray structure of kaliotoxin to be determined by direct methods.

Bacterial microcompartments are a functionally diverse group of proteinaceous organelles that confine specific reaction pathways in the cell within a thin protein-based shell. The propanediol utilizing (Pdu) microcompartment contains the reactions for metabolizing 1,2-propanediol in certain enteric bacteria, including Salmonella. The Pdu shell is assembled from a few thousand protein subunits of several different types. Here we report the crystal structures of two key shell proteins, PduA and PduT. The crystal structures offer insights into the mechanisms of Pdu microcompartment assembly and molecular transport across the shell. PduA forms a symmetric homohexamer whose central pore appears tailored for facilitating transport of the 1,2-propanediol substrate. PduT is a novel, tandem domain shell protein that assembles as a pseudohexameric homotrimer. Its structure reveals an unexpected site for binding an [Fe-S] cluster at the center of the PduT pore. The location of a metal redox cofactor in the pore of a shell protein suggests a novel mechanism for either transferring redox equivalents across the shell or for regenerating luminal [Fe-S] clusters.

Many of the functional units in cells are multi-protein complexes such as RNA polymerase, the ribosome, and the proteasome. For such units to work together, one might expect a high level of regulation to enable co-appearance or repression of sets of complexes at the required time. However, this type of coordinated regulation between whole complexes is difficult to detect by existing methods for analyzing mRNA co-expression. We propose a new methodology that is able to detect such higher order relationships.

Carboxysomes are primitive bacterial organelles that function as a part of a carbon concentrating mechanism (CCM) under conditions where inorganic carbon is limiting. The carboxysome enhances the efficiency of cellular carbon fixation by encapsulating together carbonic anhydrase and the CO(2)-fixing enzyme ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO). The carboxysome has a roughly icosahedral shape with an outer shell between 800 and 1500 A in diameter, which is constructed from a few thousand small protein subunits. In the cyanobacterium Synechocystis sp. PCC 6803, the previous structure determination of two homologous shell protein subunits, CcmK2 and CcmK4, elucidated how the outer shell is formed by the tight packing of CcmK hexamers into a molecular layer. Here we describe the crystal structure of the hexameric shell protein CcmK1, along with structures of mutants of both CcmK1 and CcmK2 lacking their sometimes flexible C-terminal tails. Variations in the way hexamers pack into layers are noted, while sulfate ions bound in pores through the layer provide further support for the hypothesis that the pores serve for transport of substrates and products into and out of the carboxysome. One of the new structures provides a high-resolution (1.3 A) framework for subsequent computational studies of molecular transport through the pores. Crystal and solution studies of the C-terminal deletion mutants demonstrate the tendency of the terminal segments to participate in protein--protein interactions, thereby providing a clue as to which side of the molecular layer of hexameric shell proteins is likely to face toward the carboxysome interior.

Bacterial microcompartments are supramolecular protein assemblies that function as bacterial organelles by compartmentalizing particular enzymes and metabolic intermediates. The outer shells of these microcompartments are assembled from multiple paralogous structural proteins. Because the paralogs are required to assemble together, their genes are often transcribed together from the same operon, giving rise to a distinctive genomic pattern: multiple, typically small, paralogous proteins encoded in close proximity on the bacterial chromosome. To investigate the generality of this pattern in supramolecular assemblies, we employed a comparative genomics approach to search for protein families that show the same kind of genomic pattern as that exhibited by bacterial microcompartments. The results indicate that a variety of large supramolecular assemblies fit the pattern, including bacterial gas vesicles, bacterial pili, and small heat-shock protein complexes. The search also retrieved several widely distributed protein families of presently unknown function. The proteins from one of these families were characterized experimentally and found to show a behavior indicative of supramolecular assembly. We conclude that cotranscribed paralogs are a common feature of diverse supramolecular assemblies, and a useful genomic signature for discovering new kinds of large protein assemblies from genomic data.

Simvastatin is the active pharmaceutical ingredient of the blockbuster cholesterol lowering drug Zocor. We have previously developed an Escherichia coli based whole-cell biocatalytic platform towards the synthesis of simvastatin sodium salt (SS) starting from the precursor monacolin J sodium salt (MJSS). The centerpiece of the biocatalytic approach is the simvastatin synthase LovD, which is highly prone to misfolding and aggregation when overexpressed from E. coli. Increasing the solubility of LovD without decreasing its catalytic activity can therefore elevate the performance of the whole-cell biocatalyst. Using a combination of homology structural prediction and site-directed mutagenesis, we identified two cysteine residues in LovD that are responsible for nonspecific intermolecular crosslinking, which leads to oligomer formation and protein aggregation. Replacement of Cys40 and Cys60 with alanine residues resulted in marked gain in both protein solubility and whole-cell biocatalytic activities. Further mutagenesis experiments converting these two residues to small or polar natural amino acids showed that C40A and C60N are the most beneficial, affording 27% and 26% increase in whole cell activities, respectively. The double mutant C40A/C60N combines the individual improvements and displayed approximately 50% increase in protein solubility and whole-cell activity. Optimized fed-batch high-cell-density fermentation of the double mutant in an E. coli strain engineered for simvastatin production quantitatively (>99%) converted 45 mM MJSS to SS within 18 h, which represents a significant improvement over the performance of wild-type LovD under identical conditions. The high efficiency of the improved whole-cell platform renders the biocatalytic synthesis of SS an attractive substitute over the existing semisynthetic routes.

Human thymocyte nuclear protein 1 contains a unique DUF55 domain consisting of 167 residues (55-221), but its cellular function remains unclear. Crystals of DUF55 belonged to the trigonal space group P3(1), but twinning caused the data to approach apparent 622 symmetry. Two data sets were collected to 2.3 A resolution. Statistical analysis confirmed that both data sets were partially twinned by tetartohedry. Tetartohedral twin fractions were estimated. After the structure had been determined, only one twofold axis of rotational pseudosymmetry was found in the crystal structure. Using the DALI program, a YTH domain, which is a potential RNA-binding domain from human YTH-domain-containing protein 2, was identified as having the most similar three-dimensional fold to that of DUF55. It is thus implied that DUF55 might be a potential RNA-related domain.

Lattice-translocation or crystal order-disorder phenomena occur when some layers or groups of molecules in a crystal are randomly displaced relative to other groups of molecules by a discrete set of vectors. In previous work, the effects of lattice translocation on diffraction intensities have been corrected by considering that the observed intensities are the product of the intensities from an ideal crystal (lacking disorder) multiplied by the squared magnitude of the Fourier transform of the set of translocation vectors. Here, the structure determination is presented of carboxysome protein CsoS1C from Halothiobacillius neapolitanus in a crystal exhibiting a lattice translocation with unique features. The diffraction data are fully accounted for by a crystal unit cell composed of two layers of cyclic protein hexamers. The first layer is fully ordered (i.e. has one fixed position), while the second layer randomly takes one of three alternative positions whose displacements are related to each other by threefold symmetry. Remarkably, the highest symmetry present in the crystal is P3, yet the intensity data (and the Patterson map) obey 6/m instead of \overline 3 symmetry; the intensities exceed the symmetry expected from combining the crystal space group with an inversion center. The origin of this rare phenomenon, known as symmetry enhancement, is discussed and shown to be possible even for a perfectly ordered crystal. The lattice-translocation treatment described here may be useful in analyzing other cases of disorder in which layers or groups of molecules are shifted in multiple symmetry-related directions.

Enzymes from natural product biosynthetic pathways are attractive candidates for creating tailored biocatalysts to produce semisynthetic pharmaceutical compounds. LovD is an acyltransferase that converts the inactive monacolin J acid (MJA) into the cholesterol-lowering lovastatin. LovD can also synthesize the blockbuster drug simvastatin using MJA and a synthetic alpha-dimethylbutyryl thioester, albeit with suboptimal properties as a biocatalyst. Here we used directed evolution to improve the properties of LovD toward semisynthesis of simvastatin. Mutants with improved catalytic efficiency, solubility, and thermal stability were obtained, with the best mutant displaying an approximately 11-fold increase in an Escherichia coli-based biocatalytic platform. To understand the structural basis of LovD enzymology, seven X-ray crystal structures were determined, including the parent LovD, an improved mutant G5, and G5 cocrystallized with ligands. Comparisons between the structures reveal that beneficial mutations stabilize the structure of G5 in a more compact conformation that is favorable for catalysis.

Bacterial microcompartments (BMCs) are large intracellular bodies that serve as simple organelles in many bacteria. They are proteinaceous structures composed of key enzymes encapsulated by a polyhedral protein shell. In previous studies, the organization of these large shells has been inferred from the conserved packing of the component shell proteins in two-dimensional (2D) layers within the context of three-dimensional (3D) crystals. Here, we show that well-ordered, 2D crystals of carboxysome shell proteins assemble spontaneously when His-tagged proteins bind to a monolayer of nickelated lipid molecules at an air-water interface. The molecular packing within the 2D crystals recapitulates the layered hexagonal sheets observed in 3D crystals. The results reinforce current models for the molecular design of BMC shells.

2008

The carboxysome is a bacterial microcompartment that functions as a simple organelle by sequestering enzymes involved in carbon fixation. The carboxysome shell is roughly 800 to 1400 angstroms in diameter and is assembled from several thousand protein subunits. Previous studies have revealed the three-dimensional structures of hexameric carboxysome shell proteins, which self-assemble into molecular layers that most likely constitute the facets of the polyhedral shell. Here, we report the three-dimensional structures of two proteins of previously unknown function, CcmL and OrfA (or CsoS4A), from the two known classes of carboxysomes, at resolutions of 2.4 and 2.15 angstroms. Both proteins assemble to form pentameric structures whose size and shape are compatible with formation of vertices in an icosahedral shell. Combining these pentamers with the hexamers previously elucidated gives two plausible, preliminary atomic models for the carboxysome shell.

The structure of actin in its monomeric form is known at high resolution, while the structure of filamentous F-actin is only understood at considerably lower resolution. Knowing precisely how the monomers of actin fit together would lead to a deeper understanding of the dynamic behavior of the actin filament. Here, a series of crystal structures of actin dimers are reported which were prepared by cross-linking in either the longitudinal or the lateral direction in the filament state. Laterally cross-linked dimers, comprised of monomers belonging to different protofilaments, are found to adopt configurations in crystals that are not related to the native structure of filamentous actin. In contrast, multiple structures of longitudinal dimers consistently reveal the same interface between monomers within a single protofilament. The reappearance of the same longitudinal interface in multiple crystal structures adds weight to arguments that the interface visualized is similar to that in actin filaments. Highly conserved atomic interactions involving residues 199-205 and 287-291 are highlighted.

Cystathionine beta-synthase domains are found in a myriad of proteins from organisms across the tree of life and have been hypothesized to function as regulatory modules that sense the energy charge of cells. Here we characterize the structure and stability of PAE2072, a dimeric tandem cystathionine beta-synthase domain protein from the hyperthermophilic crenarchaeon Pyrobaculum aerophilum. Crystal structures of the protein in unliganded and AMP-bound forms, determined at resolutions of 2.10 and 2.35 A, respectively, reveal remarkable conservation of key functional features seen in the gamma subunit of the eukaryotic AMP-activated protein kinase. The structures also confirm the presence of a suspected intermolecular disulfide bond between the two subunits that is shown to stabilize the protein. Our AMP-bound structure represents a first step in investigating the function of a large class of uncharacterized prokaryotic proteins. In addition, this work extends previous studies that have suggested that, in certain thermophilic microbes, disulfide bonds play a key role in stabilizing intracellular proteins and protein-protein complexes.

The Pdu microcompartment is a proteinaceous, subcellular structure that serves as an organelle for the metabolism of 1,2-propanediol in Salmonella enterica. It encapsulates several related enzymes within a shell composed of a few thousand protein subunits. Recent structural studies on the carboxysome, a related microcompartment involved in CO(2) fixation, have concluded that the major shell proteins from that microcompartment form hexamers that pack into layers comprising the facets of the shell. Here we report the crystal structure of PduU, a protein from the Pdu microcompartment, representing the first structure of a shell protein from a noncarboxysome microcompartment. Though PduU is a hexamer like other characterized shell proteins, it has undergone a circular permutation leading to dramatic differences in the hexamer pore. In view of the hypothesis that microcompartment metabolites diffuse across the outer shell through these pores, the unique structure of PduU suggests the possibility of a special functional role.

Many bacteria contain intracellular microcompartments with outer shells that are composed of thousands of protein subunits and interiors that are filled with functionally related enzymes. These microcompartments serve as organelles by sequestering specific metabolic pathways in bacterial cells. The carboxysome, a prototypical bacterial microcompartment that is found in cyanobacteria and some chemoautotrophs, encapsulates ribulose-l,5-bisphosphate carboxylase/oxygenase (RuBisCO) and carbonic anhydrase, and thereby enhances carbon fixation by elevating the levels of CO2 in the vicinity of RuBisCO. Evolutionarily related, but functionally distinct, microcompartments are present in diverse bacteria. Although bacterial microcompartments were first observed more than 40 years ago, a detailed understanding of how they function is only now beginning to emerge.

In most cases of merohedral twinning, two different twin-domain orientations are present. A rarer type of merohedral twinning exists in which there are four different twin-domain orientations. The former case is referred to as hemihedral twinning, while the latter more complex type is referred to as tetartohedral twinning. In tetartohedral twinning, each observed reflection is the weighted sum of four twin-related but otherwise independent reflection intensities. The weights that determine how the true crystallographic intensities combine to give the observed intensities are described by four twin fractions representing the fractional volumes of the four different domain orientations within the specimen. Here, equations are developed to determine values for the four tetartohedral twin fractions based on a statistical comparison of quadruplets of twin-related reflections. Knowledge of the twin fractions is important in working backwards to obtain values for the true crystallographic intensities. Use of the equations is demonstrated with synthetic intensity data simulated according to given values of the twin fractions.

The Gram-negative bacterium Vibrio cholerae is the causative agent of a severe diarrheal disease that afflicts three to five million persons annually, causing up to 200,000 deaths. Nearly all V. cholerae strains produce a large multifunctional-autoprocessing RTX toxin (MARTX(Vc)), which contributes significantly to the pathogenesis of cholera in model systems. The actin cross-linking domain (ACD) of MARTX(Vc) directly catalyzes a covalent cross-linking of monomeric G-actin into oligomeric chains and causes cell rounding, but the nature of the cross-linked bond and the mechanism of the actin cytoskeleton disruption remained elusive. To elucidate the mechanism of ACD action and effect on actin, we identified the covalent cross-link bond between actin protomers using limited proteolysis, X-ray crystallography, and mass spectrometry. We report here that ACD catalyzes the formation of an intermolecular iso-peptide bond between residues E270 and K50 located in the hydrophobic and the DNaseI-binding loops of actin, respectively. Mutagenesis studies confirm that no other residues on actin can be cross-linked by ACD both in vitro and in vivo. This cross-linking locks actin protomers into an orientation different from that of F-actin, resulting in strong inhibition of actin polymerization. This report describes a microbial toxin mechanism acting via iso-peptide bond cross-linking between host proteins and is, to the best of our knowledge, the only known example of a peptide linkage between nonterminal glutamate and lysine side chains.

Many bacteria conditionally express proteinaceous organelles referred to here as microcompartments (Fig. 1). These microcompartments are thought to be involved in a least seven different metabolic processes and the number is growing. Microcompartments are very large and structurally sophisticated. They are usually about 100-150 nm in cross section and consist of 10,000-20,000 polypeptides of 10-20 types. Their unifying feature is a solid shell constructed from proteins having bacterial microcompartment (BMC) domains. In the examples that have been studied, the microcompartment shell encases sequentially acting metabolic enzymes that catalyze a reaction sequence having a toxic or volatile intermediate product. It is thought that the shell of the microcompartment confines such intermediates, thereby enhancing metabolic efficiency and/or protecting cytoplasmic components. Mechanistically, however, this creates a paradox. How do microcompartments allow enzyme substrates, products and cofactors to pass while confining metabolic intermediates in the absence of a selectively permeable membrane? We suggest that the answer to this paradox may have broad implications with respect to our understanding of the fundamental properties of biological protein sheets including microcompartment shells, S-layers and viral capsids.

We report an effective method to fabricate two-dimensional (2D) periodic oxide nanopatterns using S-layer proteins as a template. Specifically, S-layer proteins with a unit cell dimension of 20 nm were reassembled on silicon substrate to form 2D arrays with ordered pores of nearly identical sizes (9 nm). Octadecyltrichlorosilane (ODTS) was utilized to selectively react with the S-layer proteins, but not the Si surface exposed through the pores defined by the proteins. Because of the different surface functional groups on the ODTS-modified S-layer proteins and Si surface, area-selective atomic layer deposition of metal oxide-based high-k materials, such as hafnium oxide, in the pores was achieved. The periodic metal oxide nanopatterns were generated on Si substrate after selective removal of the ODTS-modified S-layer proteins. These nanopatterns of high-k materials are expected to facilitate further downscaling of logic and memory nanoelectronic devices.

2007

The question of whether novel, structurally different protein folds might have arisen from existing ones is crucial to understanding protein evolution. Recent work on cysteine-rich domains in Hydra proteins illuminates how evolutionary transitions between dramatically different structures might occur.

A growing number of organisms have been discovered inhabiting extreme environments, including temperatures in excess of 100 degrees C. How cellular proteins from such organisms retain their native folds under extreme conditions is still not fully understood. Recent computational and structural studies have identified disulfide bonding as an important mechanism for stabilizing intracellular proteins in certain thermophilic microbes. Here, we present the first proteomic analysis of intracellular disulfide bonding in the hyperthermophilic archaeon Pyrobaculum aerophilum. Our study reveals that the utilization of disulfide bonds extends beyond individual proteins to include many protein-protein complexes. We report the 1.6 A crystal structure of one such complex, a citrate synthase homodimer. The structure contains two intramolecular disulfide bonds, one per subunit, which result in the cyclization of each protein chain in such a way that the two chains are topologically interlinked, rendering them inseparable. This unusual feature emphasizes the variety and sophistication of the molecular mechanisms that can be achieved by evolution.

The carboxysome is a bacterial organelle that functions to enhance the efficiency of CO2 fixation by encapsulating the enzymes ribulose bisphosphate carboxylase/oxygenase (RuBisCO) and carbonic anhydrase. The outer shell of the carboxysome is reminiscent of a viral capsid, being constructed from many copies of a few small proteins. Here we describe the structure of the shell protein CsoS1A from the chemoautotrophic bacterium Halothiobacillus neapolitanus. The CsoS1A protein forms hexameric units that pack tightly together to form a molecular layer, which is perforated by narrow pores. Sulfate ions, soaked into crystals of CsoS1A, are observed in the pores of the molecular layer, supporting the idea that the pores could be the conduit for negatively charged metabolites such as bicarbonate, which must cross the shell. The problem of diffusion across a semiporous protein shell is discussed, with the conclusion that the shell is sufficiently porous to allow adequate transport of small molecules. The molecular layer formed by CsoS1A is similar to the recently observed layers formed by cyanobacterial carboxysome shell proteins. This similarity supports the argument that the layers observed represent the natural structure of the facets of the carboxysome shell. Insights into carboxysome function are provided by comparisons of the carboxysome shell to viral capsids, and a comparison of its pores to the pores of transmembrane protein channels.

Many proteins self-assemble to form large supramolecular complexes. Numerous examples of these structures have been characterized, ranging from spherical viruses to tubular protein assemblies. Some new kinds of supramolecular structures are just coming to light, while it is likely there are others that have not yet been discovered. The carboxysome is a subcellular structure that has been known for more than 40 years, but whose structural and functional details are just now emerging. This giant polyhedral body is constructed as a closed shell assembled from several thousand protein subunits. Within this protein shell, the carboxysome encapsulates the CO(2)-fixing enzymes, Rubisco (ribulose-1,5-bisphosphate carboxylase/oxygenase) and carbonic anhydrase; this arrangement enhances the efficiency of cellular CO(2) fixation. The carboxysome is present in many photosynthetic and chemoautotrophic bacteria, and so plays an important role in the global carbon cycle. It also serves as the prototypical member of what appears to be a large class of primitive protein-based organelles in bacteria. A series of crystal structures is beginning to reveal the secrets of how the carboxysome is assembled and how it enhances the efficiency of CO(2) fixation. Some of the assembly principles revealed in the carboxysome are reminiscent of those seen in icosahedral viral capsids. In addition, the shell appears to be perforated by pores for metabolite transport into and out of the carboxysome, suggesting comparisons to the pores through oligomeric transmembrane proteins, which serve to transport small molecules across the membrane bilayers of cells and eukaryotic organelles.

In some cyanobacteria, the genes for the large and small subunits of the enzyme RuBisCO are separated on the bacterial chromosome by the insertion of a gene coding for a protein designated RbcX, which acts as a chaperone for RuBisCO. A recent structural study [Saschenbrecker et al. (2007), Cell, 129, 1189-1200] has shed light on the mechanism by which RbcX assists RuBisCO assembly. Here, the crystal structure of RbcX from another cyanobacterium, Synechocystis sp. PCC6803, is reported, revealing an unusually long protruding C-terminal helix, as well as a bound polyethylene glycol molecule in the protein substrate-binding site.

Among the thousands of known three-dimensional protein folds, only a few have been found whose backbones are in knotted configurations. The rarity of knotted proteins has important implications for how natural proteins reach their natively folded states. Proteins with such unusual features offer unique opportunities for studying the relationships between structure, folding, and stability. Here we report the identification of a unique slipknot feature in the fold of a well-known thermostable protein, alkaline phosphatase. A slipknot is created when a knot is formed by part of a protein chain, after which the backbone doubles back so that the entire structure becomes unknotted in a mathematical sense. Slipknots are therefore not detected by computational tests that look for knots in complete protein structures. A computational survey looking specifically for slipknots in the Protein Data Bank reveals a few other instances in addition to alkaline phosphatase. Unexpected similarities are noted among some of the proteins identified. In addition, two transmembrane proteins are found to contain slipknots. Finally, mutagenesis experiments on alkaline phosphatase are used to probe the contribution the slipknot feature makes to thermal stability. The trends and conserved features observed in these proteins provide new insights into mechanisms of protein folding and stability.

Among proteins of known three-dimensional structure, only a few possess complex topological features such as knotted or interlinked (catenated) protein backbones. Such unusual proteins offer potentially unique insights into folding pathways and stabilization mechanisms. They also present special challenges for both theorists and computational scientists interested in understanding and predicting protein-folding behavior. Here, we review complex topological features in proteins with a focus on recent progress on the identification and characterization of knotted and interlinked protein systems. Also, an approach is described for designing an expanded set of knotted proteins.

CsoSCA (formerly CsoS3) is a bacterial carbonic anhydrase localized in the shell of a cellular microcompartment called the carboxysome, where it converts HCO(3)(-) to CO(2) for use in carbon fixation by ribulose-bisphosphate carboxylase/oxygenase (RuBisCO). CsoSCA lacks significant sequence similarity to any of the four known classes of carbonic anhydrase (alpha, beta, gamma, or delta), and so it was initially classified as belonging to a new class, epsilon. The crystal structure of CsoSCA from Halothiobacillus neapolitanus reveals that it is actually a representative member of a new subclass of beta-carbonic anhydrases, distinguished by a lack of active site pairing. Whereas a typical beta-carbonic anhydrase maintains a pair of active sites organized within a two-fold symmetric homodimer or pair of fused, homologous domains, the two domains in CsoSCA have diverged to the point that only one domain in the pair retains a viable active site. We suggest that this defunct and somewhat diminished domain has evolved a new function, specific to its carboxysomal environment. Despite the level of sequence divergence that separates CsoSCA from the other two subclasses of beta-carbonic anhydrases, there is a remarkable level of structural similarity among active site regions, which suggests a common catalytic mechanism for the interconversion of HCO(3)(-) and CO(2). Crystal packing analysis suggests that CsoSCA exists within the carboxysome shell either as a homodimer or as extended filaments.

Numerous diseases are characterized by the formation of insoluble, amyloid protein fibrils. Intensive investigations are beginning to unravel the detailed molecular and structural principles that underlie the spontaneous formation of these fibrils. The amyloid protein transthyretin serves as an excellent system for dissecting the conformational changes and ensuing subunit-subunit associations that lead to amyloid. One working model for tranthyretin amyloid involves the exposure of an "unprotected" edge beta strand, followed by symmetric assembly of subunits to give head-to-head and tail-to-tail protofibrils. The models and principles emerging from studies on transthyretin lead to connections to other amyloid systems.

In a natively folded protein of moderate or larger size, the protein backbone may weave through itself in complex ways, raising questions about what sequence of events might have to occur in order for the protein to reach its native configuration from the unfolded state. A mathematical framework is presented here for describing the notion of a topological folding barrier, which occurs when a protein chain must pass through a hole or opening, formed by other regions of the protein structure. Different folding pathways encounter different numbers of such barriers and therefore different degrees of frustration. A dynamic programming algorithm finds the optimal theoretical folding path and minimal degree of frustration for a protein based on its natively folded configuration. Calculations over a database of protein structures provide insights into questions such as whether the path of minimal frustration might tend to favor folding from one or from many sites of folding nucleation, or whether proteins favor folding around the N terminus, thereby providing support for the hypothesis that proteins fold co-translationally. The computational methods are applied to a multi-disulfide bonded protein, with computational findings that are consistent with the experimentally observed folding pathway. Attention is drawn to certain complex protein folds for which the computational method suggests there may be a preferred site of nucleation or where folding is likely to proceed through a relatively well-defined pathway or intermediate. The computational analyses lead to testable models for protein folding.

Previous studies of symmetry preferences in protein crystals suggest that symmetric proteins, such as homodimers, might crystallize more readily on average than asymmetric, monomeric proteins. Proteins that are naturally monomeric can be made homodimeric artificially by forming disulfide bonds between individual cysteine residues introduced by mutagenesis. Furthermore, by creating a variety of single-cysteine mutants, a series of distinct synthetic dimers can be generated for a given protein of interest, with each expected to gain advantage from its added symmetry and to exhibit a crystallization behavior distinct from the other constructs. This strategy was tested on phage T4 lysozyme, a protein whose crystallization as a monomer has been studied exhaustively. Experiments on three single-cysteine mutants, each prepared in dimeric form, yielded numerous novel crystal forms that cannot be realized by monomeric lysozyme. Six new crystal forms have been characterized. The results suggest that synthetic symmetrization may be a useful approach for enlarging the search space for crystallizing proteins.

2005

In several natural settings, the standard genetic code is expanded to incorporate two additional amino acids with distinct functionality, selenocysteine and pyrrolysine. These rare amino acids can be overlooked inadvertently, however, as they arise by recoding at certain stop codons. We report a method for such recoding prediction from genomic data, using read-through similarity evaluation. A survey across a set of microbial genomes identifies almost all the known cases as well as a number of novel candidate proteins.

Bacterial microcompartments are primitive organelles composed entirely of protein subunits. Genomic sequence databases reveal the widespread occurrence of microcompartments across diverse microbes. The prototypical bacterial microcompartment is the carboxysome, a protein shell for sequestering carbon fixation reactions. We report three-dimensional crystal structures of multiple carboxysome shell proteins, revealing a hexameric unit as the basic microcompartment building block and showing how these hexamers assemble to form flat facets of the polyhedral shell. The structures suggest how molecular transport across the shell may be controlled and how structural variations might govern the assembly and architecture of these subcellular compartments.

The 2.5-A resolution crystal structure is reported for an actin dimer, composed of two protomers cross-linked along the longitudinal (or vertical) direction of the F-actin filament. The crystal structure provides an atomic resolution view of a molecular interface between actin protomers, which we argue represents a near-native interaction in the F-actin filament. The interaction involves subdomains 3 and 4 from distinct protomers. The atomic positions in the interface visualized differ by 5-10 A from those suggested by previous models of F-actin. Such differences fall within the range of uncertainties allowed by the fiber diffraction and electron microscopy methods on which previous models have been based. In the crystal, the translational arrangement of protomers lacks the slow twist found in native filaments. A plausible model of F-actin can be constructed by reintroducing the known filament twist, without disturbing significantly the interface observed in the actin dimer crystal.

Thermophilic organisms flourish in varied high-temperature environmental niches that are deadly to other organisms. Recently, genomic evidence has implicated a critical role for disulfide bonds in the structural stabilization of intracellular proteins from certain of these organisms, contrary to the conventional view that structural disulfide bonds are exclusively extracellular. Here both computational and structural data are presented to explore the occurrence of disulfide bonds as a protein-stabilization method across many thermophilic prokaryotes. Based on computational studies, disulfide-bond richness is found to be widespread, with thermophiles containing the highest levels. Interestingly, only a distinct subset of thermophiles exhibit this property. A computational search for proteins matching this target phylogenetic profile singles out a specific protein, known as protein disulfide oxidoreductase, as a potential key player in thermophilic intracellular disulfide-bond formation. Finally, biochemical support in the form of a new crystal structure of a thermophilic protein with three disulfide bonds is presented together with a survey of known structures from the literature. Together, the results provide insight into biochemical specialization and the diversity of methods employed by organisms to stabilize their proteins in exotic environments. The findings also motivate continued efforts to sequence genomes from divergent organisms.

The wealth of available genomic data has spawned a corresponding interest in computational methods that can impart biological meaning and context to these experiments. Traditional computational methods have drawn relationships between pairs of proteins or genes based on notions of equality or similarity between their patterns of occurrence or behavior. For example, two genes displaying similar variation in expression, over a number of experiments, may be predicted to be functionally related. We have introduced a natural extension of these approaches, instead identifying logical relationships involving triplets of proteins. Triplets provide for various discrete kinds of logic relationships, leading to detailed inferences about biological associations. For instance, a protein C might be encoded within an organism if, and only if, two other proteins A and B are also both encoded within the organism, thus suggesting that gene C is functionally related to genes A and B. The method has been applied fruitfully to both phylogenetic and microarray expression data, and has been used to associate logical combinations of protein activity with disease state phenotypes, revealing previously unknown ternary relationships among proteins, and illustrating the inherent complexities that arise in biological data.

2004

The advent of whole-genome sequencing has led to methods that infer protein function and linkages. We have combined four such algorithms (phylogenetic profile, Rosetta Stone, gene neighbor and gene cluster) in a single database--Prolinks--that spans 83 organisms and includes 10 million high-confidence links. The Proteome Navigator tool allows users to browse predicted linkage networks interactively, providing accompanying annotation from public databases. The Prolinks database and the Proteome Navigator tool are available for use online at http://dip.doe-mbi.ucla.edu/pronav.

The three-dimensional structure of the RNA-modifying enzyme, psi55 tRNA pseudouridine synthase from Mycobacterium tuberculosis, is reported. The 1.9-A resolution crystal structure reveals the enzyme, free of substrate, in two distinct conformations. The structure depicts an interesting mode of protein flexibility involving a hinged bending in the central beta-sheet of the catalytic module. Key parts of the active site cleft are also found to be disordered in the substrate-free form of the enzyme. The hinge bending appears to act as a clamp to position the substrate. Our structural data furthers the previously proposed mechanism of tRNA recognition. The present crystal structure emphasizes the significant role that protein dynamics must play in tRNA recognition, base flipping, and modification.

The Genomic Disulfide Analysis Program (GDAP) provides web access to computationally predicted protein disulfide bonds for over one hundred microbial genomes, including both bacterial and achaeal species. In the GDAP process, sequences of unknown structure are mapped, when possible, to known homologous Protein Data Bank (PDB) structures, after which specific distance criteria are applied to predict disulfide bonds. GDAP also accepts user-supplied protein sequences and subsequently queries the PDB sequence database for the best matches, scans for possible disulfide bonds and returns the results to the client. These predictions are useful for a variety of applications and have previously been used to show a dramatic preference in certain thermophilic archaea and bacteria for disulfide bonds within intracellular proteins. Given the central role these stabilizing, covalent bonds play in such organisms, the predictions available from GDAP provide a rich data source for designing site-directed mutants with more stable thermal profiles. The GDAP web application is a gateway to this information and can be used to understand the role disulfide bonds play in protein stability both in these unusual organisms and in sequences of interest to the individual researcher. The prediction server can be accessed at http://www.doe-mbi.ucla.edu/Services/GDAP.

A major focus of genome research is to decipher the networks of molecular interactions that underlie cellular function. We describe a computational approach for identifying detailed relationships between proteins on the basis of genomic data. Logic analysis of phylogenetic profiles identifies triplets of proteins whose presence or absence obey certain logic relationships. For example, protein C may be present in a genome only if proteins A and B are both present. The method reveals many previously unidentified higher order relationships. These relationships illustrate the complexities that arise in cellular networks because of branching and alternate pathways, and they also facilitate assignment of cellular functions to uncharacterized proteins.

Carotenoids undergo a wide range of photochemical reactions in animal, plant, and microbial systems. In photosynthetic organisms, in addition to light harvesting, they perform an essential role in protecting against light-induced damage by quenching singlet oxygen, superoxide anion radicals, or triplet-state chlorophyll. We have determined the crystal structure of a water-soluble orange carotenoid protein (OCP) isolated from the cyanobacterium Arthrospira maxima at a resolution of 2.1 A. OCP forms a homodimer with one carotenoid molecule per monomer. The carotenoid binding site is lined by a striking number of methionine residues. The structure reveals several possible ways in which the protein environment influences the spectral properties of the pigment and provides insight into how the OCP carries out its putative functions in photoprotection.

The crystal structure at 1.54 A resolution of a double mutant of interleukin-1beta (F42W/W120F), a cytokine secreted by macrophages, was determined by multiple-wavelength anomalous dispersion (MAD) using data from highly twinned selenomethionine-modified crystals. The space group is P4(3), with unit-cell parameters a = b = 53.9, c = 77.4 A. Self-rotation function analysis and various intensity statistics revealed the presence of merohedral twinning in crystals of both the native (twinning fraction alpha approximately 0.35) and SeMet (alpha approximately 0.40) forms. Structure determination and refinement are discussed with emphasis on the possible reasons for successful phasing using untreated twinned MAD data.

Mutations in the SOD1 gene cause the autosomal dominant, neurodegenerative disorder familial amyotrophic lateral sclerosis (FALS). In spinal cord neurons of human FALS patients and in transgenic mice expressing these mutant proteins, aggregates containing FALS SOD1 are observed. Accumulation of SOD1 aggregates is believed to interfere with axonal transport, protein degradation and anti-apoptotic functions of the neuronal cellular machinery. Here we show that metal-deficient, pathogenic SOD1 mutant proteins crystallize in three different crystal forms, all of which reveal higher-order assemblies of aligned beta-sheets. Amyloid-like filaments and water-filled nanotubes arise through extensive interactions between loop and beta-barrel elements of neighboring mutant SOD1 molecules. In all cases, non-native conformational changes permit a gain of interaction between dimers that leads to higher-order arrays. Normal beta-sheet-containing proteins avoid such self-association by preventing their edge strands from making intermolecular interactions. Loss of this protection through conformational rearrangement in the metal-deficient enzyme could be a toxic property common to mutants of SOD1 linked to FALS.

First, the crystal structure of cytochrome c-550 (the psbV1 gene product) from the thermophilic cyanobacterium Thermosynechococcus elongatus has been determined to a resolution of 1.8 A. A comparison of the T. elongatus cytochrome c-550 structure to its counterparts from mesophilic organisms, Synechocystis 6803 and Arthrospira maxima, suggests that increased numbers of hydrogen bonds may play a role in the structural basis of thermostability. The cytochrome c-550 in T. elongatus also differs from that in Synechocystis 6803 and Arthrospira maxima in its lack of dimerization and the presence of a trigonal planar molecule, possibly bicarbonate, tightly bound to the heme propionate oxygen atoms. Cytochromes c-550 from T. elongatus, Synechocystis 6803 and Arthrospira maxima exhibit different EPR spectra. A correlation has been done between the heme-axial ligands geometries and the rhombicity calculated from the EPR spectra. This correlation indicates that binding of cytochrome c-550 to Photosystem II is accompanied by structural changes in the heme vicinity. Second, the psbV2 gene product has been found and purified. The UV-visible, EPR and Raman spectra are reported. From the spectroscopic data and from a theoretical structural model based on the cytochrome c-550 structure it is proposed that the 6th ligand of the heme-iron is the Tyr86.

Ketopantoate hydroxymethyltransferase (KPHMT) catalyzes the first committed step in the biosynthesis of pantothenate, which is a precursor to coenzyme A and is required for penicillin biosynthesis. The crystal structure of KPHMT from Mycobacterium tuberculosis was determined by the single anomalous substitution (SAS) method at 2.8 A resolution. KPHMT adopts a structure that is a variation on the (beta/alpha) barrel fold, with a metal binding site proximal to the presumed catalytic site. The protein forms a decameric complex, with subunits in opposing pentameric rings held together by a swapping of their C-terminal alpha helices. The structure reveals KPHMT's membership in a small, recently discovered group of (beta/alpha) barrel enzymes that employ domain swapping to form a variety of oligomeric assemblies. The apparent conservation of certain detailed structural characteristics suggests that KPHMT is distantly related by divergent evolution to enzymes in unrelated pathways, including isocitrate lyase and phosphoenolpyruvate mutase.

A new approach to analyzing macromolecular single-crystal X-ray diffraction intensity statistics is presented. Instead of considering reflections in resolution shells, differences between local pairs of reflection intensities are taken and normalized separately. When the two reflections to be compared (having intensities I(1) and I(2), respectively) are chosen appropriately, the behavior of the parameter L = (I(1) - I(2))/(I(1) + I(2)) is insensitive to phenomena that tend to confound traditional intensity statistics, such as anisotropic diffraction and pseudo-centering. The distributions and expected values for L take simple forms when the intensity data are from ordinary crystals or from perfectly twinned specimens. The robustness of the approach is demonstrated with examples using real proteins whose diffraction data appear aberrant by other methods of intensity analysis. The new statistic is better suited than other available methods for diagnosing perfect hemihedral twinning.

The iron-containing superoxide dismutase (FeSOD) from the thermophilic cyanobacterium Thermosynechococcus elongatus has been isolated. The protein crystallizes readily and we have determined the structure to 1.6 A resolution. This is the first structural characterization of an FeSOD isolated from a cyanobacterium and one of the highest resolution FeSOD structures determined to date. The activity of the T. elongatus FeSOD has been measured both at 25 degrees C and 50 degrees C and it has been spectroscopically characterized. The T. elongatus FeSOD EPR spectra at pH 5.1, 7.5 and 10.0 are similar. This indicates that no change in the geometry of the Fe(III) site occurs over a wide range of pH. This is in contrast to the other FeSODs described in the literature.

Genome-wide functional linkages among proteins in cellular complexes and metabolic pathways can be inferred from high throughput experimentation, such as DNA microarrays, or from bioinformatic analyses. Here we describe a method for the visualization and interpretation of genome-wide functional linkages inferred by the Rosetta Stone, Phylogenetic Profile, Operon and Conserved Gene Neighbor computational methods. This method involves the construction of a genome-wide functional linkage map, where each significant functional linkage between a pair of proteins is displayed on a two-dimensional scatter-plot, organized according to the order of genes along the chromosome. Subsequent hierarchical clustering of the map reveals clusters of genes with similar functional linkage profiles and facilitates the inference of protein function and the discovery of functionally linked gene clusters throughout the genome. We illustrate this method by applying it to the genome of the pathogenic bacterium Mycobacterium tuberculosis, assigning cellular functions to previously uncharacterized proteins involved in cell wall biosynthesis, signal transduction, chaperone activity, energy metabolism and polysaccharide biosynthesis.

Protein l-isoaspartate-(d-aspartate) O-methyltransferases (EC ), present in a wide variety of prokaryotic and eukaryotic organisms, can initiate the conversion of abnormal l-isoaspartyl residues that arise spontaneously with age to normal l-aspartyl residues. In addition, the mammalian enzyme can recognize spontaneously racemized d-aspartyl residues for conversion to l-aspartyl residues, although no such activity has been seen to date for enzymes from lower animals or prokaryotes. In this work, we characterize the enzyme from the hyperthermophilic archaebacterium Pyrococcus furiosus. Remarkably, this methyltransferase catalyzes both l-isoaspartyl and d-aspartyl methylation reactions in synthetic peptides with affinities that can be significantly higher than those of the human enzyme, previously the most catalytically efficient species known. Analysis of the common features of l-isoaspartyl and d-aspartyl residues suggested that the basic substrate recognition element for this enzyme may be mimicked by an N-terminal succinyl peptide. We tested this hypothesis with a number of synthetic peptides using both the P. furiosus and the human enzyme. We found that peptides devoid of aspartyl residues but containing the N-succinyl group were in fact methyl esterified by both enzymes. The recent structure determined for the l-isoaspartyl methyltransferase from P. furiosus complexed with an l-isoaspartyl peptide supports this mode of methyl-acceptor recognition. The combination of the thermophilicity and the high affinity binding of methyl-accepting substrates makes the P. furiosus enzyme useful both as a reagent for detecting isomerized and racemized residues in damaged proteins and for possible human therapeutic use in repairing damaged proteins in extracellular environments where the cytosolic enzyme is not normally found.

The enzyme l-isoaspartyl methyltransferase initiates the repair of damaged proteins by recognizing and methylating isomerized and racemized aspartyl residues in aging proteins. The crystal structure of the human enzyme containing a bound S-adenosyl-l-homocysteine cofactor is reported here at a resolution of 2.1 A. A comparison of the human enzyme to homologs from two other species reveals several significant differences among otherwise similar structures. In all three structures, we find that three conserved charged residues are buried in the protein interior near the active site. Electrostatics calculations suggest that these buried charges might make significant contributions to the energetics of binding the charged S-adenosyl-l-methionine cofactor and to catalysis. We suggest a possible structural explanation for the observed differences in reactivity toward the structurally similar l-isoaspartyl and d-aspartyl residues in the human, archael, and eubacterial enzymes. Finally, the human structure reveals that the known genetic polymorphism at residue 119 (Val/Ile) maps to an exposed region away from the active site.

Disulfide bonds have only rarely been found in intracellular proteins. That pattern is consistent with the chemically reducing environment inside the cells of well-studied organisms. However, recent experiments and new calculations based on genomic data of archaea provide striking contradictions to this pattern. Our results indicate that the intracellular proteins of certain hyperthermophilic archaea, especially the crenarchaea Pyrobaculum aerophilum and Aeropyrum pernix, are rich in disulfide bonds. This finding implicates disulfide bonding in stabilizing many thermostable proteins and points to novel chemical environments inside these microbes. These unexpected results illustrate the wealth of biochemical insights available from the growing reservoir of genomic data.

Cytochrome c(6) from the cyanobacterium Arthrospira maxima is present in isoforms that can be resolved by size-exclusion chromatography. One isoform crystallized in space group I4(1)32 with eight protein molecules in the asymmetric unit and a total of 384 molecules in the unit cell. Within the crystal, the molecules are arranged as clusters of 24 cytochrome c(6) molecules. Each cluster is a hollow shell with approximate octahedral (432) symmetry. Structural and biochemical studies of cytochrome c(6) isolated from other cyanobacteria and algae have led to the suggestion that cytochrome c(6) forms oligomers. The cytochrome c(6) complex described here is the largest assembly of cytochrome c(6) molecules observed thus far.

Many natural proteins self-assemble, either to fulfill their biological function or as part of a pathogenic process. Biological assembly phenomena such as amyloidogenesis, domain swapping and symmetric oligomerization are inspiring new strategies for designing proteins that self-assemble to form supramolecular complexes. Recent advances include the design of novel proteins that assemble into filaments, symmetric cages and regular arrays.

Proteins bearing the widely distributed SET domain have been shown to methylate lysine residues in histones and other proteins. In this issue, three-dimensional structures are reported for three very different SET domain-containing proteins. The structures reveal novel folds for several new domains, including SET, and provide early insights into mechanisms of catalysis and molecular recognition in this family of enzymes.

Amyloid fibrils are associated with several disease states, but their structures have yet to be fully defined. Here we use site-directed spin labeling to explain some of the specific interactions that are formed between subunits when the protein transthyretin (TTR) assembles into amyloid fibrils, which are associated with both spontaneous and familial amyloid diseases in humans. The results suggest that fibrils are formed when a major conformational change displaces the terminal beta-strand from the edge of a beta-sheet in the native structure, exposing the penultimate strand. The newly exposed strand then allows a novel beta-sheet interaction to form between the TTR subunits. This interaction and another previously identified subunit association lead to a plausible model for the specific sequence of beta-strands in one of the indefinitely repeating beta-sheets of TTR amyloid, which is formed by a head-to-head, tail-to-tail arrangement of subunits.

The alpha-helix containing the thiols, SH1 (Cys-707) and SH2 (Cys-697), has been proposed to be one of the structural elements responsible for the transduction of conformational changes in the myosin head (subfragment-1 (S1)). Previous studies, using a method that isolated and measured the rate of the SH1-SH2 cross-linking step, showed that this helix undergoes ligand-induced conformational changes. However, because of long incubation times required for the formation of the transition state complexes (S1.ADP.BeF(x), S1.ADP.AlF(4)-, and S1.ADP.V(i)), this method could not be used to determine the cross-linking rate constants for such states. In this study, kinetic data from the SH1-SH2 cross-linking reaction were analyzed by computational methods to extract rate constants for the two-step mechanism. For S1.ADP.BeF(x), the results obtained were similar to those for S1.ATPgammaS. For reactions involving S1.ADP.AlF(4)- and S1.ADP.V(i), the first step (SH1 modification) is rate limiting; consequently, only lower limits could be established for the rate constants of the cross-linking step. Nevertheless, these results show that the cross-linking rate constants in the transition state complexes are increased at least 20-fold for all the reagents, including the shortest one, compared with nucleotide-free S1. Thus, the SH1-SH2 helix appears to be destabilized in the post-hydrolysis state.

Few examples of pseudomerohedrally twinned macromolecular crystals have been described in the literature. This unusual phenomenon arises when a fortuitous unit-cell geometry makes it possible for twinning to occur in a space group that ordinarily does not allow twinning. Here, the crystallization, structure determination and refinement of the cocaine hydrolytic antibody 15A10 at 2.35 A resolution are described. The crystal belongs to space group P2(1), with two molecules in the asymmetric unit and unit-cell parameters a = 37.5, b = 108.4, c = 111.3 A and beta fortuitously near 90 degrees; the refined twinning fraction is alpha = 0.43. Interestingly, the non-crystallographic symmetry (NCS) and twin operators are nearly parallel, which appears to be a relatively frequent situation in protein crystals twinned by merohedry or pseudomerohedry.

A general strategy is described for designing proteins that self assemble into large symmetrical nanomaterials, including molecular cages, filaments, layers, and porous materials. In this strategy, one molecule of protein A, which naturally forms a self-assembling oligomer, A(n), is fused rigidly to one molecule of protein B, which forms another self-assembling oligomer, B(m). The result is a fusion protein, A-B, which self assembles with other identical copies of itself into a designed nanohedral particle or material, (A-B)(p). The strategy is demonstrated through the design, production, and characterization of two fusion proteins: a 49-kDa protein designed to assemble into a cage approximately 15 nm across, and a 44-kDa protein designed to assemble into long filaments approximately 4 nm wide. The strategy opens a way to create a wide variety of potentially useful protein-based materials, some of which share similar features with natural biological assemblies.

Cytochrome c(6) and cytochrome c-549 are small (89 and 130 amino acids, respectively) monoheme cytochromes that function in photosynthesis. They appear to have descended relatively recently from the same ancestral gene but have diverged to carry out very different functional roles, underscored by the large difference between their midpoint potentials of nearly 600 mV. We have determined the X-ray crystal structures of both proteins isolated from the cyanobacterium Arthrospira maxima. The two structures are remarkably similar, superimposing on backbone atoms with an rmsd of 0.7 A. Comparison of the two structures suggests that differences in solvent exposure of the heme and the electrostatic environment of the heme propionates, as well as in heme iron ligation, are the main determinants of midpoint potential in the two proteins. In addition, the crystal packing of both A. maxima cytochrome c-549 and cytochrome c(6) suggests that the proteins oligomerize. Finally, the cytochrome c-549 dimer we observe can be readily fit into the recently described model of cyanobacterial photosystem II.

Amyloid and prion diseases appear to stem from the conversion of normally folded proteins into insoluble, fiber-like assemblies. Despite numerous structural studies, a detailed molecular characterization of amyloid fibrils remains elusive. In particular, models of amyloid fibrils proposed thus far have not adequately defined the constituent protein subunit interactions. To further our understanding of amyloid structure, we employed thiol-specific cross-linking and site-directed spin labeling to identify specific protein-protein associations in transthyretin (TTR) amyloid fibrils. We find that certain cysteine mutants of TTR, when dimerized by chemical cross-linkers, still form fibers under typical in vitro fibrillogenic conditions. In addition, site-directed spin labeling of many residues at the natural dimer interface reveals that their spatial proximity is preserved in the fibrillar state even in the absence of cross-linking constraints. Here, we present the first view of a subunit interface in TTR fibers and show that it is very similar to one of the natural dimeric interchain associations evident in the structure of soluble TTR. The results clarify varied models of amyloidogenesis by demonstrating that transthyretin amyloid fibrils may assemble from oligomeric protein building blocks rather than structurally rearranged monomers.

Protein L-isoaspartyl (D-aspartyl) methyltransferases (EC 2.1.1.77) are found in almost all organisms. These enzymes catalyze the S-adenosylmethionine (AdoMet)-dependent methylation of isomerized and racemized aspartyl residues in age-damaged proteins as part of an essential protein repair process. Here, we report crystal structures of the repair methyltransferase at resolutions up to 1.2 A from the hyperthermophilic archaeon Pyrococcus furiosus. Refined structures include binary complexes with the active cofactor AdoMet, its reaction product S-adenosylhomocysteine (AdoHcy), and adenosine. The enzyme places the methyl-donating cofactor in a deep, electrostatically negative pocket that is shielded from solvent. Across the multiple crystal structures visualized, the presence or absence of the methyl group on the cofactor correlates with a significant conformational change in the enzyme in a loop bordering the active site, suggesting a role for motion in catalysis or cofactor exchange. We also report the structure of a ternary complex of the enzyme with adenosine and the methyl-accepting polypeptide substrate VYP(L-isoAsp)HA at 2.1 A. The substrate binds in a narrow active site cleft with three of its residues in an extended conformation, suggesting that damaged proteins may be locally denatured during the repair process in cells. Manual and computer-based docking studies on different isomers help explain how the enzyme uses steric effects to make the critical distinction between normal L-aspartyl and age-damaged L-isoaspartyl and D-aspartyl residues.

2000

Adenylosuccinate lyase is an enzyme that plays a critical role in both cellular replication and metabolism via its action in the de novo purine biosynthetic pathway. Adenylosuccinate lyase is the only enzyme in this pathway to catalyze two separate reactions, enabling it to participate in the addition of a nitrogen at two different positions in adenosine monophosphate. Both reactions catalyzed by adenylosuccinate lyase involve the beta-elimination of fumarate. Enzymes that catalyze this type of reaction belong to a superfamily, the members of which are homotetramers. Because adenylosuccinate lyase plays an integral part in maintaining proper cellular metabolism, mutations in the human enzyme can have severe clinical consequences, including mental retardation with autistic features.

Faced with the avalanche of genomic sequences and data on messenger RNA expression, biological scientists are confronting a frightening prospect: piles of information but only flakes of knowledge. How can the thousands of sequences being determined and deposited, and the thousands of expression profiles being generated by the new array methods, be synthesized into useful knowledge? What form will this knowledge take? These are questions being addressed by scientists in the field known as 'functional genomics'.

Adenylosuccinate lyase catalyzes two separate reactions in the de novo purine biosynthetic pathway. Through its dual action in this pathway, adenylosuccinate lyase plays an integral part in cellular replication and metabolism. Mutations in the human enzyme can result in severe neurological disorders, including mental retardation with autistic features. The crystal structure of adenylosuccinate lyase from the hyperthermophilic archaebacterium Pyrobaculum aerophilum has been determined to 2.1 A resolution. Although both the fold of the monomer and the architecture of the tetrameric assembly are similar to adenylosuccinate lyase from the thermophilic eubacterium Thermotoga maritima, the archaebacterial lyase contains unique features. Surprisingly, the structure of adenylosuccinate lyase from P. aerophilum reveals that this intracellular protein contains three disulfide bonds that contribute significantly to its stability against thermal and chemical denaturation. The observation of multiple disulfide bonds in the recombinant form of the enzyme suggests the need for further investigations into whether the intracellular environment of P. aerophilum, and possibly other hyperthermophiles, may be compatible with protein disulfide bond formation. In addition, the protein is shorter in P. aerophilum than it is in other organisms. This abbreviation results from an internal excision of a cluster of helices that may be involved in protein-protein interactions in other organisms and may relate to the observed clinical effects of human mutations in that region.

An empirical function is developed to measure the protein-like character of electron-density maps. The function is based upon a systematic analysis of numerous local and global map properties or descriptors. Local descriptors measure the occurrence throughout the unit cell of unique patterns on various defined templates, while global descriptors enumerate topological characteristics that define the connectivity and complexity of electron-density isosurfaces. We examine how these quantitative descriptors vary as error is introduced into the phase sets used to generate maps. Informative descriptors are combined in an optimal fashion to arrive at a predictive function. When the topological and geometrical analysis is applied to protein maps generated from phase sets with varying amounts of error, the function is able to estimate changes in average phase error with an accuracy of better than 10 degrees. Additionally, when used to monitor maps generated with experimental phases from different heavy-atom models, the analysis clearly distinguishes between the correct heavy-atom substructure solution and incorrect heavy-atom solutions. The function is also evaluated as a tool to monitor changes in map quality and phase error before and after density-modification procedures.

1999

Different types of crystal twinning are reviewed with an emphasis on how to detect the phenomenon from protein diffraction data. The recent literature and a database survey both serve as reminders to perform routine checks whenever twinning is a possibility.

Determining protein functions from genomic sequences is a central goal of bioinformatics. We present a method based on the assumption that proteins that function together in a pathway or structural complex are likely to evolve in a correlated fashion. During evolution, all such functionally linked proteins tend to be either preserved or eliminated in a new species. We describe this property of correlated evolution by characterizing each protein by its phylogenetic profile, a string that encodes the presence or absence of a protein in every known genome. We show that proteins having matching or similar profiles strongly tend to be functionally linked. This method of phylogenetic profiling allows us to predict the function of uncharacterized proteins.

We present a fast algorithm to search for repeating fragments within protein sequences. The technique is based on an extension of the Smith-Waterman algorithm that allows the calculation of sub-optimal alignments of a sequence against itself. We are able to estimate the statistical significance of all sub-optimal alignment scores. We also rapidly determine the length of the repeating fragment and the number of times it is found in a sequence. The technique is applied to sequences in the Swissprot database, and to 16 complete genomes. We find that eukaryotic proteins contain more internal repeats than those of prokaryotic and archael organisms. The finding that 18% of yeast sequences and 28% of the known human sequences contain detectable repeats emphasizes the importance of internal duplication in protein evolution.

A computational method is proposed for inferring protein interactions from genome sequences on the basis of the observation that some pairs of interacting proteins have homologs in another organism fused into a single protein chain. Searching sequences from many genomes revealed 6809 such putative protein-protein interactions in Escherichia coli and 45,502 in yeast. Many members of these pairs were confirmed as functionally related; computational filtering further enriches for interactions. Some proteins have links to several other proteins; these coupled links appear to represent functional interactions such as complexes or pathways. Experimentally confirmed interacting pairs are documented in a Database of Interacting Proteins.

In this study, we analyzed all known protein sequences for repeating amino acid segments. Although duplicated sequence segments occur in 14 % of all proteins, eukaryotic proteins are three times more likely to have internal repeats than prokaryotic proteins. After clustering the repetitive sequence segments into families, we find repeats from eukaryotic proteins have little similarity with prokaryotic repeats, suggesting most repeats arose after the prokaryotic and eukaryotic lineages diverged. Consequently, protein classes with the highest incidence of repetitive sequences perform functions unique to eukaryotes. The frequency distribution of the repeating units shows only weak length dependence, implicating recombination rather than duplex melting or DNA hairpin formation as the limiting mechanism underlying repeat formation. The mechanism favors additional repeats once an initial duplication has been incorporated. Finally, we show that repetitive sequences are favored that contain small and relatively water-soluble residues. We propose that error-prone repeat expansion allows repetitive proteins to evolve more quickly than non-repeat-containing proteins.

The protein sequence database was analyzed for evidence that some distinct sequence families might be distantly related in evolution by changes in frame of translation. Sequences were compared using special amino acid substitution matrices for the alternate frames of translation. The statistical significance of alignment scores were computed in the true database and shuffled versions of the database that preserve any potential codon bias. The comparison of results from these two databases provides a very sensitive method for detecting remote relationships. We find a weak but measurable relatedness within the database as a whole, supporting the notion that some proteins may have evolved from others through changes in frame of translation. We also quantify residual homology in the ordinary sense within a database of generally unrelated sequences.

The availability of over 20 fully sequenced genomes has driven the development of new methods to find protein function and interactions. Here we group proteins by correlated evolution, correlated messenger RNA expression patterns and patterns of domain fusion to determine functional relationships among the 6,217 proteins of the yeast Saccharomyces cerevisiae. Using these methods, we discover over 93,000 pairwise links between functionally related yeast proteins. Links between characterized and uncharacterized proteins allow a general function to be assigned to more than half of the 2,557 previously uncharacterized yeast proteins. Examples of functional links are given for a protein family of previously unknown function, a protein whose human homologues are implicated in colon cancer and the yeast prion Sup35.

The ycaC gene comprises a 621 base pair open reading frame in Escherichia coli. The ycaC gene product (ycaCgp) is uncharacterized and has no assigned function. The closest sequence homologs with an assigned function belong to a family of bacterial hydrolases that catalyze isochorismatase-like reactions, but these have only low sequence similarity to ycaCgp (approximately 20% amino acid identity). The ycaCgp was obtained and identified during crystallization trials of an unrelated E. coli protein with which it co-purified.

The crystal structure of the high-potential iron-sulfur protein (HiPIP) isolated from Chromatium purpuratum is reported at 2.7 A resolution. The three HiPIP molecules in the asymmetric unit of the crystals form one and one-half dimers. Two molecules are related by a noncrystallographic symmetry rotation of approximately 175 degrees with negligible translation along the dyad axis. The third molecule in the asymmetric unit also forms a dimer with a second HiPIP molecule across the crystallographic 2-fold symmetry axis. The Fe4S4 clusters in both the crystallographic and noncrystallographic dimers are separated by approximately 13.0 A. Solution studies give mixed results regarding the oligomeric state of the C. purpuratum HiPIP. A comparison with crystal structures of HiPIPs from other species shows that HiPIP tends to associate rather nonspecifically about a conserved, relatively hydrophobic surface patch to form dimers.

1997

Twinning is fairly common in protein crystals. In its merohedral from, twinning is not apparent in the diffraction pattern, but the observed intensities do not represent individual crystallographic intensities. Since partial twinning (twin fraction less than 1/2) and perfect twinning (twin fraction of 1/2) can both be identified relatively easily by examining intensity statistics, the appropriate tests should be performed routinely when working in space groups that support merohedral twinning.

To attempt to understand the physical principles underlying protein crystallization, an algorithm is described for simulating the crystal nucleation event computationally. The validity of the approach is supported by its ability to reproduce closely the wellknown preference of proteins for particular space group symmetries. The success of the algorithm supports a recent argument that protein crystallization is limited primarily by the entropic effects of geometric restrictions imposed during nucleation, rather than particular energetic factors. These simulations provide a new tool for attacking the problem of protein crystallization by allowing quantitative evaluation of new ideas such as the use of racemic protein mixtures.

The conformation of NAD bound to diphtheria toxin (DT), an ADP-ribosylating enzyme, has been compared to the conformations of NAD(P) bound to 23 distinct NAD(P)-binding oxidoreductase enzymes, whose structures are available in the Brookhaven Protein Data Bank. For the oxidoreductase enzymes, NAD(P) functions as a cofactor in electron transfer, whereas for DT, NAD is a labile substrate in which the N-glycosidic bond between the nicotinamide ring and the N-ribose is cleaved. All NAD(P) conformations were compared by (1) visual inspection of superimposed molecules, (2) RMSD of atomic positions, (3) principal component analysis, and (4) analysis of torsion angles and other conformational parameters. Whereas the majority of oxidoreductase-bound NAD(P) conformations are found to be similar, the conformation of NAD bound to DT is found to be unusual. Distinctive features of the conformation of NAD bound to DT that may be relevant to DT's function as an ADP-ribosylating enzyme include (1) an unusually short distance between the PN and N1N atoms, reflecting a highly folded conformation for the nicotinamide mononucleotide (NMN) portion of NAD, and (2) a torsion angle chi N approximately 0 degree about the scissile N-glycosidic bond, placing the nicotinamide ring outside of the preferred anti and syn orientations. In NAD bound to DT, the highly folded NMN conformation and torsion angle chi N approximately 0 degree could contribute to catalysis, possibly by orienting the C1'N atom of NAD for nucleophilic attack, or by placing strain on the N-glycosidic bond, which is cleaved by DT. The unusual overall conformation of NAD bound to DT is likely to reflect the structure of DT, which is unusual among NAD(P)-binding enzymes. In DT, the NAD binding site is formed at the junction of two antiparallel beta-sheets. In contrast, although the 24 oxidoreductase enzymes belong to at least six different structural classes, almost all of them bind NAD(P) at the C-terminal end of a parallel beta-sheet. The structural alignments and principal component analysis show that enzymes of the same structural class bind to particularly similar conformations of NAD(P), with few exceptions. The conformation of NAD bound to DT superimposes closely with that of an NAD analogue bound to Pseudomonas exotoxin A, an ADP-ribosylating toxin that is structurally homologous to DT. This suggests that all of the ADP-ribosylating enzymes that are structurally homologous to DT and ETA will bind a highly similar conformation of NAD.

Crystals of a carotenoid protein from the cyanobacterium Arthrospira maxima have been grown in space group C2 with unit-cell dimensions a = 219.6, b = 40.3, c = 75.5 A and beta = 95.5 degrees. The crystals diffract X-rays to 2.3 A resolution and display unusual optical properties in polarized light that suggest that all of the carotenoid molecules in the crystals are oriented similarly. A slight increase in the concentration of a crystallization additive in the mother liquor induces macroscopic twinning, which is also visible when the crystals are illuminated with polarized light.

In the course of refining atomic protein structures, one often encounters difficulty with molecules that are unusually flexible or otherwise disordered. We approach the problem by combining two relatively recent developments: simultaneous refinement of multiple protein conformations and highly constrained refinement. A constrained Langevin dynamics refinement is tested on two proteins: neurotrophin-3 and glutamine synthetase. The method produces closer agreement between the calculated and observed scattering amplitudes than standard, single-copy, Gaussian atomic displacement parameter refinement. This is accomplished without significantly increasing the number of fitting parameters in the model. These results suggest that loop motion in proteins within a crystal lattice can be extensive and that it is poorly modeled by isotropic Gaussian distributions for each atom.

1996

The thrombin-binding aptamer d(GGTTGGTGTGGTTGG) is one of a family of DNA oligonucleotides that were identified by in vitro selection to bind specifically and with high affinity to thrombin. Two groups independently determined the tertiary structure in solution by NMR and at about the same time, the X-ray crystal structure of the aptamer in complex with thrombin was reported. In all cases, the thrombin-binding aptamer was found to fold into a structure containing two planar guanine quartets as its core. The NMR and crystal structures, however, have fundamentally different folding patterns owing to differences in the way these central bases are connected. We discuss the distinctions between the refined crystal and solution structures and show that the NMR model is consistent with the X-ray diffraction data.

Adenylosuccinate lyase (ASL) from Bacillus subtilis has been crystallized and structural analysis by X-ray diffraction is in progress. ASL is a 200-kDa homotetramer that catalyzes two distinct steps of de novo purine biosynthesis leading to the formation of AMP and IMP; both steps involve the beta-elimination of fumarate. A single point mutation in the human ASL gene has been linked to mental retardation with autistic features. In addition, ASL plays an important role in the bioprocessing of anti-HIV therapeutics. B subtilis ASL, which shares 30% sequence identity and 70% sequence similarity with human ASL, has been crystallized and data to 3.3 A have been collected at 100 K. The space group is P6(1)22 or P6(5)22 with a = b = 129.4 A; the length of the c-axis varies between 275 and 290 A, depending on the crystal. An analysis of solvent content indicates a dimer in the asymmetric unit, although a self-rotation function and an analysis of native Pattersons failed to identify unambiguously the location of any noncrystallographic symmetry axes. Structure determination by isomorphous replacement is in progress.

Several soluble electron transfer proteins were isolated and characterized from the marine purple-sulfur bacterium Chromatium purpuratum. The C. purpuratum flavocytochrome c is similar in molecular mass (68 kDa) and isoelectric point (6.5) to flavocytochromes isolated from other phototrophs. Redox titrations of the flavocytochrome c hemes show two components with midpoint potential values of +15 and -120 mV, behavior similar to that observed with the flavocytochrome isolated from the thermophilic Chromatium tepidum. Moreover, N-terminal amino acid sequence analysis of both the flavin and the cytochrome subunit indicates substantial homology to the primary structure of the flavocytochrome c of Chromatium vinosum. In contrast, the C. purpuratum high-potential iron-sulfur protein (HiPIP) differs from those isolated from other photosynthetic bacteria in its relatively high midpoint potential (+390 mV) and the possibility that it exists as a dimer in solution. Two low molecular mass c-type cytochromes were also characterized. One appears to be a high-potential (+310 mV) c8-type cytochrome. Amino acid sequencing suggests that the second cytochrome may be a homologue of the low-potential cytochrome c-551, previously described in two species of Ectothiorhodospirillaceae.

1995

Algorithms are presented for characterizing the long-range accessibilities of protein surfaces. First, we describe an analytical method for determining the maximum contact radius for each atom in a structure. The problem is simplified greatly by geometric inversion in a sphere, a type of conformal mapping. Second, we introduce the concept of diffusion accessibility of a protein surface, which we evaluate either by random-walk simulations or by numerical solution of the equations of diffusion with the protein acting as an adsorber. These two measures of exposure are compared to each other as well as to the more common notion of solvent accessibility. These new procedures provide longer-range descriptions of surface geometry which may be useful in docking studies and other areas where surface comparison is required.

The molecular structure of cytochrome c6 from the green alga Chlamydomonas reinhardtii has been determined from two crystal forms and refined to 1.9 A resolution. The two crystal forms are likely the result of different levels of post-translational modification of the protein. This is the first report of a high-resolution structure of a chloroplast-derived class I c-type cytochrome. The overall fold is similar to that of other class I c-type cytochromes, consisting of a series of alpha-helices and turns that envelop the heme prosthetic group. There is also a short two-stranded anti-parallel beta-sheet in the vicinity of the methionine axial ligand to the heme; this region of the molecule is formed by the most highly conserved residues in c6-type cytochromes. Although class I c-type cytochromes are assumed to function as monomers, both crystal forms of cytochrome c6 exhibit oligomerization about the heme crevice that is, in part, mediated by the short anti-parallel beta-sheet. The functional significance of this oligomerization is supported by the appearance of similar interfaces in other electron transfer couples, HPLC and light-scattering data, and is furthermore consistent with kinetic data on electron transfer reactions of c6-type cytochromes.

One of the most puzzling observations in protein crystallography is that the various space-group symmetries occur with striking non-uniformity. Molecular close-packing has been invoked to explain similar observations for crystals of small organic compounds, but does not appear to be the dominant factor for proteins. Instead, we find that the observed frequencies for both two- and three-dimensional crystals can be explained by an entropic model. Under a requirement for connectivity, the favoured space groups are simply less restrictive than others in that they allow the molecules more rigid-body degrees of freedom and can therefore be realized in a greater number of ways. This result underscores the importance of the nucleation event in crystallization and leads to specific ideas for crystallizing water-soluble and membrane proteins.

1994

Plastocyanin is one of the best characterized of the photosynthetic electron transfer proteins. Since the determination of the structure of poplar plastocyanin in 1978, the structure of algal (Scenedesmus, Enteromorpha, Chlamydomonas) and plant (French bean) plastocyanins has been determined either by crystallographic or NMR methods, and the poplar structure has been refined to 1.33 A resolution. Despite the sequence divergence among plastocyanins of algae and vascular plants (e.g., 62% sequence identity between the Chlamydomonas and poplar proteins), the three-dimensional structures are remarkably conserved (e.g., 0.76 A rms deviation in the C alpha positions between the Chlamydomonas and poplar proteins). Structural features include a distorted tetrahedral copper binding site at one end of an eight-stranded antiparallel beta-barrel, a pronounced negative patch, and a flat hydrophobic surface. The copper site is optimized for its electron transfer function, and the negative and hydrophobic patches are proposed to be involved in recognition of physiological reaction partners. Chemical modification, cross-linking, and site-directed mutagenesis experiments have confirmed the importance of the negative and hydrophobic patches in binding interactions with cytochrome f and Photosystem I, and validated the model of two functionally significant electron transfer paths in plastocyanin. One putative electron transfer path is relatively short (approximately 4 A) and involves the solvent-exposed copper ligand His-87 in the hydrophobic patch, while the other is more lengthy (approximately 12-15 A) and involves the nearly conserved residue Tyr-83 in the negative patch.

The purification and characterization of the peripheral antenna and the preliminary characterization of a carotenoid-protein complex from the purple-sulfur bacterium Chromatium purpuratum are described. The peripheral antenna of C. purpuratum is unusual among purple bacteria in that it can be resolved by SDS-PAGE into six subunits, the largest number observed thus far for a spectrally pure antenna complex. N-terminal sequence analyses of these subunits suggest that they may have an additional bacteriochlorophyll-binding site located outside the transmembrane domain. The results of pigment-protein quantification are also consistent with additional pigment-binding sites in the C. purpuratum LH2. Furthermore, CD measurements and sequence analysis suggest the presence of considerable beta-type in addition to alpha-helical secondary structure. Thus, the secondary and quaternary structures of this complex differ significantly from light-harvesting complexes of other purple photosynthetic bacteria. A carotenoid-protein complex is also described; it is an apparent association of three proteins and carotenoid and is closely associated with the peripheral antenna. The purple-sulfur bacteria are evolutionarily older than the relatively better characterized purple-nonsulfur organisms. The phenotypic features described here of the C. purpuratum photosynthetic apparatus are related to those of other purple bacteria and green-sulfur bacteria and may reflect the evolutionary position of this organism.

Two complexes, the reaction center light-harvesting complex 1 (RC-LH1) and the B820 subunit of the LH1, have been isolated and characterized from the purple-sulfur photosynthetic bacterium Chromatium purpuratum. The RC-LH1 consists of the B870 antenna and a P-870 RC with an associated tetraheme cytochrome. This complex can be further fractionated to yield the B820 subunit of the LH1. The C. purpuratum B820 subunit is the first isolated from a purple-sulfur bacterium. It is also the first that retains its carotenoid absorption properties. CD spectra in the Qy region of bacteriochlorophyll a in both the RC-LH1 and the B820 subunit are bathochromically shifted as compared to other such complexes. Comparison of the sequence of the LH1 beta polypeptide to other LH1 beta s reveals the presence of additional aromatic amino acids in the vicinity of both of the conserved histidines in the C. purpuratum beta polypeptide. The CD spectra of these C. purpuratum pigment-protein complexes can be interpreted in terms of exciton interaction between bacteriochlorophylls in the B820 subunit of the LH1 and in the B870, with additional spectral characteristics arising from interactions of the pigments with their protein environment.

Neurotrophin-3 (NT-3) has been crystallized in 2 forms. Orthorhombic crystals, space group P2(1)2(1)2, diffracted to 2.8 A and have cell dimensions a = 39.1 A, b = 54.0 A, and c = 65.5 A. The second form is space group P4(3)2(1)2, with cell dimensions a = b = 67.1 A, and c = 107.9 A. The tetragonal crystals diffract to 2.8 A at room temperature and 2.5 A at -100 degrees C. The unit cell dimensions change significantly upon freezing, a = b = 66.1 A, and c = 102.8 A. Phases for the orthorhombic form were obtained by molecular replacement using nerve growth factor as the search model. A partially refined model of the NT-3 dimer (75% complete) was then oriented and positioned in the tetragonal cell.

Rotation functions between Patterson functions can be calculated and analyzed more efficiently when it is possible to consider only a unique or asymmetric region of rotation space. Previous authors have succeeded in characterizing the symmetries and asymmetric units of rotation functions between Patterson functions whose symmetries are less than cubic. Here we describe a simple and general solution that applies to rotation functions between Patterson functions of any symmetry, including cubic. The method relies on partitioning rotation space into Dirichlet domains.

The structure determination of a macromolecule from a hemihedrally twinned crystal specimen with a twinning fraction of one-half is described. Twinning was detected by analysis of crystal-packing density and intensity statistics. The structure was solved using molecular replacement, and the positioned search model was used to overcome the twinning by a novel method of 'detwinning' the observed data. Estimates of the unobservable crystallographic intensities from each of the twin domains were obtained and used to refine the model. The structure of a new algal plastocyanin from Chlamydomonas reinhardtii was determined by this method to 1.6 A resolution with a 'twinned' R factor of 15.6%. Additional data from a crystal specimen with a low twinning fraction were used to establish the accuracy of the structure solution from the perfectly twinned data, and to finalize the refinement to 1.5 A resolution and a true R factor of 16.8%. Methods for detecting twinning and obtaining a molecular-replacement solution in the presence of twinning are discussed.

A novel method for differentiating between correctly and incorrectly determined regions of protein structures based on characteristic atomic interaction is described. Different types of atoms are distributed nonrandomly with respect to each other in proteins. Errors in model building lead to more randomized distributions of the different atom types, which can be distinguished from correct distributions by statistical methods. Atoms are classified in one of three categories: carbon (C), nitrogen (N), and oxygen (O). This leads to six different combinations of pairwise noncovalently bonded interactions (CC, CN, CO, NN, NO, and OO). A quadratic error function is used to characterize the set of pairwise interactions from nine-residue sliding windows in a database of 96 reliable protein structures. Regions of candidate protein structures that are mistraced or misregistered can then be identified by analysis of the pattern of nonbonded interactions from each window.

The crystal structure of plastocyanin from the green alga Chlamydomonas reinhardtii has been determined at 1.5-A resolution with a crystallographic R factor of 16.8%. Plastocyanin is a small (98 amino acids), blue copper-binding protein that catalyzes the transfer of electrons in oxygenic photosynthesis from cytochrome f in the quinol oxidase complex to P700+ in photosystem I. Chlamydomonas reinhardtii plastocyanin is an eight-stranded, antiparallel beta-barrel with a single copper atom coordinated in quasitetrahedral geometry by two imidazole nitrogens (from His-37 and His-87), a cysteine sulfur (from Cys-84), and a methionine sulfur (from Met-92). The molecule contains a region of negative charge surrounding Tyr-83 (the putative distant site of electron transfer) and an exclusively hydrophobic region surrounding His-87; these regions are thought to be involved in the recognition of reaction partners for the purpose of directing electron transfer. Chlamydomonas reinhardtii plastocyanin is similar to the other plastocyanins of known structure, particularly the green algal plastocyanins from Enteromorpha prolifera and Scenedesmus obliquus. A potential "through-bond" path of electron transfer has been identified in the protein that involves the side chain of Tyr-83, the main-chain atoms between residues 83 and 84, the side chain of Cys-84, the copper atom, and the side chain of His-87.

The crystal structure of V510, a chimeric type 2/type 1 poliovirus, has been determined at 2.6 A resolution. Unlike the parental Mahoney strain of type 1 poliovirus, V510 is able to replicate in the mouse central nervous system, due entirely to the replacement of six amino acids in the exposed BC loop of capsid protein VP1. Significant structural differences between the two strains cluster in a major antigenic site of the virus, located at the apex of the radial projection which surrounds the viral five-fold axis. Residues implicated in the mouse-virulence of poliovirus by genetic studies are located in this area, and include the residues which are responsible for stabilizing the conformation of the BC loop in V510. Despite evidence that this area is not involved in receptor binding in cultured primate cells, the genetic and structural observations suggest that this area plays a critical role in receptor interactions in the mouse central nervous system. These results provide a structural framework for further investigation of the molecular determinants of host and tissue tropism in viruses.

A program is described that performs least-squares group refinement of oriented molecular replacement models whose positions in the unit cell are unknown. The program (INTREF) is designed to produce improved models for use in a translation function by optimizing the orientations and relative translations of the model domains. The molecular contents of the asymmetric unit are refined as a small number of rigid bodies whose origins relative to each other may be unknown. More than one molecule in the asymmetric unit can be accommodated. The refinement seeks to minimize the residual error between the observed and calculated intensities that have been modified to produce the equivalent of a radial weighting in Patterson space. Calculated intensities include contributions from all symmetry-related molecules, enabling meaningful refinement in high-symmetry space groups. Derivatives of the intensities with respect to the rigid-body parameters are evaluated numerically using fast Fourier transforms and the shifts are obtained by non-linear least-squares analysis. Results with test cases show that the program is capable of adjusting the orientations and relative translations of protein domains to give models that more closely resemble the known structures. Consequently, the resulting models produce more accurate and more interpretable results in translation functions. The importance of including all crystallographically related molecules and of downweighting the contribution of the longer-radius region of the Patterson function is demonstrated.

A simple method is described for determining the reference coordinate system of a list of atomic coordinates. The reference system is characterized by finding the optimal metric tensor on the basis of the expected bond lengths. The ability to identify the correct frame of reference is important for structures solved in non-orthogonal unit cells.

1988

The statistics of intensity data from hemihedrally twinned specimens are analyzed in terms of a new parameter and are shown to take a simple form in both the centrosymmetric and non-centrosymmetric cases. This analysis provides a sensitive method for determining the twinning fraction. The effects of intensity measurement errors on the observed statistics are discussed.

The three-dimensional structures of the cofactors and protein subunits of the reaction center (RC) from the carotenoidless mutant strain of Rhodobacter sphaeroides R-26 and the wild-type strain 2.4.1 have been determined by x-ray diffraction to resolutions of 2.8 A and 3.0 A with R values of 24% and 26%, respectively. The bacteriochlorophyll dimer (D), bacteriochlorophyll monomers (B), and bacteriopheophytin monomers (phi) form two branches, A and B, that are approximately related by a twofold symmetry axis. The cofactors are located in hydrophobic environments formed by the L and M subunits. Differences in the cofactor-protein interactions between the A and B cofactors, as well as between the corresponding cofactors of Rb, sphaeroides and Rhodopseudomonas viridis [Michel, H., Epp, O. & Deisenhofer, J. (1986) EMBO J. 3, 2445-2451], are delineated. The roles of several structural features in the preferential electron transfer along the A branch are discussed. Two bound detergent molecules of beta-octyl glucoside have been located near BA and BB. The environment of the carotenoid, C, that is present in RCs from Rb. sphaeroides 2.4.1 consists largely of aromatic residues of the M subunit. A role of BB in the triplet energy transfer from D to C and the reason for the preferential ease of removal of BB from the RC is proposed.

The three-dimensional structure of the reaction center (RC) from Rhodobacter sphaeroides has been determined by x-ray diffraction to a resolution of 2.8 A with an R value of 24%. The interactions of the protein with the primary quinone, QA, secondary quinone, QB, and the nonheme iron are described and compared to those of RCs from Rhodopseudomonas viridis. Structural differences between the QA and QB environments that contribute to the function of the quinones (the electron transfer from QA- to QB and the charge recombination of QA-, QB- with the primary donor) are delineated. The protein residues that may be involved in the protonation of QB are identified. A pathway for the doubly reduced QB to dissociate from the RC is proposed. The interactions between QB and the residues that have been changed in herbicide-resistant mutants are described. The environment of the nonheme iron is compared to the environments of metal ions in other proteins.

Photosynthetic reaction centers from purple bacteria exhibit an approximate twofold symmetry axis, which relates both the cofactors and the L and M subunits. For the reaction center from Rhodobacter sphaeroides, deviations from this twofold symmetry axis have been quantitated by superposing, by a 180 degrees rotation, the cofactors of the B branch onto the A branch and the M subunit onto the L subunit. An alignment of the sequences of the L and M subunits from four purple bacteria, one green bacterium, and the D1 and D2 subunits of a photosystem II-containing green alga is presented. The residues that are conserved in all six species are shown in relation to the structure of Rb. sphaeroides and their possible role in the function of the reaction center is discussed. A method is presented for characterizing the exposure of alpha-helices to the membrane based on the periodicity of conserved residues. This method may prove useful for modeling the three-dimensional structures of membrane proteins.

The three-dimensional structure of the cofactors of the reaction center of Rhodobacter sphaeroides R-26 has been determined by x-ray diffraction and refined at a resolution of 2.8 A with an R value of 26%. The main features of the structure are similar to the ones determined for Rhodopseudomonas viridis [Michel, H., Epp, O. & Deisenhofer, J. (1986) EMBO J. 5, 2445-2451]. The cofactors are arranged along two branches, which are approximately related to each other by a 2-fold symmetry axis. The structure is well suited to produce light-induced charge separation across the membrane. Most of the structural features predicted from physical and biochemical measurements are confirmed by the x-ray structure.

The energetics of membrane-protein interactions are analyzed with the three-dimensional model of the photosynthetic reaction center (RC) from Rhodobacter sphaeroides. The position of the RC in the membrane and the thickness of the membrane were obtained by minimizing the hydrophobic energy with the energy function of Eisenberg and McLachlan. The 2-fold symmetry axis that relates the L and M subunits is, within the accuracy of 5 degrees, parallel to the normal of the membrane. The thickness of the membrane is estimated to be 40-45 A. Residues that are exposed to the membrane are relatively poorly conserved in the sequences of homologous RC proteins. The surface area of the RC is comparable to the surface areas of water-soluble proteins of similar molecular weight. The volumes of interior atoms in the RC are also similar to those of water-soluble proteins, indicating the same compact packing for both types of proteins. The electrostatic potential of the cofactors was calculated. The results show an asymmetry in the potential between the two possible pathways of electron transfer, with the A branch being preferred electrostatically.

The three-dimensional structure of the protein subunits of the reaction center (RC) of Rhodobacter sphaeroides has been determined by x-ray diffraction at a resolution of 2.8 A with an R factor of 26%. The L and M subunits each contain five transmembrane helices and several helices that do not span the membrane. The L and M subunits are related to each other by a 2-fold rotational symmetry axis that is approximately the same as that determined for the cofactors. The H subunit has one transmembrane helix and a globular domain on the cytoplasmic side, which contains a helix that does not span the membrane and several beta-sheets. The structural homology with RCs from other purple bacteria is discussed. A structure of the complex formed between the water soluble cytochrome c2 and the RC from Rb. sphaeroides is proposed.

Crystals of the reaction center (RC) from Rhodopseudomonas sphaeroides with the space group P2(1)2(1)2(1), have been studied by x-ray diffraction. The Patterson search (molecular replacement) technique was used to analyze the data, with the structure of the reaction center from Rhodopseudomonas viridis as a model system. A preliminary electron density map of the reaction center from R. sphaeroides has been obtained. Comparison of the structure of the RC from R. sphaeroides with that from R. viridis showed the following conserved features: five membrane-spanning helices in each of the L and M subunits, a single membrane-spanning helix in the H subunit, a 2-fold symmetry axis, and similar positions and orientations of the cofactors. Unlike the RCs from R. viridis, both quinones are retained in the RCs from R. sphaeroides. The secondary quinone is located near the position related by the 2-fold symmetry axis to the primary quinone.