While genome projects and individual labs have deciphered the nucleotide sequences of genes and the linear protein sequences of their gene products, the functions of proteins and other biomolecules ultimately depend upon their shape. Because of this, the study of structural biology is an important complement to genomics. Together, those fields contribute insights into the biology of thousands of organisms and provide a foundation for yet more research on protein functions and classifications, the chemicals to which they bind, biological systems, and more.

In the illustration to the right, for example, the P53 tumor suppressor (accession 1TUP) is bound to double-stranded DNA, as viewed in the free Cn3D program. The three-dimensional structure shows the functional shape of the protein and can be used to infer the specific amino acids that are active in binding to DNA. Here, yellow spheres represent amino acids within 5 Angstroms of the DNA strands. (Click on the image for step by step instructions on how to generate that particular view using Cn3D.) A number of the mutations (allelic variants) observed in patients with Li-Fraumeni syndrome and various cancers appear to have occurred in or near those regions of the protein, based on an alignment of the 393 amino acid TP53 protein discussed in Online Mendelian Inheritance in Man (OMIM 191170) to the 3D structure's protein sequence data. Together, the sequence data, 3D structure, and phenotypic observations yield a greater understanding of the protein and its biological function than any one of them alone could. Open the structure record (accession 1TUP) to read more about it, download the Cn3D program, and interactively view the structure and its corresponding sequence data.

Throughout this help document, the structures of the P53 tumor suppressor (1TUP) and prostaglandin-endoperoxide synthase (1PTH, discussed in the sequence-structure-function section of this document) are used in search examples and illustrations to show the ways in which the Molecular Modeling database can be searched and to describe the contents and features of a structure record.

Four Levels of Protein Structure

A linear protein (referred to as the primary structure) consists of amino acids with varying chemical properties. Forces of attraction among the amino acids cause regions of the protein molecule to fold into one of two basic shapes, which are referred to as secondary structures and take the shape of alpha-helices and beta-sheets (also known as pleated-sheets). Depending on its length and composition, a single protein molecule can contain one or more secondary structures; for example, some regions of the molecule might fold into alpha-helices while another folds into a beta-sheet. The three-dimensional shape of the complete protein molecule is called its tertiary structure. Some biological molecules are composed of two or more proteins that are assembled into a complex, and the shape of the overall complex is called its quaternary structure. These levels of structure are shown in the illustration to the right.

An example of a biomolecule with a quaternary structure is the human P53 tumor suppressor (accession 1TUP). It is composed of three protein molecules, as shown in brown, blue, and pink portions of the illustration for "what are macromolecular structures?". Open the 1TUP record in MMDB and then view it in the Cn3D program to see: (a) its linear protein sequences (primary structures); (b) the secondary structures into which each protein molecule folds (alpha helices are shown as green spirals and beta sheets as yellow bands in Cn3D's default view); and (c) how the three proteins come together (tertiary and quaternary structures) to form the biolocially active molecule that binds with DNA.

Experimental Methods

Most structure data are obtained from X-ray crystallography and NMR-spectroscopy. X-ray crystallography determines the arrangement of atoms within a protein by passing X-rays through a crystallized form of the protein and analyzing the resulting X-ray diffraction pattern. This technique provides the highest resolution and usually yields only one model of a structure. Nuclear magnetic resonance (NMR) determines the structure of a protein in solution and generally yields multiple models, which allow for characterization of the biomolecule's motion in solution. An example of each type of structure is shown in the section of this document on "record types", and additional experimental methods are listed in the ExpMethod search field of the database.

As an alternative to these experimental methods, some researchers use computational modeling to predict the structure of a protein by simulating the forces that act on each atom in a molecule of known composition. However, this method produces non-experimental models and the least reliable results. For these reasons, the Molecular Modeling Database excludes computationally generated structures or other theoretical models and includes only experimentally determined structures.

How can 3D structures be used to learn more about proteins and other biomolecules?

Examine Sequence-Structure-Function Relationships:
The sequence-structure relationship of all structures in the Molecular Modeling Database can be interactively explored using the free Cn3D software program. In addition, when structures include a bound chemical or other observed interactions, the function of the biomolecule is elucidated. For example, the illustration to the right shows the 3-D structure of an ovine prostaglandin H2 synthase protein (1PTH), which reveals the inferred structural basis of aspirin activity. The homologous human protein (NP_000953, prostaglandin-endoperoxide synthase 1) does not yet have a resolved structure but can be aligned to the sheep's protein sequence in Cn3D, and the relationship between the two sequences and corresponding 3D structure can then be examined interactively. Click on the image to view step by step instructions on how to do this. The Cn3D tutorial provides additional details on how to use the program.

View 3D Structures of Conserved Core Motifs:
The Conserved Domain Database (CDD), a related resource maintained by the NCBI Structure Group, includes an NCBI-curated data set whose goal is to provide insights into how patterns of residue conservation and divergence in a protein family relate to functional properties, and to provide useful links to more detailed information that may help to understand those sequence/structure/function relationships. To achieve this, the curators combine information about conserved domains from multiple sequence alignments with what we can infer from three-dimensional structure and three-dimensional structure superposition. As a result, the NCBI-curated conserved domain records include representations of conserved structural core motifs whenever possible, and the 3D structure images in the domain's conserved feature summary box link to specially annotated views of the 3D structures that highlight the conserved feature.

Identify Putative Active Site Residues:
The free Cn3D program can be used to identify putative active site residues. To do this, use the "Show/Hide:Select by Distance" option to highlight amino acids within a specified distance (e.g., 5 Angstroms) of a molecule of interest. Examples are shown in the image to the right and in the human P53 Tumor Suppressor protein image shown in "What are macromolecular structures?". Click on either image to open a separate page with step-by-step instructions on how to generate that view. The Cn3D tutorial provides additional details on how to use the program.

The NCBI-curated data set in CDD also identifies amino acids involved in catalysis and binding whenever possible and describes their function in the conserved feature summary box of a conserved domain record. The specific amino acids involved in the conserved feature are marked with hash signs (#) in the domain model's multiple sequence alignment and highlighted in specially annotated 3D structures, when available.

Useful Features of the Molecular Modeling Database

Facilitate computation on 3D structure data

Uniform processing and validation of 3D structure data enables a variety of computational analyses within individual structure records and across the complete MMDB database, in order to identify salient features of 3D structures and relationships among them.

The results of the analyses, along with the connection of structure records to associated data throughout the Entrez system, permit the retrieval of data sets that have certain attributes, as well as the association of proteins that do not yet have resolved 3D structures with those that do. For example, in MMDB it is possible to:

A variety of computational analyses are performed during MMDB data processing in order to identify salient features of individual 3D structures, and to identify relationships among structures across the database:

Structure data are also incorporated into NCBI-curated conserved domains whenever possible in order to combine information that has been derived from multiple sequence alignments with what we can infer from three-dimensional structure and three-dimensional structure superposition, providing insights into how patterns of residue conservation and divergence in a protein family relate to functional properties. These sequence-structure associations also make it possible to view 3D structures of conserved core motifs and identify putative active site residues.

Evolutionary relationships among 3D structures

The Vector Alignment Seach Tool (VAST) computer algorithm was developed to identify similar protein 3-dimensional structures by purely geometric criteria, and to identify distant homologs that cannot be recognized by sequence comparison.

To do this, VAST identifies 3D domains (substructures) within each protein structure in the Molecular Modeling Database (MMDB), and then finds other structures that contain similarly shaped protein molecules. This output, referred to as "Original VAST," reflects comparisons between individual protein molecules, which can share a similar shape along their entire length, or only along a fraction of their length, such as a single 3D domain.

In addition, VAST+, an expanded version of the program, finds macromolecular structures that have similarly shaped biological units (also referred to as "biounits"), not just those that share similarly shaped individual protein molecules or fragments.

(The VAST Search page can also be used to compare the coordinates of a newly resolved structure in PDB format against all structures in MMDB to find its neighbors.)

Interactive views of sequence-structure relationships

All structures in MMDB can be viewed with the free Cn3D program, which was developed as a companion resource in order to visualize three-dimensional structures with an emphasis on interactive examination of sequence-structure relationships. Specifically, Cn3D simultaneously displays a 3D structure and its corresponding sequence data, and allows you to select items of interest (e.g., entire protein or nucleotide molecules, spans of sequence data, or individual amino acids or nucleotides, as desired) in either view in order to examine their location in both views. An example is shown featuring the human P53 tumor suppressor, in which amino acids within 5 Angstroms of the bound DNA are highlighted in yellow in Cn3D's structure and sequence view windows.

For each structure in MMDB, the data processing procedure identifies associated literature, molecular, and chemical data throughout the Entrez system, and then establishes connections among those data sets. These related data are accessible as Links on the MMDB search results and structure summary pages.

The content of an individual structure record reflects the data provided by the submitter, and the literature associated with a structure record provides more details about it. Note that various data submitters might use different terminology to describe the same gene or protein (for example, some might use the term "suppressor" while others use the term "inhibitor"), so it is often helpful to include synonyms, such as acronyms, full spellings, and disease names, if appropriate, when searching the database (see search tips).

When PDB structure records are imported into MMDB, the information in each structure record is reorganized and validated in a way that enables cross-referencing between the chemistry and the three-dimensional structure of macromolecules. While the PDB data model provides an elegant and concise description of a crystal structure, there is no one-to-one correspondence between a site, a structure, and an atom in the chemical sense. MMDB provides this chemical information in an explicit manner. Its data specification includes a description of a biopolymer's spatial structure, a description of how it is organized chemically, and a set of pointers linking the two.

The first step in creating MMDB is getting an accurate sequence that is consistent with the atom site coordinates in PDB. For example:

The SEQRES records in an original PDB file are generally intended to represent the molecule that was purified, crystallized, and measured. However, it might not have been possible to experimentally resolve the atomic coordinates for all of the amino acids in some structures, especially in flexible regions of proteins such as N- and C- terminals. In addition, sometimes the atomic coordinates might indicate the presence of additional residues not listed in the SEQRES records. In the latter case, MMDB derives the biopolymer sequence from the atomic coordinates and not from the original SEQRES records. The derived biopolymer sequence will then appear in the MMDB record, and in the SEQRES records of the PDB-formatted file saved from the MMDB database.

Some PDB records may have discontinous residue numbers, which exist in a free text field. MMDB assigns a consecutive series of positive integers to residues in biopolymers, using a numerical data field. This ensures correspondence between the residue numbers in the structure file and those in the corresponding protein and/or nucleotide sequence records.

The second step is to construct a complete chemical graph for the molecule, representing all bonds and chirality. An important component of this second step matches the amino acid and nucleotide groups defined by PDB against a dictionary that defines all bond and atom types.

The third and final step is to recover disorder information in the structure.

(Note: Because such changes may occur during data processing, the content of a PDB-formatted file that you save from the MMDB database might differ from the original PDB file.)

In addition to providing the spatial (x,y,z) coordinates of every atom in a 3D macromolecular structure, a structure record includes the sequence data for each component nucleotide (DNA, RNA) and/or protein molecule. As part of MMDB data processing, the sequence data for each molecule are deposited into the Entrez Nucleotide or Entrez Protein database, as appropriate. The data processing procedures for those databases, in turn, identify relationships (i.e., similarities) among the sequence data from 3D structures and the other sequences in those databases, facilitating the use of 3D structure data to learn more about proteins and other biomolecules.

The biochemically active form of a biomolecule can range from a monomer (single protein molecule) to an oligomer of 100+ protein molecules, and is referred to as "biological unit" for brevity.

The raw data present structure records resolved by x-ray crystallography or neutron diffraction of a crystal are often casually referred to as the "asymmetric unit." These data can represent either: (a) the complete biological unit, (b) a portion of the biological unit, or (c) multiple copies of the biological unit, as in the human hemoglobin examples shown below. Authors of structure records use programs such as PISA to identify the biological unit within a structure record. If multiple interpretations of the biological unit exist, the author may choose to annotate the various interpretations in their record. The MMDB data processing pipeline applies several procedures to identify a structure's biological unit(s) and displays it by default on a structure summary page. (See technical note about asymmetric unit.)

The asymmetric unit is equivalent to the biological unit in approximately 60% of structure records resolved by x-ray crystallography or neutron diffraction of crystals. In the remaining 40% of the records, the asymmetric unit represents a portion of the biological unit that can be reconstructed using crystallographic symmetry, or it represents multiple copies of the biological unit.

Additionally, some structures exceed the size limits implicit to the PDB file format and are therefore split by PDB into several files. In those cases, the biological unit might be spread across multiple PDB files. The MMDB data processing pipeline merges the split files into a single structure record. In such cases, "asymmetric unit" is the only display option for merged PDB split files from crystallographic studies, because the biological unit of the complete structure is not specified in a computer readable way in the PDB source files. The structure summary page for a merged crystallographic structure therefore simply uses the label of "asymmetric unit" above the molecular graphic, because it represents the unification of raw data from the original PDB files. The asymmetric unit can represent the structure's complete biological unit, a portion of the biological unit, or multiple copies of the biological unit. In the case of structures resolved by electron microscopy (EM) or nuclear magnetic resonance (NMR), the term "asymmetric unit" does not apply, and the term "biological unit" is shown instead on the summary page for a merged structure from either of those technologies. Please refer to the corresponding publication for a structure, if/as available, for the author's description of its biologically active form.

Asymmetric unit (raw data) → Biological unit (default display)

Example: -- As an example of the varying degrees to which a biological unit can be represented by the raw data in a structure record, compare the following records for human hemoglobin. Each one contains the spatial coordinates and sequence data for a different number of protein molecules, yet the fundamental biological unit in all three structures is a tetramer consisting of two alpha, two beta subunits, and four heme groups. By default, an MMDB structure summary page displays the biological unit:

(Although the raw data in this structure record represents only half of the tetramer, MMDB's automated data processing procedure applies the tranformations derived from crystallographic symmetry to generate the other half, as shown in the corresponding biological unit.)

Two copies of the tetramer (four alpha subunits and four beta subunits)

Tetramer with two alpha subunits, two beta subunits, and four heme groups. A corresponding schematic shows the interactions among the components:
The summary page also provides display options to view all biological units (if applicable) or the asymmetric unit, if desired.

Procedures to identify the biological unit(s) within a structure record:

The "REMARK 350" record of a PDB source file specifies the biological unit (oligomeric state) of the structure and lists the protein molecules of which it is composed. The REMARK 350 also indicates how the biological unit was determined -- by the author and/or a software program, and if the latter, which software program was used (e.g., PISA, PQS).

MMDB parses that information to identify the biological unit(s) within the structure record, compares biological units to each other if two or more are present in order to determine if they are similar or distinct, and uses the results of the parsing and comparison steps to provide a variety of display options for a structure, such as a concise view showing only the default biological unit, a comprehensive view of all biological units, or the asymmetric unit. If biological units are displayed, the MMDB summary page indicates the method by which each was determined, as extracted from the "REMARK 350" record of the PDB source file.

MMDB also identifies the non-biopolymers (e.g., chemicals, ions, heme groups, etc.) that are part of the biological unit by analyzing the interactions observed within the structure. If a non-biopolymer has five or more contacts with a biopolymer at an interatomic distance of 4 Å or less, the non-biopolymer is grouped into the relevant biological unit(s). If a non-biopolymer contacts two or more biopolymers, the interaction with the greatest number of contacts takes precedence. Chemicals that are not biologically significant to the structure, such as crystallization agents, water molecules, detergents, etc. are ignored.

(NOTE: The biological unit display option is not available for merged PDB split files from crystallographic studies, because the biological unit of the complete structure is not specified in a computer readable way in the PDB source files. The structure summary page for a merged crystallographic structure therefore simply uses the label of "asymmetric unit." In such cases, please refer to the corresponding publication, if/as available, for the author's description of the structure's biologically active form.)

apply transformationsderived fromcrystallographic symmetry

If the raw data in a structure record represents a portion of the biological unit, and if the "REMARK 350" record of the PDB source file specifies the rotational and translational transformations that should be applied to the raw data, MMDB automatically applies these transformations to reconstruct the complete biological unit.

If any protein or nucleotide molecules in the structure were generated by applying transformations from crystallographic symmetry, they are depicted in the interactions schematic and molecular components summary table of an MMDB summary page with labels that have alphanumeric combinations (for example, or ), indicating the source molecule from which they were generated and the copy number. Chemicals that interact only with such molecules were also generated by applying transformations from crystallographic symmetry.

compare biological unitswithin a record to each otherto identify distinct forms

If multiple biological units exist within a single structure record, or if multiple interpretations of the biological unit have been annotated in the record, MMDB uses an algorithm to compare them to each other and determine if they are the similar or distinct.

Biological units are considered similar if they contain the same number and type of molecular components and meet a threshhold for sequence and structural similarity. In such a case, they will be assigned the same "type" code on the MMDB summary page display of "all biological units." The thresholds currently used are 90% or more sequence similarity and an RMSD of 2 Å or less for a global superposition of the biological units. (RMSD is the root mean square superposition residual in Angstroms. This number is calculated after optimal superposition of two structures, as the square root of the mean square distances between equivalent C-alpha atoms. Note that the RMSD value scales with the extent of the structural alignments and that this size must be taken into consideration when using RMSD as a descriptor of overall structural similarity.)

Biological units are considered to be distinct if they do not meet the above threshholds. In that case, each one will be assigned a different "type" code on the MMDB summary page display of "all biological units."

For example, if the author has determined that the biological unit of the structure is a tetramer, and a software program has determined it to be a dimer, the interpretations of the biological unit are distinct from each other and each one will be assigned a different "type" code on the MMDB summary page display of all biological units, along with a corresponding annotation noting how each was determined.

note about biological unitin merged PDB split files

Some structures exceed the size limits implicit to the PDB file format and are therefore split into several PDB files. The MMDB data processing procedures merge the PDB split files into a single structure record.

The biological unit specification is contained in a free text field of the individual PDB source files. When a structure record has been reconstructed my merging two more PDB split files, that information cannot be parsed in an automated way for the complete structure. Therefore, only the asymmetric unit is displayed for merged crystallographic structures, representing the unification of raw data from the original PDB files. In the case of structures resolved by electron microscopy (EM) or nuclear magnetic resonance (NMR), the term "asymmetric unit" does not apply, and the term "biological unit" is shown instead on the summary page for a merged structure from either of those technologies. Please refer to the corresponding publications for those structures, if/as available, for the author's description of their biologically active form.

The raw data in a structure record (generated by x-ray crystallography or neutron diffraction) are often casually referred to as the "asymmetric unit." These data, which were submitted by the author and stored in the source PDB record, can represent either: (a) the complete biological unit (i.e, the biochemically active form of a biomolecule); (b) a portion of the biological unit; or (c) multiple copies of the biological unit, as shown in the illustrated example of three different human hemoglobin structure records. The display options on an MMDB summary page for an individual structure allow you to view your choice of biological unit(s) or asymmetric unit, with the biological unit shown by default.

The "asymmetric unit" is equivalent to the biological unit in approximately 60% of structure records.

The concepts of asymmetric unit and biological unit do not apply to structure records resolved by experimental methods other than x-ray crystallography and neutron diffraction.

Note: The technical definition of asymmetric unit is somewhat different from its casual meaning. Technically, an asymmetric unit is the smallest part of a 3D structure from which the complete structure can be built using a specific set of rotational and translational matrices that describe the symmetry of the structure. The PDB help document provides additional details.

Merging PDB split files into a single MMDB structure record

Some structures exceed the size limits implicit to the PDB file format and are therefore split into several PDB files. The MMDB data processing procedures merge the PDB split files into a single structure record. The merged structures now make it possible to display and/or download large macromolecular structures in their entirety, and to interactively view thesequence-structure relationships using Cn3D 4.3 (install).

Please note that "asymmetric unit" is the only display option for merged PDB split files from crystallographic studies, because the biological unit of the complete structure is not specified in a computer readable way in the PDB source files. The structure summary page for a merged crystallographic structure therefore simply uses the label of "asymmetric unit" above the molecular graphic, because it represents the unification of raw data from the original PDB files. The asymmetric unit can represent the structure's complete biological unit, a portion of the biological unit, or multiple copies of the biological unit. In the case of structures resolved by electron microscopy (EM) or nuclear magnetic resonance (NMR), the term "asymmetric unit" does not apply, and the term "biological unit" is shown instead on the summary page for a merged structure from either of those technologies. Please refer to the corresponding publication for a structure, if/as available, for the author's description of its biologically active form.

Example: The viral capsid for the Adeno-associated Virus Serotype 6 (Aav-6) by Xie et al. was split into PDB records 1VU0, 1VU1, 3TSX, and was merged at MMDB into a single record with the MMDB ID 99554. Click on the thumbnail image of the merged file, below, to open it interactively in Cn3D 4.3. If you do not yet have Cn3D 4.3 on your computer, install it before clicking the thumbnail. Alternatively, open the structure summary page for MMDB ID 99554 in the Molecular Modeling Database, then click on the spin icon near the bottom of the molecular graphic to interactively view the structure with iCn3D, a web-based 3D viewer that loads the structure within the web page without the need to install a separate application.

Click on the thumbnail image above to open the merged file in Cn3D 4.3 and interactively view the entire structure and its sequence data.

Alternatively, open the structure summary page for MMDB ID 99554 in the Molecular Modeling Database, then click on the spin icon near the bottom of the molecular graphic to interactively view the structure with iCn3D, a web-based 3D viewer that loads the structure within the web page without the need to install a separate application.

Example: The rat liver vault by Tanaka et al. was split into PDB records 2ZUO, 2ZV4, 2ZV5, and was merged at MMDB into a single record with the MMDB ID 99596. Click on the thumbnail image of the merged file, below, to open it interactively in Cn3D 4.3. If you do not yet have Cn3D 4.3 on your computer, install it before clicking the thumbnail.
Alternatively, open the structure summary page for MMDB ID 99596 in the Molecular Modeling Database, then click on the spin icon near the bottom of the molecular graphic to interactively view the structure with iCn3D, a web-based 3D viewer that loads the structure within the web page without the need to install a separate application.
(Note: The merged file represents half of the biological unit, as it was submitted by the author. The procedures to identify biological units cannot be applied in an automated way to a merged file; therefore, the asymmetric unit is diplayed instead. Please refer to the corresponding publication for a structure for the author's description of the biologically active form.
)

Click on the thumbnail image above to open the merged file in Cn3D 4.3 and interactively view the entire structure and its sequence data.

Alternatively, open the structure summary page for MMDB ID 99596 in the Molecular Modeling Database, then click on the spin icon near the bottom of the molecular graphic to interactively view the structure with iCn3D, a web-based 3D viewer that loads the structure within the web page without the need to install a separate application.

Click on the thumbnail image above to open the merged file in Cn3D 4.3 and interactively view the entire structure and its sequence data.
Alternatively, open the structure summary page for MMDB ID 99580 in the Molecular Modeling Database, then click on the spin icon near the bottom of the molecular graphic to interactively view the structure with iCn3D, a web-based 3D viewer that loads the structure within the web page without the need to install a separate application.

Note: the interactions schematic, shown above and also visible on the structure summary page for MMDB ID: 99580, indicates that there are two copies of the ribosome in the structure file, reflecting the data submitted by the author.

In summary, the merged structure files, such as the viral capsid , the rat liver vault, and the ribosome illustrated above, now make it possible to view and/or download large macromolecular structures in their entirety, and to interactively view the sequence-structure relationships using Cn3D 4.3 (install) or the iCn3D web-based 3D viewer. You can also retrieve all merged files from the Molecular Modeling Database, if desired.
Please refer to the corresponding publications for those structures, if/as available, for the author's description of their biologically active form.

A contact is defined as a distance of 4 Å or less between the heavy atoms of biopolymers (proteins, DNA, and/or RNA). Interactions are identified in a pairwise fashion. For examples, if protein molecules A, B, and C form a trimer, the interactions will be reported between each pair of proteins (e.g., A:B, B:C, and A:C).
Interactions between the heavy atoms of biopolymers and chemicals are also reported.

5 or more contacts

An interaction between two molecular components is reported on a structure's summary page if five or more contacts exist between those molecules.
For example, atoms from at least 5 amino acids or nucleotides in a biopolymer (protein, DNA, or RNA) must be closer than, or as close as, 4 Angstroms from one or more atoms in the "other molecule" in order for the interaction to be reported.

Note: Ions that interact with the biomolecules in the structure but do not reach the 5 contact threshold will be absent from the interaction schematic; however, they will be listed in the tabular summary of molecular components. Interactions for short peptides, or for molecule types other than protein, DNA/RNA, and chemical, are not calculated. Molecules, such as crystallization agents, etc., that are not part of the biologically active molecule are absent from both the interaction schematic and the molecular components list.

Secondary structures (alpha helices and beta strands, as shown in the illustration on four levels of structure) in each protein molecule are identified algorithmically using purely geometric criteria, and the residue span of each secondary structure is noted in the MMDB record. (Note that because the spans are identified algorithmically, they might differ from the secondary structure residue spans annotated in the original PDB file by the data submitter.)

3D domains are compact structural units within a protein that are identified automatically during MMDB data processing using purely geometric criteria. A protein molecule can contain one or more 3D domains, which often correspond with conserved domains (illustrated example) observed in molecular evolution. Additionally, proteins that are dissimilar in sequence might contain geometrically similar 3D domains, indicating a distant homology that cannot be recognized by sequence comparison. 3D domains are used in the identification of VAST similar structures. They are also displayed as footprints on individual protein molecules (illustrated example, additional details) in the graphical portion of structure summary pages.

Identify the gene that corresponds to each protein:

gene symbols

A gene symbol, if/as available, appears beside the name of each protein molecule in the tabular list of molecular components. The protein-gene association is determined in the following way:

(2) The NCBI Gene database generates data files on its FTP site that provide mappings between protein identifiers and gene identifiers. Specifically: (a) the "gene_refseq_uniprotkb_collab.gz" file lists the correspondence between UniProt and RefSeq protein accessions; and (b) the "gene2accession.gz" file lists the correspondence between RefSeq protein accessions and Gene IDs. The MMDB data processing pipeline creates a join between these two tables in order to map each UniProt ID to its corresponding Gene ID, and to link to the NCBI Gene record.

(Note that the protein sequence in the structure record is not necessarily identical to the protein product of the gene. For example, a structure record might only contain a fragment of the protein rather than the whole protein. So there is a mapping between the structure's protein molecule and the gene product, but not necessarily an exact sequence match.)

Identify relationships among 3D structures:

find similar 3D structures using VAST algorithm

The VAST algorithm is used to identify structures that are similar in 3D shape, regardless of their degree of sequence similarity, in order to identify distant homologs that cannot be recognized by sequence comparison. The region of similarity can span the entire length of a protein molecule, or a portion of it, as indicated by the footprints on the similar structures graphic display. If a structure contains more than one protein molecule, Similar Structures are shown for each one.

In addition, VAST+, an expanded version of the program, has been applied to each structure in MMDB in order to find macromolecular structures that have similarly shaped biological units, also referred to as "biounits".

Reciprocal links are created among the similar 3D structures and are accessible from the structure summary page by either: (a) clicking on the "Similar Structures: VAST+" link near the upper right corner of the page; or (b) viewing the Protein annotation graphic for any protein molecule of interest, then clicking on the bar graphic for the overall protein molecule or for any 3D domain it contains in order to view a list of other structures that are similar in shape to the molecule or 3D domain you selected.

Each structure record has one-to-one relationships with specific records in other Entrez databases, such as links to the protein sequence, nucleotide sequence, and chemical records that were created from the structure's molecular components.

A structure record also has links to the PubMed records for articles cited in the structure record and to the NCBI Taxonomy record(s) for the sourceorganism(s). Reciprocal links between the structure record and these molecular component and literature records are created, making it possible to start in any one of the databases and traverse to associated records in another database.

Example: The structure 3Q5S (MMDB ID 91866): "Crystal Structure of Bmrr Bound to Acetylcholine" is composed of protein and nucleotide molecules and the chemical acetylcholine. The structure therefore has links to the specific protein sequence, nucleotide sequence, small molecule records that contain data extracted from the source PDB record for each of those molecular components. In addition, the structure record contains links to the NCBI Taxonomy database record for the source organism, Bacillus subtilis, and the to PubMed record PMID: 21690368 for the published reference.

Records that are directly linked to a structure may in turn have associations with other types of data in the Entrez system. Links are therefore also created from the structure record to those additional data types. The methods by which those links are made are explained in more detail in the section on search results: find related data

The structure record will have also have links to additional protein sequences that are cited as cross-references in the "DBREF" record of the PDB source file, to the genes that code for those proteins, and to any other protein sequences that are identical in length, composition, and source organism as the proteins cited in the "DBREF" record of the PDB source file. (The documentation about PDB file format provides more information about the various "records" (data fields) that are present in PDB source files.)

As final example of an indirect link, if the protein in a structure record is the target of a bioassay, or is involved in the biological process described in the bioassay experiment, a link between the structure record and the biological activity data (PubChem BioAssay) is established, if the submitter of the bioassay data provided the link to the structure record's protein.

Example: The structure 3Q5S (MMDB ID 91866): "Crystal Structure of Bmrr Bound to Acetylcholine" includes the protein, "Multidrug-efflux Transporter 1 Regulator," which has been annotated with two conserved domain superfamilies. There are also other protein sequence records linked to the structure (beyond the protein record that was created directly from the PDB source file) because they were either: (a) cited in the "DBREF" record of the PDB source file; (b) listed in the same Entrez Gene record (bmrR, Gene ID 938676) as the protein accession that was cited in the "DBREF" record of the PDB source file; or (c) are identical in length, composition, and source organism as any of the proteins in (a) or (b). Of course, the 3Q5S structure also has a link to the gene record itself.
As a final example of an indirect link from a structure record to data in another Entrez [add links to BioAssay data for the PubChem record]

X-ray crystallography determines the arrangement of atoms within a protein by passing X-rays through a crystallized form of the protein and analyzing the resulting X-ray diffraction pattern. This technique provides the highest resolution and usually yields only one model of a structure. Nuclear magnetic resonance (NMR) determines the structure of a protein in solution and generally yields multiple models, which allow for characterization of the biomolecule's motion in solution. Additional experimental methods, such as neutron diffraction, electron microscopy, and more, are listed in the ExpMethod search field, which can be browsed by using the Show index link on the Advanced Search page.

Molecule Types

The biomolecules in MMDB can be composed of protein molecules, RNA molecules, DNA molecules, as in the examples shown here, or combinations of these components, as shown in the earlier illustration of the P53 Tumor Suppressor.

In addition, links to related data are updated on a regular basis for all structures in the database. This ensures that new data in other Entrez databases are reciprocally linked to 3D structures. For example, as new sequences are deposited into the Entrez Protein database, the CBLAST program is used to create links from those proteins to existing and/or new 3D structures that are similar in sequence (available as Related Structures from the links menu of protein sequence records; illustrated example).

This help document focuses on how to search for 3D macromolecular structures using the Entrez search system, which allows you to retrieve records that contain desired text terms. Additional search methods allow you to search the database with a query protein sequence (using CBLAST) or with the 3D coordinates for a newly resolved structure (using VAST tool); separate help documents exist for those search systems.

In the Entrez Structure search interface, you can retrieve structure records by searching for:

text terms (key words): A wide variety of text terms, such as names of proteins, bound chemicals, authors, and more can be used to search the Entrez Structure database. You can also search for other words that might be present in any of the other text containing search fields of a record.
Because terminology can vary across records, it can be helpful to include synonyms in your query, for example:

suppressor OR inhibitor

NF1 OR neurofibromin OR neurofibromatosis

PTGS1 OR "prostaglandin endoperoxide synthase 1" (see note about use of quotes)

It is also possible to search for a word stem by using an asterisk (*) as a wild card. For example, a search for inhibit* will retrieve records with terms such as inhibit, inhibited, inhibition, inhibitor, etc. The Entrez Help document provides additional information about truncating search terms in this way.

organism: To retrieve structure records for a specific organism or organism group, you can enter its common name (e.g., human) or scientific name (e.g., Homo sapiens), or other taxonomic node (e.g, Primates) in the Organism [orgn] search field. Note that some structure records contain protein or nucleotide sequences from more than one organism, and they will be retrieved if they contain one or more sequences from the organism or taxon specified in your query. If you specifically want to retrieve structure records that contain data from more than one source organism, simply enter the desired organism names with a Boolean AND (e.g., human[orgn] AND HIV1[orgn]).

database subset: It is possible to retrieve subsets of records that have certain attributes, such as structures generated by specific experimental methods or containing specific molecule types (protein, DNA, RNA) or bound chemicals.
Additionally, the Filter field allows you to limit a search to records that have links to another Entrez database of interest. For example, a search for structure_biosystems[filter] will retrieve structure records that have links to the NCBI BioSystems database; a search for structure_omim[filter] will retrieve structure records that have links to the Online Mendelian Inheritance in Man (OMIM) database; and a search for structure_biosystems[filter] AND structure_omim[filter] will retrieve the subset of records that have links to both of those databases.

and more... The Structure database can also be searched by terms that appear in any of the other search fields.

Search Methods

A variety of techniques can be used to search the database, offering varying degrees of control over your query. In some cases, they offer alternative ways of executing the same search (as is true for sample searches #4, #5, and #6 below), with each method offering different benefits. The search methods include:

Range search (range of values in numerical fields such as dates, counts, and resolution).

Method

Description

Example

Basic Search

Just enter search terms without specifying search fields, other limits, or Boolean operators.

The "Search Details" box in the right margin of the search results page shows exactly how Entrez parsed and handled your query. If desired, you can edit the query in that box and press the "Search" button to run the modified query.

The "See more..." link a the bottom of the "Search Details" box opens a more detailed display:

The Query Translation box shows the search strategy used to run the search

To edit the search in the Query Translation box, add or delete terms and then click Search.

Click URL to display the current search as a URL to bookmark for future use. Searches created using History numbers can not be saved using the URL feature.

Some of the structure records might not contain proteins or nucleotide sequences from human because we did not limit that search term to the Organism search field. In such cases, the term "human" might appear in a comment or some other field of the record.

Similarly, the term p53 tumor suppressor can appear anywhere in the record, and the words may or may not be adjacent to each other in a record, depending on how Entrez parsed the query (as shown in the Search Details for a given search). To force terms to be searched as a phrase, use quotes. To refine your search in other ways, use the Limits option or the Advanced Search methods described below.

Limits

The Limits page allows you to restrict your search in various ways.

At a minimum, the Limits page displays the list of available search fields. You can do a separate search for each term or phrase in your query, as shown in sample Search #2 and #3 to the right, and select the desired search field for each one. (If desired, you can then combine the searches by using the Search Builder or History section of the Advanced Search page.)

For some databases, the Limits page also provides other commonly used options, as check boxes and/or pull-down menus, for restricting your search results to records with specific characteristics. These check boxes and pull-down menus generally represent a commonly used subset of the choices that are available from the Advanced Search page and are placed on the Limits page for easy access.

IMPORTANT NOTE: Once you have used a particular Limit, warning sign will appear near the top of your search results page that indicates which Limit(s) are currently in effect, for example:

Note that the Limit will remain in effect for all subsequent searches in the current database unless you change or remove that limit. In the illustrated example above, any search you do will be limited to the Titles of records, until you remove the limit.

Search #2:

On the Entrez Structure search page, click on the Limits link, select the Organism search field, and enter the following query:

and press "GO". That will retrieve only records containing those terms in the title of a structure record.

If desired, you can then combine the searches on the Advanced Search page, either by using the Search Builder, as shown in sample Search #4, or by using the History section of that page, as shown in sample Search #5.

Advanced Search

The Advanced Search page allows you to exercise greater control over your search, for example, by enabling you to:

Browse the index of any search field and add term(s) of interest from the index to the active query box at the top of the page.

View your search History and combine or subtract searches from each other.

As you build a query, either by using the Search Builder's pull-down menus, or by using the "Add" links in the "History" portion of the page to combine previous searches, the grey text box at the top of the page will display your current query.

You can also manually edit the current query by clicking the "Edit" link beneath the grey text box. That will allow you to type terms/search numbers/etc. directly into the box, add parentheses for nesting if desired, change Boolean operators, etc.

In addition, the following types of advanced searches can be entered in the query box of any Entrez search page (i.e., in the query box of the database's Home page, Limits page, or Advanced Search page):

The "Search Builder" section of the Advanced Search page allows you to build your query step by step, adding a new search term and selecting a new search field at each step. It also allows you to browse the index of any search field to view the available terms.

(2) Type a term(s) in the text box beside the search field menu. Or, use the "Show index list" link to see the index of the search field and select the desired term from the index. (tips on using the "Show Index List")

(3) Select the Boolean operator (AND, NOT, OR) that should precede the term when it is added to the active query at the top of the page.

Continue the above steps, as desired, to add more term/search field combinations to your query.

As you use the Search Builder, the grey text box at the top of the page will show your current query.

You can manually edit the current query by clicking the "Edit" link beneath the grey text box. That will allow you to type terms/search numbers/etc. directly into the box, add parentheses for nesting if desired, change Boolean operators, etc.

Press the Search button to display the records retrieved by your search (i.e., it displays the search results page).

Click on the "Add to history" link if you prefer to simply add the query to your search history and remain on the Advanced Search page, where you can continue building your query.

Tips on using the "Show Index List" function on the Advanced Search page:

The "Show Index List" function allows you to browse the index of any Search Field. If you select a search field and press the "Show Index" link without entering a term in the box, you will be taken to the top of the index. If you enter a term first, you will be taken to the part of the index that contains your term (or the closest alphabetical location, if your term is not present in the index).

The number of records that contain the term will appear in parentheses. You can also browse the index to explore the variety of terms available (for example, select "All Fields", enter "Huntington", and click on the "Show Index" link to see additional spellings and/or related terms, such as Huntington disease, Huntington's, Huntington's disease).

To select a range of terms from the index, use the Shift key while selecting the first and last term. Then use the AND, OR, or NOT buttons to add that group of terms to the active query.

To select multiple terms that do not fall within a continuous range from the index, use the Control key while selecting the terms of interest. Then use the AND, OR, or NOT buttons to add that group of terms to the active query.

Note: When multiple terms are selected from the index window, they are OR'ed together within parentheses and then appended to your query with whatever Boolean operator you have selected.

Search #4:

On the Entrez Structure search page, click on Advanced Search and build your search one step at a time:

(a) Using the first pull-down menu in Search Builder, select the Organism search field and enter the following query:

human

and select "AND" as the Boolean operator. That term/search field selection will automatically be displayed in the grey text box at the top of the page, which shows your current query.

(b) Using the second pull-down menu in Search Builder, select the Title search field and enter the following query:

p53 tumor suppressor

and select "AND" as the Boolean operator. That newest term/search field selection will automatically be added to the grey text box at the top of the page.

Press the Search button if you want to display the records retrieved by your search (i.e., it displays the search results page).

Or, click on the "Add to history" link if you prefer to just add the query to your search history and remain on the Advanced Search page, where you can continue building your query.

Note that this search will produce the same results as sample searches #5 and #6. It is simply executed in a different way. That is, you remain on a single query page (Advanced search) and can browse the index of any search field as you build your query one step at a time.

History

The "History" section of the Advanced Search page displays the searches you have done in the current database.

You can combine or subtract searches from each other by entering the search numbers and the AND, OR, or NOT Boolean operators in the query box, for example: #2 AND #3. If the query contains several search numbers and Boolean operators, the Boolean operators are processed from left to right unless parentheses are used for nesting. If parentheses are used, the portions of the query in parentheses will be processed first, then the remaining Boolean operators will be processed from left to right.

Additional details about Search History:

The Search History will be lost after 8 hours of inactivity. (To save a search indefinitely, click on the search # and select "Save inMy NCBI.)

Click "Clear History" to delete all searches from History.

Entrez will move a search statement number to the top of the History if a new search is the same as a previous search.

History search numbers may not be continuous because some numbers are assigned to intermediate processes, such as displaying a citation in another format.

The maximum number of searches held in History is 100. Once the maximum number is reached, PubMed will remove the oldest search from the History to add the most current search.

A separate Search History will be kept for each database, although the search statement numbers will be assigned sequentially for all databases.

PubMed uses cookies to keep a history of your searches. For you to use this feature, your Web browser must be set to accept cookies.

Database records that you have copied to the Clipboard are represented by the search number #0, which may be used in Boolean search statements. For example, to limit the records you have collected in the Clipboard to those from human, use the following search: #0 AND human[organism]. This does not change or replace the Clipboard contents.

Search #5:

Use the search numbers shown in the "History section" of the advanced search page to combine previous searches (for example, searches #2 and #3 shown above).

To do that, you can either:

Click on the "Edit" link beneath the grey text box and type in a search statement such as:

#2 AND #3

Or, instead of typing the search statement, use the "Add" link beside any search number in the "History" section of the Advanced Search page to add that search number into the grey text box.

That will retrieve only records that contain human in the Organism field (i.e., records that contain at least one molecular component -- protein, DNA, or RNA -- from human) and p53 tumor suppressor in the Title field. Compare the retrieval from this search with that of the sample basic search above.

(Note that your search numbers might be different from those shown here, if you did earlier searches in the Entrez system before trying these examples.)

Enter a search in command language, specifying your exact combination of desired search terms, search fields, and Boolean operators, as shown in the examples to the right. The syntax is:

term[field] BOOLEAN term[field] BOOLEAN term[field]etc.

Search Field names must be placed in square brackets [], and can be written as either the full name, for example, [Database], or as the corresponding search field abbreviation, for example, [db] (additional examples).

Boolean operators (AND, OR, NOT) must be written in UPPER CASE.

Boolean operators are processed from left to right unless parentheses are used for nesting. If parentheses are used, the portions of the query in parentheses will be processed first, then the remaining Boolean operators will be processed from left to right.

Boolean operators can also be used to combine or subtract searches from each other (i.e., to find the union, difference, or intersection of the data sets retrieved by various searches). To do this, use the Search History section of the Advanced Search page and simply enter the search numbers and desired Boolean operators in the query box.

For example, to identify the records that were retrieved by Search #2 of your search history, and also by Search #3, you could enter the following query:

#2 AND #3

To identify the records that were retrieved by Search #2 but not by Search #3, you could enter the following query:

#2 NOT #3

Search #6:

Simply enter all search terms and search fields as a single statement into the query box:

This search will retrieve structure records that contain the terms prostaglandin H2 synthase OR prostaglandin endoperoxide synthase in any field, but that will not contain molecular components (proteins, DNA, RNA) from organisms in the taxonomic orders Primata or Rodentia.

Range Search

Range queries are constructed by specifying a lower and upper numerical value separated by a colon (:) to specify the range, followed by a search field name or abbreviation in square brackets, as shown in the examples to the right. You can insert a space on each side of the colon but that is not necessary; the search will work either way.

All dates and all 'counts' (such as residue counts, molecule counts, etc.) fields can be range queried. Apart from that, there are two additional fields that can be range queried: Resolution [RESO] in the Entrez Structure database, and MolWeight [MWT] in the Entrez Protein database (from which you can link to the Structure database).

Range queries on Resolutions [RESO] (in angstroms) must have the following format:

fromResolution : toResolution [RESO]

Range queries on MolecularWeights [MWT] (in daltons) must have the following format:

fromMolecularWeight : toMolecularWeight [MWT]

Note that searches by molecular weight are currently possible only in the Entrez Protein database. When you are searching that database, simply append "AND srcdb_pdb[prop]" to your query if you want to retrieve only the protein sequences that were derived from 3D structure records. For example:

_____:_____[molwt] AND srcdb_pdb[prop]

That will retrieve protein sequences that fall within the specified molecular weight range and that were derived from Protein Data Bank (PDB), the source database for 3D structure records. A specific example is provided in Search #10 to the right.

Range queries on Dates have a similar format:

FromDate : ToDate [fieldname]

Note: The FromDate and ToDate values can specify an exact date, a month, or a year, and are written in the format: YYYY/MM/DD, YYYY/MM, or YYYY. The search fields summary table includes the names and abbreviations for the various "date" fields.

Range queries on "counts" have the format:

FromCount : ToCount [fieldname]

Note: The FromCount and ToCount values are integers. The search fields summary table includes the names and abbreviations for the various "counts" fields.

That will retrieve protein sequences that have a molecular weight between 4060 and 4075 Daltons and that were derived from 3D structure records. Each protein sequence record will have a link to the corresonding structure record. Alternatively, you can select the "Find Related Data:Structure" option in the right margin of the search results page to retrieve the complete set of structure records that corresponds to the set of protein records you retrieved. (more details about protein → structure links...)

Search fields can be selected from pop-up menus on either the Limits or Advanced Search page, or can be typed directly in your query by surrounding field names with square brackets [], for example, [Organism] or [Orgn].* The Show index link on the Advanced Search page allows you to browse the index of each search field, where you can see the available terms, the number of records containing each term or phrase, as well as the syntax for entering values in search fields such as dates and EC/RN number.

will retrieve the structure records that contain the phrase "p53 tumor suppressor" in any field of the record.

(Compare these search results with those obtained by the sample Citation Abstract Field search, which will retrieve structure records containing that phrase in the abstract of an associated PubMed record, and with those obtained by the sample Title field search, which will retrieve records containing that phrase only in the title of an associated PubMed record.)

The quotes surrounding the search terms ensure they are searched as a phrase.**

Abstract

[Abstract][ABS][ABST]

The abstract (if available) of any PubMed reference linked to the structure.

will retrieve the structure records that contain the phrase "p53 tumor suppressor" in the abstract of a PubMed reference associated with the structure.

(Compare these search results with those obtained by the sample All fields search, which will retrieve records containing that phrase in any field of the structure record, and with those obtained by the sample Title field search, which will retrieve records containing that phrase only in the structure title.)

The quotes surrounding the search terms ensure they are searched as a phrase.**

Note: Some structures may have a biopolymer count of zero, and can be retrieved by a search for: 0[AsuBiopolymerCount]
These can include structure records that contain only chemicals (such as peptide-like antiobiotics), peptide nucleic acids (PNAs), or protein or nucleotide sequences composed of ≥ 50% modified amino acids or nucleotides.

In addition, the "Retrieve 3D Structures that have..." blue buttons near the bottom of the 3D Macromolecular Structures resources page and Entrez Structure search page allow you to retrieve various molecule combinations (Protein+Chemical, RNA+Chemical, etc.) with a single click.

In addition, the "Retrieve 3D Structures that have..." blue buttons near the bottom of the 3D Macromolecular Structures resources page and Entrez Structure search page allow you to retrieve various molecule combinations (Protein+Chemical, RNA+Chemical, etc.) with a single click.

The number of different types of chemicals (not the total number of bound chemicals) in the raw data for the structure (i.e., in the "asymmetric unit," or "ASU"). The bound chemicals are sometimes referred to as "ligands," hence the abbreviation [AsuLigCount].(Compare with "BioUnit Ligand Count.")

In addition, the "Retrieve 3D Structures that have..." blue buttons near the bottom of the 3D Macromolecular Structures resources page and Entrez Structure search page allow you to retrieve various molecule combinations (Protein+Chemical, RNA+Chemical, etc.) with a single click.

The number of molecules in the raw data for the structure (i.e., in the "asymmetric unit," or "ASU") that are not classified as a protein, DNA, RNA, or chemical, and therefore fall into the category of "other." (Compare ASU Other Molecule Count, described here, with "BioUnit Other Molecule Count.")

The "other" molecules are generally non-standard biopolymers. Examples include nucleotide or protein sequences that contain a large percentage of non-standard residues, long sugar chains (e.g., 1HPN), artificial constructs that contain a polypeptide backbone and nucleotide side chains (e.g., 1PUP), etc.

In addition, the "Retrieve 3D Structures that have..." blue buttons near the bottom of the 3D Macromolecular Structures resources page and Entrez Structure search page allow you to retrieve various molecule combinations (Protein+Chemical, RNA+Chemical, etc.) with a single click.

In addition, the "Retrieve 3D Structures that have..." blue buttons near the bottom of the 3D Macromolecular Structures resources page and Entrez Structure search page allow you to retrieve various molecule combinations (Protein+Chemical, RNA+Chemical, etc.) with a single click.

In addition, the "Retrieve 3D Structures that have..." blue buttons near the bottom of the 3D Macromolecular Structures resources page and Entrez Structure search page allow you to retrieve various molecule combinations (Protein+Chemical, RNA+Chemical, etc.) with a single click.

The name of any author associated with any PubMed reference linked to the structure.

The format to search this field is: last name followed by a space and up to the first two initials followed by a space and a suffix abbreviation, if applicable, all without periods or a comma after the last name (e.g., o'neil kt[auth] OR o'connell jd 3r[auth]).

Entrez automatically truncates on an author's name to account for varying initials, e.g., o'neil k [au] will retrieve o'neil ka, o'neil kt, etc, in addition to o'neil k. To turn off this automatic truncation, enclose the author's name in double quotes, e.g., a search for "o'neil k"[auth] will retrieve just o'neil k.

Initials and suffixes may be omitted when searching, if desired. In that case, all authors with the specified last name will be retrieved, regardless of their initials.

Note: Some structures may have a biopolymer count of zero, and can be retrieved by a search for: 0[BiopolymerCount]
These can include structure records that contain only chemicals (such as peptide-like antiobiotics), peptide nucleic acids (PNAs), or protein or nucleotide sequences composed of ≥ 50% modified amino acids or nucleotides.

In addition, the "Retrieve 3D Structures that have..." blue buttons near the bottom of the 3D Macromolecular Structures resources page and Entrez Structure search page allow you to retrieve various molecule combinations (Protein+Chemical, RNA+Chemical, etc.) with a single click.

In addition, the "Retrieve 3D Structures that have..." blue buttons near the bottom of the 3D Macromolecular Structures resources page and Entrez Structure search page allow you to retrieve various molecule combinations (Protein+Chemical, RNA+Chemical, etc.) with a single click.

The number of different types of bound chemicals (not the total number of bound chemicals) in the biological unit ("biounit") of the structure. The bound chemicals are sometimes referred to as "ligands," hence the abbreviation [LigCount].(Compare with "ASU Chemical Count.")

In addition, the "Retrieve 3D Structures that have..." blue buttons near the bottom of the 3D Macromolecular Structures resources page and Entrez Structure search page allow you to retrieve various molecule combinations (Protein+Chemical, RNA+Chemical, etc.) with a single click.

The number of molecules in the biological unit ("biounit") of the structure that are not classified as a protein, DNA, RNA, or chemical, and therefore fall into the category of "other." (Compare BioUnit Other Molecule Count, described here, with "ASU Other Molecule Count.")

The "other" molecules are generally non-standard biopolymers. Examples include nucleotide or protein sequences that contain a large percentage of non-standard residues, long sugar chains (e.g., 1HPN), artificial constructs that contain a polypeptide backbone and nucleotide side chains (e.g., 1PUP), etc.

In addition, the "Retrieve 3D Structures that have..." blue buttons near the bottom of the 3D Macromolecular Structures resources page and Entrez Structure search page allow you to retrieve various molecule combinations (Protein+Chemical, RNA+Chemical, etc.) with a single click.

In addition, the "Retrieve 3D Structures that have..." blue buttons near the bottom of the 3D Macromolecular Structures resources page and Entrez Structure search page allow you to retrieve various molecule combinations (Protein+Chemical, RNA+Chemical, etc.) with a single click.

In addition, the "Retrieve 3D Structures that have..." blue buttons near the bottom of the 3D Macromolecular Structures resources page and Entrez Structure search page allow you to retrieve various molecule combinations (Protein+Chemical, RNA+Chemical, etc.) with a single click.

The name of a ligand (chemical) that is present in a 3D structure record. This was derived from the "HETNAM"* record of the PDB source file and represents the name that the author of the structure used for the chemical.

The same chemical might also be known by other names, which are indexed in the Chemical Synonyms search field. Use that field if you would like more comprehensive search results.

For example, the author of the 1PTH structure, used the term "2-HYDROXYBENZOIC ACID" as the chemical name for the aspirin molecule bound to Prostaglandin H2 Synthase. A search of the "Chemical Name" field for "2-Hydroxybenzoic Acid" will therefore retrieve 1PTH (along with other structures in which the authors used the same chemical name). However, if you search the "Chemical Name" field for a term other than the one the author used in the HETNAM record of their PDB source file, you will not retrieve those structures.

For broader search results, use the "Chemical Synonyms" field instead. That will allow you to enter any one of many names by which a chemical has been known. For example, you could search for either "2-Hydroxybenzoic Acid" or "salicylate" or "2-Carboxyphenol" (or another synonym) and you will retrieve all macromolecular structures that contain salicylic acid, regardless of the chemical name that the authors used for it.

* Note: "HETNAM" is the PDB terminology for "heterogen name," which refers to any non-biopolymer that is present in a 3D structure record. The documentation about PDB file format provides more information about the various "records" (data fields), such as HETNAM, that are present in PDB source files.

will retrieve structure records in which the author used the term "2 hydroxybenzoic acid" as the name of the chemical present in the 3D structure.

Tip: To search for other names by which the chemical has been known, such as "salicylate" or "2-Carboxyphenol," use the Chemical Synonyms search field.

Chemical Synonyms

[ChemSyn][CSYN]

The various names by which a given chemical structure has been known.

For example, the terms "salicylate," "2-Hydroxybenzoic acid," "o-hydroxybenzoic acid," "2-Carboxyphenol," "o-Carboxyphenol," "2-hydroxy(1-14c)benzoic acid," etc. have been used to refer to the chemical structure of salicylic acid. You can search the "Chemical Synonym" field for any of those terms in order to retrieve all of the 3D macromolecular structures that contain the chemical that is described in the corresponding PubChem Compound record (CID 338).

The chemical names in this search field represent the filtered synonyms from PubChem Compound records that correspond to the chemicals present in the 3D macromolecular structure records.

will retrieve 3D macromolecular structure records that contain the chemical shown in the PubChem Compound record for salicylic acid (CID 338), regardless of the chemical name that was used by the submitter of the 3D macromolecular structure.

This search, for example, will retrieve 1PTH structure (among others), even though the submitter of 1PTH used the term "2-Hydroxybenzoic Acid" instead of the term "salicylate" to refer to the chemical that is bound to Prostaglandin H2 Synthase.

Example: "sedolisin" is a term in the description of the NCBI-curated conserved domain model cd04056, which has the short name "Peptidases_S53," full title "Peptidase domain in the S53 family," and PSSMID 173788.

Note: A search of this field will retrieve 3D structures that contain at least one protein molecule annotated with a specific hit to a conserved domain model whose description includes your query term.

Example: "subtilisin" is a term in the description of the conserved domain superfamily cl10459, which has the short name "Peptidases_S8_S53 Superfamily," full title "Peptidase domain in the S8 and S53 families," and PSSMID 209143.

Note: A search of this field will retrieve 3D structures that contain at least one protein molecule annotated with a conserved domain superfamily whose description includes your query term.

will retrieve 3D structures that contain at least one protein molecule annotated with a conserved domain superfamily that has the term "peptidase" in its title.

DNA Name

[DNAM][DNAME][DNAName]

The name of an DNA molecule in a structure record. The names of nucleotide molecules, including DNA and RNA, are derived from the COMPND record of the PDB source file.

(The documentation about PDB file format provides more information about the various "records" (data fields) that are present in PDB source files.)

The DNA name often reflects the sequence of nucleotides in the molecule itself.

EC/RN Number

[EC]

The Enzyme Commission (EC) number of the PDB structure, representing the classification of an enzyme based on the chemical reactions it catalyzes. The EC number is extracted from the "COMPND" record (data field) of a PDB file.

This field can be queried with the wild-card (*) feature, for example:

3.2.1.114 [EC]
3.2.1.* [EC]
3.2.*.* [EC]
3.2.* [EC]

and so on. Note the queries 3.2.*.* [EC] and 3.2.* [EC] will return identical set of PDB structures, so the two queries are equivalent.

use the wild card (*) to retrieve structure records that contain the digits specified, followed by any other digits.

You can click on the Details folder tab of a search results page to see exactly how a query was handled by the Entrez system.

Experimental Method

[EXP][EXPM]

The experimental method used to characterize the protein structure. Most structures are resolved using X-ray crystallography or nuclear magnetic resonance (NMR) but additional methods also exist (e.g., electron microscopy).

To see the full list of experimental methods available, open the Advanced Search page, select the ExpMethod search field in the Search Builder section, and press the Show index link to browse the index of available terms.

will retrieve structure records that contain the protein product of any gene that contains the term "tumor suppressor" in the gene's description.

The quotes surrounding the search terms ensure they are searched as a phrase.**

Gene Name

[GN][GENE][GNAME][GeneName]

The name of the gene that codes for a protein molecule present in the structure record.

Because a gene may be known by a variety of names, this search field includes the official symbol and the alternative ("also known as") gene symbols that are listed in the corresponding Entrez Gene record.

For example, the Entrez Gene record for the human tumor protein p53 is known by the following names: Official Symbol: TP53Also known as: P53; LFS1; TRP53

You can enter any of those terms in a search of the Gene Name field in order to retrieve structures that contain the protein product.

The association between the gene names and the protein molecules has been made using the method described under "Find related data."

will retrieve structure records that contain the protein product of the TP53 (tumor protein p53) gene.

Filter

[FILT]

The "Filter" search field allows you to narrow your retrieval to records that have certain attributes, such as record type (e.g., structures resolved using x-ray crystallography or NMR, which can also be retrieved via the ExpMethod field).

The "Filter" field also allows you to limit search results to structure records that have links to other Entrez databases of interest, as shown in the sample search to the right. A detailed explanation of each type of link is provided in the description of an Entrez search results page.

The Filter field can also be used to view current database statistics, by entering a search for All[Filt], as shown in the example in the next column.

will retrieve the structure records that have associated data (i.e., bound chemicals) in the PubChem Compound database.

You can then open the "Display" menu near the top of the Structure search results page and select "Chemicals/PubChem Compound" to retrieve the PubChem records for bound chemicals that are present in the structures you have retrieved, or only for those whose checkboxes have been activated. (Conversely, it is possible to retrieve 3D structures that are bound to a specific chemical.)

will retrieve all of the structure database records, showing the total number retrieved. (Additional database statistics are available on the news page.)

Journal

[JOUR]

The journal of the publication that reported the PDB structure findings. If more than one PubMed reference is associated with a structure record, the journal of each article has been indexed.

Journal names can be written as full names or abbreviations. To see the list of journals, open the Advanced Search page, select the "Journal" search field in the Search Builder section, and press the Show index link to browse the index of available terms.

The first date on which a particular MMDB ID appeared. This can represent the date on which a new Protein Data Bank structure record (i.e., a particular PDB accession) was first imported into MMDB, or the date on which a previously existing PDB record was significantly changed and therefore received a new MMDB ID.

The syntax for searching the field is YYYY/MM/DD, YYYY/MM, or YYYY. The colon (:) can be used to search for a range of dates, for example, YYYY/MM/DD:YYYY/MM/DD[MDAT].

Searches of this field will retrieve: (a) new structure records (PDB accessions) that were not previously in MMDB, and (b) PDB accessions that were previously in the database but that have changed in some significant way and have therefore received a new MMDB ID. For example, if the atoms in a previously available PDB data file were re-ordered during a PDB remediation (e.g., September 2007 or March 2009 remediations), the PDB accession will remain the same but it will receive a new MMDB ID and a new MMDBEntryDate.

The unique identifier (MMDB ID) of the structure record in the Molecular Modeling Database (MMDB). It is an integer assigned consecutively to each structure record processed by NCBI. For example, 50885 is the MMDB ID for sheep prostaglandin H2 synthase. (The summary page for a structure record shows both of its identifiers: MMDB ID and corresponding PDB ID. The latter is searchable in the PDB Accession field.)

If you enter an integer as a query and do not specify a search field, the MMDB ID field will be searched by default.

Note: The MMDB ID assigned to a PDB accession can change if there have been significant changes to the data in a record. For example, if the atoms in a previously available PDB data file were re-ordered during a PDB remediation (e.g., September 2007 or March 2009 remediations), the PDB accession will remain the same but it will receive a new MMDB ID and a new MMDBEntryDate. Obsolete MMDB IDs (e.g., 6543) cannot be retrieved through the Entrez Structure search interface, even with direct searches of the UID field, because they are no longer indexed. However, those obsolete MMDB IDs can be retrieved from the archival copy of the database by using the "Direct Fetch via UID" option on the MMDB Search Methods page.

will also retrieve that same structure record, because the MMDB ID field is searched by default for queries that are only a string of digits.

MMDB Modify Date

[MDAT]

The date on which the structure record was last modified. If no modifications were made since the record was deposited into MMDB, then MMDBModifyDate will be the same as the MMDBEntryDate.

The syntax for searching the field is YYYY/MM/DD, YYYY/MM, or YYYY. The colon (:) can be used to search for a range of dates, for example, YYYY/MM/DD:YYYY/MM/DD[MDAT].

Note about this field: When PDB undergoes a database remediation, in which most or all PDB records are updated in some way, MMDB imports the complete set of updated records. This was the case when the PDB database underwent a September 2007 remediation. Because the complete revised PDB data set was loaded into MMDB at that time, the earliest available value in the MMDBModifyDate field is 2007. Similarly, the release of PDB Archive Version 3.15 in March 2009 resulted in changes to a large subset of records, which is reflected in an MMDB MDAT of 2009/07 for approximately 20,000 records.

The following searches will retrieve updated structure records that were previously in MMDB but that have changed in some way, as well as new structure records that became available during the specified period of time:

As noted in the section of this document that describes the procedures used to identify the biological unit, the oligomeric state is derived from the "REMARK 350" record of the PDB source file. (The documentation about PDB file format provides more information about the various "records" (data fields) that are present in PDB source files.)

Also note that the oligomeric state of a structure might reflect its bound state. For example, the PDB source file for 1TUP: "Tumor Suppressor P53 Complexed With DNA" defines the oligomeric state as pentameric (a trimer protein complexed with a DNA double helix).

Organism

[ORGN]

The source organism(s) of the protein and/or nucleotide molecules in the structure record. A common name (e.g., human), scientific name (e.g., Homo sapiens), or other taxonomic node (e.g, Primates or Primata) can be entered as a query.

If a structure record contains protein or nucleotide sequences from more than one organism (e.g., human AND HIV1), the record can be retrieved by searching for any one of the source organisms.

will retrieve structures with at least one molecular component from any species falling in the order Primata.

Other Molecule Name

[ONAM][ONAME][OtherMoleculeName]

The name of a molecule -- other than a protein, DNA, RNA, or ligand -- that is present in a structure record. The name is derived from the COMPND record of the PDB source file and represents the term used by the author for the molecule.

(The documentation about PDB file format provides more information about the various "records" (data fields) that are present in PDB source files.)

PDB Accession

[ACCN][PACC][PDBACC]

The accession of number of the Protein Data Bank (PDB) record from which the MMDB record was derived, and is sometimes referred to as PDB ID. It is generally a four-character alphanumeric combination (e.g., 1PTH is the source record for MMDB ID 50885).

The classification of the PDB structure, as provided by the submitter in their data file.

PDB Comment

[PCOM][PCMT]

The more detailed description of the PDB structure. This field contains text from the REMARK records in the PDB data file.

PDB Deposit Date

[PDDAT]

The earliest date that Protein Data Bank associates with an accession, generally representing the date on which the record was submitted to the PDB.

The syntax for searching the field is YYYY/MM/DD, YYYY/MM, or YYYY. The colon (:) can be used to search for a range of dates, for example, YYYY/MM/DD:YYYY/MM/DD[MDAT].

(Note that the PDB Deposit Date is not necessarily the date on which the record became publicly available, and may be significantly different from the release date if submitters requested their data remain confidential until publication.)

The source organism of each protein and/or nucleotide molecule, as noted in the original PDB data file.

Note: During MMDB data processing, the source organism names in the PDB data file are compared against the organism names in the NCBI Taxonomy database. If there is a difference, the MMDB version of the data file will contain the organism name from the NCBI taxonomy database (based on the results of a BLAST search), and that name will be searchable in the Organism field. However, the source organism name noted in the original PDB file will still also be searchable via the PDBSource field.

Protein Name

[PNAM][PNAME][ProteinName]

The name of a protein molecule in a structure record, derived from the COMPND record of the PDB source file. This represents the term used by the author for the protein.

(The documentation about PDB file format provides more information about the various "records" (data fields) that are present in PDB source files.)

Resolution

[RES][RESL][RESO]

The resolution (in Angstroms) of a protein structure resolved by diffraction or electron microscopy. This field can be queried for a single value or a range of values.

(Compare these search results with those obtained by the sample All Fields search, which will retrieve structure records containing that phrase anywhere in the record, and those obtained by the sample Citation Abstract Field search, which will retrieve structure records containing that phrase in the abstract of an associated PubMed record.)

The quotes surrounding the search terms ensure they are searched as a phrase.**

* In a query, the field name may be typed as the full name or abbreviation, and may be in upper, lower, or mixed case. If more than one abbreviation is shown, any one of them can be used. The field name must be surrounded by square brackets []. A space between the search term and the field specifier is optional. If desired, surround a phrase with quotes to force an adjacency search. For example, the sample queries below will work equally:
"p53 tumor suppressor"[TI]
"p53 tumor suppressor"[TITL]
"p53 tumor suppressor" [TITL]
"p53 tumor suppressor" [titl]
"p53 tumor suppressor"[Title]

** The quotes surrounding the query terms in some of the sample searches force the terms to be searched as a phrase. If quotes are not used, the Entrez system may still recognize and handle the terms as a phrase, if they are present in a phrase dictionary used by the search engine. If the terms are not present in the phrase dictionary and are not surrounded by quotes, Entrez will insert a Boolean AND between the terms; in that case, they may or may not appear adjacent to each other in the retrieved records. The "Details" folder tab on a search results page will show you exactly how the Entrez system parsed your query. More search tips are provided in the PubMed help document and Entrez help document.

It is also possible to search for a word stem by using an asterisk (*) as a wild card; for example, inhibit* will retrieve records with terms such as inhibit, inhibited, inhibition, inhibitor, etc. The Entrez Help document provides additional information about truncating search terms in this way.

Link from other Entrez Database

The Entrez databases to which structure records have been linked (via the data processing pipeline) generally have reciprocal links from their records back to the corresponding Structure database records.

Therefore, if you start your search in an Entrez database other than Structure, you can view the "Related Information" menu in the right hand margin of any record you have retrieved to see if it has links to associated information in the Structure database, as shown in the illustrated example below.

Alternatively, you can use the "Find Related Data" menu in the right hand margin of an Entrez search results page (in whatever database you have chosen to search) and select "Structure" to view the associated structure records for all items (default) displayed on the search results page or for those you have selected using their checkboxes.

Protein sequence records can have two different types of links to 3D structure records. One, both, or neither link can be present, depending upon the data available for a particular protein sequence:

Structure - Protein sequence records that have a direct association with the structure record because at least one of the following is true: (a) the protein sequence record was derived directly from a 3D structure record (as described in MMDB data processing); (b) the accession number of the protein sequence record was listed in the DBREF record of the PDB source file; (c) the protein accession listed in the DBREF record of the PDB source file is also found in an Entrez Gene record, and that Gene record also has links to other protein accession(s); in such a case, all of the protein accessions in the Entrez Gene record will have "Structure" links (and will show a thumbnail image of a corresponding 3D structure in their protein sequence record display); or (d) the protein is identical in composition and sequence length to any of the proteins noted in (a), (b), or (c).

Related Structures - Protein sequences from experimentally resolved 3D structures that are related to the query protein, based on sequence similarity. Those are referred to as "related structures" and were identified by the Related Structures (CBLAST) service. The related structures might align to the full length of the query sequence, or only to a portion of it, and the CBLAST results page provides a graphical display that summarizes the extent of each match.

Related Structures (List) - Opens an Molecular Modeling Database (MMDB) display that lists the experimentally resolved 3D structure records that contain one or more protein molecules similar in sequence to the current protein(s). Each 3D structure and its corresponding sequence data can be viewed interactively in the free Cn3D tool.

Related Structures (Summary) - Opens a Related Structures (CBLAST) graphical summary that: (a) lists the individual proteins from experimentally resolved 3D structures that are related to the query protein, based on sequence similarity, and (b) shows alignment footprints (as pink bars) that indicate regions of similarity between the query protein and the structure-based protein, with an option to view the 3D structure and corresponding sequence alignment interactively in the free Cn3D tool.

See frames B and C of an illustrated example to see the "Related Structures" links that appear in the right hand margin of protein sequence record displays.

As of July 2012, approximately 0.7% of the 53+ million sequence records in Entrez Protein have a "Structure" link, because they were derived from 3D structure records or have another type of direct association with a 3D structure. However, approximately 44% of the total protein sequence records have a "Related Structures" link.

If the "Related Information" menu for an individual protein sequence record does not contain an option for "Related Structures", then no structure-based protein sequences were similar enough to your protein of interest to pass the CBLAST score cutoff. However, other records in your protein search results might have a "Related Structures" link. Alternatively, you can BLAST the protein sequence against the PDB (structure) database and adjust the algorithm parameters to decrease the stringency of the search, if desired.

The initial search results provide a list (document summary, or "docsum") of the structure records that contain your search term, which can appear in any field of the record, unless a search field was specified in the query.
If desired, you can narrow your search by restricting the query to a search field of interest or adding more terms with a Boolean AND.
Alternatively, you can broaden your search by adding more terms (e.g., synonyms) to your query with a Boolean OR.

Once you are satisfied with your search results, click on the thumbnail image, PDB Accession, or MMDB ID of any record on the DocSum page to view its structure summary page. In addition, the following options are available for viewing the search results:

The "Display settings" menu on acts upon all of the structure records (default) in your search results, or on the subset you have selected with checkboxes. You can select items from multiple pages of the search results, if desired.

Format

Summary -- a summary of all of the structure records (default) retrieved by your search, or for those you have selected with checkboxes, in HTML format.The information shown for each record may include the following, as available:

A subset of links to additional information about the structure, including a "View in Cn3D" link that opens an interactive view of the 3D structure in NCBI's free Cn3D structure viewing program and links to related data in other Entrez databases. (Note: The "Find Related Data" menu in the right margin of the search results page provides a complete list of links. That menu retrieves related data for all records (default) retrieved by your search, or for the subset of records you have selected with checkboxes.)

Summary (text) -- a summary of the records retrieved by your search, in plain text format. By default, all records from your search result are listed. If you are interested only in specific records, select their checkboxes, select the desired display settings, and press "Apply" to view only those records. The information shown for each record is the same as in the "Summary" format described above, but does not include the subset of links to additional information.

UI List -- a list of the unique identifiers (UI's) for all of the structure records (default) retrieved by your search, or for those you have selected with checkboxes.

Search results are displayed in order of decreasing relevance with respect to the query. Many search fields have a score or rank associated with them; for example, the Title and Organism fields have a high rank, while the PdbComment field has a lower rank. The presence of a search term in any one or more of the fields is scored accordingly by the search system, and the total score given to a hit is used in determining its relevance to the query and therefore its placement on the search results page.

Copies all the hits retrieved by your search (default), or those you have selected with check boxes, into a Clipboard, which temporarily stores up to 500 items (they will be lost after 8 hours of inactivity).

Click on the "Clipboard: XX items" link in the upper right corner of the page to view the items in any format for up to 8 hours after your last activity in the database.

The Clipboard will not add an item that is currently in the Clipboard; it will not create duplicate entries. You can remove items from the Clipboard, if desired.

Entrez uses cookies to add your selections to the Clipboard. For you to use this feature, your Web browser must be set to accept cookies.

Items in the Clipboard are represented by the search number #0, which may be used in Boolean search statements. For example, to limit the items you have collected in the Clipboard to those from human, use the following search: #0 AND human[organism]. This does not affect or replace the Clipboard contents.

The Clipboard's "Send to" menu offers you the same "File" and "Collections" options as offered on the original search results page. The latter option saves all items (default), or the subset of items selected with check boxes, indefinitely in the My NCBI Collections section of your My NCBI account.

Saves all the hits retrieved by your search (default), or those you have selected by using their checkboxes, into the My NCBI Collections section of your My NCBI account.

Filter your results

The "Filter your results" area in the upper right corner of a search results page allows you to see all the records (default) retrieved by your search, or subsets of your search results that reflect commonly requested categories of records, and shows the corresponding number of records in each case.

The "NMR" and "X-ray" folder tabs show the number of structures in your search results that were resolved by those experimental methods and enable you to view those subsets of your search results, if desired. The Refine your results box enables you to view other subsets from your search results.

Refine your results

The "Refine Your Results" box that appears in the upper right corner of a search results page displays some aggregate information that characterizes your search results and allows you to view the corresponding subsets of the retrieved structure records, for example:

Families - 3D structures containing at least one protein molecule annotated with a specific hit to a conserved domain, suggesting a high confidence level for the inferred function of the protein. Subsets under this header list the top five conserved domains found as specific hits in the structures retrieved by your search. The number in parentheses represents the subset of structures from your search results that contain one or more protein molecules annotated with a specific hit to the listed domain; clicking on the number will retrieve that subset of structure records. The "All XX Families" link will open a list of all the conserved domain models (in the Conserved Domain Database) that had at least one specific hit to a protein component of any structure found by your search.

Superfamilies - 3D structures containing at least one protein molecule annotated with a any type of hit to a conserved domain, inferring that protein's function and therefore its membership in a superfamily. Subsets under this header show the top five conserved domain superfamilies found in the structures retrieved by your search. The number in parentheses represents the subset of structures from your search results that contain one or more protein molecules annotated with the listed superfamily; clicking on the number will retrieve that subset of structure records. The "All XX CDD Superfamilies" link will open a list of all the superfamilies (in the Conserved Domain Database) that were annotated on proteins components of the structures found by your search.

Subsets under this header show the five organisms most frequently found in the structures retrieved by your search. The number in parentheses represents the subset of structures from your search results that contain one or more protein molecules from the listed organism; clicking on the number will retrieve that subset of structure records. The "All XX Organisms" link shows the total number of different organisms found in the search results and opens that list of organisms in the NCBI Taxonomy database.

The "Related information" box that appears in the right margin of the display for an individual record allows you to retrieve related data for that particular structure. For example, if you select "Conserved Domains" when you are viewing the record for accession 1PTH, you will retrieve the domain models from the Conserved Domain Database that have been annotated on the protein molecules in that structure. Many of the links are also available on a structure's summary page.

A "Find Related Data" box (instead of an "Related information" box) will appear in the right margin of an Entrez Structure search results page if you retrieved two or more records. The "Find Related Data" box allows you to retrieve related data for all the records retrieved by your search (default), or for the records you have selected with checkboxes.

VAST identifies similar protein 3-dimensional structures by purely geometric criteria, in order to identify distant homologs that cannot be recognized by sequence comparison. The region of similarity can span the entire length of a protein molecule, or a portion of it. If a structure contains more than one protein molecule, similar structures are shown for each one.

VAST+, an expanded version of the program, has also been applied to each structure in MMDB in order to find macromolecular structures that have similarly shaped biological units, also referred to as "biounits".

By default, VAST+ search results are shown for a structure, if/as available. If you prefer to see the original style VAST results, which focus on similarities between individual protein molecules or 3D domains (compact substructures)) within the query structure and hits, follow the link for "original VAST" near the upper right corner of the VAST+ search results page.

The "Similar Structures" link appears for individual records, as available. Some structure records do not have any VAST neighbors (read more).

The "Find related data:Structure" menu in the right margin of an MMDB search results page also enables you to retrieve similar structures. By default, that menu acts upon all of the items in your search results (i.e., it will retrieve the full set of structures that are are 3D-similar to any/all of the structures listed on your search results page). If you want to retrieve similar structures for only one record, or a subset of records from your search results, then activate the check boxes of the structures of interest before selecting "Find related data: Structure."

Nucleotide sequences that comprise the 3D structure (see molecular components). As noted in the data processing section of this document, DNA or RNA sequence data present in 3D structure records are deposited into the Entrez Nucleotide database, and the "Nucleotide" link will retrieve those sequence records.

Protein sequence records that are directly associated with the structure record because at least one of the following is true:

(a) the protein sequence record was created from a 3D structure record (as described in the section on MMDB data processing)or
(b) the accession number of the protein sequence record was listed in the "DBREF" record of the PDB source file. (The documentation about PDB file format provides more information about the various "records" (data fields) that are present in PDB source files.)or
(c) the protein accession found in the "DBREF" record of the PDB source file is also lists in an Entrez Gene record, and that Gene record also lists other protein accession(s); in such a case, the structure record will link to all of the protein accessions listed in the Entrez Gene record. (Those proteins will have reciprocal links back to the struture record, and will show a thumbnail image of a corresponding 3D structure in their protein sequence record display)or
(d) the protein is identical in composition, sequence length, and source organism to any of the proteins noted in (b) or (c).

The association between the structure record and a gene record is made in the following way:

When a 3D structure includes one or more protein molecules, it also includes sequence data for each protein molecule.

In addition to that sequence data, the structure record may also include a cross-reference other protein sequences (often Swiss-Prot) by listing their accession number in the "DBREF" record of the PDB source file. (The documentation about PDB file format provides more information about the various "records" (data fields) that are present in PDB source files.)

If the protein accession from the "DBREF" record is also listed in an Entrez Gene record, a link is created between the structure record and the Gene record.

If the protein in the structure record is the target of a bioassay, or involved in the biological process described in the bioassay experiment, a link between the structure record and the PubChem BioAssay record is established (if the submitter of the bioassay data provided the link to the structure record's protein).

BioSystem

Biosystems that include protein sequences identical to any one of the protein molecules in a structure.

Records from the Online Mendelian Inheritance in Man (OMIM) database that cite one or more of the PubMed records associated with a structure. For example, if an OMIM record cites a particular PubMed ID in its reference list, and that article is one of the cited references in a structure record, a link will be established from the structure to the OMIM record.

This link retrieves the NCBI Taxonomy database record for the source organism(s) of the protein and/or nucleotide molecules in the structure record. If a structure record contains protein or nucleotide sequences from more than one organism (e.g., human and HIV1), each source organism is listed and links to the corresponding taxonomic information in the NCBI Taxonomy database, including the organism's Taxonomy ID (TaxID) and lineage.

The "Search" box and button in the upper right hand corner of a structure summary page allow you to retrieve a 3D structure record directly from the backend database by entering its unique identifier (UID), in the form of a PDB accession or an MMDB ID. If you would like to search for structures using other methods, such as text term search, protein sequence query, or the 3D coordinates of a resolved structure, you can access those options from the MMDB search methods page.

Structure Record Identifiers

PDB ID: The accession of number of the Protein Data Bank (PDB) record from which the MMDB record was derived. It is generally an alphanumeric combination (e.g., 1PTH, which served as the source record for MMDB ID 50885). The PDB ID on the structure summary page links to the source record on the PDB web site. If two or more PDB IDs are listed on a structure summary page, that indicates the MMDB record has been merged from PDB split files. By merging the files, MMDB enables you to view and/or save the complete structure, as shown in the illustrated example of the ribosome.

The "Download" button beside the PDB ID downloads the original PDB source file from which the MMDB record was derived. That file contains data for the asymmetric unit of the structure.

Note that the "Download Structure Data" section of a structure summary page also provides a "PDB" option in the "Format" pulldown menu. That option does not save a copy of the PDB source file, but instead saves copy of the structure record, in PDB file format, after it has undergone MMDB data processing.

MMDB ID: The unique identifier of the structure record in the Molecular Modeling Database (MMDB). It is a string of digits (e.g., 50885 for sheep prostaglandin H2 synthase) that are assigned consecutively to each structure record processed by NCBI. (This is also referred to as the structure's unique identifier, or UID.)

Note: The MMDB data processing pipeline will assign a new MMDB ID number to a structure if the 3D coordinates and/or sequence data in the corresponding PDB source file have changed as a result of updates to the structure record. (The PDB ID will remain the same, however.)

For example, if the atoms in a previously available PDB data file were re-ordered during a PDB remediation (e.g., September 2007 or March 2009 remediations), the PDB accession will remain the same but it will receive a new MMDB ID and a new MMDBEntryDate.

Descriptive Information

Title: The title of the structure record, derived from the TITLE field of the PDB source file. It may or may not be the same as the title of the citation.

Citation: The primary journal article that describes the structure. The article title opens the corresponding PubMed record. If additional references about the structure are available, an "All References" link will be present and will retrieve the primary as well as additional references from PubMed. Reference information will be absent from summary pages of structures that do not have any corresponding publications.

PDB Deposition Date: The date on which the record was deposited into the Protein Data Bank. It is extracted from the HEADER record of the PDB source file and is searchable in the PDBDepositDate field of MMDB. Note that this is not necessarily the date on which the record became publicly available, and may be significantly different from the release date if submitters requested their data remain confidential until publication.

Updated in MMDB: The date on which the record was last modified. This may reflect the date on which a new version of the PDB source record was imported into MMDB, or the date on which changes were made to MMDB's version of the record as a result of enhancements to NCBI data processing procedures, and is searchable in the MMDBModifyDate field of MMDB.

Note: You can use the MMDBModifyDate search field of MMDB to retrieve records that were modified on a given date or between a range of dates.
If no modifications were made since the record was deposited into MMDB, then MMDBModifyDate will be the same as the MMDBEntryDate.
If PDB undergoes a database remediation, in which most or all PDB records are updated in some way, many or all records will share the same update date. (more...)
If the 3D coordinates and/or sequence data in a PDB source file change as a result of updates to the structure record, the MMDB data processing pipeline will assign a new MMDB ID number to that record, although the PDB ID remains the same.

Source Organism: The source organism(s) of the protein and/or nucleotide molecules in the structure record. If a structure record contains protein or nucleotide sequences from more than one organism (e.g., human AND HIV1), each source organism is listed and links to the corresponding taxonomic information in the NCBI Taxonomy database, including the organism's Taxonomy ID (TaxID) and lineage.

Resolution: The resolution of the structure in Angstroms (Å), extracted from the REMARK 2 record of the PDB source file. The PDB website provides additional information about resolution.

Experimental Method: The experimental method that was used to resolve the structure, extracted from the "EXPDTA" record of the PDB source file.

Similar Structures: VAST+: The "Similar Structures: VAST+" link near the upper right hand corner of a structure summary page allows you to retrieve the structures that are similar in 3D shape to the one currently being viewed.

The similar structures were found by the Vector Alignment Search Tool (VAST), which identifies structures that are similar in 3D shape, using purely geometric criteria, regardless of their degree of sequence similarity. In this way, VAST can identify distant homologs that cannot be recognized by sequence comparison.

Alternatively, Original VAST similar structures can be retrieved from the structure summary page by scrolling down to the table of molecules and interactions, viewing the the "show annotation" graphic for a protein of interest, and then clicking on the bar graphic for the overall protein molecule or for any 3D domain it contains in order to view a list of structures that are similar in shape to the molecule or 3D domain you selected.

(Note: if you have a new structure that is not yet publicly available in MMDB, you can use the VAST Search page to input the coordinates of that newly resolved structure in PDB file format, and compare it against all structures in MMDB to find its neighbors.)

Default Biological Unit: This option is selected by default and displays the first author-determined biological unit that is listed in the PDB source file. If a PDB source file lists only software-determined biounits, then the first one listed is displayed as the default biounit. Additional information about the identification of biological units is provided in the data processing section of this document.

Asymmetric Unit: This option displays the data that were provided by the submitter of the record. These data are often casually referred to as the asymmetric unit and can represent either: (a) the complete biological unit; (b) a portion of the biological unit; or (c) multiple copies of the biological unit, as shown in the illustrated example of three different human hemoglobin structure records.
(Note: "Asymmetric unit" is the only display option for merged PDB split files from crystallographic studies.)

When you use the options to "Download Structure Data," they will act upon the biological unit(s) or asymmetric unit currently displayed in the browser window. For example, if you are viewing the default biological unit and choose to display the 3D structure, only that biological unit will be shown in the 3D structure viewer, regardless of how many copies of the biological unit exist in the raw data that were deposited by the submitter. To see the raw data, change the display to "asymmetric unit" before selecting the desired "View or Save 3D Structure" options.

Note: The asymmetric unit is equivalent to the biological unit in approximately 60% of structure records resolved by x-ray crystallography or neutron diffraction of crystals. In such cases, all three of the above displays will be the same (i.e., default biological unit = all biological units = asymmetric unit). In the remaining 40% of the records, the asymmetric unit represents a portion of the biological unit that can be reconstructed using crystallographic symmetry, or it represents multiple copies of the biological unit. In those cases, the biological unit displays will be different from the asymmetric unit display.

If you are viewing a structure resolved by an experimental method other than x-ray crystallography or neutron diffraction of a crystal, the above display options will not be present, as the concepts of asymmetric unit and biological unit do not apply to structures resolved by other methods.

Finally, the "biological unit" display option is not available for merged PDB split files from crystallographic studies, because the biological unit of the complete structure is not specified in a computer readable way in the PDB source files. The structure summary page for a merged crystallographic structure therefore simply uses the label of "asymmetric unit" above the molecular graphic, as it represents the unification of raw data from the original PDB files. The asymmetric unit can represent the structure's complete biological unit, a portion of the biological unit, or multiple copies of the biological unit. In the case of structures resolved by electron microscopy (EM) or nuclear magnetic resonance (NMR), the term "asymmetric unit" does not apply, and the term "biological unit" is shown instead on the summary page for a merged structure from either of those technologies. Please refer to the corresponding publication for a structure, if/as available, for the author's description of its biologically active form. In such cases, please refer to the corresponding publication, if/as available, for the author's description of the structure's biologically active form.

provides a type classification based on a comparison of the biological units identified in the structure record, if the record contains multiple biological units. If two or more biological units meet a threshold for sequence and structural similarity, they will receive the same type code; if they do not meet that threshold, they are considered distinct from each other and received different type codes.

indicates the oligomeric state (dimer, trimer, tetramer, etc.) and the method by which it was determined

STATIC IMAGE: Upon first opening a structure summary page, the molecular graphic shows a static image of the 3D structure.
The static image generally shows the default biological unit of the structure.

Click the spin icon in the lower left corner of the static image to load an interactive view that uses iCn3D ("I see in 3D"), NCBI's WebGL-based 3D structure viewer.

The interactive display will load only if your web browser supports WebGL. If it doesn't, the static image will be shown instead. To see the interactive view, modify the settings in your web browser to enable WebGL, or, if needed, update your web browser to a newer version that supports WebGL. (See the WebGL site for more information about compatibility with various web browsers.)

Click an icon in the corresponding interactions schematic to highlight molecule in both the schematic and the molecular graphic.

Right click on the structure to open a menu that allows you to control various aspects of the display (background color, display solvent accessible surface) and/or to "export image."

If you select "Export Image" from the menu, the molecular graphic will open in a separate window. From there, you can:

Right click on the exported image to use the browser's "save image as" function.

Reload the MMDB summary page to reveal the spin icon again, then repeat the process as many times as desired in order to save snapshots of the structure at the desired angles.

Each time you select "Export Image," a separate window will open, making it possible to view the structure from many angles simultaneously.

Reload the MMDB summary page to refresh the page and to reveal the spin icon again. Then repeat the steps above as many times as desired in order to save snapshots of the structure at the desired angles.

Each time you select "Export Image," a new, separate window will open, making it possible to view the structure from many angles simultaneously.

Click the icon to open an larger view of the structure, which will occupy 95% of the browser window's height or width, whichever is larger.

LAUNCH FULL iCn3D: Click the icon to launch the advanced (full feature) version of iCn3D in another window.

The full feature version provides many additional controls for rendering, labeling, coloring, and saving the structure, as well as viewing corresponding sequence data.

Note that iCn3D will launch only if your web browser supports WebGL. If it doesn't, modify the settings in your web browser to enable WebGL, or, if needed, update your web browser to a newer version that supports WebGL. (See the WebGL site for more information about compatibility with various web browsers.)

Additional structure viewing options: The "Download Structure Data" dialog box (that appears to the right of the molecular graphic on the structure summary page) provides options for downloading the structure data in a variety of file formats. The ASN.1 format, for example, can be interactively viewed in Cn3D, NCBI's free 3D structure viewing application, which provides a wide range of options for rendering, labeling, coloring, annotating, viewing sequence data, and more. Installation takes only a couple of minutes and a tutorial describes the program's features and functions.

The interactions schematic shows the molecular components of the biological unit and the interactions among them. (Note: Structures that only have alpha carbons, and no side chains, do not show interactions. In those cases, the schematic just shows the structure's molecular components (proteins, nucleotide sequences, and ligands) as free floating (disconnected) icons.)

Non-standard biopolymers, if present, are shown as parallelograms:
(These are molecules such as nucleotide or protein sequences that contain a large percentage of non-standard residues.)

etc.

If any protein or nucleotide molecules in the structure were generated by applying transformations from crystallographic symmetry, their labels are shown as alphanumeric combinations (for example, or ), indicating the source molecule from which they were generated (to the left of the underscore bar) and the copy number (to the right of the underscore bar). Chemicals that interact only with such molecules were also generated by applying transformations from crystallographic symmetry; their icon labels also include an underscore bar, with a number on either side of the underscore bar to indicate the source chemical and the copy number, respectively.

The protein and nucleotide icons are scaled to show the relative sizes of those molecular components, so they are roughly comparable to each other based on molecular weight. All chemical icons are the same size.

There is no meaning to the length of the lines in the interaction schematic. After the interactions are drawn, the diagram is flattened out to fit into the square, lengthening or shortening lines as needed.

Because of the latter thresholds, ions that are part of the biological unit may be missing from the interaction diagram, but they will be listed in the table of molecular components and interactions. Interactions for short peptides, or for molecule types other than protein, DNA/RNA, and chemical, are not calculated. Molecules, such as crystallization agents, etc., that are not part of the biologically active molecule are absent from both the interaction schematic and the molecular components list.

The actions taken by the interactions schematic when you click on a node depend on whether the static or interactive version of the molecular graphic is displayed on the page..

If the static molecular graphic is displayed:

Mouse over any node in the interactions schematic to view the molecule name.

Click on a node in the interaction schematic to jump down to the corresponding part of the Molecules and Interactions table, which provides additional information about the molecule.

If the interactive molecular graphic is displayed:

Each node in the interaction schematic works as a toggle switch to highlight a molecule on/off.

Click on a node in the schematic to highlight just that molecule in the 3D structure (and render all the other molecules in grey).

Click on that molecule again in the interaction schematic to un-highlight the molecule and revert to the previous view of the 3D structure.

To highlight more than one molecule, press the Control key (on a PC) or the Command key (on a Mac) while you click on the molecules of interest

To highlight all molecules of the same type, click on the term "protein," "nucleotide," or "chemical" that appears at the bottom of the interactions schematic. (Click on the term again to toggle the highlight off, if desired.)

Displays the detailed model, showing the coordinates of each atom in the structure. This option, which is the default, transmits a large amount of structure data and it may therefore take some time to load the structures.

All 3D Structures

This option is available only when the Cn3D file format is selected. It displays all members of NMR ensembles or correlated disorder sets from crystallography. You can also see movie-like animations of multiple models with Cn3D.

Alpha Carbons

Displays only alpha-carbon (protein) or phosphate (DNA) coordinates for simple representation of protein or nucleic acid backbones, respectively. This option transmits only a subset of the data points from a structure record and therefore loads relatively quickly. This option is selected by default for structures with >25,000 atoms. If you are viewing the structure summary page for an NMR ensemble or a correlated disorder set from crystallography, this option will download backbone data only for the first model in the set.

ASN.1 Format: To save the structure's data file in ASN.1 format, an International Standards Organization (ISO) data format that is viewable in the free Cn3D program, select the following combination of options (on the web interface, or through the Web API):

Details about the data that are saved:
(1) For X-ray crystallography or neutron diffraction of crystal structures: (a) If you have chosen to display the "first biological unit" or "all biological units" on the structure summary page, the "Download" operation will save the data for the specific biological unit displayed in the molecular graphic. The saved file will include sequence and spatial coordinate data that were present in the original PDB source file as well as data that were generated at NCBI by applying transformations from crystallographic symmetry, if applicable to that biological unit. (b) If you have selected the "asymmetric unit" display option, the "Download" operation will save the data that were present in the PDB source file, whether those data represented all, part, or multiple copies of a biological unit. The saved file will not include any data generated at NCBI by applying transformations from crystallographic symmetry.
(2) For structures resolved by experimental methods other than X-ray crystallography or neutron diffraction of crystal structures, the "Download" operation will save the data that were provided by the author in the PDB source file. The concepts of asymmetric unit, biological units, and crystallographic symmetry do not apply to these structures.
Note for both (1) and (2) above: The saved file may also include some modifications (relative to the original PDB source file) that occurred as a standard part of MMDB data processing. Some examples are provided below in the notes about PDB format.

PDB Format: To save the structure's data file in PDB format, which is viewable in Rasmol or other programs that accept PDB format, select the following combination of options (on the web interface, or through the Web API):

Details about the data that are saved: The PDB-formatted file that is downloaded when you select "Format: PDB" (in the "Download Structure Data" section of a structure summary page) has undergone content validation that is a standard part of data processing. Its content may therefore be somewhat different from that of the original PDB record. For example, some PDB records may have discontinous residue numbers, which exist in a free text field. MMDB assigns a consecutive series of positive integers to residues in biopolymers, using a numerical data field. In addition, MMDB resolves some discrepancies that might exist between the SEQRES records and the atomic coordinates. For example, if the structure's atomic coordinates reveal the presence of amino acids or nucleotides that are not listed in the SEQRES records of an original PDB file, MMDB will derive the biopolymer sequence from the atomic coordinates and not from the original SEQRES records. The derived biopolymer sequence will then appear in the MMDB record, and in the SEQRES records of the PDB-formatted file saved from the MMDB database. As a third example, the spans of secondary structures annotated on proteins might vary between PDB and MMDB records, as NCBI algorithmically identifies alpha helices and beta strands using purely geometric criteria and annotates the proteins using that information rather than the spans indicated in the original PDB file. Therefore, the content of a PDB-formatted record you save from an MMDB structure summary page may be different from the content of the original PDB file.

Click the spin icon in the lower left corner of the image to load an interactive view of the structure. (The interactive view uses a basic version of iCn3D, NCBI's web-based 3D structure viewer, and requires a web browser that supports WebGL.)

Once the structure spins to the desired position, click on the structure to stop the spin

Right click to open a menu that allows you to control various aspects of the display and/or to "export image."

Select "Export Image" to open the view in a separate window.

Right click on the exported image to use the browser's "save image as" function.

Reload the page to reveal the spin icon again, then repeat the process as many times as desired in order to save snapshots of the structure at the desired angles.

Each time you select "Export Image," a separate window will open, making it possible to view the structure from many angles simultaneously.

To customize rendering style of the structure, highlight selected regions of the structure and/or corresponding sequence data, add labels, etc., and then save the state of the structure so you can reload it in the advanced (full feature) version of iCn3D in the future:

Click the spin icon in the lower left corner of the static image to load an interactive view of the structure. (The interactive view uses iCn3D, NCBI's web-based 3D structure viewer, and requires a web browser that supports WebGL.)

Click the icon to launch full feature iCn3D in another window

Use the various menu options to render the structure with the desired style, color, labels, viewing angle, etc.

Select the File/Export State to save the state of the structure in a file {(which is named "statefile" by default, unless you select the "Save As" option in your browser. The saved file will be in *.txt format)}. You can then later open the statefile through the iCn3D "File/Open State" menu option.

Alternatively, if you prefer to just save the image of the structure, Select the File/Export Image to save the state of the structure in a file {(which is named "canvas" by default, unless you select the "Save As" option in your browser. The saved file will be in *.png format)}.

To render and save images using the wide range of controls that are available in NCBI's free standaloneCn3D structure-viewing program:

Open the file in any 3D structure viewer (e.g., Rasmol) that reads PDB file format.

Render, label, color, and save the structure as desired, according to the instructions provided by the structure viewing program's help documentation

Save structure components

The sequence and/or chemical records for the molecular components of a structure can be retrieved by: (a) following the link for each component displayed in the tabular summary at the bottom of a structure record to its corresponding record in the Entrez Protein, Nucleotide, and/or PubChem database, or (b) selecting the appropriate items from the "Links" pop-up menus on the search results (docsum) page for the structure.

Once you are viewing the components in the relevant Entrez database, you can display and/or save those records in any format that is available for that database. For example, records from the Entrez Protein database can be saved in FASTA format (which is convenient for sequence analysis), as a list of GI numbers, or in other formats such as GenPept (which contains sequence data plus annotations, similar to GenBank format). The Entrez help document provides additional information about sequence database record formats. The Entrez Gene help and PubChem help documents describe record formats for genes and small molecules, respectively.

The table near the bottom of a structure summary page lists the molecular components of the structure, which may include proteins, nucleotide sequences (DNA, RNA), and chemicals. The graphics and other links in the table open more detailed displays. For example, mouse over any icon in the graphic display on a live structure summary page (e.g., 1PTH) for more information about that component or feature annotation.

If any protein or nucleotide molecules in the structure were generated by applying transformations from crystallographic symmetry, their labels are shown as alphanumeric combinations (for example, or ), indicating the source molecule from which they were generated and the copy number.
Chemicals that interact only with such molecules were also generated by applying transformations from crystallographic symmetry; their labels include an underscore bar, with a number on either side of the underscore bar to indicate the source chemical and the copy number, respectively.

If you are viewing the structure's biological unit, the count reflects the number of molecules that were present in the PDB source file plus any copies that were generated by applying transformations from crystallographic symmetry.

If you are viewing the structure's asymmetric unit, the count reflects only the number of molecules that were present in the PDB source file.

The name or other descriptive identifier of the molecule:

Protein names are derived from the COMPND record of the PDB source file.

LABEL: Labels for protein molecules are derived from their single letter chain codes in the PDB source file, and are shown as circle icons in the interaction schematic, for example .
Labels for protein molecules that were generated at NCBI by applying transformations from crystallographic symmetry are shown as an alphanumeric combination, for example , indicating the source chain from which they were generated and the copy number.

COUNT: If you are viewing the structure's biological unit, the count reflects the number of protein molecules that were present in the PDB source file plus any copies that were generated by applying transformations from crystallographic symmetry. If you are viewing the structure's asymmetric unit, the count reflects only the number of molecules that were present in the PDB source file.

MOLECULE: The name of the protein, derived from the COMPND record of the PDB source file. If a particular protein name has been applied to multiple molecules (e.g., PDB chains A, B, etc.) within the PDB source file, those molecules are considered to be the same. A non-redundant list of protein molecules is then displayed, with the "count" column indicating the number of instances of each protein molecule in the structure's biological unit or asymmetric unit, depending on what you are viewing in the current display. Each protein molecule is represented with a sequence graph and annotated with features such as 3D domains and domain families, as described below.

GENE: A gene symbol, if/as available, appears beside the name of each protein molecule in the tabular list of molecular components. The protein-gene association is determined in the following way:
(1) The source database, PDB, provides a UniProt ID for each protein chain in a structure record.
(2) The NCBI Gene database generates data files on its FTP site that provide mappings between protein identifiers and gene identifiers. Specifically: (a) the "gene_refseq_uniprotkb_collab.gz" file lists the correspondence between UniProt and RefSeq protein accessions; and (b) the "gene2accession.gz" file lists the correspondence between RefSeq protein accessions and Gene IDs. The MMDB data processing pipeline creates a join between these two tables in order to map each UniProt ID to its corresponding Gene ID, and to link to the NCBI Gene record.(Note that the protein sequence in the structure record is not necessarily identical to the protein product of the gene. For example, a structure record might only contain a fragment of the protein rather than the whole protein. So there is a mapping between the structure's protein molecule and the gene product, but not necessarily an exact sequence match.)

Protein annotation graphic

Sequence graph

The sequence bar graph for each protein molecule in the molecular components table shows the protein's length in amino acids. Beneath that is an interactive graphic of the geometrical and biological features annotated on the protein, such as 3D domains and domain families (protein classifications), respectively. For example, the illustration below shows the features annotated on the Prostaglandin H2 Synthase 1 protein, which is a component of the prostanglandin H2 synthase structure (1PTH) from sheep. Click on the image to open the live MMDB structure summary record for 1PTH, which in turn includes a live, clickable protein annotation graphic:

3D Domains

3D domains are compact structural units within a protein that are identified automatically in MMDB using purely geometric criteria. A protein molecule can contain one or more 3D domains, which often correspond with conserved domains (illustrated example) observed in molecular evolution. Additionally, proteins that are dissimilar in sequence might contain geometrically similar 3D domains, indicating a distant homology that cannot be recognized by sequence comparison. 3D domains are used in the identification of VAST Similar Structures.

The Colored bars in the "3D Domains" line in a protein molecule's sequence graph indicate the 3D domain boundaries. Click on the bar for any 3D domain in the "show annotation" display to retrieve similar structures identified by the VAST algorithm.

Note that a protein molecule can contain one or more 3D domains. A 3D domain may be composed of a single region of protein sequence, or two or more non-contiguous regions of the protein sequence.

If no compact substructures have been found to exist within a protein molecule, then the overall molecule is regarded as a 3D domain in its own right. In that case, the "3D Domains" line does not appear in the "show annotation graphic" and you can click on the sequence bar itself to retrieve similar structures identified by the VAST algorithm. That will retrieve other structures similar in 3D shape of the overall protein molecule..

(3D domains can also be seen in the interactive 3D structure view by displaying the structure in the free Cn3D structure visualization program and selecting the "Color by domain" option.)

Domain Families(Protein classification)

The "Domain Families" text link in a protein molecule's sequence graph opens the CD-Search results for that protein sequence, showing the conserved domains found in the protein, which infer protein function. These are the results of an RPS-BLAST search of the protein molecule against the Conserved Domain Database.

Mouse over the cartoon representing a conserved domain for brief information about it, and click on the cartoon to open the corresponding, detailed record in the Conserved Domain Database. More details about each type of conserved domain hit are below:

Specific Hits

A Specific Hit meets or exceeds a domain-specific e-value threshold and represents a very high confidence that the query sequence belongs to the same protein family as the sequences use to create the domain model. Therefore, there is also a high confidence level for the inferred function of the protein query sequence. (Details and illustrations are provided in the Conserved Domain Database help document.)

Superfamilies

A Superfamily is the domain cluster to which the specific and/or non-specific hits belong. This is a set of conserved domain models that generate overlapping annotation on the same protein sequences and are assumed to represent evolutionarily related domains. See additional details, including information about clustering methodology, in the CDD help document section on "What is a superfamily?"

LABEL:
Labels for nucleotide molecules are derived from their single letter chain codes (e.g., C, D) in the PDB source file. They are shown as square icons in the interaction schematic, for example .
Labels for nucleotide sequences that were generated at NCBI by applying transformations from crystallographic symmetry are shown as an alphanumeric combination, for example , indicating the source chain from which they were generated and the copy number.

COUNT: If you are viewing the structure's biological unit, the count reflects the number of nucleotide molecules that were present in the PDB source file plus any copies that were generated by applying transformations from crystallographic symmetry. If you are viewing the structure's asymmetric unit, the count reflects only the number of molecules that were present in the PDB source file.

MOLECULE: The name of the nucleotide sequence, derived from the COMPND record of the PDB source file, with the "count" column indicating the number of instances of each molecule in the structure's biological unit or asymmetric unit, depending on what you are viewing in the current display.

Bar graph for each nucleotide molecule:

Sequence graph

The bar graph shown for each nucleotide sequence molecule in a structure record shows the molecule's length in nucleotides. Follow the "N Nucleotide" text link (that appears to the left of the molecule's bar graph) to open the corresponding sequence record in the Entrez Nucleotide database.

Chemicals

LABEL: If chemicals are present in the structure, they are shown as diamond-shaped icons in the interaction schematic and labeled with integers, for example . If several chemicals have the same molecule name, they are labeled with the same number.
If a chemical interacts only with a protein or nucleotide molecule that was generated by applying transformations from crystallographic symmetry, then the chemical was also generated by crystallographic symmetry. Icon labels for chemicals generated by crystallographic symmetry include an underscore bar, with a number on either side of the underscore bar to indicate the source chemical and the copy number, respectively.

COUNT: If you are viewing the structure's biological unit, the count reflects the number of chemicals that were present in the PDB source file plus any copies that were generated by applying transformations from crystallographic symmetry. If you are viewing the structure's asymmetric unit, the count reflects only the number of chemicals that were present in the PDB source file.

MOLECULE: The name of the chemical, derived from from the HETNAM record of the PDB source file or from the MeSH terms associated with the corresponding PubChem Compound or Substance record. In order to provide a non-redundant list of chemicals found in the structure, the name of each unique chemical is listed only once. If two or more non-biopolymers were assigned the same HETNAM by PDB, the are grouped together under that name in the molecular components table. If their chemical structures are slightly different, they will be linked to separate PubChem substance IDs (SIDs). The "count" column indicates the number of instances of each chemical in the structure's biological unit or asymmetric unit, reflecting what you are viewing in the current display.

Note: Ions that interact with the biomolecules in the structure but do not reach the 5 contact threshold will be absent from the interaction schematic; however, they will be listed in the tabular summary of molecular components. Interactions for short peptides, or for molecule types other than protein, DNA/RNA, and chemical, are not calculated. Molecules, such as crystallization agents, etc., that are not part of the biologically active molecule are absent from both the interaction schematic and the molecular components list.

Thumbnail image for each chemical:

Thumbnail graphic

The thumbnail graphic for each chemical links to corresponding information about the physiochemical and biological properties of each chemical in the PubChem Compound or PubChem Substance database.

Non-standard Biopolymers

Non-standard biopolymers are molecules such as nucleotide or protein sequences that contain a large percentage of non-standard residues. As an example, view the MMDB summary page for 4GLS
"Crystal Structure of Chemically Synthesized Heterochiral {D-Protein Antagonist plus VEGF-A} Protein Complex in space group P21."

LABEL: If non-standard biopolymers are present in the structure, they are shown as parallelograms in the interaction schematic and labeled with letters, for example .
Labels for non-standard biopolymers that were generated at NCBI by applying transformations from crystallographic symmetry are shown as an alphanumeric combination, indicating the source molecule from which they were generated (to the left of the underscore bar) and the copy number (to the right of the underscore bar).

COUNT: If you are viewing the structure's biological unit, the count reflects the number of non-standard biopolymers that were present in the PDB source file plus any copies that were generated by applying transformations from crystallographic symmetry. If you are viewing the structure's asymmetric unit, the count reflects only the number of non-standard biopolymers that were present in the PDB source file.

MOLECULE: The name of the non-standard biopolymer, derived from from the COMPND record of the PDB source file. In order to provide a non-redundant list of non-standard biopolymers found in the structure, the name of each unique chemical is listed only once. If two or more non-biopolymers were assigned the same COMPND by PDB, the are grouped together under that name in the molecular components table. The "count" column indicates the number of instances of each non-standard biopolymer in the structure's biological unit or asymmetric unit, reflecting what you are viewing in the current display.

Web API

Web API: URL format for displaying or saving a structure record:

It is possible to view or save a 3D structure record by linking directly to it. The URL format, parameters, and allowable values, are as follows:

To save an exact copy of the original PDB source file, use the parameters of "fileformat=pdb" AND "complexity=4". In such case, the "buidx" argument will be ignored. For other "complexity" input values, the cgi will create an NCBI-style PDB formatted data set with "complexity=3" only (all atoms), and with whatever "buidx" value you specify.

xml = This option renders the data in XML format.
If you specify this fileformat, the only display options available are "1" (save data to a file) and "2" (see data in web browser, which is the default display for XML format).

json = This option renders the data in JSON format.
If you specify this fileformat, the only display options available are "1" (save data to a file) and "2" (see data in web browser, which is the default display for JSON format).

Default: If the "fileformat" parameter is not included in the URL, the cn3d (ASN.1) file format will be returned by default. A separate section of this document provides additional details about file formats.

display

Specify what you would like the browser to do with the file. The allowable values are:

0 = launch the structure viewer, automatically opening the file in that program so you can view the structure interactively.

Note that the structure viewer you will use (e.g., NCBI's free Cn3D program or a PDB-format compatible viewer) must be installed on your computer and configured as a helper application for your browser in order for the display parameter of "0" to automatically open the 3D structure. If you already have Cn3D 4.1 or earlier on your computer, you will need to upgrade to Cn3D 4.3 (install) in order to view 3D structures that were reconstructed by applying transformations from crystallographic symmetry.

1 = save data to a file

2 = see data in the web browser

Note: If you specify "xml" or "json" for the "fileformat" parameter, the only display options available are "1" (save data to a file) and "2" (see data in web browser, which is the default display for xml and json format.

Defaults: If the display parameter is not included in the URL, the value of 0 (launch structure viewer) will be used by default if the fileformat parameter is set to "cn3d" or "pdb". The display value of "2" (see data in web browser) will be used by default if the fileformat parameter is set to either "xml" or "json".

complexity

Specify the desired complexity (data set) of the structure you want to view. The allowable values are:

1 = vector. This option is valid if fileformat=cn3d or xml or json. It returns data about the secondary structures identified in the asymmetric unit or biological unit, and their orientation (vector) in 3D space.

4 = PDB model. This option is valid only if fileformat=pdb.
If fileformat=pdb and complexity=4, the program will return the original PDB source file. In that case, the only available biounit value is buidx=0 (asymmetric unit); that value will be applied regardless of whether you insert any other value.

Default: If the complexity parameter is not included in the URL, the value of 3 (all atoms) will be used by default.

If a structure has >25,000 atoms, the value of 2 (backbone) is selected by default. If a structure record contains an NMR ensemble or a correlated disorder set from crystallography, this will download backbone data only for the first model in the set.

If the "fileformat" parameter is set to "pdb," the only complexity values available are 3 (all atoms) and 4 (PDB model); if any other number is specified, it will be invalid and will be set to 3.

Note: When the parameters of "fileformat=pdb" and "complexity=4" are used together, the "buidx" argument is ignored. For this reason, the "buidx" parameter is not included in the sample URL above. This is because the original PDB source file contains the asymmetric unit, so that is the only thing that can be returned.