Introduction to PDB Data

The PDB archive is a repository of atomic coordinates and other information describing
proteins and other important biological macromolecules. Structural biologists use methods
such as X-ray crystallography, NMR spectroscopy, and cryo-electron
microscopy to determine the location of each atom relative to each other in the molecule.
They then deposit this information, which is then annotated and publicly released into the archive
by the wwPDB.

The constantly-growing PDB is a reflection of the research that is happening in laboratories
across the world. This can make it both exciting and challenging to use the database in research
and education. Structures are available for many of the proteins and nucleic acids involved in
the central processes of life, so you can go to the PDB archive to find structures for ribosomes,
oncogenes, drug targets, and even whole viruses. However, it can be a challenge to find the
information that you need, since the PDB archives so many different structures. You will often
find multiple structures for a given molecule, or partial structures, or structures that have
been modified or inactivated from their native form.

Looking at Structures is designed to help you get started with charting a path through
this material, and help you avoid a few common pitfalls. These chapters are intertwined with one
another. To begin, select a topic from the right menu, or select a topic from below:

PDB Data

The primary information stored in the PDB archive consists of coordinate
files for biological molecules. These files list the atoms in each protein, and their 3D location
in space. These files are available in several formats (PDB, mmCIF, XML). A typical PDB formatted file
includes a large "header" section of text that summarizes the protein, citation information, and the
details of the structure solution, followed by the sequence and a long list
of the atoms and their coordinates. The archive also contains the
experimental observations that are used to determine these atomic coordinates.

Visualizing Structures

While you can view PDB files directly using a text editor, it is often most useful to use a browsing or
visualization program to look at them. Online tools, such as the ones on the RCSB PDB website, allow you
to search and explore the information under the PDB header, including information on
experimental methods and the chemistry and biology of the protein. Once you have found the PDB entries
that you are interested in, you may use visualization programs to allow you to
read in the PDB file, display the protein structure on your computer, and create custom pictures of it.
These programs also often include analysis tools that allow you to measure distances and bond angles,
and identify interesting structural features.

Reading Coordinate Files

When you start exploring the structures in the PDB archive, you will need to know a few things about
the coordinate files. In a typical entry, you will find a diverse
mixture of biological molecules, small molecules, ions, and water. Often, you can use the names and
chain IDs to help sort these out. In structures determined from crystallography, atoms are annotated
with temperature factors that describe their vibration and occupancies that show if they are seen in
several conformations. NMR structures often include several different models of the molecule.

Potential Challenges

You may run into several challenges as you explore the PDB archive. For example, many structures,
particular those determined by crystallography, only include information about part of the
functional biological assembly. Fortunately the PDB can help with
this. Also, many PDB entries are missing portions of the molecule that
were not observed in the experiment. These include structures that include only alpha carbon
positions, structures with missing loops, structures of individual domains, or subunits from a
larger molecule. In addition, most of the crystallographic structure entries do not have
information on hydrogen atoms.

Except where noted, this feature is written and illustrated by David S. Goodsell.

About PDB-101

PDB-101 helps teachers, students, and the general public explore the 3D world of proteins and nucleic acids. Learning about their diverse shapes and functions helps to understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease to biological energy.

Why PDB-101? Researchers around the globe make these 3D structures freely available at the Protein Data Bank (PDB) archive. PDB-101 builds introductory materials to help beginners get started in the subject ("101", as in an entry level course) as well as resources for extended learning.

RCSB PDB (citation) is managed by two members of the Research Collaboratory for Structural Bioinformatics (RCSB):