Proteomics/Introduction to Proteomics

Presentation

What is proteomics?

Information transfer in the central dogma of biology

The focus of proteomics is a biological group called the proteome. The proteome is dynamic, defined as the set of proteins expressed in a specific cell, given a particular set of conditions. Within a given human proteome, the number of proteins can be as large as 2 million. [1]

Proteins themselves are macromolecules: long chains of amino acids. This amino acid chain is constructed when the cellular machinery of the ribosome translates RNA transcripts from DNA in the cell's nucleus. [2] The transfer of information within cells commonly follows this path, from DNA to RNA to protein.

Proteins can be organized in four structural levels:

Primary (1°): The amino acid sequence, containing members of a (usually) twenty-unit alphabet

Secondary (2°): Local folding of the amino acid sequence into α helices and β sheets

Quaternary (4°): Interaction between multiple small peptides or protein subunits to create a large unit

Each level of protein structure is essential to the finished molecule's function. The primary sequence of the amino acid chain determines where secondary structures will form, as well as the overall shape of the final 3D conformation. The 3D conformation of each small peptide or subunit determines the final structure and function of a protein conglomerate. [3]

Proteomics has both a physical laboratory component and a computational component. These two parts are often linked together; at times data derived from laboratory work can be fed directly into sequence and structure prediction algorithms. Mass spectrometry of multiple types is used most frequently for this purpose. [5]

The importance of proteomics

Proteomics is a relatively recent field; the term was coined in 1994, and the science itself had its origins in electrophoretic separation techniques of the 1970's and 1980's. [6] The study of proteins, however, has been a scientific focus for a much longer time. Studying proteins generates insight on how proteins affect cell processes. Conversely, this study also investigates how proteins themselves are affected by cell processes or the external environment.

Proteins provide intricate control of cellular machinery, and are in many cases components of that same machinery. [7] They serve a variety of functions within the cell, and there are thousands of distinct proteins and peptides in almost every organism. This great variety comes from a phenomenon known as alternative splicing, in which a particular gene in a cell's DNA can create multiple protein types, based on the demands of the cell at a given time.

The goal of proteomics is to analyze the varying proteomes of an organism at different times, in order to highlight differences between them. Put more simply, proteomics analyzes the structure and function of biological systems. [8] For example, the protein content of a cancerous cell is often different from that of a healthy cell. Certain proteins in the cancerous cell may not be present in the healthy cell, making these unique proteins good targets for anti-cancer drugs. The realization of this goal is difficult; both purification and identification of proteins in any organism can be hindered by a multitude of biological and environmental factors. [9]

Proteomics Workflows

The first step of proteomics is sample preparation. In this step, we are trying to extract protein from cells. In the second step, we use methods such as 2D electrophoresis to separate different proteins. Then we try to cut proteins into peptides since peptides are easier to detect. In the forth step, we use mass spectrometry to detect peptides and peptides fragments. Finally, we can then determine the sequence of the protein by interpreting all the data obtained.

Broad-Based Proteomics

Broad-based Proteomics Approach vs traditional focused approach

Because Proteomics is growing at a very rapid pace, there is a shift in the field away from a specialized/focused way of conducting studies and towards a more global perspective. Broad-based proteomics presents a unique perspective on the field of proteomics because it allows for one to take on this general perspective by setting out to understand the proteome as a whole. A critical aspect to this strategy is planning ahead; and in doing so, the most appropriate plans and technologies can be implemented in the most efficient manner. By developing a strategy tailored to understanding a particular proteome, problems and setbacks can be avoided during the study.

The first step when utilizing broad-based proteomics is to develop a hypothesis specific to the proteome being studied. It is best to choose organisms that already have a great deal of genomicinformation available, since the genome is always a useful supplement to proteomic information. Once the a hypothesis and organism are established, the proper technologies should be chosen; and these technologies should be compatible with whatever biological factors are present (i.e. sample type). Some important and relevant proteomic methods include HPLC, Mass Spectrometry, SDS-PAGE, two-dimensional gel electrophoresis, and perhaps in silico protein modeling.

Since there are multitudes of sample type, sample preparation, and analytical technology combinations possible, it is obvious why careful planning from a broad-based proteomic perspective is critical. By planning upfront, an efficient proteomic study can be conducted. And when the efforts of many broad-based proteomic studies are taken together, understanding the proteome in its entirety becomes a realistic possibility.

Articles Summarized

Advances in Proteomic Workflows for Systems Biology

The article summarizes recent improvements as well as some principal limitations of shortgun tandem mass spectrometry based proteomics. Furthermore, it also briefly introduces steps of targeted driven quantitative proteomics.

Summary

In recent years, great improvements have been made in all the parts of non targeted mass spectrometry based proteomics including sample preparation, data acquisition, data processing and analysis. In the sample preparation process, with the introduction of IEF separation method, resolution obtained from classical two dimensional chromatography peptide separation is greatly improved. Improvements are also made in the field of data quality which is increased by the development of highly reproducible capillary chromatography methods and quantitative analysis by stable isotope labeling method. High mass resolution and accuracy could be achieved now by different types of mass spectrometry such as TOF-TOF,Q-TOF in the data acquisition process. Furthermore, different types of mass analyzers and ion sources have been combined to increase the proteome coverage. With the development of database search tools, the quality of proteomics data could be more accurately assessed and estimated in the data processing and analysis process.

Despite all these improvements achieved, limitations exist in shotgun approaches. For example, shotgun MS datasets are extremely redundant which greatly affect the identification of peptides present in proteomic samples. The existence of semi-tryptic or non-tryptic peptides in samples made the sample more complex. Saturation effect greatly reduces the discovery rate of new proteins. Many peptides that detected by Mass Spectrometry could not be identified, making it difficult to compare sample to sample.

The limitations of shotgun approaches made the development of targeted driven quantitative proteomics necessary. The first step of targeted driven quantitative proteomics is protein and peptide selection. This step could be finished both experimentally and computationally. For the multiple reaction monitoring (MRM) and data analysis step, multiple reaction monitoring was applied to proteomics data analysis. Relevance to the course: this source is a brief overview of recent improvements in targeted mass spectrometry (one method of proteomics) based proteomics as well as some limitations. It also introduced another field of proteomics: targeted driven quantitative proteomics.

A soft ionization technique used in mass spectrometry, allowing the analysis of biomolecules (biopolymers such as proteins,peptides and sugars) and large organic molecules (such as polymers, dendrimers and other macromolecules), which tend to be fragile and fragment when ionized by more conventional ionization methods(http://en.wikipedia.org/wiki/MALDI-TOF)

PeptideAtlas

A multi-organism, publicly accessible compendium of peptides identified in a large set of tandem mass spectrometry proteomics experiments(http://www.peptideatlas.org/)

Multiple reaction monitoring

MRM experiments, using a triple quadrupole instrument, are designed for obtaining the maximum sensitivity for detection of target compounds. This type of mass spectrometric experiment is widely used in detecting and quantifying drug and drug metabolites in the pharmaceutical industry(http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2291721)

FT-ICR mass spectrometry

Fourier transform ion cyclotron resonance mass spectrometry, also known as Fourier transform mass spectrometry, is a type of mass analyzer (or mass spectrometer) for determining the mass-to-charge ratio (m/z) of ions based on the cyclotron frequency of the ions in a fixed magnetic field(http://en.wikipedia.org/wiki/Fourier_transform_ion_cyclotron_resonance)

Course Relevance

This source is about non targeted mass spectrometry and targeted approaches which are important methods in the identification of proteins(an important step in proteomics).

This article summarizes what broad-based proteomics is and how one can design a study using this global-view strategy. It first briefly looks at the current technology in proteomics and then discusses how these technologies can be incorporated into a study.

Summary

Proteomics as a field is becoming a very daunting one to enter because many studies are getting lost in the complicated focused details. To help assist with this challenge, a researcher can employ broad-based proteomics. Broad-based proteomics is a strategy where careful planning is employed upfront to answer a question about a proteome (for instance, comparisons between a tissue in a diseased state and a normal state) using the most appropriate and applicable technologies available. By developing a strategy at the beginning of a proteomics study, possible setbacks during the study are avoided.

The first step is to develop a general hypothesis that is specific to the problem or issue that is being studied. Since proteomics mirrors genomics, a proteomic study is increasingly difficult when the genome of the model organism isn't known. For this reason, organisms where the majority of the genome is known (80% or greater) should be chosen. Once a proper organism has been chosen for study, the next factors to consider are the type of data that will be generated and also the sample source. Some proteomic methods yield qualitative data, while others yield quantitative; so the type of data needed should be determined before a method is chosen. At the same time, the source of the sample is important in determining the extraction and purification methods. Typical sample types include: urine, blood (plasma/serum) and mucosal secretions. Protein concentration within the sample is important, and one should expect reasonable extraction if the protein can be visualized on a coomassie blue stained gel (> 300 ng). The separation technique chosen should reflect the characteristics of the protein(s) of choice (hydrophobic vs hydrophilic, molecular mass, etc).

Another major factor in the planning process is estimating the difficulty in the preparation of the fractioned sample for mass spectrometry identification. Each mass spectrometry technique requires varying degrees of preparation, and some are much more complicated than others (2DE with MS/MS analysis requires greater preparation than HPLC with MS, for instance). Since mass spectrometry is often the step where a lot of proteomic studies encounter difficulty (both in preparation and in interpretation of the results), it is very important to choose a method that is appropriate for the protein sample.

With the advent of proteomic databases in recent years, bioinformatics has had an increasing presence in proteomic studies. For this reason, almost all proteomic studies should incorporate bioinformatics; and consequently it's important for the research team to have some bioinformatics knowledge. And depending on how much data will be received at the end of the study (depending on the analysis methods chosen), the research team can determine how much bioinformatic analysis should be needed.

A final factor to consider is whether to bring in outside assistance or to attempt the study in a more self-contained way. Keeping it self-contained allows for the research team to keep its data integrated and also keeps miscommunication to a minimum. Bringing in outside help, on the other hand, could allow a researcher to tackle problems that would be large and normally not solvable with a smaller team. While bringing in outside assistance seems promising, it's important to not lose control over the data and to make sure that the team is not spread out trying to accomplish more than it can handle.

Since there are many ways to study a cell's proteome, careful planning should be implemented at all stages of a proteomics study. Through broad-based proteomics, a researcher can define a test plan before any actual study is performed. And when used appropriately, this strategy can lead to productive and efficient projects that will bring science one step closer to understanding the proteome as a whole.

a subfractioned subset of the proteome. Often these are linked to area of the cell (organelle for instance) or by chemical properties.

Peptide mass fingerprinting (PMF)

an analytical technique for protein identification. The unknown protein of interest is first cleaved into smaller peptides and after mass is determined using mass spectrometry, their masses are compared to either a database containing known protein sequences or a genome. ( http://en.wikipedia.org/wiki/Peptide_mass_fingerprinting )

Course Relevance

This article is relevant because a global view of proteomics is becoming more important. As the wealth of information about proteins expands, understanding the proteome from a broad viewpoint is becoming more and more useful.

Websites Summarized

The Association of Bimolecular Resource Facilities: Proteomics Research Group (PRG)

This web page is about how the Association of Bimolecular Resource Facilities relates to proteomics. Of particular importance is the Proteomics Research Group within the ABRF.

Summary

The Association of Bimolecular Resource Facilities (ABRF) is an international association of research facilities and laboratories that is focused on core research in Biotechnology. The association encourages the sharing of information through conferences, a quarterly journal, and group studies. The ABRF has a heavy influence on the field of proteomics, and there are five main research groups (RG) that deal with proteomics in some way: Protein Expression (PERG), Protein Sequencing (PSRG), Protein Informatics (iPRG), Proteomics (PRG), and Proteomics Standards (sPRG).

Of particular importance, the Proteomics Research Group allows for researchers throughout the world in the field of proteomics to share their protein analysis information freely. Obviously, since understanding the proteome is about bringing together information on many different proteins (which is information that requires a great amount of effort/time/money to achieve), the sharing of protein/subproteomic information is imperative to beginning to understand a proteome in its entirety. This website has numerous links to studies performed by research groups throughout the world.

This is an overview of the Association of Biomolecular Resource Facilities (ABRG) and how it relates to proteomics. There is a great deal of relevant information on this website that those in proteomics will find useful.

This web page is about the importance and challenges in proteomics. It also introduces major steps of proteomics briefly.

Summary

Proteomics is important for us to understand biological processes since all the functions are accomplished by proteins in cell.But as the number of proteins are so large and amino acids(which are units of protein) are so small, the study is quite challenging.There are five steps to analyze protein sequences: sample preparation,separation,ionization,mass spectrometry and informatics.First of all, we obtain cells and extract proteins from the cells.Then we use methods such as 2D electrophoresis to separate different types of cells. Next, we use protease to cut proteins into peptides.Mass spectrometry allows us to identify individual peptides as well as peptides fragments.Finally, by interpreting the data, we are able to determine the sequence of proteins.

New Terms

Biopsy

A biopsy is a medical test involving the removal of cells or tissues for examination. It is the removal of tissue from a living subject to determine the presence or extent of a disease(http://en.wikipedia.org/wiki/Biopsy)

TOF

The time of flight (TOF) describes the method used to measure the time that it takes for a particle, object or stream to reach a detector while traveling over a known distance(http://en.wikipedia.org/wiki/Time-of-flight)

Quadrupole mass spectrometry

The quadrupole mass analyzer is one type of mass analyzer used in mass spectrometry.It consists of 4 circular rods,set perfectly parallel to each other.In a quadrupole mass spectrometer the quadrupole mass analyzer is the component of the instrument responsible for filtering sample ions, based on their mass-to-charge ratio (m/z).Ions are separated in a quadrupole based on the stability of their trajectories in the oscillating electric fields that are applied to the rods(http://en.wikipedia.org/wiki/Quadrupole_mass_analyzer)

Electronspray ionization

Electrospray ionization (ESI) is a technique used in mass spectrometry to produce ions.It is especially useful in producing ions from macromolecules because it overcomes the propensity of these molecules to fragment when ionized(http://en.wikipedia.org/wiki/Electrospray_ionization)

This website discusses the aims and definitions of proteomics. It also introduces two important methods in proteomcis studies - 2D protein electrophoresis and mass spectrometry as well as proteomics in medicine

Summary

Proteomics is a broad field which includes expression proteomics,protein distribution in subcellular compartments of the organelles,post-translational modifications of the proteins,structural proteomics and functional proteomics, clinical proteomics and so on. Even though analysis of the expression on transcripts level is possible with the introduction of RNA/cDNA microarray,proteomics is still important since not all mRNA will be translated and the processes such as RNA splicing, posttranslational protein modifications exist.

Two-dimensional (2D) protein electrophoresis is commonly used to separate proteins based on their PI and mass. Mass spectrometry is an important method in proteomics since it cannot only be used for protein identification but can also be used for protein posttranslational modification analysis.

One of the major application of proteomics in medicine is the identification of markers in all the steps to treat diseases. Other applications include drug discovery and pharmacoproteomics.

New Terms

Human Proteome Organization(HUPO)

The Human Proteome Organisation (HUPO) is an international scientific organization representing and promoting proteomics through international cooperation and collaborations by fostering the development of new technologies, techniques and training(http://www.hupo.org/)

The Swedish Human Protein Atlas program (HPA), funded by the (non-profit) Knut and Alice Wallenberg Foundation, invites submission of antibodies from both academic and commercial sources to be included in the human protein atlas (http://www.proteinatlas.org)