Thursday, March 8, 2012

Overview:
~800 ug of peptides were divided in half
1 half went to SCX on a 10mM sodium phosphate (pH 2.8) gradient with an increase to 0.6M KCl. 20 fractions were collected
1 half was separated by peptide offgel using a high resolution strip pH 3-10
The peptides were desalted by ZipTips and loaded on our Velos using a standard top 10 method.
The MS/MS spectra were analyzed using Proteome Discoverer 1.3 with both Sequest and Mascot using Peptide Validator.
The results are shown above. Pretty one-sided. We'll see what the repeats look like

Tuesday, March 6, 2012

Sometime last year it became just about impossible for me to find good autosampler vials for my Accela system. Thermo got tired of making them, or changed the catalog number or something. After weeks of searching and multiple calls to manufacturers, we were able to secure several boxes to set us up for the future. Then I started my new position.

We have been making due for the last 4-5 months by using the autosampler vials for our Shimadzu HPLC with spring-loaded glass inserts from Thermo that run about $70/100 inserts. Couple this with the price of the glass vials, and we're running a couple of bucks a sample vial.

Tonight I happened to be proof-reading my recent entries and an ad for Colbert Associates popped up. Although I'm not supposed to click on the Adsense ads, I did in this case. (Google, you can take back the 0.02 cents that I earned for that click, I apologize). I don't think Colbert Associates would be too angry considering that I ordered 500 of their polypropylene inserts which run about $0.15/insert. As long as the inserts are of good quality, we'll definitely be switching over to their autosampler vials. I need to check the numbers, but it looks like they are substantially cheaper than the ones we are using. It looks like we could save at least $1 per sample, which doesn't seem like much, but will definitely add up at 24 samples/day.

Sunday, March 4, 2012

I'm still investigating PepNovo for performing de novo sequencing on our data sets. The second paper from the list at CSE Bioinformatics is this 2008 paper by Ari Frank and describes the PepNovo Plus algorithm.
While a lot of the statistics are a little beyond my level, there is a lot of very useful information in this somewhat long paper.
In the introduction, Dr.Frank points out the problem with most statistical models used in bioinformatics -- that "such models tend to oversimplify the phenomenon they describe and are consequently inaccurate."
In order to address these shortcomings, the paper describes the use of a machine learning boosting algorithm to analyze a large database of low resolution MS/MS spectra.
The dataset used was >300,000 peptide spectrum pairs.
The principle of boosting "produces highly accurate prediction rules by combining many "weak" rules that, each on their own, might be only moderately accurate."
The boosting algorithm, as described here, is able to make use of a combination of over 800 possible features produced by CID fragmentation of a peptide.
Its pretty clear that this algorithm is much more complicated than simpler programs like Sequest. Considering the amount of thought that has went into the PepNovo program, I'm expecting big things from it once I can actually get the file to run.
A handicap is that the Thermo RAW files can not be inputted directly into the software. They must be converted before they will upload successfully. I'm still working on that one....

Thursday, March 1, 2012

Today's lunch time reading was an older paper. Some of the proteins we are interested in have high variability regions. The way we've been dealing with them is a complex FASTA file containing all known sequences of these proteins from dozens of partially sequenced field isolated. Unfortunately, it doesn't look like we're only seeing the tip of the iceberg. The next plan is to filter our peptide data and remove everything that matches. What we're going to be interested in is the stuff that doesn't match any entries in our database.
The first program I've chosen to evaluate is the PepNovo software.
The following paper was cited in the link above, and its short, so I figured it was a good place to start (if first appeared in JPR in 2006).

The central concept of this paper is that of homeometric peptides, which the authors define as different peptides with similar theoretical MS/MS spectra. The authors site a number of reasons that these can and do occur, though I'm sure they are FAR more likely when you are looking at lower resolution MS/MS spectra
The authors propose that multiple de novo sequencing outputs should be produced by the software that can be narrowed down/filtered by other means.

The big advance forward from this paper is the description of the Dancik scoring algorithm. From what I can understand of this algorithm, in addition to normal de novo sequencers, the Dancik ranks the intensity of the fragment ions (1st most intense, 2nd, etc.,). The most intense fragments are considered to be the most likely to be b or y ions and the probability of the outputted peptide sequence takes this into consideration.

They then take this scoring algorithm and interrogate a dataset generated by a 7-tesla(!) FT machine and conclude that it is an improvement over other scoring methods.