Thursday, October 29, 2015

Alright, so...now I have big list of proteins....what do I do now? What a great question. There are lots of things. If you own an institutional license for an expensive pathway software, you could try that. You could go to KEGG. If you're one of those highly employable people who know R really well, there are ton of cool scripts and on and on.

One thing you might want to check out is the Thermo Fisher Cloud. Why?
Cause it looks pretty cool. And its free. And you get 10GB of free data storage on the Cloud just for registering and checking it out. Oh, and there are these tools I've never seen outside of papers on R scripts like Pathway Over Representation and Pairwise Significants that are super easy to use in this format. And if we generate interest in this then more tools will be added and faster. The bioinformatician behind the scenes in this project has some great insight into what this field needs and I think we'll continue to see more cool things added to this interface all the time.

Wednesday, October 28, 2015

I just saw an update on the attendees for tomorrow's NIH PD workshop. 60+ people! I'm super psyched. Sorry the blog has been slow lately. I started a new role recently for my day job and I've been putting all of my free time into new content for the workshop. There are people flying in from far away to attend!!!!?!?! I don't want anyone to be disappointed.
Thank you PRIDE Repository and to you guys who put tons of cool experiments in there!

And to everyone who can't make it, I can't make promises yet, but I think at least some of the material should be accessible to you later. I'm working on it! Can not wait to get back to Maryland today!!!

EDIT: 10/29/15 So...I found out the hard way (after lugging a tripod and good camera into the NIH and through 3 security checkpoints..) that all video recording on NIH campuses is done by an organized and unionized group that considers any attempt to record on campus as a threat to their livelihood. However, for the price of a good used car, they will record a workshop for you. We will have some slides to share, though!

Its thorough, up-to-date, and shockingly concise considering the history, reagents and methodologies described. Even if you've done these experiments for years and with different instruments there are still some great insights here!

There is a great section on doing PTM quantification with reporter ions (very phospho-centric) that brings up a really interesting methodology (reversed ammonia gas spray across the front of the instrument (?what?!? I gotta read that) that boosts TMT-phospho IDs (??again, no idea!!).

The highlight in this, for me, is a concept I've never even considered and I feel really dumb for not having come up with myself. TARGETED ANALYSIS with reporter ions. We're getting more targeted all the time, especially with new high-certainty LC-MS methods like PRM (parallel reaction monitoring). These let us look at hundreds of peptides in a pathway and each MS/MS scan gives us a ton of confirmational data that we're looking at the right target. But what if we multiplexed it with TMT-10? Then you get the sensitivity of the targeted approach and the certainty from the PRM and you can also get relative quan from up to 10 patients at once!!!

Sorry, my mind is kind of blown. I'd better finish this coffee and get to work....

The cells were SILAC labeled. So they end up with really nice up/down regulation of their whole proteins. For the whole proteomics they did in-gel digestion with 40ug of protein and cut out 12 sections and double digested (LysC and Trypsin).

For the quantitative lysine acetylomics, the proteins were mixed, SCX fractionated and the peptides were incubated with a bead with an antibody that recognizes acetylated lysines and pulled down. Everything was LC-MS/MS'ed on an Orbitrap Elite.

On the data processing side, the files were ran through once with a recalibration algorithm (similar to the Recalibration node in PD 1.4). Once recalibrated, the files were reprocessed. The data was processed with various tools including Andromeda in MaxQuant, and MS-Viewer in the Protein Prospector and Perseus. The combination of these analyses is a solid output that gives the changes at the whole protein level as well as the changes at the lysine acetylome level.

Oh, and they worked out a cool variation of anti-oxidant response that appears to be mediated by the super cool Nrf protein(s). Solid paper! Having trouble getting the PRIDE depository reference number listed in the paper to lead me to anything, unfortunately, cause I'd love to see this RAW data, but super cool paper.

Wednesday, October 21, 2015

Ever wondered how the Skyline team has managed to create and support this awesome software? Turns out its got a grant in the background holding it up -- a grant that is up for competitive renewal. SO...this awesome free software that thousands of us are using (8,000 PCs fired up Skyline just last week!!!) might disappear (or become..gasp!..not free!) if this grant goes away.

If you're thinking "what can I possibly do to protect my access to this software before I finish this extremely large mug full of espresso shots?" you should click on this link.

It takes you to a place where you can download a draft letter of support that, if you get back to Brendan, will be attached to the R01 renewal application. The deadline for their application is coming up real fast, so time is of the essence.

Tuesday, October 20, 2015

Any time we quantify anything we need to determine what is the priority for the measurement. Is it absolute quantification or is it lots of data points. This is always consistent, whether you're weighing out masses for buffers or doing global -omics quan.

We're absolutely no different. On Monday I worked with an awesome team developing an absolute quantification assay. One biomarker with heavy labeled spike-ins. One data point with precision and accuracy. But ONE measurment! Can we get that level of perfection with 16,000 measurements? Yeah...if you've got 10 years...(as technology improves...maybe 4 years?). And this will be measurement where we've reduced the level of technical variation.

Okay. So what happens when we look at something from a global level where the technical variation is high, but also the biological variation? I just learned that you can gain a lot of insight from this output by actually looking at the variation itself!

Variance normalization isn't a new concept in reporter ion based quantification. IsobariQ is a software a friend of mine has evaluated and really likes that was first showing off its algorithm for variance normalization in iTRAQ at HUPO in 2011. And there is an R package out there that does something similar and, for the life of me, I can never ever remember what it is called. Someone just asked me about it last week and I still haven't come up with the name.

What is cool about this paper is that we actually see the normalization working in clinical samples. Maybe someone else has done this, but I haven't seen it. If you show that something works well in your cool cancer cells that are grown at exactly 37C and get exactly 30% N2, or whatever, that is one thing (and awesome! and I'm not going to put it down ever. Bravo! I can't keep cells alive at all!)

But if you show me that your technique can extract more meaningful biological data out of people who walked into a clinic -- people who may have eaten 10 minutes ago, or 6 hours, or who have immunity to this virus or that or have any amount of the crazy differences each one of us walk around with every day? Thats gonna make me think we should take a look at what you did!

Monday, October 19, 2015

De novo sequencing has traditionally been a pain in the neck. If you are looking at low resolution MS/MS spectra there are often 10s, maybe 100s of possibilities out there to explain a given fragmentation spectra. High resolution accurate mass MS/MS spectra with fragments that are often only a few 100 parts per billion off in mass accuracy? That improves things a LOT! And that's why we're seeing things like de novo and wide mass accuracy windows making a resurgence.

You know what would be useful? A free (for academics...) super fast new de novo algorithm! And that is what Novor is. Novor makes some intelligent assumptions based on biological data (from spectral libraries) and can take a shortcut or two that other de novo algorithms can't. What you end up with is a crazy amount of sequencing speed without a loss in match certainty since its based on real data.

Sunday, October 18, 2015

Man, time flies when you're solving all the biological problems no one else can! SEQUEST is 20 years old now! To commemorate the fact that this awesome step forward for science will be old enough to have a drink with me next year JASMS has a special issue focusing on and around this accomplishment.

Saturday, October 17, 2015

Pretty soon I'm going to be rambling on about all sorts of improvement in reporter ion quan due to improvements we'll be seeing in Proteome Discoverer 2.1.

In the meantime, however, here's a great new node for PD 1.4 that improved TMT quan for this software package.

How's it work? Well, it takes a bunch of things into account, including the reporter ion isotope distribution! I can't wait to give it a shot. Unfortunately, I think I've let my PD 1.4 licenses expire and I've got to get on that....

Friday, October 16, 2015

Here is the gist of it: mutations are gonna happen. If they didn't evolution, to a large degree, wouldn't happen either (hugely complicated higher organism silly reproductive procedures aside). What Sahand Hormoz is proposing here is that the evolutionary pressure on essential proteins and their structures is to minimize the nasty effects of mutations is a mathematical certainty. That's a long sentence and I think the grammar could use some work.

Here, I've stolen the genetic coding table from this Wikipedia page. Say, you've got an essential leucine at this essential location. That's great! Cause many point mutations can happen to the section of the DNA and the protein won't change at all! You can change UUA to UUG or CUA and you're still gonna have leucine! Now, if it goes UAA...well you've got a truncated protein...so maybe this wasn't the best example.

There are also changes where we switch amino acids and it isn't all that bad either. For example, it would be better to have a mutation that changes a polar amino acid with a small functional group to another similar one, but changing to one with a huge non-polar, would be a whole lot worse.

From the genetic perspective, this makes all sorts of sense. What this guy did was show that you can calculate it with that Maths stuff to show that this wasn't accidental at all. This was evolutionary pressure over billions of years to find the best possible way to protect the proteins that life would end without. Really cool read, even if you don't know what any of the equations mean!

Hey! I know this is self-serving, but enrollment for the NIH Proteome Discoverer 2.0 workshop number 2 is still kinda low. I don't think its in danger of being cancelled, but if more attendees register then I'll be more confident we can have this thing.

The program is starting to take shape. The advanced concepts in the afternoon will definitely include:
How to combine quantitative whole protein and quantitative PTM (probably phosphoproteomics) into a single report AND
How to combine the results of multiple TMT and iTRAQ experiments.

All of our friends on the bioinformatics side of the proteomics world have been throwing out all these funny letters for years. They tend to start with an "m" and end with an "l" and have something random in the middle. mZmL, mzXmL, mzTab (no L! cheater!), mzIdentmL, and on and on. On cursory examination these are all attempts to store our data with better efficiency without the loss of data that we see when converting our data to MGF (where we lose almost all of our MS1 data!)

Problem is, that some of us have used these things. One or the other and the public repositories may have cool data hidden in one of these formats.

This new program (definitely meant for the bioinformaticians out there who can code and stuff!) is called ms-data-core-api. It is an Application Programming Interface that should take care of all these formats for you. Adding this to your programs will allow you to pull data in from any of these sources and read the data in a unifying format so you aren't all jumbled in your downstream processing.

Wednesday, October 14, 2015

I love webinars! In order of importance, this is what the internet has given my life:

1) Elvis Pugsley

(proof that I'm not the weirdest person in the world? or maybe just hilarious!!!)

2) Webinars! The ability to learn from lectures from wherever I happen to be. Now, webinars come in different levels of quality and topic interest but when I get and email from the MacCoss lab about a webinar I'm going to check it out.

And if it is on something I've never heard of (Panorama?) and it talks about how I can "AutoQC" my instruments (y'all might have realized I'm kind of a dork for quality control!) then I'm gonna sign up for it.

Its a cool story, too. There are really no targeted chemotherapies out there that will work on KRAS specific cancers. The function of the various GTPases and their pathways are complicated and convoluted. When they are working right they are supposed to function in signaling by GTP to GDP conversions. This communication system is so critical to normal cell functioning that disregulating of these proteins has the nasty outcomes of the cell dying or becoming a cancer cell. As in any biological system, its certainly more complex than this, because years of work with these things comes up with a whole lot more info and no clear simple answers.

This is where we come in. Turns out that this group did a big shRNA screen (this is where you transfect cells with a great big mixture of Single Hairpin RNA that knocks out RNA production (and therefore protein production) on a huge scale. The readout is typically a phenotype. In this case, I'm assuming what they did (I'm sure its in the paper. not my area of expertise.) was see what cells did or did not become cancerous and then go back and figure out what gene they knocked out.

Of the many observations they came with 2 ligases that are the only ones known to be involved in the SUMO E1 and E2 pathways (controlling the SUMO PTM, not sure if I have the nomenclature 100% correct here). Anyway...SUMOs are small proteins that are ligated to big proteins and modulate their function as post-translational modifications (PTMs). There are bunch of them and a bunch of pathways, but here you have two major regulators of SUMOylation (more info on this PTM here) that are somehow implicated in KRAS oncogenesis? Tell me more!

So, they go in and construct an RNA interference to directly deplete these SUMOylation ligases (the things that attach the SUMO PTMs) in some cells that are crazy KRAS cancer cells. Turns out that if you can't produce these proteins even a KRAS cancer cell gets subdued. Then they study it by labeling some of the cells with SILAC and repeating the knockdowns to try to figure out the mechanism by which all this is happening and come up with a group of proteins they call KASPs which is short for KRAS Associated SUMOylated Proteins that are involved in this mechanism.

To sum up: We start with a common cancer mutation we don't have drugs for and we figure out a protein, not just that, a whole series of proteins that may be potential targets for treatment when someone has this type of cancer. Inhibiting these proteins and maybe you have a new chemotherapy. And along the way, we learn an entirely new biological modulation pathway?

I highly recommend picking this one up. Its nice to see what we do fitting seamlessly into a biological study alongside the cutting edge tools the molecular biologists are using these days!

Monday, October 12, 2015

Alzheimer's disease is some terrifying stuff. Fortunately, however, it is a disease that appears to be amenable to study with the advanced tools we have these days! Pull a Google Scholar search for "Alzheimer's disease proteomics" and you'll find a load of great studies that show that protein mass spectrometry may be exactly the way that we need to approach this for early disease diagnosis, stage monitoring, and hopefully!!!! for finding the upstream stuff so we can fix it.

Point in case: This open access study from groups at several institutions (hey! my good friend Katie is an author!) who used a combination of differential proteomic analysis and targeted MS/MS to profile disease progression. For the initial analysis they use simply MS1 high resolution monitoring and compare the peak profiles between the CSF samples of different stages of the disease. These differential lists provide them with ions to go after in their targeted assays! This is a pretty awesome application of a way a lot of us have thought of doing proteomics over the years...by just going after the stuff that is different!

How'd they find the stuff that was different? With Elucidator (and I'm not sure it exists for today's instruments...) but we have lots of tools we could use for things like this, like SIEVE and OpenMS.

Big highlights of this assay? Seeing biomarkers without any level of antibody-based enrichment or pulldown. Just finding them by high resolution differentials!

Friday, October 9, 2015

I just downloaded this and I'm digging for something cool to feed it. This is free, easy to use, software for analysis of your phosphoproteomics data! And it runs through Java, so it should be accessible to just about everybody!

Did you just jump up out of your chair and yell a happy profanity? Or was that me?

It seems too good to be true to me, but I sure have a JAR file, and an instruction manual, and a practice dataset that I downloaded here.

If you get to check it out first, I'd love to know your impressions. I had a weird Java permissions issue (probably me and my PC settings) and I had to "unblock" the .JAR file, but I almost always have to.

Splicing? Well, that's when DNA that should be over here making this protein -->
ends up hanging out
with DNA
that's way over here -->
(who says blogging can't be hi-tech!)

and you end up with a transcript (and therefore a protein!) that, from a purely DNA perspective, TOTALLY SHOULDN'T EXIST AT ALL! (Definitely isn't in Uniprot/Swissprot...!)

Sorry for shouting. I'm excited. This paper focused on mouse brains and found a whole ton of these things. In regards to some of the recent discoveries in brain proteins, maybe this isn't that big of a deal, but tissue-wise, holy cow....should we be using a different FASTA file if we are profiling liver tissue than if we are doing tissue that came from brain samples? Sure looks like it! Heck, if nothing else, its another great argument for PROTEOGENOMICS! (Yeah, I'm shouting again!)

Worth a read. Sorry it isn't open access. And sorry if this is jumbled. Its kind of late and I've been excited about one thing or another for the last couple of days.

Wednesday, October 7, 2015

What happens when 64 different labs submit BSA samples that they run every month for 9 months and people sit down and assess the data? Sounds like an ABRF study to me!

As we've come to expect, intralab variability (same lab over the 9 months) was smaller than interlab variability (from one place to another...I get them mixed up). That makes sense. My LC my mass spec, I'm going to keep it pretty consistent for 9 months, compared to the way I run it versus those wackos over at Whats-It-Called University.

Variability among all the samples really doesn't look all that bad. It his, however, a single protein digest -- so we'd kind of expect that. Sampling of 100,000 peptides from a normal mammalian line might be a more sensitive indicator, but I still think this is a promising measurement. As a field, we're getting better all the time!

Interestingly, the real outliers seem to show up right after LC-MS preventative maintenance (PM). And this makes sense, too. If you've had your LC open and changed some thingies in it recently then peakwidths and retention times might have shifted a bit pre- and post- opening it. Sure does emphasize the frequent use and recording of quality control standards, particularly after maintenance and things.

Tuesday, October 6, 2015

Okay, y'all know I've been trying to find a good, easy, and reproducible protein digestion method to get behind. And I've mentioned the SMART (previously Perfinity FLASH) digests kits before. The big question that is floating around is: it works for single proteins just fine, but how well does it work for proteomics?

According to this paper it works pretty darned well. Now, this is just one paper and all, but I really can't come up with a reason that a method that digests one protein wouldn't digest a whole bunch of 'em and do it well (so long as the protein to enzyme ratios aren't all wonky).

Monday, October 5, 2015

Over the weekend I got to finally toy around with one of the cool free nodes from the Kohlbacher lab that we can install into Proteome Discoverer 2.0. The LFQ is short for "Label Free Quan" and the nodes are freely available to download for anybody here. Now, before I go forward I should probably reiterate something that is on that page. These are 2nd party nodes and these won't be supported by Thermo's Proteome Discoverer team. Questions should be directed to the node developers. Fortunately, they seem quite straight forward!

Here are some early impressions of the nodes.

1) They are easy to install. Download the file from SourceForge, make sure all Proteome Discoverer versions on the PC are closed and run the file. When you reopen, the nodes are there!

2) The node developer's even have workflows ready for us! There is a Processing Workflow and Consensus Workflow. Which is great! Cause, honestly, I wouldn't have thought to set them up as above....

3) Interesting note. SequestHT and Percolator are mandatory. Gotta have 'em or you won't go anywhere, it seems.

4) LFQProfiler appears to multithread.

Windows performance loggers are always kind of hard to interpret, but all 8 cores on this desktop appeared to be doing something when the LFQ Profiler kicked in. In the consensus you can actually tell that LFQ node how many cores it is allowed to use! On other runs, it looked like I was maybe only using 4 cores, but this really isn't a good measurement.

5) Disclaimer here: I've got like 10 versions of Proteome and Compound Discoverer on my desktop because I have been alpha/beta testing them for years. I've got some versions that are locked down for different projects so my working environment is probably sub-ideal. But... I'm gonna be honest here, and I'm likely doing something wrong, but I'm finding the node a little difficult to integrate into my workflows, in an odd way. I keep getting "Execution failed" in my Administration tabs, but the failed workflow can be opened and looks just fine. I do have to unhide my Intensity but the numbers are here and it looks like it ran real fast!

So...first impressions. The LFQ node installs easy, has convenient pre-made workflows (additional downloads required) and seems to run fast. More analysis required to see how it works, but its Sunday and this is all the PD I think I'll do today!

Sunday, October 4, 2015

We have an awful lot of search engines these days and we have almost as many (more?) ways of working out our automatic false discovery rates. The Qu lab seems to have stepped back and said, let's try to sort it out, meaning, which FDR is more appropriate for large datasets -- and when?

This is a heavy analysis of three different search engines available for running in or through Proteome Discoverer as well as an analysis of what false discovery rate algorithm/method or filter will leave you with the best possible results. Interestingly, the answers appear to be very analyzer and fragmentation-type dependent.

I'll leave this here for you guys who took more maths! The answer appears to be...there is no easy answer...these are things we're definitely need to spend more time working on as proteomics moves further and further into the BIG DATA world.

Saturday, October 3, 2015

It turns out that most of the histone work that has been done out there has been done on mouse cell lines or immortalized human lines. While this is undoubtedly useful information, immortalized cell lines tend to be kind of messed up and we all know about the plus/minuses of studying mice.

The Ciborowski lab has a plan of long term, in-depth studying of histones and their post-translational modifications of normal human macrophages. In this first study (available here, open access) they work on establishing their normal, baseline conditions for resting macrophages. Once they get that, they can go on to further studies.

For this analysis they are primarily using an LTQ-Orbitrap XL with ETD and employing both CID and ETD fractionation. Surprisingly, the majority of the information being obtained for PTM matches is not coming from the ETD. This is likely due to the lower speed/efficiency of the earliest ETD system compared to the ones I normally get to mess around with. It does, however, contribute meaningfully to the study. This is a nice clear study but I mostly highlight it here because I'm very interested in what they are going to do next AND how this data is going to line up with other well established histone PTM datasets we have from other models. So...this post is kind of to remind myself to check back on these guys later...sorry...