Tuesday, January 30, 2018

PepMap C-18 is a really common LC material these days. After looking at the last week of runs on 3 different instruments I feel like they should slap a warning label in it. It should say something like

ALL OF THE PEPTIDES COME OFF AT LOW LOW LOW ORGANIC!

I've got better runs than this now but I haven't processed yet. 200ng of human digest lysate. In the top run you have a normal-ish ramping of 5-35% acetonitrile in 100 minutes. On a whole cell human lysate, I'm getting around 3,700 unique protein groups on OT-OT scans. The peaks are just bunched too closely in the middle to get much more.

On the second one -- no changes to the instrument method -- just using a 2 stage ramping (to 20% B!!) I'm over 4,500 protein groups. I'm picking up around 3,000 new PSMs just by compensating for the fact that PepMap is ditching all my peptides super early in the organic. Again, I've improved upon what is shown above, but it gives you a feel for where I'm heading with this!

Sunday, January 28, 2018

Ever have one of those weeks where your alma mater decides to require 2-factor authentication on all of the sites it manages and that leaves you locked out of your personal email, remote instrument connect, Google drive, and your blogs? It's been that kind of week here. But I'm back in! And I've read something like 40 papers this week and at least 8 of them have been worth telling other people about. Prepare for an onslaught this weekend!

The amazing software team at Michigan is working to take their proteomic dark matter elucidating software and make it far more accessible to non-bioinformaticians.

MSFragger is ridiculously powerful -- but both the input and output have been...a bit overwhelming... Even better than the GUI? The team has enabled direct output of the data MSFragger creates into Philosopher!

Saturday, January 27, 2018

I'm admittedly biased, but I unabashedly love the IonStar methodology.

#1 reason? The Qu lab developed a pipeline for sample prep, running their instruments and they don't vary from it.

An awesome thing about our field is that we're tinkerers. Let's try this new sample prep method. We'll get 4% more IDs. Or let's mess with the gradient here and that'll get us 2% here.

Could Qu lab take every sample that comes in the door and reoptimize their sample prep and instrument parameters and maybe get a little better data? Probably. Could they introduce more sample handling like sample specific fractionation protocols and get deeper coverage? Definitely. But the number one thing they focus on is run-to-run reproducibility and feel free to pull some RAW files and line up the peaks. They are shockingly consistent from one study to the next. These aren't the results you get if you tinker from one experiment to the next. They aren't the results you get if your department's big deal MD sends half of a sample he/she is totally fixated on to one proteomics lab and the other half to another.

One of the oncologists's biggest core problems is -- what nasty, probably toxic chemotherapy drug do you subject this poor patient to that will have the best chance of killing the tumor and the least chance of killing the patient? Complicate that with the fact that combination therapies are probably the way they have to go -- and they are often making these decisions with less information than they'd really like to. Any extra info might help them make a better choice. These aren't trivial choices. A person's life might hinge on picking the right 2 drugs and how much of them...based on a couple ELISAs and amplification of..what?...maybe 50 genetic targets...?

In this study the authors take pancreatic cancer cells and characterize their response to single and combination therapies. This allows them to characterize the downstream effects and helps elucidate why the combination ends up being effective in these specific cell lines.

Okay -- and here is the best part and I'll get off my soap box. They looked at 40 samples here. If someone brings them in 20 more samples from, for example, patients who have been treated with these drugs they'll be able to run those new samples and line them back up. The same sample prep, the same trap and column manufacturer and the same gradient, LC and MS parameters will allow them to continue to add data to this cohort and extract useful quantitative measurements across all of the samples from beginning to end. I'd argue that this is every bit -- if not more -- valuable in many contexts than picking up 10% more peptide IDs in the second cohort and losing some of your original measurements in the process.

Don't get me wrong. It's AWESOME that we're tinkering with proteomics and coming up with new methods and techniques and pushing our field forward. This works for the theoretical proteomics and you could argue our labs that are innovating like this are our field's major driving force. Unfortunately, it is also the number 1 thing slowing down the realization of clinical proteomics and this study is another example of how focusing on reproducible measurements over innovation is what we'll have to do to bridge the gap.

Friday, January 26, 2018

Native mass spec is hard to do -- top down proteomics is hard to do -- the two together might be the mass spectrometrist Kobayashi Maru (reference -- like you need one...nerd...) ;)

Characterizing a 1.8MDa complex is nothing to scoff at. Nowhere near the record (which was around 10x higher last I looked). The fact that this team can not only get the mass but also top-down characterization is seriously impressive.

Unfortunately for me, my new facility retired the FTICR due in large part to crazy Helium prices before I got there -- but....one of these things just showed up...

It is the first one to be installed in the whole state of Maryland (and it's not mine -- it belongs to a really good friend who has tons of really cool intact/native stuff lined up for it. I am, however, hoping that a time will come that maybe she would really need access to, I dunno, ETHcD and we can work out a deal for some EMR run time!) FTICR or not, this is how native protein and complex characterization is gonna go down here!

Yeah...ETD/ECD on the native complex like the Nature paper above uses would be nice, but SID and (stepped) HCD gives you a lot of power. And the Helium/year for an FTICR would almost pay for a second EMR...so it seems like a fair trade off for now!

EDIT (1/28/18): Great! Almost no one has really yet read this awful post. (Blogger actually tells me how many people have seen this and the joke:useful ratio is way off.

Take the extreme complex on the end. If you're used to shotgun proteomics you are probably very aware of how much harder it is to manually pick through a z +4 peptide compared to a z+2. At 10,000 m/z the 490kDa native protein is still around z+50. Imagine how hard it is to make sense of it if it was around 1,000 m/z. It would have to pick up 500(5e2!) charges. And with 490kDa of protein -- it probably would!

Also consider this: in shotgun proteomics on the more sensitive instruments these days you are almost always seeing your peptide showing up as +2 and +3 (and if you look hard and it is a large peptide..maybe also +4) it's all pH, pKa gobblydygook that determines this. If you have 500 charges on something as your dominant peak, 499,501,498,502,etc.,etc., are going to be there (in decreasing likelihood -- translating to intensity) diluting your charge envelope until the maximum number of ions your C-trap let you load is still hardly enough to get you above the noise.

I think this is the point that is so nicely made in the image above.

One more since the espresso just kicked in. Check out this (by perspective of the studies mentioned above) much smaller intact problem -- intact antibodies (around 150kDa) nicely shown in this recent study.

I think this makes the clarification of the spectra more clear. The denatured mAB with 45 charges is clearly more complex than the same protein in native MS. Even doubling the number of charges makes it more difficult to determine structural modifications as large as glycans (~202 Da!). Still do-able, but harder than when your dominant peak is only 25 charges. This third paper spends a lot of time on antibody drug conjugates (ADCs) where they further complicate the intact antibody by adding things to it to turn the mAB into a disease-seeking missile for drug delivery. In case you're wondering, this doesn't make the mass of the mAB easier to resolve.

Thursday, January 25, 2018

RAS mutants are bad news in cancer. KRAS drives something like 90% of pancreatic cancer cases --and I'm sure I don't need to tell you the most common ending of that story. Scientists around the world have been looking for RAS inhibitors for decades and a lot of modern chemotherapies have resulted from this work -- just...well...none of them really target this mutation.
Maybe it is time to flip the switch and look at this a different way!

In this study these authors start with an exploratory look at the cell surface of SILAC labeled cell lines that are KRAS wild-type and induced mutant strains. The cell surface proteome is enriched in this way (as I understand it)

1) Live cells are incubated with a compound that modifies the glycopeptide/proteins on the cell surface.
2) The modified cells are washed of unused glycoprotein modifying compound and flash frozen
3) The cells are thawed with protease inhibitors and a slurry of some sort of avidin that binds to the modified glycoprotein surface
4) The proteins that aren't bound are washed off the beads and the proteins that are still stuck are subjected to on-bead digestion.

If this sounds like a glycoprotein enrichment protocol -- I think it is. The trick, I guess is that since you are only modifying glycoproteins while the cell is still alive that you aren't modifying the ones on the inside of the cell.

5) The proteins are treated with PNGase that cleaves the glycans off the peptides. (My first thought is -- wait -- can you modify this so you can keep the glycans on it? Or did you chemically modify it so that the glycans now wouldn't provide you with much/any useful information). I'm going to ask the authors a bit later today.

Here the SILAC membrane enriched peptides are analyzed with a Q Exactive Plus and the peptide ID and quantification all appears to have been done in ProteinProspector. And this is when the paper goes kinda crazy. This group doesn't stop anywhere near here.

Proteomics is the smallest part of this study. The validation is the biggest part. CRISPR, phage displays, ELISAs, and Antibody Drug Conjugate (ADC) killing assays(!!) are the star of this study. Proteomics just tells them where to look.

Even I have to admit that the story here is that they end up showing cell surface targets that appear on KRAS mutant cell lines that they can target with antibodies and kill the cells. We're talking real potential therapeutic targets for KRAS mutant cells. I'd say I'd hope that some pharma companies will jump on this and start developing ADCs right now -- but, come on, you know people have this paper in hand and are working on these right now somewhere!

Tuesday, January 23, 2018

Are you in one of those facilities (all of them...?) where you aren't allowed to install new software or updates on your PCs anytime you want? Do you wonder why you don't keep liquor in your desk when you realize your only option is to call your IT security guy? Are you currently considering the logistics of getting your own PowerPC and filling the ethernet and USB ports with hot glue so it is CLEARLY not a security risk to anyone and lug it around with you so you don't ever have to make that phone call again?

I am going to say without any hesitation at all that the new Metamorpheus with FlashLFQ is 100% worth interrupting your IT security person during his/her 35th straight hour of Dungeons and Dragons knockoff M.M.O.R.P.G. to come up and put his/her administrator password into your PC so you can install it. Yes, I know how long it takes to get Dorito(TM) crumbs out of your keyboard...yet again.... STILL WORTH IT.

Why? Because Metamorpheus can do almost everything now.

Crazy fast proteomic analysis
Scary accurate label free quan
Crosslink analysis (what!?! I haven't checked this part out yet)
Find just about any PTM and localize it before you've even reduced you RAW file to your peak picked list in other software (through the G-PTM-D) which, btw takes UniProt XML files (that contain the known PTMs -- you don't even have to unzip the files!)

The advancements in the LFQ -- FlashLFQ! -- were just described here. FlashLFQ may also work separately on it's own, but once I realized I just had to upgrade my Metamorpheus (which happened automatically...on my own PC of course, which I'm not carrying around with me...yet...)

Somehow -- MetaMorpheus does all this stuff without being an intimidating mess. It looks very straight forward and it even has a button that says

(I added some things)

It's fast, too. I think I mentioned in my post about the G-PTM-D that it still took a few days to search every known modification against a human database. This is not the case with MetaMorpheus. I took 6 Orbitrap Fusion files (OT-OT) and searched with every modification in UniProt. My MaxDestroyer did this for about 30 minutes.

..and then it was done!

It uses very little RAM. You can change that and speed the search up massively if you turn off the function "conserve memory" -- this assumes you've got a lot of RAM and that your RAM is fast. 8GB of RAM on my old Proteome Destroyer was like half of it. On this one -- what is that? A tenth?

What do you get from this? If you aren't impressed that I searched 6 Fusion files with around 1200 dynamic modifications and did it in 30 minutes, you might not be on the right blog. Oh -- and I quantified everything as well with FlashLFQ.

You can get your output as XML that you can bring right into Perseus -- or open in Excel (it will complain) and interrogate your results with the Ctrl+F function.

Now...I think the paper is en route that fully describes this amazing software. Maybe if it is really hard for you to get software installed you should wait until you have the publication in your hands -- or maybe you're like me and you just want to search all your data so you can move on to the next project. In that case, maybe you should check out MetaMorpheus and tell that nerd in the basement to pause his campaign.... ;)

Saturday, January 20, 2018

I'll be honest -- I may have just made fun of using MudPIT in a meeting recently. In my immediate defense, I was only thinking about it in context of our Fusion 1 system because it has been my primary focus the last few days and nights.

This study isn't in Cell because they didn't have anything else to put in the journal that month. This study is in Cell because it's flippin' awesome!

MLL is a gene that has a wild-type form (which is important) but can have weird translocations and produce strange chimeric proteins that are seriously bad news for the patient. Over 70 different chimeric proteins have been identified -- and they all sound like they sucked.

As you can imagine -- this is kind of a moving target. Seventy different protein variants? How do you even start to study at this? This group says "oh. that's simple. we'll study it with EVERY analytical technique you've ever heard of"!!

This study has cell sorting, RNA-Seq, induced mutations, purifiable (via flagtag) proteins, more cell line combinations than you can shake a stick at, bone marrow transplants in mice that are THEN irradiated -- you name it. They threw it at this problem. Oh yeah! And they did proteomics!

What did they get out of this? Oh -- just the most thorough picture of how MLL translocations lead to the destruction of the important wild-type protein and a darned good picture of how the entire mechanism of MLL leukemia works. You know...nothing special...

An immediate question I had was how did this relatively small number of authors do ALL OF THESE THINGS? I looked and expected it to have 40 names on it. I'm impressed, for real.

Side note: An LTQ is still an awesome instrument if you give it the right problems to solve!

Friday, January 19, 2018

Global proteomics is awesome. I LOVE to give the elevator pitch to someone about what proteomics is. I've ran through it so many times that I've got it perfected and I imagine that just about everyone doing this has one that is better than mine.

However -- there are some clear downsides to all of the statistics that are necessary to match every ion the instrument sees and/or fragments to a theoretical database containing somewhere between tens of thousands and hundreds of thousands (millions?) of theoretical sequences. Just a reminder: A 1% false discovery rate (FDR 0.01) on 1,000,000 Peptide Spectral Matches is 10,000 matches that could have occurred purely by chance.

On the other extreme end -- you have the targeted proteomics stuff -- where you specifically look at a small set of things you are interested in. This new study bridges this gap.

This study is focuses purely on cancer biomarkers. To go after them they narrow the definition of what a "biomarker" is by interrogating databases to build a list of around 1,000 proteins that have been linked to cancer in some way. I haven't looked at this list yet, but I like the number. If you are searching 1-10 proteins, I do not trust global FDR approaches like target decoy -- or even Percolator /Elucidator. They're great, but I think they need a lot of data to work right. Around 1,000 proteins? I'd use the global tools without hesitation (I hope it goes without saying that I would manually look through the matches, though!). Here data spectra appear to be searched against all of Human UniProt/SwissProt, but the downstream analysis in informed with the biomarker list. I'm thinking that I might look at some other datasets and limit the FASTA to just the biomarkers this team has identified.

The team then develops a kind of extreme phenotype to assess how well this approach works. By arresting cancer cells of different types at different cell cycle check points they have a really interesting and complex model system to test it on. And it works! An LTQ (yup! Linear Tion trQp!) can identify and quantify more than 1/3 of the biomarkers from their starting list. Since we know that cancer is almost never just one protein being messed up, and is instead dozen or hundreds of proteins working together -- 300+ quantified proteins is more than enough to point you toward the pathways being affected!

Thursday, January 18, 2018

It's funny how often I am "introduced" to something and the search blog on this bar shows proof that I read a paper on the topic at some point. JUMP is one such thing!

I have an excuse for forgetting about it. I didn't have a Linux computer to run it on. Yesterday, however, my application for access to a small Linux computer....

...with 72,000 (seventy two THOUSAND!!!) processing cores was approved...and it's time to see what this little guy can do (and...what they bill for it...since I now have a billing account for it as well...)

Of course it has around 50 programs installed for next gen sequencing analysis, but it has 2 programs for proteomics and the first is JUMPg, which was recently described here.

My hopes were dashed a little when I found out it is only enabled to run on one node at a time (so... 28 or so of those 72,000 cores...) but it appears that multiple instances can be started. I'll send it something tough and see how it goes.

It seems like most institutions have super computing resources these days (they have to in order to support the "next gen" sequencing stuff). Maybe JUMPg is a resource you can sneak onto one near you as well!

Wednesday, January 17, 2018

One of the first things I noticed after being out of the lab for a few years is that the scientists have been getting younger -- or...well...something else I'd rather not consider has occurred.

[Go Go Gadget Denial!]

An upside of this is a running notepad on my telephone of the cool new tools I'm learning about that are being used in school these days/recently that I need to check out. The first I'm getting to this morning is Avogadro. It was described originally here, but appears to have evolved a lot since.

In it's simplest form it is a really powerful molecule editor -- akin to ChemSketch/Draw but with a simpler and more intuitive interface. It if has something comparable to the "mass spec scissors" I haven't found it yet (but I often just delete a bond anyway).

Imagine that you are sitting there minding your own business and someone walks in with some Louisiana crayfish peptides. As cool as proteomics is, you'll have to convince me that there is a more appropriate usage of Crayfish than this....

...but let's assume that this is super important (and we only need a few micrograms of peptides anyway)

What if there isn't a good sequenced crayfish FASTA database? I don't know if there is, I'm working on something else and I only chose this example because I'm hungry. Before you go all out and start de novo sequencing everything, maybe you can start with just a giant FASTA (I learned today that this is a hard "A" fast-AY or fast-(Candian) -Eh.) Who knew? Everybody?

You can start by building a FASTA that has all related organisms. If you have Mascot, you're in luck. You can just choose the taxonomy in your pulldown (assuming the complete database has been loaded). I don't have Mascot access at home so I went to Google and the first link was some terrifying exercise in BLASTP+ from command line where you cross-reference your taxonomy list from UniProt to the complete FASTA....

Then I remembered one of the perks of having PD maintenace -- something about FASTA downloads. Turns out it is pretty cool. If you look up Crayfish (p.s. in my state we call them crawldaddies, no idea why) in WikiPedia you can find the entire taxonomy. You can then follow either of the links in the box at the top image (this pops up when you go FASTA Database utilities --> download from ProteinCenter --> I chose arthropoda in taxonomy which gave me that number and then I just hit Download.

It queues up and does everything, building you a huge (prepare to wait a while if you choose TrembL) database that you can then run and see if it actually finds you some hits.

Sunday, January 14, 2018

NeuCode (neutron coding!) has seemed an almost inevitable replacement for a lot of our labeled proteomics techniques for a few years now. However, the fact that you are simply switching neutrons in different atoms has made some of us kind of gasp at how much resolution you need to pull it off.

(There are over 20 theoretical tags that can exist in something like a 0.030 Da space!)

This new Nature Protocol walks you through the entire thing -- including the really smart instrument method shown above. The trick is using a lower resolution MS1 scan to pick your data dependent ions for fragmentation and then to obtain those fragments in the ion trap simultaneously to your 500,000 resolution scan for quantification.

In case you're thinking...umm...how would you possibly write that method... Don't worry! They provide step by step instructions for both instruments!

MassSpecPro has also put some out there, but he's an ion physics guy so they're a little beyond this biologist and maybe intro courses (but good for physics classes, I bet!). I've seen them on Twitter, but I'm sure they're on his website here.

Saturday, January 13, 2018

OPENChrom Community Edition is an open source community driven software for looking at chromatography and mass spec data from virtually any device -- in the same interface.

There is an Enterprise Division if you're really serious and you like it, but if you meet the requirements to use the Community Edition and you're real tired of looking at output from 6 different vendor instruments, this could be huge.

What I was looking for was something like this:

You can't tell me that with today's quality of home electronics that can built with Arduino and controlled with Python and/or Raspberry Pi that I can't recruit some summer students in mechanical engineering and have them build a functional HPLC.

I have access to a number of new/newish HPLCs now and I was surprised to see that many of the features that were standard on our old stack in grad school (that was controlled with a monochrome Macintosh II..and I wonder if it is still cranking along making beautiful chromatograms -- yup! sure is!) aren't things that we have now. Sure, the pressure is higher on the pumps, and the mixing is supposedly more uniform, and there are cool things like what the print cartridge manufacturers use to make sure I'm using the instrument vendor's columns. There are even cooler things like 1/16th inch screw unions that can't be used together and there are inconsistent labeling schemes for solvent delivery line diameters -- from the same vendor.

Important side note: Do not use NanoVuper and ZenFut unions interchangeably! They are both 1/16th. They are not interchangeable or compatible! They just look compatible and the consequences of using the wrong one can mean pulling a switching valve and trying to remove the dead volume seal. I didn't make this mistake, I just heard about it ;)

Maybe I'm just mad about pump seals and a fraction collector that doesn't automatically progress to the next 96 well plate so we can't batch fractionations (different vendors, btw, but the latter makes reproducing the ultradeep proteomics methods out of the Olsen lab seem much more difficult). I'm probably not angry enough to actually spend my Saturday morning emailing engineers and chromatographers I know to see if they'd like to pool resources on such a ridiculous after-school endeavor. No one is that weird, right?

Thursday, January 11, 2018

This isn't even close to the first time someone has set up GPU based data processing, but it's still really cool!

For people who aren't as obsessed with today's computers, this is how most of them work:
You have a Central Processing Unit (there is probably a sticker on it that says what that is: i7 or Xeon which...really...means nothing because there can be a 3 order of magnitude difference is power and efficiency between different generations of these processor families. It is probably an indication of how much the PC cost when new. i3 being the least and Xeon being the most. That's the CPU and a crazy huge one has 20 cores (though multi-chip setups might allow near 100 these days)

Your computer also has a Graphics Processing Unit (GPU). Lower power ones may be built directly into the motherboard. Higher power ones will be completely separate units attached to the motherboard. Modern GPUs have THOUSANDS of cores. These cores are generally taxed with controlling a small number of pixels and their work load isn't very hard. There isn't a real reason to give them tons of memory.

Side note: Who is old enough to remember when you had to purchase a secondary "math processing unit" for your PC so it could handle big data like multiplying 6 digit numbers....? We've come a long way!

My understanding is that one of the big problems for searching a spectra with a GPU is the memory issue -- a spectra is too large to fit in the memory for that tiny little processing core.

G-MSR reduces the data that each GPU core gets hit with to make sure that each little tiny core can handle what it is told to process. The authors hit it with 14 different historic datasets (I think they're all high resolution MS/MS) and they can process all of them.

I looked up the GPU they are using and it's $4,000! It looks like one specifically designed for industrial applications, but it doesn't look like it has more cores or memory than your standard $300-$500 GPU you'll use for 3D rendering, playing Injustice 2, or mining applications. It would be interesting to see how this algorithm would do with something that it would be easy to array 5-10 of....The program is GPL open and can be downloaded from GitHub here.