The second one I get. For global proteomics I also go to MS2-based TMT quan. I'd rather get 15 peptides per protein with more isolation interference than getting 9 peptides per protein with less interference. The first one -- cheeeeeeeese --- I can't wrap my head around....

The author's report 16,700 TMT10-plex labeled phosphopeptides. I'm pretty sure I got around 10k when I reprocessed it myself (and I'm a picky jerk about PTMs) with offline fractionation and short short gradients (6 hours total run time or something ridiculously short).

And maybe it's the offline fractionation that improves the coisolation? And maybe phosphopeptides are just simpler? Because on the HF-X -- at least at launch -- APD was always on....

If you're worried that I'm just making this up, call into tech support or your favorite FSE and ask, they'll be going around doing these patches soon. For reference, this is Factory Communication 2018.020

Friday, October 26, 2018

((This image was floating around un-acknowledged on Google Images. It is a CopyRight of Steve Graepel and originally appeared here. This image used without permission, but better that I hunted down the guy who created it, right? (As always, let me know if this is a problem and I'll take it down!))

Okay -- so those degraded peptides?? THOSE ARE A SUPER BIG DEAL! What if a big team decided to do something crazy and profile those???

I'll be honest, I'm not 100% sure how they did this. I believe the proteosomes were purified and then the degraded peptides were knocked loose from them somehow. Then MaxQuant was used for an enzyme non-specific search of the entire proteome. Multiple rounds of digested proteomes were used for comparison to make sure they were on the right track.

And -- I can't even wrap my head around all the potential here, but I'm going to try.

1) The "dark proteome" stuff -- which might have a different definition now than the one I normally put with it. I consider it all the stuff that passes MIPS (or Peptide Match) -- so it isotopically looks like a peptide, elutes off c18 when a peptide should, but we don't know what the Albert heck it is.

The protesomes are, presumably, active ALL THE TIME. So a lot of the background peptides may have just been profiled in this paper!

2) How these differ between disease states could open up a whole new field in diagnostics! The proteosomes are tightly regulated by a series of complex processes (typically modulated by ubiquitin, as far as we can tell, right?) Some proteins are labeled for degradation just because they're old (there is an N-terminal instability thing that marks old proteins) or they're degraded as part of the specified, complex, and poorly understood mechanisms.

What if we didn't need to learn the degradation patterns themselves and could just monitor the degraded peptides coming out of the system??? These authors do this here and show the potential this may have -- there are big differences in different diseases!

I'm super psyched to discuss this paper with people who understand the biology behind this and congrats to this team for ---

The first Tweet is my perception of this great paper. The second Tweet -- well -- that's pretty funny...

Thursday, October 25, 2018

Virtually all of proteomics data processing these days requires a proper monisotopic assignment to make a match. It's also probably no surprise that today's instruments are trained on perfect tryptic digests.

What if I told you that there is a huge spreadsheet showing that monoisotopic assignments of big peptides (like crosslinked peptide species) are messed up a large percentage of the time?!?

Wednesday, October 17, 2018

EDIT: This blog exists in a wobbly space in time and space. This half-day course is on October 17, 2018!

I know this is the proteomics blog, but if you're also dabbling in the dark side with small mass ions, you can't find something as powerful and easy to use as XCMSOnline.

There is a half day course that starts at 8:30AM (California time -- if it was East Coast I would neither attend nor would I tell you about it. I'm going to assume no one attends in person because that is when people are supposed to be sleeping NOT sitting in meetings....) but they are livecasting it!

I've got to give a talk that I should probably...start...writing....umm....soon... but then I'll log in and try not to ask questions that are too dumb....

A lot of the scientists we get samples from are getting used to statistics. If I sit down with a biologist younger than myself, it's pretty much a given that I won't have to explain what a PCA plot is, because the computers they had in their stats class was powerful enough to run one. My stats class in the 90s didn't have a PC element to it. And -- if I had started clustering or building a PCA plot on 20 samples on a 486 computer, I bet you it STILL wouldn't be done. (However -- it WOULD have minesweeper built into it. The future isn't always pressing forward in every regard.)

What does all this rambling have to do with anything other than Ben's love of espresso and rapidly striking ergonomic keyboards???

INSTANT CLUE SOLVES ALL OF THIS!!

Look -- if you've mastered Perseus -- good for you. You're awesome. And you probably don't need this. If you're an R superstar and it's easier for you to do everything in R (did you know some people make their slide decks in R rather than Powerpoint? (@AlexisLNorris)) -- then you probably don't need this either.

But if you need
1) A GUI interface that works in Windows, MacApLitosh, and Linux
2) That has amazing flexibility to upload data into
3) That has short, premade, well scripted tutorial videos in case you get stuck
4) Has every stat you ever heard of and a bunch maybe you weren't sure if you really heard or if it was someone who started to say a real word and then accidentally burped a little, threw in a muffled apology, and then finished what they were saying without inhaling. (Tell me that's not what "latent semantic analysis" sounds like.)
5) A way to rapidly export the cool stuff you find
6) And software you can get started with even if your number one goal for the day is to not read anything smart today at all ---

Sunday, October 14, 2018

Membrane proteins are hard to get to. They've got super hydrophobic regions for stuffing inside membranes, they've often got multiple glycan domains and they can have annoying 3D structures that are just clumpy (best term I've got this morning). Even the most comprehensive global proteomics studies we've ever seen appear to under-represent membrane proteins. (Post I wrote on that last topic last year).

This group essentially enriches for hydrophobic peptides by throwing in a high organic separation that results in a downstream loss of the most hydrophilic peptides. EDIT: Loss isn't the correct word. Let's go with "enrichment of hydrophobic peptides in relation to the general peptide population."

The RAW files are up at PeptideAtlas here and it's striking how much signal they get in the high organic section of their chromatogram through this process.

It's fair, I think, to mention that this approach isn't entirely new....

...but the changes are definitely novel enough to warrant checking out this iteration if you're looking at membrane proteins!

Saturday, October 13, 2018

It seems less bad if you start with the fact that this team has a (computationally expensive) solution and I think it's already live on Crux.

Look -- we all know that all our FDR shortcut things (target decoy, Percolator, Elutator, and so on) are imperfect. And -- we know they need appropriate datasets to work right. This study starts out by pointing out what happens if the dataset that hits the FDR calculator IS NOT right. Fluctuations by as much as 20% in your peptide IDs, just by reshuffling your decoy sequences and searching the same data again??? Ummm.....

....yeah....fortunately for those of us who use...well...BASICALLY EVERY PIECE OF SOFTWARE I USE....when you make your decoy sequence, you end up using that one pretty much forever.

Let's see....when was my UniProt human decoy FASTA generated.....

Oh. The week I installed software on my new computer?

The reason this is so disturbing is that if I was using a program that would reshuffle my decoy FASTA every time, I would see this because, given random shuffling, my results could be very different each time I press the <RUN> button. Okay -- Honestly, from a reproducibility standpoint, making one decoy and sticking to it is a good thing and keeps people from asking questions like "wait. are you running my results through a random number generator?!?" and I'm grateful for the fact I don't have to answer this question. This paragraph is poorly written.

Okay -- but -- at the end of the day I want to give people the list that is the absolute closest representation of what the proteins that I can detect in the cells they gave me are doing. And if my current FDR methods are simply masking issues with the data that can be as extreme as described here -- I think upgrading the way I generate my lists and tell true from false needs to be put at the top of my priority list.

Thursday, October 11, 2018

Our spectral matching tools are going into some unprecedented territory right now in terms of the ridiculous power that they have. I doubt we'll ever get to a point where we disregard SeQuest and what it has continued to evolve into, but -- holy cow -- there is some amazing stuff out there right now.

A new entry in this amazing category is Open-pFind. Here is the bioRxiV link, but it's now in Nature something or other (can't find the link yet) -- which seems to be advanced beyond the preprint entry.

pFind isn't new, but this is pFind 3.1. As far as I can tell, a totally free GUI (you just have to go through a licensing procedure so they can keep track!) that you can get here.

What's it do? Well -- like the other entries in the category (the ones I use the most right now are Fragger/FragPipe and MetaMorpheus, but there are obviously others!) pFind doesn't care what modifications you're looking for. It blows up the search space to a huge level and then starts pulling out the modifications. You can find what you never thought to look for, in large scale.

It looks like it uses a different type of mechanism for making matches than the others. I

You know what this blog needs? A DEATHMATCH. There hasn't been a software DEATHMATCH in forever. Time to get a great dataset, these 3 software packages and pit them against one another in a vaguely scientific and moderately unbiased manner. Gotta come up with some rules, though....

Don't be distracted by my rambling --- there is some serious important stuff to learn in this paper. Their tests are extensive and show both the power and weaknesses of other programs out there.

There is also some surprising insights (to me, at least) into the "Dark Proteome" stuff. And....well ....even about trypsin. It only cuts K/R, right? Right??

Obviously, we've had Percolator and other programs for a long time -- but outside of those they haven't impacted us all that much. BOOM! NEW ENTRY!!

I'm on the wrong computer so I can't read this yet -- but -- if this is real -- this is a seriously big deal. These people train a machine learning program to learn the profiles of different tissues and cells. I'm super motivated to get to a computer where I can read this --- I just need a huge data transfer to finish first!!

EDIT: Okay...we probably all knew that we could probably do this, right? The best part about this might be the fact that this group did.

Tuesday, October 9, 2018

One of the gems of Baltimore is the National Institute on Aging. They do all sorts of cool stuff over there, but the one that I always think of first is the Baltimore Longitudinal Study (BLSA), which has been running since the 1950s! The goal is to establish some understanding of what healthy aging and what is not....

In this study they don't use LC-MS, instead opting for the SomaScan thing (which is up to 1,300 targets, now? That's a big bump since the last time I'd heard anything from it!)

I like this study because it shows that we don't always have to push for the highest number of targets to draw conclusions. Maybe there is just as much to learn if you use the same amount of time to run more samples and allow the use of better statistics!

Do y'all know about this PaperSpray thing? You literally just put a drop of blood or whatever on a piece of paper and charge the paper like it is a nanospray emitter. The liquid ionizes right off the edge of the paper and into your mass spec. Cool stuff, but I don't track toxic inorganic compounds or anything, so I haven't needed to do it.

BUT -- here -- this group blows the doors off. While their end goal is instant tracking of some terrifying sounding biological weapons in people -- what they also do here is quantify a neurotransmitter! From a tiny amount of blood! And they do a reaction on this piece of paper that is their ionization source.

Am I (just) crazy or does this now sound like PaperSpray has moved over from the "cool toy" to "this belongs in the clinic" category?!?

I don't know if anyone else will ever find this tool useful. I know that I'm using it daily (and, if I'm perfectly honest, that's all I care about, but I really really wanted it out there just in case and for everyone to see how smart the people around me are!)

Here is the scenario it was invented for:

I've got 24 fractions of reporter ion quan stuff. The LC-MS/MS was ran by an expert's expert (PNNL, FTW, yo!) and this phenotype is as extreme as you can possibly get. The control channels? Yeah... they were still alive when the samples went on the instrument....

And you know what I have from the total protein quan? Besides some concerns regarding my capabilities as a scientist? NOTHING. And, yeah, today I can delta mass search and I can de novo everything and whatever. There are a lot of tools now that weren't around when I got these files. What if I use these? I get a big ol' list of things. But...if the answer is here in these million spectra? I can't make sense of it. And let's face it, sometimes the best quan software tools aren't found in the same place as the best discovery tools. My favorite tools for discovery don't yet have this kind of quan -- and I need it here.

Okay -- so what if I get RIDAR from Conor's Github here. And I take my MGFs and I say -- only keep the MS/MS spectra that are >2,5, or 10-fold different between my controls and everybody else? (You have to edit the text file, but I've requested a GUI. That's how hard I am to work with, btw.... "Thanks for doing this amazing thing...can you make it so I don't have to open this document, change this number and then save it? That's.too.hard. Thhhaaaannnnkkkkssss.....!")

What does this enable?
1) I know these spectra that RIDAR keeps are quantitatively interesting. Now this opens up tools I love that don't have reporter quan built in. Fragger (is it FragPipe yet?), SearchGUI, Metamorpheus. KER-POW. ALL THE POWER.

2) At 10-fold? I've only got a few thousand spectra -- and you know what they look like? PTM hotspots. Is it real? I don't know yet, but I do know it's the first lead I've ever had on these files. In the study we look at CPTAC data and -- you know what? -- it's similar. Sure, the proteins that change the most come to the top (you'll see bunches of peptides for them, but then you also see loads of spectra that are from one peptide/protein --> and it's PTMs EVERYWHERE.

Sorry if this seems self-promotional (said the blogger, lol!). I didn't make this. It ended up smarter than I ever imagined. (I still don't understand the normalization thing they came up with, but it works!) and now I have a tool I've wanted for years!

Saturday, October 6, 2018

In my house, Jeff Leak is a hero. Maybe in a lot of other houses. I've never met him, but I've seen him speak and taken an online course he taught. The dude does awesome science and somehow makes it approachable to more people than you'd believe possible.

Okay -- so this is right in line with stuff we're working on in Frederick -- how the heck do you become a data scientist if 24 patient samples is 300GB of Lumos data and you've got a PC with 2GB of RAM? Answer? No idea.

Friday, October 5, 2018

My list of things to blog about is about 100 things long at this point. There is ridiculously cool stuff out of Max Planck and the Smith, Glaros, Pandey, Coon and Gundry labs that are at the very top of my -- "you've gotta see this!!" list and I keep getting distracted by off-target stuff that matters to what we're working on in Frederick. Oh -- and HUPO was last week?!?!

And I'm rambling about what matters directly to what we're doing in Frederick.