Sunday, September 30, 2018

This concept is BIG, has far reaching and hard-to-fathom potential consequences, and is something we should all be thinking about.

I'm not qualified to talk about any of that (it may not stop me) but what about the stuff I do kinda understand? First off -- this is the paper (direct link here)!

What is the Exposome? The authors define it here as "....human airborne environmental biotic and abiotic
exposures..." so....the stuff in our air that we're getting exposed to, coming from living things or non-living.

Interested? You should be!

Okay -- so they monitor 15 individuals around the world for up to 2 years. They had a "wearable device" of some kind that collected samples of the stuff they're exposed do. They do a ton of genetics stuff on the people and the samples that are collected (I think. this is a Cell paper, it's like 100 pages and I do have a job).

Now the interesting stuff ---> for the exposure stuff, the samples are ran on a UHPLC coupled Exactive using a cool mixed mode column (to presumably separate both polar and nonpolar compounds well) and -- the details are kind of fuzzy in the methods -- but it appears they ran each sample in positive and negative? or with pos/neg switching with 100,000 resolution.

The data was searched with XCMS and someone on this team is an R fanatic (or epidemiologist -- which might be redundant). I've never seen so many individual packages utilized in a single study -- but the genomics and the geographic data are all statistically tied together and ----

We're exposed to TONS of stuff, both from living and non-living sources. And -- geography plays a huge role. And -- there is some clear looking (though mysterious in their actual meaning) links between what you are exposed to and what is going on in your genetics.

Probably not the right response -- but I am certainly definitely completely not qualified to judge. However, this is a really though-provoking paper in a field where our technologies will obviously be able to help!

Thursday, September 27, 2018

I've been dying to talk about this one since seeing a talk sometime in the spring about it!

Did you know that phosphorylations are commonly associated with phosphorylations on amino acids right beside them or just a few amino acids away?!? I didn't, but I've asked a bunch of biologists and they said it's true.

Before you get too excited -- Thesaurus is for DIA and PRM data. Wait -- You're more excited?!?!

(Groans.....okay...last one, probably....)

Thesaurus is software. You might have guessed that from a couple of the names on the paper. And it -- okay -- figure 1 is awesome and explains it better than I possibly could.

Somebody is good at making flowcharts. The end result of running through that logical circle is going to be a test of whether phosphorylation at E or F is the best match ---

Okay ---- last stolen picture for this post -- but this is the ABSOLUTE COOLEST PART --- what if it is both of them? Because it biologically makes sense that it could be. No -- not phosphoRS doesn't have enough information to discern which one it is and gives you 50/50 so you just report both --- like, biologically it can and does totally happen that you'll get an almost perfectly co-eluting pair of peptides that is both phosphopeptides E and F (obviously not in the example above, but it really does happen a lot (proof in this great paper!)).

But this is the ABSOLUTE COOLEST PART (wait. I said that. I'm excited.) in modern dd-MS2 -- we skip the second one! Almost always! We're so certain of our massively improved peak shapes and the efficiency of our instruments in making an ID on the first fragmentation that most of us use dynamic exclusion to trigger at some (by historical standards) ludicrously low peptide intensity -- and then we exclude peptides of that exact mass from being fragmented for huge amounts of time. So if there are 2 positional isomers eluting at almost the same time -- we don't see it.

Is it possible that our improved methods and instruments is actually decreasing our phosphopeptide ID recovery? Yeah, it totally is.

EDIT: Forgot this part --> In DIA and PRM you are constantly acquiring MS/MS spectra for your mass range in a cycle. So you can see fragmentation patterns of two almost completely co-localizing phosphopeptides and Thesaurus can help you identify them!

I think DIA has kinda been floating around looking for something that it's good at -- or better at than dd-MS2 -- this might actually be that thing.

Wednesday, September 26, 2018

We don't always have 11 samples to run. In cost per reaction though, (at least with my old sales rep -- the new one doesn't seem to appreciate the level of discount I expect 💔😇😇) TMT-11plex is about the cheapest way to go.

If you don't have 11 channels, for example, you have 6 -- you can get a boost in your number of peptide and protein IDs by skipping every other channel.

If I have 6 samples I will pick an N or C variant for each unit mass and stick to it. For example

126
127 N
128 N
129 N
130 N
131 N

And make sure to not mix in any of the C variants from the same unit mass. To fully resolve 129N/C you need 43,225 resolution @ m/z of 200. This means you need to use 50,000 resolution on a Tribrid, QE HF or QE HF-X. Or 70,000 resolution on a QE or QE Plus.

Sooooooo slooooow..... Great data...but...sloooooowww....

If you skip the N/C variants for each unit then your reagent is essentially the TMT6-kit (with shuffling of the N/C)

At 1Da apart, it really doesn't matter that MS/MS resolution you use. I drop it all the way down (30 Hz on the Tribrid!). You aren't going to get 6 times the number of scans -- but unless you are fill-time limited (having trouble hitting that AGC target) you're going to get a lot more MS/MS, peptides, and protein IDs than if you'd mixed your N/C labels!

Worth noting -- if you are using Proteome Discoverer you probably want to create a new quan method for the channels you use -- if you want PD to normalize the data

-- if you aren't normalizing and/or imputing quan you probably don't need to worry about it. You can just hide those channels in the output report. What you don't want is background noise in your quan region (or a reporter M+1 isotope) being mistaken for your quan, it being scaled up as if you loaded only 1% of peptide in channel 128C and some noise being amplified 100x. Probably harmless in the end, but it totally looks weird and worries me that it might actually affect something.

Tuesday, September 25, 2018

I have a tendency to give people a really hard time for animal models. Or Arabidopsis. I understand these things have uses, I'm mostly just annoyed when people use these models unnecessarily -- or deceptively. Veterinary medicine is woefully behind a lot of other sciences. Maybe our technologies can do something about it?

Okay -- so some lucky person on this team got to collect urine from almost 20 sea lions. 11 of them were sick. What has to be more fun than collecting sea lion pee? Collecting it from one that doesn't feel well! Dedication to your craft, FTW!

Discovery based LC-MS was used on a Tribrid and found nearly 3,000 proteins. Which made me almost try to find out how many proteins are normally found in urine. That sounds like a lot, right? And a number of obvious differential protein biomarkers. The others mention in the paper that this bacteria affects a lot of other sea mammals and they decided something with the word LION in it's name was a better place to start than a cuddly otter.

Sunday, September 23, 2018

It probably is no surprise to you that using proteomics for structural biology is growing like crazy right now. It seems like every day another one of our 250 or so researchers sees some data from someone else's DSSO protein crosslinking study we knocked out and is excited to send us samples. At this point I think the queue for our only Fusion is longer than most reasonable estimates for my life expectancy and DSSO might be the main reason.

But protein crosslinking is just one of many structural biology techniques that is benefiting from our massive improvements in instrument speed, resolution, sensitivity -- and, more importantly(maybe?) our access to alter our instrument experiment logic (seriously -- maybe above all else, the reason the Fusion is so powerful -- though MaxQuant Live may offset that somewhat when it launches).

It goes into techniques you've probably heard of and maybe forgot years ago when it sounded smart, but the hardware just couldn't pull it off. Maybe it's time to revisit these with what we can do today! (Honestly, it has a bunch I swear I've never heard of at all -- and those are cool too!!)

Somewhat related and something that will go in that section over there --> as well -- if you do the Twitter thing -- @BioTweeps posted a really concise and well-written overview of Mass Spectrometry that surprisingly well within the Twitter character limits. You can find it here.

Thursday, September 20, 2018

Okay -- cool developments with our toys aside for a second -- this might be one of my all-time favorite papers. If you can look at that picture without getting inspired by how far proteomics has come -- AND WHAT WE MIGHT DO NEXT?? you probably didn't click the right link to end up here.

I think I might have passed by this paper once because I didn't know what a lot of the words were in that title.

To be honest -- I have one criticism of this paper. The title is terrible.

What I would have made the title --

In a clinically relevant time frame we can help diagnose a cancer patient and pick a personalized therapy to kill their tumor and massively increase the chance the patient survives!!

How'd they do it? They did label free proteomics on slides (? pretty sure?) from an excised tumor as well as the surrounding stromal tissue.

They rapid tip digest it (they use the rapid reduction/alkylation together method as well) and they pop 1ug of peptides onto an inhouse 40cm column (probably cost them $5 to make?) on a Q Exactive HF. Yup, just an HF. Like the one that is running brain digests from mice with generalized anxiety disorders or something in your lab right now that you were considering trading in for something more expensive to run mouse brain digest with? Same one!

They use the always free MaxQuant and Perseus for the data processing and downstream analysis/stats.

Why am I fixating on prices? Because the first argument in clinical anything (at least in the US) is "how can we make an absurd profit for our shareholders again this year if the test we charge $8000 for costs more than $1.16 to actually run? Do you think we're running a charity in this hospital?!?!?"

This group just showed us that we could do personalized proteomics to help patients TODAY with an aging benchtop Orbitrap (that will fit neatly into any clinical lab -- have you seen how much smaller colorimetric blood analyzers are now? They're tiny! Boom -- put an HF there). If you consider free software, virtually no cost for reagents (10uL of acetonitrile and some trypsin) and I don't think we're too far off that $1.16 target for the assay cost. We can stop talking about personalized medicine and actually start doing it already!

Monday, September 17, 2018

I'm torn on this one, cause this is real footage of me the first, last and only time I ever packed a nanoLC column...

...you know what would make this even better? Throwing in an extra 25,000 PSI? Ummmm.... no...not the best idea for me....

However -- if you have successfully packed a nanoLC column without injuring yourself or others AND can make one that doesn't negatively impact the performance of that $1M instrument you're using...maybe this is a solution that would work for you!

Saturday, September 15, 2018

This has come up a lot over the years and I was surprised to see I couldn't find a case of me rambling about it here! So....I present Ben's guide to run Proteome Discoverer way slower and with lots of weird random chaotic surprises!

Tip 1 (picture above):

Keep all your stuff on separate drives while processing! Bonus points if you keep your RAW files on network drives. Double bonus points if you process your RAW files over your network from one drive and then deposit the results on a different processing drive!! Want to level all the way up? Pull your RAW files from one network storage drive. THEN transmit your processed results to a DIFFERENT network storage drive!

...hours of processing....

Besides the fact that you've went from eSATA data transfer rates (according to Google --6Gbps) to (assuming you have true gigabit ethernet LAN) to a whopping 100 Mbps which is a minimum of 60x slower, you also get to deal with a bunch of cool extra things that are described well in this page.

It totally cracks me up that the physical distance between your network drive and your PC is a tangible factor that can affect your network rate. High traffic on your network doesn't speed things up either (a win for us nocturnal scientists!), but that is often negated by the huge FAIL that the drives tend to do things like perform their backups and security scans at 2am when there is only the one weird guy in the building using them.

Honestly, our files aren't all that big. We just did some deep fractionated proteomes (15 fractions) and they're maybe 24GB per patient. Transferring 6Gbps and 100Mbps a second shouldn't be that big of a change, even if you had 10 of them, right? However, it isn't just one reading step. It's constant R/W steps (have you seen the funny huge ".scratch" file that is generated? while you're running? You are constantly reading that back and forth across the network.

Around the fact that PD is super slow -- you get all sorts of hilarious strange bugs. This week I saw one where PD would claim there was something wrong with the name of the output file that someone was trying to use! Wins all around!

Tip 2: Even on the same PC --- process your data on different drives!

I think I have proof around here somewhere. I think I worked it out to 24x slower if you process the same data all on one drive as opposed to R/W to different drives. I think it's on my old PC....I'll update if i find it, it's striking.

Wait -- side note --- did you know that even HDDs can have markedly different speeds? They totally do! There are drives designed for storage that are much slower than ones meant for working on. I've described my problem with that recently on here I think.

This is from a paper currently in review from our lab, but I think it's cool to use it here out of context ---

The cool part is how our new software makes processing huge proteomics sets much faster while kicking out the same data -- but what is pertinent in this ramble is the two shorter bars. Using the exact same files, huge mult-gig proteogenomic FASTA and software settings, we can drop a processing run from 24 hours to down to 14 or so just by moving everything from a HDD to a faster standard commercial Solid State Drive (SSD). If you aren't processing on these, I'd recommend checking them out again. They are getting cheaper every day. I think we just ordered some 1TB ones for less than $200. Bonus: I've still never had an SSD fail. And I've got 2 HDDs on different boxes that sound like they are popping popcorn (not the best sign ever) that aren't as old as the SSDs sharing space with them.

Can I call this a "guide" if there are only two tips? On the first Saturday in approximately 3 years in Maryland where the sun is shining? Looks like I sure can. I need to put on some brake pads.

TL/DR: PD HATES processing over network drives. Move your data and output files to the same drive when running PD then put them back. Yeah, transferring is a pain, but you'll more than make up for it in processing your data faster and with less random chaos.

Big shoutout to the two great scientists who introduced me to new PD errors this week that inspired this post! I promise I'm not making fun. This really does come up a lot. It's too tempting to use your >100TB network storage rather than move things around, but I think system architecture needs improved before you can do it bug-free.

Now...I'm unclear how the ANN (Approximate Nearest Neighbor) part of this differs from the NIST Open Search functionality added to MSPepSearch last year. At first it seems interesting that the authors use the NIST library here but don't appear to compare their code to MSPepSearch + HybridSearch. They do use other libraries and since MSPepSearch only utilizes NIST library format, maybe the comparison isn't possible? I would be very interested in seeing a comparison between the two.

Unfortunately, while Open Search has an .Exe that I can run and use, ANN-Solo requires a NumPy Python to work and I'll have to ask for help if I want to try it. Honestly, with results as good as the paper reports -- 100% worth it.

Be warned, I'm already planning an award ceremony for when someone pulls this off without looking at the nucleotide data.

My proposal -- we should have an award ceremony for ourselves at ASMS or HUPO next year. I also propose it features the great Dr. Jurgen Cox coming in and kicking over a stool that has a Mi-Seq on it. Come on, tell me you had trouble visualizing that happening when you read it!

Wednesday, September 12, 2018

Hopefully if you're doing proteomics you're always throwing in some great FASTA entries from cRAP or the MaxQuant contaminant database or have even generated your own list of stuff that you find in every water blank (or a combination of all 3).

Have you ever seen a way to keep track of the junk in your sample that isn't from sheep wool or gorilla keratin peptides?

Loads of reasons to read this paper.
1) PEG is in just about every sample in some way. It's only when there is tons of it that it's a serious problem. This can help you keep track of this!

2) PEG is the first thing you might think of, but there are other contaminants as well. And this method doesn't just work for proteomics. It'll work for any LC-MS experiment.

3) The author totally pulls off a full (and awesome) application note as the single author. It's a great precedent for people with a bunch of stuff on their desktop that they felt funny about writing alone. Writing "I" a lot in a paper feels really weird while you're doing it, just because you're so used to reading "we". It doesn't come off as weird when you read someone else who wrote it that way.

4) In Excel you can =MROUND([Cell],5) to round to the nearest 5. Which no person ever in the history of the world has ever needed. You're welcome.

Tuesday, September 11, 2018

The NIST reference antibody was digested and spiked at different levels into a universal concentration of a standard yeast digest. The Lumos was operated in different ways to determine relative sensitivity by picking up the mAB digest at different spike levels.

The most interesting comparisons are probably when the ion trap and Orbitrap are compared and when the Lumos is compared head-to-head with a Q Exactive Plus instrument.

While the Lumos comes out ahead in every comparison, it's only when the ion trap is involved that the gap between the two instruments becomes something you couldn't overcome with some optimization and gradient lengthening -- the gap is just too large.

There are a lot of gems in this study that help guide for instrument selection and method optimization on this great platform.

Monday, September 10, 2018

Sometimes we need to ride some coattails to move science forward. Case in point?

First of all -- there is an entire journal called "Computer-Aided Molecular Design" !?!?

Second --- you might also be mostly aware that VR headsets are out there from videos of how stupid people look while playing games with them....

Okay -- but what if you could take one of these things and with shockingly little code, that is freely available here, use these things to immerse yourself in protein structures from the immense PDB databases all those weird structural people are already uploading?

The better the PBD structure present (newer ones tend to have way more snapshots of the protein from different angles) the better this all works, but if you can't seem to sort out those protein interactions, maybe a visual/pseudo kinesthetic approach will help you get that breakthrough!

Cool stuff from this study -- there are just companies where you can buy commercial human body fluids from! They just bought a bunch of CSF!

They digest the CSF, TMT6-plex and then they break out the OFF-GEL and use the high resolution fractionator (24 isoelectric peptide fractions).

It looks like they take the peptides directly from the OFF-GEL and desalt online (! awesome if true !) and run a complex 171.354 minute gradient (my math) on a 50cm column into an Orbitrap Fusion Lumos running in OT-OT mode (120k MS1 15k MS/MS).

That's 68 hours of Lumos time and the highest number of peptides and proteins from CSF to-date, by a large margin! Now that there is an improved baseline for "normal" is it time to re-evaluate some of these historic datasets from studies on different pathologies? I'd think so!