Friday, January 19, 2018

Informing proteomic biomarker discovery with genomic databases!

Global proteomics is awesome. I LOVE to give the elevator pitch to someone about what proteomics is. I've ran through it so many times that I've got it perfected and I imagine that just about everyone doing this has one that is better than mine.

However -- there are some clear downsides to all of the statistics that are necessary to match every ion the instrument sees and/or fragments to a theoretical database containing somewhere between tens of thousands and hundreds of thousands (millions?) of theoretical sequences. Just a reminder: A 1% false discovery rate (FDR 0.01) on 1,000,000 Peptide Spectral Matches is 10,000 matches that could have occurred purely by chance.

On the other extreme end -- you have the targeted proteomics stuff -- where you specifically look at a small set of things you are interested in. This new study bridges this gap.

This study is focuses purely on cancer biomarkers. To go after them they narrow the definition of what a "biomarker" is by interrogating databases to build a list of around 1,000 proteins that have been linked to cancer in some way. I haven't looked at this list yet, but I like the number. If you are searching 1-10 proteins, I do not trust global FDR approaches like target decoy -- or even Percolator /Elucidator. They're great, but I think they need a lot of data to work right. Around 1,000 proteins? I'd use the global tools without hesitation (I hope it goes without saying that I would manually look through the matches, though!). Here data spectra appear to be searched against all of Human UniProt/SwissProt, but the downstream analysis in informed with the biomarker list. I'm thinking that I might look at some other datasets and limit the FASTA to just the biomarkers this team has identified.

The team then develops a kind of extreme phenotype to assess how well this approach works. By arresting cancer cells of different types at different cell cycle check points they have a really interesting and complex model system to test it on. And it works! An LTQ (yup! Linear Tion trQp!) can identify and quantify more than 1/3 of the biomarkers from their starting list. Since we know that cancer is almost never just one protein being messed up, and is instead dozen or hundreds of proteins working together -- 300+ quantified proteins is more than enough to point you toward the pathways being affected!