I read that protein interactions found using high throughput methods are less trustable than those found using low throughput methods.

For example, searching for interaction of BRCA2 on BioGRID, I got this result.

So, we won't be sure about the interaction of BRCA2 with SIRT1 as it has been found with a high throughput method.

My question: Could you give me a very simple explanation of what is the meaning of high and low throughput techniques ? (I have a vague idea reading it on web but I am not satisfied with my understanding.) Why is high throughput not reliable in this case?

1 Answer
1

This article is a bad example, it has been retracted. See here.
But this doesn't matter for the methods.
This is usually done by so called "genome wide association studies". There you compare the regulation of your group of interest with a "normal" group to find genes which are differently expressed. This is something we did: We overexpressed a transcription factor in a cell line and compared it to the untransfected cell line, which doesn't have this factor. The comparision brings up differently expressed genes. This is either done with microarrays or today more like by ChIP-sequencing (let me know, if you want to know more, I can go into the details).

When you analyze such data, you first cut-off all data, which is below a certain threshold to be sure, that you reduce the number of false positives. A low threshold for the recognition of changes means that you are more likely to have false positive genes in your final list. This might happen due to changes in the cell or the conditions, secondary effects and so on. And you will also have signals, which appear statistically and which are false positive.

If you choose your threshold too high, you will reduce the number of false positives, but on the downside you will also generate false negatives, since you refuse genes which only have a relatively low change but are real. The balance between this is not always easy. As a rule of thumb usually everything which changes at least twofold in expression is used (unless you control better).

Even when filtered, the output of the experiment will still contain some false positives. Then you will also record some secondary effects, meaning that you gene of interest regulates a second, which regulates a third. Since both will come up in the analysis, it will look like that the third gene is regulated by the first (and not the second, as it would be correct). And these are the reasons why usually every intereaction between two genes (or better: Between a protein and a gene when talking about a transcription factor or two proteins) needs to be verified in the wet lab. And this is work, which can be pretty hard.