My intuition is that if you have a true knockout, this can replace your Input samples as a better control right? Because that still accounts for any non-specific binding by the beads, for example.

This is what I have read as well.

Quote:

Originally Posted by fanli

I don't think it is surprising that you get different gene lists using your two approaches. Consider the case where a gene has multiple binding sites - how do you handle this with Option 2? Do you take something like the mean binding affinity across all sites in a given gene? What about upstream binding activity?

I think you are definitely right here and this is one of the major reasons I was hesitant to use thse tools for RIP-SEQ. I wasn't sure what the best approach was.

From reading into this more in the past few months, this is what I have discovered:

1) Inputs can be used for normalization (and seem to be more popular). To calculate enrichment that shows the success of the IP you need input from both WT and KO. Unfortunately, in my case I only had the WT input. I could just sequence KO tissue but I was worried about technical variability that might occur because of this months after the initial experiment. If you have input and KOs you could try normalizing to the input first (enrichment for WT and KO) and then exclude any genes that show up in the KO.

2) DESeq2/EdgeR, and other similar tools are not really designed to handle the case of comparing two conditions that are themselves compared to their KO counterparts. You can analyze the WT and KO conditions separately though. One filtering approach I used was only possible because there is a known target list available for the WT condition and a list of likely "non-targets" from the literature. I checked the WT/KO count ratio for the known targets and likely "non-targets" and found that the ratio was also much closer to 1 in the "non-targets" case (i.e., high KO background). This allowed me to set a cutoff to filter potential non-targets.

RIPSeeker, ASPeak, and Piranha are all designed to analyze RIP-SEQ data but they are all pretty new. Personally, I had some issues running them and getting data that made any sense to me. But they address the issue @fanli pointed out in their post about binding intervals. I think some of the tools recommend setting bins that are the size of the sequenced fragments.

3) Normalization (such as by DESeq2) can obliterate count differences between WT and KO samples. These inflated KO counts are not terribly useful for analysis if you are trying to assess background. I found that the upper quartile normalization seemed to work better for the purpose of maintaining this WT-to-KO ratio.

The best workaround for this issue is to use spike-ins that can be used as a normalization factor to ensure the ratio between the WT and KO libraries are maintained. Another alternative would be to scale back the KO counts based on some kind of factor if you have the ratio of concentrations between WT and KO from BioAnalyzer. This assumes the ratios are maintained through sequencing. Not ideal but it might get you started...