When using GOseq analysis on RNA-seq data, I often find many 'false positives'. What I mean with that is that some genes in certain categories are not really involved in the process, but are only involved due to 'Electronic Annotation' (IEA). Most of the genes are IEA. A list of all evidence codes is here.

Is it possible to (easily) do GOseq analysis without this evidence group?

I mean, I know how to do it the hard way (make a custom GO-genes annotation in e.g., biomaRt removing all IEA evidence code genes, and perform GOseq with custom genome). But is there also a easy way in GOseq available?

$\begingroup$For what it is worth, there have been papers that say that the quality of the IEA is not so bad (as bad as other categories), also the GO says that the evidence codes are not designed to be used as quality filters.$\endgroup$
– llrsMar 28 '18 at 9:33

$\begingroup$Thanks @Llopis, I mostly agree with their statement about IEA quality. However, I was thinking of ways to prevent getting enriched GO terms such as 'spermatogenesis' in female mice experiments... Another motivation is to get 'real' target genes for further investigation (often you get false positives when further investigating).$\endgroup$
– bennMar 28 '18 at 9:50

$\begingroup$Then I recommend to give a look at the topGO package, it should filter most of them out, via taking into account the DAG structure of GO.$\endgroup$
– llrsMar 28 '18 at 9:59

$\begingroup$@Llopis, do you mean with elim function? That reduces the number of significant GO terms, but won't eliminate false positive genes (targets) for further research.$\endgroup$
– bennMar 28 '18 at 10:10

1

$\begingroup$I think the answer to the quesiton is that there is not easy way to do it. But as a comment i'd say that I would never trust a GO analysis to provide "answers", only suggestions for further analysis, and as such any genes in significant categories are always going to have to be examined by a domain expert.$\endgroup$
– Ian SudberyMar 29 '18 at 10:14