I'm re-analyzing data from Yuan et al. 2018 (https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-018-0567-9) with 8 high grade glioma samples. For a particular sample, I log normalize data using the method of Lun et al. (2016) and run GSVA on a subset of cells (putative cancer cells) using some gene signatures, and in for one gene signature, I see what appear to be consistently positive values. Here's the plot:

As you can see, for the gene set 'RNA.GSC.c2' (which is composed of about 1200 genes out of 4800 used in this analysis), we have very few samples below 0. Since GSVA's rank based score is deals with genes ranked by relative expression in a dataset, I was a bit surprised by this result. Do you think this could be due to the existence of outliers with extremely low log counts?

I would definitely remove lowly-expressed genes prior to running GSVA, just as you would do with differential expression. The fact that a gene set has consistently positive scores across samples means to me that its constituent genes are highly ranked in expression values across samples.