We are currently designing a community where users can upload content (User Genererated Content). This content is voted on by those who have used the uploaded content (usage is a criteria for voting) and we are now deciding on how to show "Highest Rated" content.

Users vote up or vote down content.

We have these variables to use and I am looking for help as to how this should be sorted to best match users expectations for a list that is titled "Highest Rated Content":

Vote count

Percentage of votes that are positive

I don't have access to more data and cannot get to the granular data. When votes were casted is for instance not accessible, which excludes options for calculating rating based on time decay of votes.

.(Apologize for being vague concerning the content/community type, it is due to client confidentiality)

3 Answers
3

The best method is to use the lower bound of a statistical confidence interval.

I won't go into detail about how to do this, as Evan Miller has a great post on How NOT to sort by average rating for a Bernoulli distribution - which is what you have.

The main reason that you would use this method is to find a balance between the average vote and the number of votes. We all instinctively know that 2 upvotes and no downvotes is less of an indication of quality than 254 upvotes and 30 downvotes, even though the average is higher. This method is the best that I have found to balance the two.

This gives the ordering A > B > C > D. Intuitively, I'd expect A > B > D > C, since D has only had 4 votes. This problem arises because we are using the lower interval limit.

The above approach works by pushing high rated content that we are unsure about towards the bottom. Actually, we want to be pushing it down towards the average. And low rated that we are unsure about should be pushed up towards the average.

Note that the BeerAdvocate method can also be formulated in terms of pseudocounts: writing R = s/v, where s is the sum of the actual review scores, we get WR = (s + m × C) / (v + m). That is, to compute the weighted rank, we augment the actual reviews with a total of m pseudo-reviews such that the average of the pseudo-reviews is C.
–
Ilmari KaronenFeb 16 '13 at 16:47

+1 for an interesting post. It deserves more upvotes than it has.
–
JohnGB♦Apr 28 '13 at 18:48

As the other answers generally agree, what you basically want to do is, in effect, to bias the rankings for items with low vote counts towards some "default" rank — which might be the mean rank, if you want an unbiased estimate, or a very low rank if you subscribe to the idea that an item should be ranked low until it's proven to deserve a higher rank.

The Wilson score interval method suggested in the link given by JohnGB certainly works for the latter approach, and can be adjusted to achieve the former by taking some other point on the interval (e.g. the midpoint rather than the low endpoint). However, if you'd prefer something mathematically and conceptually simpler, you can instead use additive smoothing by adding pseudocounts — essentially, a fixed number of "virtual" up- and downvotes — to the vote counts for each item before calculating the average score.

In particular, adding exactly one pseudo-upvote and one pseudo-downvote for each item corresponds to Laplace's rule of succession, which, in modern (Bayesian) terms, gives the mean expected (posterior) fraction of positive votes on the item, given the observed votes so far and based on the assumptions that a) the votes are independent and b) before any votes are observed, all upvote fractions between 0 and 1 are considered equally likely a priori.

It's also possible to use different pseudocounts to express different prior beliefs about the vote distribution, and/or different levels of optimism or pessimism about uncertain results (corresponding to the choice of confidence intervals in the Wilson method). For example, adding 4 pseudo-downvotes (and zero pseudo-upvotes) to each post gives an estimated upvote fraction which is very close to the lower bound of the Wilson 95% confidence interval (which the article linked to by JohnGB recommends), while adding 2 pseudo-upvotes and 2 pseudo-downvotes gives an even closer approximation of the center of this interval.

(The number 4 here comes from the fact that the Wilson formula involves the value z2, where z is the percentile of the standard normal distribution corresponding to the desired confidence interval around the mean, e.g. the 97.5th percentile for a 95% confidence interval. This particular percentile value is approximately 1.96(yes, Wikipedia really has an article on everything), or pretty close to 2, and 22 = 4. Indeed, using exactly z2/2 pseudo up- and downvotes, for any percentile z, gives the exact value of the center of the corresponding Wilson confidence interval, while making all z2 of the pseudo-votes positive or negative gives a fairly good approximation of its upper or lower bound respectively.)

For comparison, I've plotted the lower bound of the Wilson 95% confidence interval (in green) and the simple upvote fraction with four pseudo-downvotes added (in red) below:
The horizontal axes give the positive and negative vote counts (from 0 to 20) respectively, while the vertical axis gives the score (which is actually a probability, and hence ranges from 0 to 1) calculated using the two methods. Generally, the methods give nearly identical results at the extremes (mostly up- or mostly downvotes), but the Wilson method assigns somewhat lower values for items with intermediate up/downvote ratios. Note that the difference between the methods actually peaks at 6 upvotes and 6 downvotes (for which the Wilson method gives a score of about 0.254 while the pseudocount method gives 6/(6+6+4) &approx; 0.357) and gradually decreases thereafter.

Of course, you need not stick to these particular pseudocount values; you can tweak them to get the ordering you like. The pseudocounts don't even need to be integers. A good way to understand what changing the pseudocounts does to the rankings is to keep in mind that the ratio of the pseudocounts directly gives the estimated score of a new, unvoted item, while scaling both pseudocounts up by the same amount leave the score of new items unchanged but increases number of actual votes needed to overcome this initial bias.

Indeed, the pseudocount method also generalizes nicely to schemes with multiple options (e.g. 1 to 5 star ratings), or even multiple orthogonal axes (e.g. polls with three or more alternative options per item). Here it may be more convenient to think in terms of the total number of pseudocounts and their average value, rather than in terms of individual pseudo-votes; for example, in a five-star rating scheme, it doesn't really matter whether you add, say, 5 one-star and 5 five-star pseudo-ratings, or simply add 10 identical pseudo-ratings each with a value of 3 stars.

To sum up all this, if you have the total vote count v and the percentage of positive votes R, you can calculate the additively smoothed score S as:

S = (v * R + m * C) / (v + m)

where m (the number of pseudo-votes) and C (the average of the pseudo-votes) are arbitrary parameters you can choose to tweak the sorting. If in doubt, try e.g. m = 4 and C somewhere between 0 and ½ depending on what you want the initial score of a new item to be.