You are right..I saw it after the post and I'm doing some more tests to understand why (the code is updated if you want to test it)..
I also sent a message to the author of the second article to ask help with the formula..

For certain entries which are not too popular, the historical aggregate score was also added to the final score, which makes some of them rise above the truly more popular ones based on the past 30-day total.

My apologies for this error.

The new top 10 is a mixture of old and new compared to the all-time list:

I'm studying the new formula.. is the idea still valid? because I think current solution has only a limited value and not correctly reflects popularity.. but I'd like to know your and other opinions about it

A couple of questions about the database: does it still store all votes? (to eventually restore old votes if needed) ..and could you give me some examples of software ratings? (as + and - votes for each app, to make tests.. like 20 apps)
Because if in general plus votes are much more than minus votes, current formula could be already good..

I update the code and the screenshot again.. now there are 5 solutions available:
1. Wilson formula (that analyzes only single software rating and gives a limited weight to number of votes)
2. Bayes formula (that compares a software rating with average software ratings, but seems to be not reliable for ratings with a relevant percentage of negative votes)
3. Lupo formula (that is the simple solution I proposed in previous post, apparently good but that needs to be verified)
4. positive / total formula (that doesn't consider number of votes at all)
5. positive - negative / total formula (that is another reference formula fully independent from number of votes)

I'd like to have some example of ratings to test them with realistic values, but anyway my opinion is that Wilson formula is the best one.. or eventually could be used my simple formula, that allows to specify the weight of rating parameters..
About Bayes formula instead, I'm waiting an answer from the author of a related article.. because I think to have correctly implemented it, but results obtained are a little strange..

Before we continue, let me summarize the current system as it is (after digging into the code).

Every time a browser is interested enough to click on the "Website" or "Download" link, a "+1" score is added to a "points" table. Let's call this the "activity score". For simplicity, I am not going to go into duplicate detection (another table stores the IP addresses for such actions, so if a user clicks "Website", then "Download", it will still count as "+1").

Every time a browser is interested enough to rate an entry, a "+5" or "-5" score is added to the "points" table depending on whether it's a "rocks" or "sucks" vote. Let's call this the "voting score". Separately, an internal score corresponding to the user's rank (if available) is added to a "appscore" table.

On a daily basis, the scores for each entry in the "points" table are consolidated via an SQL sum() and inserted into another "points2" table, then all entries in "points" are cleared for the next day (to save space).

As is obvious now, the popularity score is computed by summing the "points2" table for each entry (range is currently 30 days, previously it was the entire range).

So the only "neg" score available is the "sucks" vote. (The "appscore" table does not record the "sucks" vote. It is merely a record of all the "plus" votes by registered users of each entry).

Also, the "voting score" is overwhelmed by the "activity score". Very few anonymous browsers vote. For example, at this moment, the "activity score" is in the thousands, while the "voting score" is in the tens.

Due to the negligible presence of "neg" score in the current system, I suspect the formula won't make much of a difference. We can increase the weights assigned to registered users, but unless it's some ridiculous number, they will be overwhelmed by the "activity score" of anonymous browsers.

The point of contention now I think it whether the list should be based on the score over the entire range, or just the past x days (or some exponentially decreasing window applied to the scores, so older scores have lower weights).

My current thinking is maybe by having two lists i.e. recent popular titles + all time favorites, as some of you have suggested, this debate can be somewhat resolved. The reason is I don't think a single list can cater to both recent scores and perpetual scores simultaneously.

Andrew Lee wrote:My current thinking is maybe by having two lists i.e. recent popular titles + all time favorites, as some of you have suggested, this debate can be somewhat resolved. The reason is I don't think a single list can cater to both recent scores and perpetual scores simultaneously.

Andrew Lee wrote:The point of contention now I think it whether the list should be based on the score over the entire range, or just the past x days (or some exponentially decreasing window applied to the scores, so older scores have lower weights).

Now current solution is clear.. the idea of use activity in rating is good..

I think you could use a unified formula like my solution to have a more accurate rating (giving a weight to activity score, a weight to unregistered voting score and one to registered voting score). It may resolves the "problem" of points overwhelming, because the importance of each aspect has a related percentage. For example you could evaluate popularity in this way:

In ActivityScore you could add +1 for "Website" click and +2 for "Download" click (independently from the solution you will use, I think a Download click is more important that a Website click).
MaxActivityScore is evaluated checking all ActivityScore counters once per day.
For this solution you need also to separate counters of positive and negative votes for each app. But you could consider to add registered and unregistered votes together, giving: +1 to unregistered users, +2 * UserLevel to registered users.

Another good improvement could be to add a simple time factor, for example giving a different weight to scores of different dates (I have something in mind, but I avoid to write a too long message now).

In alternative, keeping current solution for rating, I think these aspects may be improved:
1. give a different weight to Download and Website clicks (as previously described)
2. give a bigger weight to "voting score" (or eventually keep them permanently, not only last 30 days votes)
3. offer the two scores you proposed (last 30 days and all times)

@Hydaral: My point is, I don't think using exponential moving averages will eliminate the need for two separate lists. Right?

@Lupo73: Some of your suggestions can be readily implemented (eg. different scores for website and download). Others will need considerably more work and changes, which I will KIV until I have sorted the current situation out.

I think we need to discuss whether we can do only with one list, or whether we need two. If we have two, what should be the popularity score for an app (past x-days, or entire range)?

More to the point, would using an exponential moving average eliminate the need for two lists? If so, what is the ideal formula for the window?

What about using EMA for the last x days (30?) then add a percentage of the total votes for that app (5%?). The EMA will weight new apps that have been voted up a lot recently and the percentage of the total will add weight to all-time popular apps.

Some considerations:
- a first doubt is about the unified counter for activity and votes.. because the risk is that more frequent is a software update and more popular it will be (I think it is the reason of your limitation for updates per month, but it may be not enough)
- a second doubt about the unified counter is that vote an app loses its importance if it is overwhelmed by activity score (so a separated parameter could give much more relevance to votes and stimulate users to vote apps)
- another consideration is that current solution of separated counters for registered and unregistered users is not very useful.. it may be studied a unified solution for them (eventually keeping the support to see preferences of other registered users)

After these considerations, my opinion is that a good solution could be two Popularity Scores:
1. Rating Score without a time limit and unified for registered and unregistered users
2. Activity Score with a time limit (e.g. 30 days) or without it (using EMA formula)

For the first counter you could use the Wilson formula and give different weights to Registered and Unregistered users (for example +1 to unregistered, +2*level to registered).
For the second counter you need to decide the formula and give different weights to Download and Website clicks (for example +1 to Website, +3 to Download).