Indonesian Perl mongers, blogging about Perl and related stuffs in Bahasa Indonesia and English. Come join us if you are a fellow Indonesian monger!

Senin, 12 Oktober 2009

CPAN download counter? +1!

From time to time people (including me once, a few years back) would ask questions like: what are the 'top' (or 'most popular' or 'widely used' or most downloaded) dists/modules on CPAN? Is there a download counter for each dist/module? Like this one from prz, a budding module author.

The answer is there isn't one, because CPAN is just a bunch of static files. The upside of this, CPAN is very easily mirrored (e.g. via FTP or rsync or offline via CDs) and served (e.g. via FTP, HTTP, or local filesystem). The downside, there isn't a place for much intelligence/logic on the serving side.

To implement this feature, we can put some stats gathering code on the client side, like what Debian has been doing for a while; in fact you can already see the list of most widely installed Perl modules from the data. Or we can add some stats to search.cpan.org like most viewed/clicked/downloaded dists and modules, and maybe top search keywords. Not representative of all mirrors, sure, but it's better than nothing.

Download counter, or at least Popular/Top Downloads, is a common feature on download/catalog/shopping/news sites, from freshmeat and Download.com, to Amazon and iTunes Store. So common that many users expect it to be there as a standard feature.

It's not hard to imagine why people like to know what's popular, what everybody else is using/doing, what's in, what's hot. It's a social side of human nature. And it's beneficial to know which modules are getting downloaded and used more, to direct development efforts to the more important stuffs. Volunteers can surely take the top modules list as one consideration when picking which project to spend their valuable time on.

What I'm not very clear on though is why, aside from PHP, manyprogramminglanguages' communities don't like this particular feature? Do we hate competition, do we hate popularity contest, or are we just plain lazy?

6 komentar:

Speaking as the maintainer of one of the fastest CPAN mirrors, one of the reason that no-one has implemented a counter feature, is that it requires a tool to monitor downloads and then send that information to central resource, which can then collate the information. If someone had the time and motivation to do it, I would think many mirror admins would be happy to run it.

With search.cpan.org, you have to remember that it is a distributed system too. There isn't one web server, there are several around the world to reduce latency. It would be potentially possible to aggregate server logs, but again no-one has had the time or motivation to do it.

The problem with being a distributed system means that gathering the information requires the numerous small parts to all work together, otherwise any information you provide is going to be inaccurate.

Another reason perhaps why no-one has implement it is because of the Flash Crowd effect. Some module featuring in a top 20 most viewed/downloaded list doesn't mean it is the best module for the job. It just means that because it featured in the top 20, several hundred/thousand people have now viewed/downloaded it to see what the fuss was about, thus sustaining its position in the top 20.

"Another reason perhaps why no-one has implement it is because of the Flash Crowd effect. Some module featuring in a top 20 most viewed/downloaded list doesn't mean it is the best module for the job. It just means that because it featured in the top 20, several hundred/thousand people have now viewed/downloaded it to see what the fuss was about, thus sustaining its position in the top 20."

Agreed. Personally, the ratings and reviews are far more interesting than aggregate download statistics. I wish more folks would take time to rate / review modules and -- when necessary -- to use the annotation feature to update the documentation.

If some module in the top 20 were so mediocre and should not even be there, wouldn't there be a natural reaction from the community? E.g. actively promoting an alternative, improving the module, forking the module, or creating an alternative top-N list based on some other criteria. Won't that reactive movement itself bring positive results?

As mentioned in the article, we can add stats gathering code on the client side (as in: the CPAN::* modules and/or the command lines). That way Mini::CPAN module can skip counting downloads when doing mirroring. We can also track number of installations/upgrades/other activities.