I have noticed some problems with how Debian is using the
popularity-contest data.

popcon units are unknown

Using the popcon score of a package to measure its use is like using
the bleeple score of a trip to measure its distance. Both scores have
no sensible units attached, though they may be loosely derived from
a unit value. Is a trip with a bleeple score of 99 a long trip?
Is a package with a popcon score of 99 a rarely used package?

The only way to resolve this ambiguity at all is to compare ratios of
values, so the problimatic units cancel out. A flight from NYC to AMS with
a bleepie score of 99 is 50 times as bleepie as my drive home, which
scores 2.

So, any statement like "low popcon score" is basically so lacking in
context as to be meaningless. Such statements are deprecated, and
should be ignored.

not all popcon scores are comparable

The above example is intentionally bad. Plane flights and
car trips are not very comparable when you don't know what units
(time / CO2 / distance / number of people sharing a confined space /
security theater points) are being used.

Similarly, comparing a high popcon package like gnome-terminal with
a relatively low popcon package like udhcpc is very deceptive. The former
is installed by default in the desktop task, but plenty of desktop users
would not miss it. The latter is installed only on embedded systems, which
can exist in absurd numbers, and none of which will tend to report to
popcon.

So, any attempt to compare popcon scores should include a rationalle
about why the two scores are comparable. For example, gnome-terminal
and rxvt are somewhat comparable since they are both terminal emulators.
But, only the vote scores, not the inst scores should be compared, since
gnome-terminal is installed by default. dhcp3-client and udhcpc are not
comparable despite being similar packages.

popcon scores do not measure long tail effects

A strength of Debian is that not only commonly used, but also uncommon
and niche software is packaged. Popcon does not measure the benefit of
some little used peice of software being there, packaged and ready to use
when a user needs it.

For six years I kept satutils in Debian, despite it probably having
no users. It has a very specific use case, to control a motorized
internet satellite dish typically installed on an RV. I did that because
it was essentially no work (the package was approximatly bug free, and
required no changes since 2007), and because of the possible payoff
if someone needed this thing and there it was, in Debian.
The value of Debian in that occasion would spike to a value that,
while not directly comparable with a popcon score, would be pretty epic,
for that one user, as they pushed arrow keys to move a satellite dish
around.

(It also had the best WITHOUT WARRANTY statement I've had the pleasure to
write: "If you break your dish off your vechicle using this software,
you get to keep both pieces.")

Every removal of a package for "low popcon score" runs the risk of
silently degrading this overall value of Debian.

who wants to be popular?

Part of the problem is that popcon has been around long enough that
the connotations of its name, "popularity contest" have been dulled by
repetition (and abbreviation). Popularity contests are not pleasant
things. They rarely reach the best result. They embody the tyranny of the
majority. The name was originally, to the best of my knowledge, chosen
exactly to imply all these failings, to say that hey, popularity-contest
is deeply flawed, but is better than nothing for this one specific use
case (ordering packages to place on CD sets). We no longer think of
popcon with these caveats. That is a regression in your brain. Fix it.

By removing packages that appear unpopular, we run the risk of Debian
becoming bland and homogenous.

Your comment hit the nail on the head so hard that I just had to post a comment (something that I rarely do): your "long tail" argument is something that I had in my mind but that I just lacked the words to describe.

Thank you very much for this most insightful post. I hope that other people stop by and read this. I also hope that they have, at any one time or another, been in the minority (e.g., like Linux users being treated like 2nd or 3rd class citizens, anyone?) so that they can appreciate the relief that it is to have something that works, even when you have nowhere to run to.

Great
Thanks for this very nice summary. Almost everything that I do in Debian deals with packages that have popcon < 100. "low-popcon" sounds bad (we even have a dedicated QA tag for that), but at least in my case there are quite a few very happy people (including me) behind these scores that just love Debian, because of its unprecedented diversity. Debian does a great job as a mainstream distribution, but (maybe more importantly) it excels in so many fields that others haven't even thought of supporting.
Comment by
Michael
— Wednesday evening, April 20th, 2011

Just to complement with a tangential issue: popcon scores are "leaky". Major derived distributions simply divert popcon submissions to their servers, instead of submitting to both Debian's and their own, which is easily possible [1]. Thus it makes difficult to adequately assess popularity of a package, especially a niche one which could be used by 50% of a specialized derived distribution, while used by less than 0.1% of native Debian users.

Great post
The work that you maintainers do is simply fantastic. The thing I love most about Debian is how I can find software using apt for nearly any task I want. It's awesome, and I think the spirit of your post is what lets users have this sort of comfort: that their needs will almost always be met by some rarely-used program that is packaged because someone cared.
Comment by
roshan-george
— terribly early Thursday morning, April 21st, 2011

Normalisation is indeed an issue - against the number of reports of the popularity-contest package or just against some package that is dominating a particular field, say some IRC library that many different tools share. I had suggested this to upstream and was given instructions where to send patches to and ... I just did not get around it.

When I read Joey's post, I had instantly thought about measuring the importance for everyday's life that a tool may have. Here I thought about the vote by inst ratio. With people like me not setting the atime this is difficult, but one gets an idea. Icedove and Iceweasel for instance I use daily. Some sequence alignment tool - not. To have such a ranking my shed some new light on some tools ... much along the line, not perfect though, of Joey's posting as I perceived it.

Imagine how I do this. (in Ubuntu, but does not much matter)
I select all players and sort them in ascending order of popularity,
then look at top 5-10 of them.
Players of low popularity are the best. Usually they are like 'Aqualung' or AlsaPlayer.
Simple, understandable, working.