moar, and moar, and moar debsources stats

A while ago I've announced the
availability of several
stats about Debian source code on http://sources.debian.net. Since
then the statistical basis of those stats has increased a lot, and
now includes all Debian historical releases, from
hamm (July
1998) onward. This allows to appreciate macro-level evolution
trends in Free Software, over a period of more than 15 years,
through the eyes of a distro that sits at the nice intersection of
the eldest, largest, and most reputed distros.

To get there I've added support for sticky
suites to the plumbing layer of debsources,
and then injected historical releases from http://archive.debian.org. The
injection process took about a week (without any sort of
parallelism, pretty slow disks, and computing sha256 checksums,
ctags, and sloccount on all source files) and has been an
"interesting" experience.

When you go back decades in technology time, bit
rot is just around the corner, and I've found myshare
while injecting archive.d.o into
sources.d.n. In both cases the respective maintainers
(Guillem and Ganneff, kudos) have been positive about and helpful
in improving the situation, despite the low impact of the bugs I've
found on the average user. That's quite important for the
long-term preservation of digital information in
general, and for the perennity of access to Free Software in the
specific case of Debian.

While we are it, I'm now maintaining a list of
bugs affecting sources.d.n but belonging to other
packages, in case you fancy helping out but are not a Python
hacker. Interestingly enough, quite a bit of those bugs are related
to the fact that tools debsources uses (e.g. ctags, sloccount) are
also starting to show their age.

You might wander why buzz, rex, and bo are still missing from
sources.d.n. That's in fact for similar reasons.
Before hamm Debian didn't have complete archive coverage in terms
of Sources indexes and .dsc files. Given
that debsources rely on both to extract source packages, it first
needs to grow an additional abstraction layer that can cope with
their absence. It's SMOP, and planned.

for eg 'disk usage', on the 20 years graph the x axis legend spans
aug2013-apr2014. Its shape is basically the same as the 1 and 5
years graphs, only with fewer sample points. Maybe there's
something I'm really misunderstanding here?

Right. So, it's not a bug in the data, but arguably a bug in how
it is presented --- we can definitely do better on that front.

First of all, the 20-years data graphs are not meant to cover
the historical evolution of Debian releases. Those data is
currently available only at per-release pages.

The 20-years data are rather meant to cover the historical
evolution of the sources.d.n dataset. We have only about 1 year of
history, as sources.d.n didn't exist earlier on. That could be made
clearer by having longer x-axes, going back 20 years; but the data
would be invariably 0 for the years before 2013, so I'm not really
sure what we will really gain by doing that.