I've heard several complaints that modem/slow connection users
struggle to keep up - even tracking security updates can take a
while. When there is a security update, the client machine will have
to download the entire Packages file each time. This problem will get
worse as time goes on and the number of packages grows. And for people
tracking testing or unstable over a modem (such people do exist!), it
already takes ages for them to just sync Packages files, let
alone actually downloading and installing the new packages they
want.

How do other people do this?

Microsoft have a central pool of servers for windows updates which
keeps a database of updates. Each client machine connects to an update
server and checks for any updates that have not yet been installed on
the client. This works for Microsoft, but they have to maintain this
huge central server pool which will be hammered solidly, constantly
from the millions of client machines scattered across the world.

We could do something similar too. We'd need to write a new
application/server/cgi/something to run on security.debian.org and
modify apt and friends to talk to that program. This could be done,
but it's reinventing the wheel. We ask people not to mirror the
security site, but it happens anyway; people would not be able to use
the mirrors for this service unless those mirrors have the same
program installed. That's a problem.

If we want the mirrors to be able to work, we need to push out the
information in a standard form (files/directories) that will propagate
easily to mirrors via existing channels: HTTP/FTP/rsync/whatever. We
need to keep some state over time so that client machines can compare
timestamps on the information they already have and then only retrieve
the changes to get them up to current state. If the client does
not have any state, or if its idea of state is too old, then this
should be quickly recognised and the client should download all
current state; we don't want to slow these users down any more.

Various people have discussed ways to do this in the
past. Suggestions have included providing periodic (e.g. daily) diffs
of the Packages files that clients can download. These have never
really taken off.

There is a much simpler solution to the problem, found after some
discussion at the UKUUG Conference 2004. Apt and dpkg already cope
with Packages/Sources stanzas containing extra fields that they do not
understand; they simply ignore the extra fields. Equally, they do not
care about the sort order of the files.

My proposed way to solve the problem is:

add a Timestamp: field to each entry (a simple time_t would be
easy) for the date/time that that entry was first added to the
file

sort the entries in the Packages and Sources files on that
timestamp, with the most recent first

This way, clients can simply download new versions of these files
and stop once a timestamp is older than the most recent timestamp of
the last version they downloaded. If they do not have an older version
or their old version is ancient, they will just end up downloading the
entire file this time. The client program doing the download can then
merge the old file version with the new information. It would even be
possible to create a normal standard-format Packages/Sources file if
that is still wanted.

One issue that this does not cope with is removed packages and
sources. There is an easy way to do that too: add a new small stanza
for a binary/source package with a new Removed-time: field. When the
client sees this stanza, it will know to remove older information
about that package/source from its merged output.

Creating the new Packages/Sources files should (I hope) be easy; in
the main archive, Katie already uses a database backend when
processing packages so dumping timestamps should take little
effort.

Comments? I'm sure I must have missed something here, but I can't
see any holes...

Re: Bandwidth problems with large Packages and Sources filesJohn
wrote on Sun, 08 Aug 2004 13:08

I'm one of this people :) :
And for people tracking testing or unstable over a modem (such people do exist!), it already takes ages for them to just sync Packages files, let alone actually downloading and installing the new packages they want.

I've created a server/applet combination that relates to this. The applet does does not download updates but rather contacts a server to query for updated packages. The server periodically scans repositories and updates a database with new/updated packages and timestamps.
See:
http://emeitner.f2o.org/projects/dus/
http://emeitner.f2o.org/projects/debupdate/
This is still a work in progresss and I am about to release a new version of the applet. I will soon be uploading both to mentors.debian.net.