PyPI is hosting over 6000 projects and is used on a daily basis
by people to build applications. Especially systems like easy_install
and zc.buildout make intensive usage of PyPI.

For people making intensive use of PyPI, it can act as a single point
of failure. People have started to set up some mirrors, both private
and public. Those mirrors are active mirrors, which means that they
are browsing PyPI to get synced.

In order to make the system more reliable, this PEP describes:

the mirror listing and registering at PyPI

the pages a public mirror should maintain. These pages will be used
by PyPI, in order to get hit counts and the last modified date.

People that wants to mirror PyPI make a proposal on catalog-SIG.
When a mirror is proposed on the mailing list, it is manually
added in a mirror list in the PyPI application after it
has been checked to be compliant with the mirroring rules.

The mirror list is provided as a list of host names of the
form

X.pypi.python.org

The values of X are the sequence a,b,c,...,aa,ab,...
a.pypi.python.org is the master server; the mirrors start
with b. A CNAME record last.pypi.python.org points to the
last host name. Mirror operators should use a static address,
and report planned changes to that address in advance to
distutils-sig.

The new mirror also appears at http://pypi.python.org/mirrors
which is a human-readable page that gives the list of mirrors.
This page also explains how to register a new mirror.

With a distributed mirroring system, clients may want to verify that
the mirrored copies are authentic. There are multiple threats to
consider:

the central index may get compromised

the central index is assumed to be trusted, but the mirrors might
be tampered.

a man in the middle between the central index and the end user,
or between a mirror and the end user might tamper with datagrams.

This specification only deals with the second threat. Some provisions
are made to detect man-in-the-middle attacks. To detect the first
attack, package authors need to sign their packages using PGP keys, so
that users verify that the package comes from the author they trust.

The central index provides a DSA key at the URL /serverkey, in the PEM
format as generated by "openssl dsa -pubout" (i.e. RFC 3280
SubjectPublicKeyInfo, with the algorithm 1.3.14.3.2.12). This URL must
not be mirrored, and clients must fetch the official serverkey from
PyPI directly, or use the copy that came with the PyPI client software.
Mirrors should still download the key, to detect a key rollover.

For each package, a mirrored signature is provided at
/serversig/<package>. This is the DSA signature of the parallel URL
/simple/<package>, in DER form, using SHA-1 with DSA (i.e. as a RFC
3279 Dsa-Sig-Value, created by algorithm 1.2.840.10040.4.3)

Clients using a mirror need to perform the following steps to verify
a package:

download the /simple page, and compute its SHA-1 hash

compute the DSA signature of that hash

download the corresponding /serversig, and compare it (byte-for-byte)
with the value computed in step 2.

compute and verify (against the /simple page) the MD-5 hashes
of all files they download from the mirror.

Verification is not needed when downloading from central index, and
should be avoided to reduce the computation overhead.

About once a year, the key will be replaced with a new one. Mirrors
will have to re-fetch all /serversig pages. Clients using mirrors need
to find a trusted copy of the new server key. One way to obtain one
is to download it from https://pypi.python.org/serverkey. To detect
man-in-the-middle attacks, clients need to verify the SSL server
certificate, which will be signed by the CACert authority.

The counting starts the day the mirror is launched, and there is one
file per day, compressed using the bzip2 format. Each file is named
like the day. For example, 2008-11-06.bz2 is the file for the 6th of
November 2008.

A mirroring protocol called Simple Index was described and
implemented by Martin v. Loewis and Jim Fulton, based on how
easy_install works. This section synthesizes it and gives a few
relevant links, plus a small part about User-Agent.

Mirrors must reduce the amount of data transferred between the central
server and the mirror. To achieve that, they MUST use the changelog()
PyPI XML-RPC call, and only refetch the packages that have been
changed since the last time. For each package P, they MUST copy
documents /simple/P/ and /serversig/P. If a package is deleted on the
central server, they MUST delete the package and all associated files.
To detect modification of package files, they MAY cache the file's
ETag, and MAY request skipping it using the If-none-match header.

Each mirroring tool MUST identify itself using a descripte User-agent
header.

The pep381client package [2] provides an application that
respects this protocol to browse PyPI.

It is obvious that some packages will not be uploaded to PyPI, whether
because they are private or whether because the project maintainer
runs his own server where people might get the project package.
However, it is strongly encouraged that a public package index follows
PyPI and Distutils protocols.

In other words, the register and upload command should be
compatible with any package index server out there.

Software that are compatible with PyPI and Distutils so far:

PloneSoftwareCenter [7] wich is used to run plone.org products section.

When a client needs to get some packages from several distinct
indexes, it should be able to use each one of them as a potential
source of packages. Different indexes should be defined as a sorted
list for the client to look for a package.

Each independent index can of course provide a list of its mirrors.

XXX define how to get the hostname for the mirrors of an arbitrary
index.

That permits all combinations at client level, for a reliable
packaging system with all levels of privacy.