A method of free speech on the Internet: random pads

Please help me make this page easier to understand. I know that I am
not expressing myself adequately. I appreciate any suggestions or
comments that would improve this document.

I also would like your help turning this idea into a standard. Link
this page from your own web pages, or copy it, and spread the idea of
free communication through random pads around you. If you are
adventurous, make your own pads, or mirror other people's pads.

The Internet is a wonderful place for free speech. However, this free
speech is sometimes actively suppressed by various factors. For
example, some people want to prevent various kinds of information from
spreading on the Internet. In some cases this is quite moral. I
suppose it is understandable to try to forbid terrorists from using
the Internet as a medium for organizing their unethical actions. In
most cases, however, it is an attempt at censorship.

Now it is a fact that all such attempts must fail. We can rejoice or
be chagrined, but there is no sense in trying to forbid information
from being spread, because the very nature of information makes it
completely intractable and unlocalized. The suggestion outlined in
this page makes this blatant. It makes it possible to spread any kind
of information in any way and in a completely undetectable way.

Let me make this clear: I am not completely persuaded that
this possibility of free speech is entirely a good thing. If abused,
it may even be a dangerous and terrible thing. But it exists, and it
makes no sense to suppress it. It can be used for good, and it can be
used for evil. Please stick to the former. Thank you.

Disclaimer: In case I have not been clear enough: I
am opposed to any criminal actions. I do not wish the technique
described here to be used to such ends. I firmly condemn them.

(Read this carefully. It may not seem to make sense at first, if you
are not used to this kind of methods. But it is, in fact, all very
trivial.)

The principle is very simple. People distribute samples of random
data. We call these samples pads. A pad is a file
containing random bits, completely indistinguishable from white noise,
and of a fixed length. I propose a standard length of 128kB (131072
bytes) for each pad.

Each pad should be given a name such as
pad-md5-d41d8cd98f00b204e9800998ecf8427e.dat, where the
32 hexadecimal digits following the prefix pad-md5- are
the MD5 fingerprint of the pad's data. This is so the pads can be
recognized. This naming convention guarantees that in practice we
don't risk a collision. It is essential that there be nothing in a
pad to tell when it was created (no date information). (A former
version of this document suggested using the first 8 bytes of the
pad's actual data. Now this is deprecated.)

If you need a tool to compute an MD5 fingerprint, you can find one on my FTP
site (but note that many systems come with such an
md5sum tool in standard).

Pads should be mirrored as much as possible around the Internet.
However, no single site should ever mirror all the
pads — nor a too large fraction of them.

Each pad by itself is completely without value. It is a mere hunk of
random data. However, if you combine several pads together
by XORing them, you can recover some data that are hidden in the pads.
The information is in no single pad, but it is somehow delocalized in
all the pads together.

The point is that if a suppressed piece of information can be
recovered from combining n pads (and no less), no single
person is distributing anything of value since all he is doing is
givint out a sample of random data. Indeed, most of them might not
even be aware that their pad can be used to produce the information in
question.

Again, each pad is truly mathematically indistinguishable from random
noise, and it is completely impossible, in the absence of date
information, to know who put the data in a set of pads.

If you wish to store some information in a set of pads, you
must choose a certain number of pads that other people have
produced, and which are stored on different sites (please do
not choose several pads from a single site). Choose about
five such pads (three is an absolute minimum, unless the data you are
storing is really innocuous, and seven is a maximum, beyond which
retrieving the data will be too much of a pain). Then take the data
you wish to store (it must be at most 128 kilobytes), and XOR it with
all the pads you have selected (using the Perl
script given below). This will give you a new pad: it is also
made of completely random data, but XORing it together with the pads
you have selected will give back the hidden data, padded (pun
unintended) with zeroes.

You must name this new pad with the same convention used to name
randomly generated pads (see above). You must make sure it is
completely indistiguishable from such code. If it can be proved that
your pad was generated after the other ones, you lose. It is up to
you to find ways to arrange for this to be practically impossible.
Some suggestions follow:

Generate one or more random pads at the same time as you store
your message, and use them as constituent pads in the storing, and
them dispatch all your pads to different pad repositories.

Generate several innocuous random pads, and start your
own pad repository with them.

Delay as much as possible the announcement of which pads are to be
combined to retrieve the message.

At some point, you or someone else must release the information that
such a set of pads, when XORed together, produce such hidden data. It
is best if the person disclosing this information is not distributing
any pads. The pads, of course, are just named by their 32-hex-digit
MD5.

Now suppose you want to recover some data. Your first task is to
locate an announcement stating that the data you want are recoverable
by XORing such a set of pads. Then you must locate the pads in
question by their numbers. They will be sitting on different pad
repositories, otherwise the security of the data would be
questionable. Perhaps someone could implement a pad search engine or
a maintain list of pad repositories. Anyway, once you have all the
pads, you simply XOR them together using the Perl script given below, and you get the data.

A provably innocent pad is one whose contents are produced
not truly randomly but by some method producing seemingly
random data, but which in fact can be easily described. Here are some
examples:

Take the complete works of Shakespeare, keep only the low-order
bits, and pack them all together.

Concatenate the MD5 fingerprints of every line in the Bible.

XOR the binary digits of square root of 42 and those of square
root of 1729.

Encrypt a photograph of the Mona Lisa with a simple blowfish
key.

(Well, you get the idea…)

Producing a provably innocent pad is less risky than producing a truly
random one, because you always have the option of proving your
innocence by showing how the pad was generated (which you do not have
in the case of a truly random pad, since in the latter case you cannot
prove that your pad was truly random).

The following paragraph is very misleading, if not
actually wrong. I should probably remove it. I am just leaving so as
not to break some references. But please ignore
it.

Assuming there are about 200 pads floating around. The
number of files which can be obtained by XORing 6 pads is over 50
billion. It is hardly conceivable to examine them all to search for
recognizable data. Thus, data can be effectively hidden in the set of
pads, and will not be found until someone issues a notice that such a
combination of pads gives something interesting.

The ultimate goal of the system is to have a Whole Mess Of
Pads, distributed all around the Internet. Most of them will be
completely innocuous random data. Some of them will even be provably innocent. A few will have been
generated so as to produce certain data when XORed with other pads,
but it is mathematically meaningless to try to isolate these from the
others. Some texts (hopefully nothing unetical, let alone criminal)
will be obtainable by XORing some pads. Some XORs will be widely
known and publicized, some will be kept confidential, or even highly
secret.

Eventually the whole system becomes just one Gigantic Mass of
information, from which it is possible to draw all sorts of things.
Information is completely delocalized. Trying to pinpoint anything,
or prevent anyone in particular from speaking freely is hopeless.

Some people have asked whether random padding would not be more
secure than zero padding. In my opinion, this is not the case: it
merely adds a false sense of security. The reason is that the data
you are hiding is, in any case, in no way random. If it is encrypted
using a strong crypto cipher, then using random padding might be a
good idea, but otherwise it is not worthwhile. (Of course, you are
free to do it anyway if you feel like it.)

I am not a lawyer. The law is so fundamentally
perverse that it might end up deciding that it is illegal to
distribute a 128-kilobyte-long block of random data.

If you distribute illegal information, regardless of the method used
to hide the information, you may be caught. There is a fundamental
difference between wanting freedom of speech and wanting to break the
law.

That being said, consider the following. Suppose something illegal
has been distribued using a set of pads. That is, the XOR of a
certain number of pads gives a file whose distribution is considered
illegal.

Then what? Can anyone be convincted? The people distributing the
pads might not know about the data in the first place. In fact, most
of them have merely uploaded a block of random data with a funny name:
can this truly be considered a crime? (Remember also that
some of the pads might be provably innocent,
although they are not immediately revealed as such. Certainly it is
not a crime to distribute an encrypted version of Homer's
Odyssey.)

Nor can it be claimed that this whole pad system is devised for
illegal purposes because this is not the case. The point of
this system is to promote free speech on the Internet, nothing
else.

Nor can anyone order every pad making up the set be removed or
destroyed, because every one of them might be used in some other XOR
operation to produce a completely innocuous piece of text. So the
freedom of speech prevents from issuing that injunction.

This site has been mentioned by an article
on Slashdot (2000/06/18). Many
comments have been posted and many people have sent me emails with
various suggestions. As I can't answer them individually, I am
publishing the following response, which I also posted on Slashdot.

Please note that this was written in a hurry, so it is probably even
more lousy than the rest of this page.

Hi. I'm the author of the page in question, and victim unaware of
the Slashdot effect (well, not truly unaware: Erik Moeller, who posted
the story, was kind to notify me in time). I received many emails
about it, which I've all read, as well as a good many posts in the
current discussion. I can't possibly reply to them all, but I'll try
to answer some of the most frequent or important comments here.

First note that the page was written in february (2000/02/19 to
2000/02/23 to be precise), so it is not new. However, I do not claim
any kind of originality, nor paternity of the idea: it is a small
variation on the protocol described in section 6.3 ("Anonymous Message
Broadcast") of Bruce Schneier's book on cryptography. In any case, I
think it is pretty obvious in the first place. I am merely suggesting
a few practical ideas to make it workable. There is nothing great or
revolutionary about anything, and I never made that claim.

One thing should be made clear from the start: the whole idea is
not about obscuring what the data is (i.e. it is not strictly
speaking cryptography) but about who is sending the data.
And, even more specifically, it is about making legal conviction
impossible so long as the presumption of innocence is maintained
(whether the presumption of innocence still means anything in these
dark days is another question :-/&nbsp); thus, it is normal that
the story appeared on Slashdot's "Your Rights Online" section.

Please also note that I am not making a political
statement. This is not a libertarian manifesto. I am
not stating that you should use this system to send out
assassination messages against the President / the Prime Minister /
the King / the Pope / <insert your favorite assassination victim
here>; I am merely stating that you can, and that
this is none of my business.

Many have pointed out that my suggested way of naming pads is bad.
That's true: using the MD5 (or SHA1 or any other kind of hash)
signature would be a better idea. But it doesn't really matter all
that much what the pads are named unless we want the system to be
resistant to malicious tampering, which was not one of my avowed
goals. Indeed, we can get this almost for free, so we might as well.
Let's say we could have a symlink pointing from pad_md5_whatever.dat
to the pad of the given md5 for each pad in each repository, and
"combination recipes" could be given with these links so as to make
them resistant to tampering.

Similarly for secret sharing: my idea was not to have a system
which is hard to censor (there are other, far better, solutions for
this), but to have one which is hard to track.

Another thing I should make quite clear is that the system in
itself is not used to hide data: it is used to hide the
origin of data. This is why all comments on the "OTP is
secure as long as the pad is truly one-time" line, or all remarks to
the effect that it is trivial to find all relevant data among the
padset, are quite true but completely irrelevant. If you want to hide
the data on top of hiding the origin, then you use a
traditional cipher; for example, you encrypt your data using blowfish
and you use that data (the ciphertext, which for all intents
and purposes is random) as input to the pad system. So long as you
don't release the key, nobody can tell that there's a
blowfish-encrypted data hidden in the pad system. The two are
completely orthogonal. (It is true that my remark about the
difficulty of finding "recognizable data" in the pad system is very
misleading and irrelevant. I should remove that: never mind that
part.) As for my comment about the birthday effect, it is merely
about accidental collisions, not at all about malicious
action.

Somebody asks what is wrong with storing all pads in the same place
since anyone can download them all. That is true, but that is beside
the point. The point is that as long as a site does not have a
complete set of pads yielding readable data, it is not, by iself,
breaking any law, and all it is distributing is white noise; whereas
if it stores one complete set of pads, then it is distributing the
forbidden document in some form. Naturally, if someone wants to
collect a complete set of pads, it is a good idea; but to distribute
it is dangerous.

Finally, there is the central question of whether the legal
argument (which is the crux of the matter) holds water. Presumably it
doesn't, but that will at leas prove one thing: the argument
shows that any kind of law restricting free speech contradicts the
presumption of innocence. Some have pointed out that one
could monitor the pad system, and the last pad published in a set of
pads would always be the culprit: this is not true, because it might
have been delayed, or it might be provably innocent (which implies the
former, actually), and you can never quite be sure.

Imagine the following scenario: someone points out on some Usenet
group that eight publically available pads, when XORed together, give
something like DeCSS code. Judge summons the 'someone' in question,
who claims that he just noticed that by randomly XORing pads together;
not unconvincing, so judge lets the guy go. Then judge summons the
pad owners. Starts with the most recently published pad: but the
owner explains "look, my pad is just an encryption using the key
'foobar' of the first 128kB of (some standard transcription of)
Shakespeare's Tempest; the idea had been floating around for
some time, I just decided to publish it". Judge checks statement:
it's true. So apparently the data was "published" earlier than was
thought, it just took some time to come out; that makes things rather
difficult to track. Second owner similarly points out that his pad is
just a sequence of decimals of pi in binary. Third owner is in a
country over which judge has no jurisdiction, so nothing to do there.
Fourth and fifth owners seem to have created their pads at the very
same time, and both state obstinately that they generated pure white
noise (following, say, a story on Slashdot about pads being a great
idea). Sixth owner says he generated his pad by XORing another dozen
other pads with an innocent message (which he shows to judge).
Seventh owner refuses to answer judge's question. Eighth owner posted
his pad before DeCSS even appeared, so must be innocent (or really?).
Now what does judge do? Convict some owners? All? None? Problem
is, judge is impressed with first poster's proof, and can't run the
risk of convicting someone who might afterward prove that his pad was
innocent. Presumption of innocence. Even if judge merely issues an
injunction that the pads be taken off the network, every owner appeals
on the ground that the pads were reused in making some other messages
(innocuous ones) and that removing them would be a serious breach of
first amendment (or whatever you call this thing about free speech).

Anyhow, this is the summary: there's nothing new or revolutionary
about the whole pad system; in fact, it's pretty trivial. But it does
make one point: that information is fundamentally delocalized and that
any attempt to pinpoint it or to find a culprit will fail. For the
better or for the worse.

Marcel Popescu (marcel@aiurea.com) has
written a program in Delphi, that runs under Microsoft Windows, that
will let you generate random pads, or XOR some pads together (to hide
data or to retreive hidden data, just like the previous Perl Script). You can download this
program from this FTP
directory. The author has put the program in the Public Domain
and claims no copyright over it.

I have not been able to test this program, as I do not have MS
Windows. Two days ago (2000/03/05), he gave me a new version (it is
the new version which is accessible with the links above), which
corrects a weakness in the previous version (the random number
algorithm used to generate the pads was not cryptographically secure).

The XOR of these three pads will produce a readable text (this is
contrary to the principles I have stated, according to which only
should be used the XOR of a set of pads no two of which are on the
same site, but since the text is innocuous, this is not a problem). I
encourage you to check this, so as to make sure you have understood
the principles. Notice that you cannot tell which two pads were
randomly generated and which one was obtained by XORing the two others
with the message (I myself do not know it any more).

If you set up a pad repository of your own, whether to mirror an
existing one, or to come up with your own set of pads, please tell me about it, so I can
mention it here. Maintaining a pad repository is a pretty easy
process. The one important point to remember is not to let out any
hint as to the date of creation of the pads you store. Remember that
even your HTTP server might give some information as to that without
your knowing about it: so please at least use the unix
touch utility on your pads (e.g. touch -m
197004011428.57 pad*.dat) to avoid this.

Generate a lot of random pads. And please do not start storing data
in the pad system until it has reached a decent size.

Of related interest is the Freenet project, which uses
a java client/server program to establish a distributed and
decentralized data network on which to store information, so that it
would be very hard to censor. It might be possible to combine the
Freenet project with the pads system, that is, store some pads on
Freenet, to guarantee at once both anonymity of free speech and
security against removal attempts.

The Publius
project is vastly more ambitious than my simple pad suggestion. It
suggests using a true secret-sharing mechanism rather than simple XOR.
On the subject of secret sharing, you might want to look at the secret
sharing program (instructions for use are included in the source
itself) that is mentioned on my programs page.

Thanks to Jon Robertson (touri@pobox.com) for having
read through this page and made suggestions for improvements. Thanks
also to Erik Moeller (moeller@scireview.de) for
having written the Slashdot article about this page. Thanks to all
those who wrote to me (and sorry if I didn't have time to write back
to each individually).