But not paper with boring old plain text on it. Paper with arbitrary digital
data on it.

Paper is a great format on which to store really important information. Thieves seldom
bother to steal it. Magnetic fields or power surges don't damage it. Paper can also
tolerate much higher temperatures than any digital storage system. And if those high
temperatures are created by a housefire, paper in a simple wooden box, like the bottom
shelf of a chest of drawers, is actually very likely to survive.

If your house burns to the ground then any paper not in a very fireproof safe (read
the small print before you buy one of those, to see what "fireproof" means to this particular
manufacturer...) will of course be gone along with everything else. And metal filing
cabinets pass heat through, sacrificing their contents to save themselves. But even
if the fire brigade take half an hour to turn up, paper in a wooden bottom-drawer will
probably survive a house fire.

And most ordinary cheap printer paper today is
acid-free, so fifty years
from now it won't be brown and flaky like an old paperback book.

OK, if the roof leaks over your backup, then a flash-drive will probably come out
better than paper would. The shelf life of modern flash RAM ought to be at least a few
decades, too - but there's no guarantee that the
hyperconductive thinking aluminum
computers of the year 2075 will have USB ports, or support for current filesystems.
Paper could, therefore, actually be more compatible in the future than any of today's
conventional data-storage options. Paper is really pretty awesome stuff.

You can fit something in the order of twelve thousand characters of small-but-legible
eight-point text on one sheet of A4 paper. At one
byte per character, that's 11.7 kilobytes
(in the powers-of-two sense) of
data per one-sided page, or more than 87 pages per megabyte. You can fit considerably
more on if you print it all tiny and squinty, but no human-legible text gives you very
good data capacity per page.

Well, they're usually square. UPS's released-into-the-public-domain "MaxiCode"
uses little circles with a bullseye in the middle, Microsoft have as usual invented
their own standard,
and there's also the very distinctive-lookingPalo Alto Research Center "DataGlyphs", the standard
version of which encodes data as a rectangle of little slashes and backslashes. The
little lines can be printed in different weights and in different colours without changing
the data they encode, so you can make a
halftone image that contains "hidden"
digital data. (For some reason, PARC seem to have abandoned
dataglyphs.com and scrubbed all mention of
the things from parc.com.
If you're reading this a while after I wrote it, perhaps they'll have sorted themselves
out.)

All of the matrix codes made for barcode sorts of jobs are, of course, only meant
to be used to store bar-code-y sorts of data. This means they usually have hard format
limits that make sure the matrix will fit on product packaging, and will be coarse-grained
enough to be "scanned" with a low-res camera, like those in cheap mobile phones. The
maximum capacity of an alphanumeric QR
Code, for instance, is 4,296 characters.
Data Matrix tops out
at 3,116 characters, and Aztec Code
can do 3,067 alphabetic characters with no numbers or punctuation, or 1,914 bytes of
arbitrary data.

(For comparison, a standard IBM
punched card, such as still
survives here and
there, has 80 columns of 12 punch locations. That gives a theoretical maximum capacity
of 960 bits, or 120 of today's conventional 8-bit
bytes, each of which
more or less equals an alphanumeric
character. In practice this full capacity was unattainable, though, partly because no
encoding system supported using every location for user-data storage - 80 characters
of user data was actually the most that anybody ever got from an 80-column card - and
partly because a card with too many holes punched in it, also known as a "lace
card", would jam in the reader. And if you've enjoyed this digression, see also
"Rainbow Storage", a
bold step forward
for information theory into the realm of utter bollocks.)

Let's stick for the moment with the job of storing plain text. English words generally
average about 5.5 characters each, plus one for a space or punctuation; that means about
660 words for QR Code, about 480 for Data Matrix, or about 300 for Aztec Code. (Here's
a neat online encoder that lets you create a Data Matrix or QR Code.)

A capacity of a few hundred words is actually quite useful, for some kinds of everyday
text. Newspaper stories, for instance, commonly come in at less than 400 words. The
Sunday paper would be a lot smaller if we were all able to read stories encoded as blocks
of dots.

And these capacity numbers are also very approximate. That's partly because
of the variability of text, but also because smarter encoding systems - a widely-understood
compression system, like the gzip used
by some Web servers, for instance - can push capacity up considerably. And, at the same
time, error-correction
code can push capacity down, but make the data resistant to damage. Many data-matrix
systems use Reed-Solomon
error correction, and allow you to dial the error-correction content up to 90% or
more of the total encoded data. That gives you a lot less space for user data, but makes
the data extremely hard to destroy.

You probably only need 128
bits of entropy
for functionally unbreakable
encryption. That's a tiny amount by computer standards, but makes for a fairly cumbersome
password or passphrase.
If it's OK to turn the key into a physical object, though, you can encode it as some
kind of matrix code. You can easily fit 128 bits of key into the area of a postage stamp,
and still have room for enough error-correction data to make the key highly resistant
to
folding, spindling or mutilation.

(You can even
tattoo
matrix codes on yourself.
Persons of ordinary dimensions are likely to find it difficult, not to mention painful,
to fit more than a very short message.)

But never mind all that. What about general-purpose backups?

Even if all you want to back up is a few megabytes of accounts data, a system that
can only store a few kilobytes per data matrix is useless.

This is a great shame, though, when you realise that fitting two or three kilobytes
into a one-inch square means a single sheet of A4 paper could hold at least a couple
of hundred kilobytes, even if you include plenty of error-correction redundancy to minimise
the chance of silverfish-related
data loss.

200 kilobytes ain't much if you're backing up your whole hard drive. But it's actually
pretty decent for a lot of really important files. Financial data, program source code.
The novel you're writing. Your university thesis.

There are already at least two data backup utilities that use matrix codes, expanded
to cover the whole of an arbitrary number of pages.

One of them is Twibright's "Optar"
(OPTical ARchiver), which can reliably pack 200 kilobytes of data onto one laser-printed
A4 page. Optar doesn't come as ready-to-go software, though; you have to compile the
C source code yourself.

This can actually be a plus, though. If the computing world as we know it, with x86
CPUs and USB ports, still exists when you have to restore your Optar backup, you can
just use the same software you compiled last time. And if you package a printout of
the 20-odd pages of "unoptar" C source code with your Optar backup, people fifty or
a hundred years from now will probably still be able to compile it. People are still
working in Fortran and
Lisp today (though
not always by choice...), and the original versions of those languages are more than
50 years old; C isn't quite middle-aged yet, but I don't think it's a stretch to say
that C will still be compilable in 2075, if we're not all busy fighting the rad-zombies
for Soylent.

All this is, of course, a bit much for someone who just wants to play with the technology.
Fortunately, there's also a ready-to-go free-software Windows paper-backup program,
inventively named "PaperBack".

The PaperBack source is downloadable too (C++,
this time), so PaperBack is another real option for long-term backups. And it can cram
about half a megabyte of data onto a 600dpi A4 page, though I wouldn't trust my cheap
laser printer with more than 180k per page.

PaperBack includes compression tuned to work very well with plain text, so it's an
ideal solution for backing up written works, program source code, lists of passwords
and exported data from your accounting program. I found that with compression turned
on, a 746-kilobyte plain-text version
of Charles Dickens' A Tale Of Two Cities only took up about one and a quarter
PaperBack pages...

...even using my crummy laser printer. Printed as tight-packed eight-point text,
it would have been more than sixty pages.

(General-purpose compression like Zip or 7-Zip
will give the best results with most files, but PaperBack's compression is clearly better
for plain text.)

PaperBack even has built-in encryption, though you can of course also encrypt
your data in some other way before backing it up. However you encrypt any backup, you
should of course make sure you remember the password, or separately back up the
key certificates, or whatever the key for the encryption scheme you're using happens
to be. If you don't, encryption can more accurately be called the "delayed
Recycle Bin". If your data doesn't need "real" encryption, data-matrix encoding
just by itself will stymie casual snoopers.

PaperBack also has error-correction, adjustable from enough for your data to survive
the loss of one little square block of dots in every ten, to enough to tolerate the
loss of one block in every two. I did my capacity tests with the default one-in-five
redundancy, and also tested the correction with a bit of hole-punching and scribbling.

At 180 kilobytes per page, you'll need 5,825 pages of A4 copy paper to back up a
gigabyte of data. And a few toner cartridges. And a paper-slave to keep feeding the
printer.

All of my passwords and other login info are only 24,074 bytes, though. Even without
compression, PaperBack can fit that on an A7 index card.

And ten years of my business accounts zip down to about 2.7Mb. That's only fifteen
cheap-laser pages.