Towards a Zero User Interface backup system

I’ve spoken before about ZUI (Zero User Interface) and how often it’s the right interface.

One important system that often has too complex a UI is backup. Because of that, backups
often don’t get done. In particular offsite backups, which are the only way to deal with
fire and similar catastrophe.

Here’s a rough design for a ZUI offsite backup. The only UI at a basic level is just
installing and enabling it — and choosing a good password (that’s not quite zero UI but
it’s pretty limited.)

Once enabled, the backup system will query a central server to start looking for backup
buddies. It will be particularly interested in buddies on your same LAN (though it will
not consider them offsite.) It will also look for buddies on the same ISP or otherwise close
by, network-topology wise. For potential buddies, it will introduce the two of you and let
you do bandwidth tests to measure your bandwidth.

At night, the tool would wait for your machine and network to go quiet, and likewise the
buddy’s machines. It would then do incremental backups over the network. These would
be encrypted with secure keys. Those secure keys would in turn be stored on your own
machine (in the clear) and on a central server (encrypted by your password.)

The backup would be clever. It would identify files on your system which are common
around the network — ie. files of the OS and installed software packages — and know it
doesn’t have to back them up directly, it just has to record their presence and the
fact that they exist in many places. It only has to transfer your own created files.

Your backups are sent to two or more different buddies each, compressed. Regular checks
are done to see if the buddy is still around. If a buddy leaves the net, it quickly
will find other buddies to store data on. Alas, some files, like video, images and
music are already compressed, so this means twice as much storage is needed for backup
as the files took — though only for your own generated files. So you do have to
have a very big disk 3 times bigger than you need, because you must store data for
the buddies just as they are storing for you. But disk is getting very cheap.

(Another alternative is RAID-5 style. In RAID-5 style, you distribute each
file to 3 or more buddies, except in the RAID-5 parity system, so that any
one buddy can vanish and you can still recover the file. This means you
may be able to get away with much less excess disk space. There are also
redundant storage algorithms that let you tolerate the loss of 2 or even 3
of a larger pool of storers, at a much more modest cost than using double
the space.)

All this is, as noted, automatic. You don’t have to do anything to make it happen,
and if it’s good at spotting quiet times on the system and network, you don’t even
notice it’s happening, except a lot more of your disk is used up storing data for
others.

It is the automated nature that is so important. There have been other proposals
along these lines, such as MNET and some commercial network backup apps, but never an app you
just install, do quick setup and then forget about until you need to restore a
file. Only such an app will truly get used and work for the user.

Restore of individual files (if your system is still alive) is easy. You have
the keys on file, and can pull your file from the buddies and decrypt it with
the keys.

Loss of a local disk is more work, but if you have multiple computers in
the household, the keys could be stored on other computers on the same
LAN (alas this does require UI to approve this) and then you can go to
another computer to get the keys to rebuild the lost disk. Indeed, using
local computers as buddies is a good idea due to speed, but they don’t
provide offsite backup. It would make sense for the system, at the cost of
more disk space, to do both same-LAN backup and offsite. Same-LAN for
hardware failures, offsite for building-burns-down failures.

In the event of a building-burns-down failure, you would have to go
to the central server, and decrypt your keys with that password. Then you can get your
keys and find your buddies and restore your files. Restore would not
be ZUI, because we need no motiviation to do restore. It is doing regular
backups we lack motivation for.

Of course, many people have huge files on disk. This is particularly true
if you do things like record video with MythTV or make giant photographs,
as I do. This may be too large for backup over the internet.

In this case, the right thing to do is to backup the smaller files first,
and have some UI. This UI would warn the user about this, and suggest
options. One option is to not back up things like recorded video. Another
is to rely only on local backup if it’s available. Finally, the system
should offer a manual backup of the large files, where you connect a
removable disk (USB disk for example) and transfer the largest files to
it. It is up to you to take that offsite on a regular basis if you can.

However, while this has a UI and physical tasks to do, if you don’t do
it it’s not the end of the world. Indeed, your large files may get
backed up, slowly, if there’s enough bandwidth.

Distributed backups like DIBS have been around for a while. I do offsite backup today over the net because I have servers in multiple locations, though it could be easier than the way I do it.

My goal here is getting as close to ZUI as you can. Not only does this mean it's easy to install, and that you actually use it and don't forget to do things needed for your backup. It also means lots of other people do too, so it's easy to find partners who have good bandwidth to you.

I only meant to point out a related open source solution that could be useful when implementing what you want. It might make sense to share your ideas with the DIBS folks, and find out where they are going with their project.

Your ZUI philosophy is a step in the right direction, but it's easier said than done. Have you considered spending some time adding such a ZUI to exiting open source backup tools (e.g. DIBS)? I'm sure the developers could use your help.

One big difference - we do all your files to as many destinations as you choose. Our research showed people liked knowing where their data was. Also, for practical restore reasons, it's best to know which machine has all your data so you're backup and running quickly.

One of the big challenges was teaching ZUI users how to backkup to another machine. Obviously you can't discover your friend Bob down the street automatically. How do you "buddy up"?

We decided to go with the "friends" list where you invite people, and then allow them to use a specific machine. Of course, this is only required if it's not another one of your machines you're backing up to.

One of the big lessons learned about this ZUI app is the challenge in communicating it's power. It looks so SIMPLE. How do you charge for something that's more advanced yet looks less sophisticated?

Imagine two cards, one has tons of really sex buttons, dials on the dashboard, etc. It looks hi-tech. It looks complicated, wow, lots of value there.

Now there is our car, no buttons anywhere. The windows go up and down automatically, the AC? Automatic. Steering wheel? It drives itself.

We hid a few buttons and controls in a dusty corner, but you get my point. People often decide based on apparent sophistication.

Just because I don't display 20.392Kb/sec on the screen, doesn't mean I don't know that and am using that info to better your user experience.

There is no need to convince people to buy it. The central server needs some funding but only minimal.

However, one workable way to sell it is to make the product free, and then charge a fee — very clearly explained up front — when a restore is needed.

You can also pre-sell the restore fee, at a discount to paying it at restore time. Ie. “You can buy the product now including the restore tools and service for $30 or pay $60 when you need to restore.”

However, I was expecting this philosophy might eventually be adopted by an open source tool. For one thing, you want the comfort of knowing that you can see the source code of a tool that is going to store you most precious data on other people’s machines (friends or otherwise.)

I don’t consider it ZUI if it’s only easy to use because you have a paid sysadmin who installs and maintains it for you — or you end up paying a service provider to manage it. Of course we can solve our system management problems that way. (Or should be able to.)

The key to my proposal is that is has effectively no cost because it uses existing resources at night. It does cost disk space of course but that’s traded P2P. And it has close to zero administrative load even for the ordinary user without a paid sysadmin.

I am aware of many automated backup systems which require setup or paying money.

While there are reasons one might not want to go this far, it is conceivable an OS distribution could come with a backup system like this already turned on, so it is giving you offsite backup without you doing a single thing or paying a dime, unless you turn it off. Backup is one of those things that should “just happen.”

Right now you wouldn’t turn it on by default because people are not quite ready for the disk space cost. In addition, if they don’t choose a secure password for their account (to be invisible you would use the password the OS already demands they provide) they would be taking a greater risk with their confidential data, which again should not happen by surprise. Finally, they might not have flat rate bandwidth. However, we’re getting closer to a world where points 1 and 3 become unimportant, and the security issue would be the only question.