Interesting tarsnap statistics

I admit it: I'm a numbers junkie. I like taking streams of numbers and
looking for patterns; and I like trying to figure out the reasons behind
those patterns. Running my tarsnap
online backup service has provided me with a great source of numbers: I
keep extensive logs, and there are enough tarsnap users now that the
randomness of individual users is starting to get washed away. In the
interest of science, then -- or if not science, at very least curiosity
-- here's some statistics I've gathered.

Average data stored: Ignoring inactive accounts (people who sign
up for tarsnap but never use it), the amount of data tarsnap users have
stored closely follows a
lognormal
distribution; the median is roughly 1 GB, while the mean is roughly
8 GB.

Machines per user: Users can register any number of machines as
belonging to the same account; the service treats them entirely
independently aside from the financial/accounting aspects. Out of the
set of active tarsnap users, 57% have just one machine registered; 22%
have two machines registered; 9% have three machines registered; 7% have
four machines registered; and 5% have five or more machines registered.

Data downloaded: Backups are often described as a "write once,
read never" storage problem -- it's important that the data be available
if and when needed, but the hope is always that you won't ever need it.
So far, on average 3% of data stored on tarsnap has been downloaded each
month.

Frequency of archives: Approximately 30% of systems running
tarsnap have created an archive in the past 24 hours. Subjectively
(i.e. I'm too lazy to write a script to figure out exact statistics for
this, but I've noticed this by eye) it looks like most of these systems
are creating backups from cron jobs, since they create archives at the
same time each day.

Archive creation time of day: Archive creation is spread quite
evenly around the clock; the only statistically significant peaks are
at 06:00-06:59, 10:00-10:59, and 13:00-13:59. Cron jobs running at
6AM in UTC, EDT, and PDT time zones respectively, perhaps?

Archive creation time of hour: In contrast, archive creation is
not evenly spread around the hour: There is a large spike in traffic
at :00, and smaller spikes at :10, :15, :25, :30, and :50. This is a
very clear sign of cron jobs.

Unearned revenue: Tarsnap works by having people prepay for their
usage (with so many people storing under 1 GB and paying under $0.30
each month, charging credit cards every month would be infeasible). The
money people have sitting in their tarsnap accounts waiting to be spent
is defined by accountants as "unearned revenue". At the present time,
tarsnap has roughly 6 months of unearned revenue -- that is, on
average, tarsnap users have enough money in their accounts to pay
for the next 6 months of their usage. Naturally, this number varies
dramatically from account to account, and is negatively correlated with
the amount of data stored -- if you only have 1 MB of data stored, $5
will last you over a thousand years. (For the record: The money tarsnap
users have deposited into their accounts but not spent yet is sitting
and waiting safely -- it's not my money yet, so I'm not going to do
anything crazy with it.)

Payment sizes:Tarsnap users can deposit money into their accounts
whenever they like, in any amount so long as it's $5 or more (allowing
smaller payments would result in too much being eaten up by processing
fees). Of the payments received to date, 26% have been $5; 24% have
been $10; 15% have been $20; 11% have been $50; 5% have been $100;
5% have been $15; 4% have been $30; 3% have been $25; and 7% have been
other sizes. The popularity of 5/10/20/50/100 is unsurprising (give
people freedom to pick numbers, and they'll usually pick round numbers),
but I'm not sure why $15 and $30 are so popular (even at 5% and 4% of
the payments, their popularity is statistically significant). Perhaps
tarsnap's pricing of $0.30 per GB of bandwidth and $0.30 per GB-month
of storage is responsible for making people "think three"?

Is there anything else interesting I can pull out of my log files?
Submit questions via the comments below. I will not publish information
from which tarsnap's revenue or profits can be derived (since tarsnap's
profits are my income, I consider that to be personally private), nor
will I publish information which could be tied to individual tarsnap
users (e.g., I will not answer questions like "what is the most data any
one user has stored"); but aside from those limitations, anything is
fair game.