Arq recently reported hundreds of GB of missing files, across multiple backup targets. This is so at odds with Amazon Glacier’s reputed 11-nines durability that I’m guessing it’s due to an application bug. It would not surprise me if the files are still there; Arq just isn’t seeing them. In any event, my strategy is to have multiple cloud backups—Arq and CrashPlan (which has been working very well recently)—so this got me thinking about possibly adding a third.

The obvious choice is Backblaze. It has a native Mac app, is developed by ex-Apple engineers, and sponsors many fine podcasts.

I’d previously been hesitant about Backblaze because of the way it handlesexternal drives. I’ve read about problems with largebzfileids.datfiles sucking RAM and preventing backups entirely once they get too large. It’s also worrisome that it only retains deleted files for 30 days—meaning that a file is truly lost if I don’t notice that it’s missing right away. And if, for some reason, my Mac doesn’t back up for 6 months, Backblaze will expunge all my data, even if my subscription is still paid-up. The situations in which my Mac is not able to back up for a while are exactly the ones in which I (or my survivors) would want to be able to depend on a cloud backup!

As a programmer, I especially care about metadata. But I think most users would as well, if they knew to think about it. For example, losing dates can make it harder to find your files (i.e. they disappear from smart folders or sort incorrectly), even leading to errors (i.e. not finding the correct set of invoices for a time period). You would never use a backup app that didn’t remember which folders your files were in, so I don’t know why people consider it acceptable to lose their Finder tags. (If you use EagleFiler, it can restore the tags for you.)

This actually tests disk imaging products, a bad test for backup as items we fail on shouldn’t be backed up by data backup service.

Some people accept this explanation. I think it’s misguided and borderline nonsensical. True, Backup Bouncer tests some rather esoteric features, but Backblaze fails the basic tests, too. It would be one thing to say that there’s a limitation whereby dates, tags, comments, etc. aren’t backed up, but they’re actually saying that these shouldn’t be backed up. As if products that do back them up are in error. So presumably Backblaze doesn’t consider this a bug and won’t be fixing it.

Lastly, it’s a shame that Backblaze isn’t upfront about what metadata it supports. Some users are technical enough to investigate these things themselves. Others will have read the excellent Take Control of Backing Up Your Mac and seen its appendixes, which give Backblaze a C for metadata support. But most Backblaze users won’t know that a poor choice has been made for them until they need to restore from their backup.

Backblaze absolutely backs up and restores the “file creation date” and “file last modified date”. With these two caveats: Backblaze is only accurate down to Milliseconds (1/1,000ths of a single second) if you restore by USB hard drive restore, and only accurate to the second if you prepare a ZIP file restore. The latter is because that is a limitation of the ZIP file format.

The tool “Backup Bouncer” fails Backblaze on this test, and it irritates me. I feel “wronged” by this. The new APFS Macintosh file system has the ability to set the file creation date down to one BILLIONTH of a second, and I assume that just to be totally difficult Backup Bouncer gleefully sets every last bit.

I’ve asked for clarification, but as far I can tell the response is spreading incorrect information and seems to misunderstand various of the issues involved.

I started a Backblaze trial in order to verify the claim that the creation date is preserved, but I was unable to get an answer because 4 hours after Backblaze says that it backed up my test files, they were still not showing up in the restore interface, even though it purports to show the latest files as of this minute. After 5 hours, the files were available, I restored them, and the file creation dates were lost and changed to the modification date. The Backblaze restore also messed up the files’ modes, making them executable when they had not been.

Update (2017-08-24): Backblaze support explained to me that it’s normal for there to be a delay, which can be from 1–8 hours, before the files are actually available for restore. This is because, although the file data has been sent to the server, the server can’t access the files until the client has sent the index that describes the changes. It typically waits a few hours before doing this. What this means is that, during those hours, the Backblaze client reports that the backup is complete (“You are backed up as of: Today, 7:28 AM”), but it’s actually not. If your Mac breaks or goes offline (i.e. you pack up your MacBook for a trip) before the index has been uploaded, it’s as if the backup never happened. I assume the delay before sending the index is some sort of optimization, so perhaps it’s justified, but I consider it a major bug that the client reports the files as backed up when you can’t actually restore them (no matter how long you wait).

The Backblaze employee replied about the file creation date issue. The gist of it is that the dates are not preserved when restoring via the network. However, you can pay $99 (flash drive) or $189 (hard drive) for them to mail you your data, and in this format the dates will be preserved. If you mail the drive back (sounds like you have to pay shipping) they will refund the cost. I have not verified that this method works, however, I can confirm that the index file that’s sent to the Backblaze server contains the correct information for the creation dates.

I started getting emails warning that all of my external drives were offline and my data would be soon deleted. Instead of “Very sorry about that, here’s how to fix the issue,” I got this long response about the ways their system looks for new files in serial and it can get jammed and start ignoring everything, with no apology, no acknowledgement this was their issue, and no solution. I had to go fishing for solutions and drag the information out of them to finally figure out what I needed to do. Which it turns out is to get back an internal drive (totally unrelated to the other drives Backblaze abandoned) I had physically removed and repurposed, put it back the computer, wait a long time for Backblaze to see it, then uncheck that drive in Backblaze and remove it again.

[…]

The client will lie to you and you never know what’s really backed up. Even if you use the secret alt-click to force a full drive scan, it can still miss files and tell you fully backed up when files from days ago are still nowhere to be found. Luckily I’ve never actually needed to do a restore, but I almost thought I did one time and would have been furious at all the missing files I noticed.

18 Comments

Unfortunately, I have had the same thing happen to me multiple times with Arq. It lost track of a couple backup sets and Stefan couldn't really explain why that happened. Then, it lost track of my Glacier backup, which was over 500 GB of stuff. Deleting it or fixing it on Glacier takes days or weeks to resolve, and it would take me about a month to push that much data back up over Comcast. So, reluctantly, I ditched Arq and went to BackBlaze, which appears to have been backing up perfectly well for the past 16+ months.

The one time I had a real issue with a hard drive, though, that affected my Aperture library, I considered restoring from BackBlaze. They basically give you a big disk image, and it takes time to "assemble" it. With my (now larger) 700 GB Aperture library, it took them roughly 36 hours to do this, and it was going to be a 700 GB disk image! Given this wasn't a "your house burned down" situation, I elected instead to use Time Machine, which restored things perfectly. So I'd probably only being restoring from BackBlaze as an absolute-last resort, and be wary of the inconveniences it has for "big" things where you want to do large restores. Maybe they can mail a hard drive; that's probably what I'd look into first in a disaster recovery situation. If it were a DR situation I could recreate tags and the like, it wouldn't be a huge deal to me. And my file names have date info in them for those that are critical (because OS X loses creation dates if you use DropBox or iCloud and have the files on 2 machines); I could use a script to reset the creation dates based on file names if need be.

These issues with Backblaze was the reason I switched to Crashplan two years ago. The issue with external drives was my main concern but also metadata and control over what is being backed up. Also, as I remember it, Backblaze would not allow backups of network attached storage while Crashplan did.

Looking more into it I found more advantages with Crashplan. The family plan is great. The ability to also backup to external drive, drive on LAN or even a drive over WAN simultaneous ment I could kiss Time Machine good bye (no more start-over-because-of-currupt-Time-Machine-backup, no more fans kicking in every hour). Also, there are a lot of settings for those who want more control, like which network interface to backup. For the security minded there is the possibility to use your own encryption keys separate from the CrashPlan user account.

CrashPlan is far from perfect (initial backup take forever, at least from Sweden) but I found it far better than Backblaze even though Backblaze isn't really bad, just not as good. I really don't see Java as a big problem with CrashPlan and I certainly don't think it's a deal breaker. A lot of other factors are more important.

The main disadvantage of Java (and CrashPlan) is memory usage. It depends on your backup set sizes, buton my family's Macs it often uses the best part of 1 GB (currently 828 MB RPRVT on my stepmom's Mac backing up all user data, versus 162 MB on my Mac where CrashPlan is just backing up VM images). It doesn't suck a lot of CPU and has good throttling controls to avoid backing up while you're using your Mac or while you're on battery, on a particular wireless network, etc. if you wish.

How much RAM does Backblaze use? I've never tried it but just assumed it was a lot less.

On the other hand, memory is cheap and unless you're extremely cost-constrained I see no reason not to get the maximum RAM on every Mac you buy. CrashPlan is by far the most reliable piece of Mac backup software I've ever used. I've seen a few small glitches here and there, mostly network backups stalling for no apparent reason, but they've only affected one of multiple backup destinations, they've all resolved themselves in a day or two, and there's absolutely never been any associated data loss.

Personally, I think I'm finally ready to dump Time Machine for CrashPlan for user files + a weekly SuperDuper! clone for everything else. I've been meaning to write a blog post on this for months.

@Nicholas The best CrashPlan tip I have is to turn off the live filesystem watching. It slows things down (especially if you don’t have an SSD) for little benefit. CrashPlan’s using 672 MB of memory for me right now to back up about 1.4 million files. I used to have lots of problems with CrashPlan not being able to connect to the server, for weeks at a time, but lately it’s been working well. Never had any problems with it for other family members.

Based on what I read about the bzfileids.dat files, it sounded like Backblaze also uses lots of RAM, and has limitations with large numbers of files. Arq seems to be the most efficient for lots of files.

"Arq recently reported hundreds of GB of missing files, across multiple backup targets. This is so at odds with Amazon Glacier’s reputed 11-nines durability that I’m guessing it’s due to an application bug. It would not surprise me if the files are still there; Arq just isn’t seeing them."

Hasn't Arq had intermittent problems with Glacier since they first implemented support for it? Or put another way, doesn't Glacier perhaps have hacky support for things like Arq?

Thanks for putting this all in one article. I've known for a while about Backblaze's metadata problems, but not with a recent summation like this one.

You might want to check out the new Arq 4 Glacier backup method. I believe the author is using a new Glacier backup method that is less error-prone. I haven't moved my files over to it yet, but it may be better than what you experienced.

I picked up the link to this thread from a podcast and I'm seeing now that it is pretty old…

I'll post my thoughts about Backblaze nevertheless.

Volatile backups

The absolute show stopper with Backblaze is indeed the fact that it forgets your backups when you haven't been backing up for some weeks. For example external disks. WTH!? This giant flaw is not even worth being discussed.

The meta data

Last time I seriously tried Backblaze it still didn't respect the com_apple_backup_excludeItem extended attribute. This attribute works with Time Machine, with CrashPlan and with Arq. It might seem finicky but I'm expecting from a backup system that it respects the rules of the OS it is working on.

On their page you find cheese like "Made by ex-Apple Employees" and "Native and Integrated". Obviously they are aiming at CrashPlan's Java nature. And indeed when you first open the Backblaze PrefPane you truly have the impression to see an app written for the Mac (contrary to CrashPlan).

But this first impression is short-lived:

When you start to include/exclude files/folders you are already facing an interface that is absolutely inferior to CrashPlan's Java app, and even inferior to Arq's pretty clunky interface.

Besides that, have you noticed the default exclusions? .dmg and .sparseimage are excluded by default. Heck, a good part of my data is stored as dmg or as sparseimage! With the same right you could exclude all zip or tar archives by default. I guess these guys don't even know what these extensions mean.

And then the above-mentioned meta-data issues. So, "Native and integrated"? Yeah, right.

[…] in at $50/device/year for unlimited data with no weird file restrictions, but there’s some wonkiness about file permissions and time stamps, and it also only retains old file versions/deleted files for 30 […]