How to roll your own backup solution with BorgBackup, Rclone, and Wasabi cloud storage

Protect your data with an automated backup solution built on open source software and inexpensive cloud storage.

Subscribe now

Get the highlights in your inbox every week.

For several years, I used CrashPlan to back up my family's computers, including machines belonging to my wife and siblings. The fact that CrashPlan was essentially "always on" and doing frequent backups without ever having to think about it was fantastic. Additionally, the ability to do point-in-time restores came in handy on several occasions. Because I'm generally the IT person for the family, I loved that the user interface was so easy to use that family members could recover their data without my help.

Recently CrashPlan announced that it was dropping its consumer subscriptions to focus on its enterprise customers. It makes sense, I suppose, as it wasn't making a lot of money off folks like me, and our family plan was using a whole lot of storage on its system.

I decided that the features I would need in a suitable replacement included:

Cross-platform support for Linux and Mac

Automation (so there's no need to remember to click "backup")

Point-in-time recovery (or something close) so if you accidentally delete a file but don't notice until later, it's still recoverable

Low cost

Replicated data store for backup sets, so data exists in more than one place (i.e., not just backing up to a local USB drive)

Encryption in case the backup files fall into the wrong hands

I searched around and asked my friends about services similar to CrashPlan. One was really happy with Arq, but no Linux support meant it was no good for me. Carbonite is similar to CrashPlan but would be expensive, because I have multiple machines to back up. Backblaze offers unlimited backups at a good price (US$ 5/month), but its backup client doesn't support Linux. BackupPC was a strong contender, but I had already started testing my solution before I remembered it. None of the other options I looked at matched everything I was looking for. That meant I had to figure out a way to replicate what CrashPlan delivered for me and my family.

I knew there were lots of good options for backing up files on Linux systems. In fact, I've been using rdiff-backup for at least 10 years, usually for saving snapshots of remote filesystems locally. I had hopes of finding something that would do a better job of deduplicating backup data though, because I knew there were going to be some things (like music libraries and photos) that were stored on multiple computers.

I think what I worked out came pretty close to meeting my goals.

My backup solution

Ultimately, I landed on a combination of BorgBackup, Rclone, and Wasabi cloud storage, and I couldn't be happier with my decision. Borg fits all my criteria and has a pretty healthy community of users and contributors. It offers deduplication and compression, and works great on PC, Mac, and Linux. I use Rclone to synchronize the backup repositories from the Borg host to S3-compatible storage on Wasabi. Any S3-compatible storage will work, but I chose Wasabi because its price can't be beat and it outperforms Amazon's S3. With this setup, I can restore files from the local Borg host or from Wasabi.

Installing Borg on my machine was as simple as sudo apt install borgbackup. My backup host is a Linux machine that's always on with a 1.5TB USB drive attached to it. This backup host could be something as lightweight as a Raspberry Pi if you don't have a machine available. Just make sure all the client machines can reach this server over SSH and you are good to go.

On the backup host, initialize a new backup repository with:

$ borg init /mnt/backup/repo1

Depending on what you're backing up, you might choose to make multiple repositories per machine, or possibly one big repository for all your machines. Because Borg deduplicates, if you have identical data on many computers, sending backups from all those machines to the same repository might make sense.

Installing Borg on the Linux client machines was straightforward. On Mac OS X I needed to install XCode and Homebrew first. I followed a how-to to install the command-line tools, then used pip3 install borgbackup.

Backing up

Each machine has a backup.sh script (see below) that is kicked off by cron at regular intervals; it will make only one backup set per day, but it doesn't hurt to try a few times in the same day. The laptops are set to try every two hours, because there's no guarantee they will be on at a certain time, but it's very likely they'll be on during one of those times. This could be improved by writing a daemon that's always running and triggers a backup attempt anytime the laptop wakes up. For now, I'm happy with the way things are working.

I could skip the cron job and provide a relatively easy way for each user to trigger a backup using BorgWeb, but I really don't want anyone to have to remember to back things up. I tend to forget to click that backup button until I'm in dire need of a restoration (at which point it's way too late!).

The backup script I'm using came from the Borg quick start docs, plus I added a little check at the top to see if Borg is already running, which will exit the script if the previous backup run is still in progress. This script makes a new backup set and labels it with the hostname and current date. It then prunes old backup sets with an easy retention schedule.

# Setting this, so you won't be asked for your repository passphrase:exportBORG_PASSPHRASE='thisisnotreallymypassphrase'# or this to ask an external program to supply the passphrase:exportBORG_PASSCOMMAND='pass show backup'

# Use the `prune` subcommand to maintain 7 daily, 4 weekly and 6 monthly# archives of THIS machine. The '{hostname}-' prefix is very important to# limit prune's operation to this machine's archives and not apply to# other machine's archives also.
borg prune-v--list$REPOSITORY--prefix'{hostname}-' \--keep-daily=7--keep-weekly=4--keep-monthly=6

The first synchronization of the backup set to Wasabi with Rclone took several days, but it was around 400GB of new data, and my outbound connection is not super-fast. But the daily delta is very small and completes in just a few minutes.

Restoring files

Restoring files is not as easy as it was with CrashPlan, but it is relatively straightforward. The fastest approach is to restore from the backup stored on the Borg backup server. Here are some example commands used to restore:

If something happens to the local Borg server or the USB drive holding all the backup repositories, I can also easily restore directly from Wasabi. If the machine has Rclone installed, using rclone mount I can mount the remote storage bucket as though it were a local filesystem:

How it's working

Now that I've been using this backup approach for a few weeks, I can say I'm really happy with it. Setting everything up and getting it running was a lot more complicated than just installing CrashPlan of course, but that's the difference between rolling your own solution and using a service. I will have to watch closely to be sure backups continue to run and the data is properly synchronized to Wasabi.

But, overall, replacing CrashPlan with something offering comparable backup coverage at a really reasonable price turned out to be a little easier than I expected. If you see room for improvement please let me know.

This was originally published on Local Conspiracy and is republished with permission.

Topics

About the author

Christopher Aedo - Christopher Aedo has been working with and contributing to open source software since his college days. Most recently he can be found at Teradata where he serves as Director of Open Source, focusing on helping the organization embrace open source software through internal use and external contributions. When he’s not at work or speaking at a conference, he’s probably using a RaspberryPi to brew and ferment a tasty homebrew in Portland OR.

27 Comments

Hi Christopher, thanks for your great article. One doubt only: what do you mean with: "works great on PC, Mac, and Linux"? Do you mean "Windows PC, Mac, and Linux"? In my family there are some Windows PCs so I'm wondering if it's possible to extend your solution to support them.

Flavio, yes I should have said Windows PCs. Borg Backup will run under the Windows 10 Linux Subsystem though their site says it's currently considered experimental. Presumably rclone would work under the linux subsystem as well, so it should be possible to run all of this on Windows 10.

Wow! I wasn't aware of Wasabi. I ended up replacing CrashPlan on my home server with a combination of CloudBerry and BackBlaze B2. It's stupid cheap for the amount of data that I have. There is a GUI for people who prefer it, but a CLI for admins and for a one-time payment of $30, it's hard to beat.

Yes, Borg can do encryption according its site. So that backup also could be encrypted. Then it would not be so easily restorable directly from Wasabi, but one would need to replace she Borg server first and then resync the Borg server back with the encrypted data from Wasabi.

You can actually restore individual files easily, using 'borg mount' which will mount a backup as a FUSE filesystem.
You also don't need to check if borg is running in your script; operations like create will put a lock on the repo, and any subsequent operation will fail.

Hi Christopher,
what happens if you backup almost similar directories to the same repo? (e.g. Box A backs up then Box B but it misses some files from Box A. Will these files vanish on A if you restore A from the repo?
How do you organize this?

Yes, this is indeed a problem! There need to be consistency checks before uploading!

First: Serious corruption will -- afaics -- occur, when you use Rclone while there is an ongoing commit/backup by one of your computers. You said that the laptops backup irregular and afaics there is no checking for this involved here.
You hence need at least to add the `pidof -x borg` check from your backup.sh to your Rclone script.

Second: Mild corruption will occur when the backup process breaks at some point. E.g. laptop shuts off/powers off, WiFi breaks, ... You said that you want things to work unnoticed in the background. That makes a realistic chance for this to happen. Divide the risk by the awareness of the end user. :-)
Usually borg is able to repair these cases. BUT i) You have to notice and do this manually ii) I don't know whether this always works/this is bullet proof iii) In the current setup described in the post, there is no checking if backup succeeded. Using CRON in the background, it might happen that your backup fails once and all subsequent backups fail unnoticed.

Third: There are myriads of other potential vectors (e.g. data degradation [1], failing hardware [2], ...). You might add a `borg check` as well as a separate checksum comparison of both archives prior to Rclone.

I reply to myself :
The case described above is valid only for a small total file size.
I continued my tests with more data (2 GB) and only news files added is uploaded to HubiC.
To summarize: : the best backup solution (encrypted and compressed) for my needs.

I observed the same thing, and even though it does start reusing files at some point, having to upload all the data every day bugs me. I'm going to find another backup program because of that. Duplicati is promising, but it's been alpha for ages and just went beta.

Maybe you should also mention the privacy point of a backup solution. All your family images goes out to an external server cloud. I would not like to give all my private images/docs from kids and parents out of hand just to save a view bucks or have comfort.

Today its very easy to do a backup system in house. If you use MacOs Family - it has auto backups included to multiple external hard drives and its encrypted the best is it does all this without pushing any button. if you need also Linux i think a raspberry solution with external backup discs is a good investment.
_
by the way the comments chaptcha does not work if you have a privacy mode on. had do turn of save sites to post this comment. its also not good to give out capcha data from all your commenting users to an external commercial company like G who collects and uses data.

both borg and rclone have options to encrypt the data before sending it to remote storage, so the cloud server can't read it without a key. Of course, this means that you need to keep the key in a separate, safe place.

It's important to have off-site backups in case e.g. your house burns down.

borg requires you to choose an encryption mode on creation of the repository. Hence you really need to want your files to lie unencrypted somewhere to do so. Otherwise BLAKE2b and SHA-256 are pretty solid and can be considered safe for uploading to somewhere. And if you're really paranoid or eager, you can set your ssh encryption accordingly to not use a weaker one for uploading.

Great post and walk-thru of your efforts! I too have been using rclone and recently decided to package it with Docker to simplify deployment consistently across machines. Also allows easier run time configurations. Enjoy!

Thank you Christopher for this article! I'm facing the same issue and I decided few weeks ago to implement both backup.
However, I didn't know about Wasabi Cloud Storage. You recommend to mount the Wasabi Storage to restore a specific file if needed (via Borg mount). How is it seen from Wasabi point of view? Isn't it considered as a full download of the archive? If so, I think the solution is less cost-effective because Wasabi will apply fees for the full restoration.

I'm going to start this response with I COULD BE WRONG! (I don't want to take responsibility for someone being hit with a big transfer bill if I am mistaken...)

That said - my understanding is that when you mount an s3 bucket as a file system, you're able to access blocks randomly. So you wouldn't need to stream the entire backup just to get a small portion of it. In my own poking around for tests I was definitely able to pull out a file without (as far as I could tell) streaming the entire backup. I would suggest a quick test though to validate, as I am pretty sure you can do a partial restore without streaming the entire backup archive.

I read the whole section related to the billing policy. It's pretty clear but I want to make sure about the billing before sending my data.

---My use case---
I plan to send my local data backup archive every day to the Wasabi Cloud Storage. The backup archive will always have the same name so the archive on Wasabi will be overwritten every day.

Let's say my archive is 1 TB, the bill for 1 month should be 3,99$.
If I understand well, I will be charged for 90days of storage for this archive, even if I overwrite it the second day, right?

Then, I understand that 1 archive file will cost me 3,99$ / 30 * 90 = 11,97$.
As I will overwrite this archive everyday by a new daily archive, is it right to think that for 1 year I will be charged:

Just a side note: If you save your passphrase in plain text, make sure to consider the access to that file. E.g. by setting file permissions to 700. On larger multi user networks it happens often, that the home folders are word readable.

Hey Christohper.. good post thanks for taking the time! My biggest issue is having a stable solution to backup the widnows machines. For all my Ubuntu's borg is an awesome way to go... I've been breaking my head trying to make the windows port of that to function but no luck. I have a Ubuntu file server (that has borg on it at the moment) which is a VM backed by a ceph cluster. What comes to mind for my use case? What would be a good way to get windows files to the borg server.... I'm almost thinking of setting up a temp in between server on ubuntu as well that would receive raw data from the windows clients and then borg it over to the main repo... but doesn't feel clean!

Currently, my backup solution is 2 BTRFS HDDs, with snapshots that are rotated: everyday, when a new snapshot is taken and copied to the other disk, the oldest one is deleted. Example:

Day 1 bkps:

A, B, C, D

Day 2 bkps:

B, C, D, E

I'm thinking of getting rid of the second brtfs snapshots bkp and turn it into a borg repo, however I'm failing to see if it would be possible to delete an older borg bkp and keep my data, the same way I do with BTRFS.

Because I'm afraid of, let's say, 2 years from now I have to restore all my files because a meteor hit my house, I will have to have over 720 files (one per day, for 2 years) to get all my data. Is it possible to keep deleting the oldest borg bkp and keep all the files?

Footer

The opinions expressed on this website are those of each author, not of the author's employer or of Red Hat.

Opensource.com aspires to publish all content under a Creative Commons license but may not be able to do so in all cases. You are responsible for ensuring that you have the necessary permission to reuse any work on this site. Red Hat and the Red Hat logo are trademarks of Red Hat, Inc., registered in the United States and other countries.