Version-controlled, automagical backup and file sharing system with Sparkleshare and Fedora

Photo credit: ardenswayoflife on Flickr, used under a CC-BY-SA 2.0 license.

The Burden of files

Okay, there’s a lot of problems:

Backing up your files is a pain in the butt.

Every time you upgrade your system, either in-place or a fresh install, it is a royal hassle to restore your files.

That file looks great on your laptop, but how do you show it to a colleague not sitting next to you easily? Ughh.

You upload files to a random directory on some web server you have some space on, quickly to show an idea to someone. Fast-forward some time, and you’ve got disorganized, poorly-named files scattered across multiple shell / other accounts all over the web, and you’re not sure what you have a copy of where, or which ones are being referred to from other places, so you’re terrified to delete any of them.

Well, crap. You’ve made a mistake. You can’t go back, can you? No version control…

I think we all know these problems pretty well. I’ve built a solution using Fedora and Sparkleshare – completely free and open source software – that over the past week has addressed all of these issues and has substantially improved the quality of my computing life. It backs my work files up to an internal corporate server and it backs my Fedora files up to a Fedora-maintained public server. I’m planning to configure it to back up some personal files to my Dreamhost account and some to my NAS at home.

What? What are you talking about?

Here’s how it works: I have a ‘Sparkleshare’ folder in my home directory. Under that, I have a couple of subfolders:

workstuff

design-team (this is for Fedora stuff)

When I start a project, say it’s for work, then I create a new subfolder under the appropriate directory. So I’m starting a new project, let’s say it’s a logo design for project X. I have the following directory tree:

~/SparkleShare/workstuff/logos/

I open up Inkscape, and start working on the logo. I work on it for a while. Now I’ve got a draft I’d like to save and show to a colleague. From within Inkscape, I navigate to the SparkleShare > workstuff > logos folder and save the logo there. Then I can go to my internal server, and automagically, the logo file I just saved is at http://hostname/workstuff/logos/mylogo.svg. I can quickly copy/paste that URL from my browser and send it via IRC or email to my colleague.

No uploading, no fumbling to find a server that has space in an appropriate location, no waiting for the upload to happen, no coming up with some lame variation of ‘temp’, ‘tmp’, ‘foobar’, ‘blah’ to make a fresh folder to upload it to. Nope. It’s just there.

An ssh key, preferably with a passphrase. I am assuming you already have your SSH key configured to enable you access to the remote server you’re intending to set your main git repo on. If not, the following resources below might be helpful:

I’m assuming you’re using GNOME. Sparkleshare does work on KDE or XFCE or LXDE. What you lose by not using GNOME is the nautilus right-click extension that lets you – from gitorious or github repos – check out older versions of a document or grab a web link to the document. Not a huge deal.

Remove nautilus-python (I don’t know why, but if you have this installed and install Sparkleshare, Nautilus will segfault on startup. If you remove it, though, everything works fine.)

sudo yum remove -y nautilus-python

You now have Sparkleshare installed.

Step 2: Set up your main git repo

First you have to figure out where you want your main git repo to be. It needs to be in an account that your desktop/laptop can connect to. I set mine up in a remote home directory I own. Here’s how to do it:

We’re going to create the empty new git repo. In your remote home directory, type

git init --bare repo.git

. (If this command doesn’t seem to work and you have permissions to do so, try installing openssh-server and updating git.)

Now we’re going to make a dummy clone of the repo to add an initial file and branch. To make things more interesting, we’re going to do this in your public_html directory, but you can do it anywhere you’d like in your home directory:

Go into your public_html directory.

cd ~/public_html

Clone it! Type

git clone repo.git

Go into your new clone with

cd repo

Now make a new dummy file,

echo 'Fedora 15 rocks!' >> test.txt

Now commit it.

git add test.txt; git commit test.txt -m 'initial commit2'

Push it, making your initial branch.

git push origin master

Step 3: Hook your local system up to your repo

Okay, now we’re going to get your local system hooked up to the repo you just created on your remote system.On your local laptop/desktop:

Turn Sparkleshare on. Click on its icon in your Applications menu, or run it from the terminal:

sparkleshare start

Right-click on the Sparkleshare icon in your system tray, and select “Add Remote Folder…”

Where Sparkleshare asks you: “Where is your remote folder?” select “On my own server.” In the field after “On my own server, replace my fake credentials here with your own, and don’t forget the ‘.git’ on the end.

user@hellokitty.ponies.com

Where Sparkleshare asks you for the “Folder name”, fill in something like this, replacing my directory structure with your own on the remote server:

/home/user/repo.git

Sparkleshare should say something about ‘coffee o’clock’ and it should start pulling the remote files down.

Look in your Sparkleshare folder for the repo you just connected. Are the files from the server there?

Now we’re going to make sure it’s set up right. Create a text file in your new Sparkleshare repo directory (will be somewhere like ‘/home/user/Sparkleshare/repo’).

Okay, that should have pushed up to the server, no issues. Make sure it did; On your remote system where git is:

Go into your remote clone.

cd ~/public_html/repo

Pull down the changes!

git pull

Make sure the changes worked. You can either visit the web side of your clone at (replace with the relevant hostname and username) http://hellokitty.ponies.com/~user/repo or you can just cat it, making sure our new panda message is present:

cat test.txt

All right, hopefully that wasn’t an issue either and your remote clone was able to pick up the panda PSA you pushed from your local laptop/desktop. Now we’re going to throw Sparkleshare into the mix.On your local laptop/desktop where Sparkleshare is installed:

Restart Sparkleshare. You can click on the icon or

sparkleshare start

Go to the Sparkleshare repo directory.

cd ~/SparkleShare/repo

Make a new test file. In honor of towel day today:

echo '42' >> theanswer.txt

Let’s see if it worked!!!!! On your remote system:

Go into the clone:

cd ~/public_html/repo

Pull it down!

git pull

Did it work??? It worked for me!!

Step 4: Set up automagical web mirroring of your repo’s content

Okay, so I did a sneaky thing. Our remote repo, in the examples above, has been configured in your remote public_html directory on the same system your remote git repo is. Now, you could follow these directions on a third remote system if you’d like; no problem. Just make sure you’ve got your public SSH key configured on that third system and that you have access to write to a web-readable directory. These instructions should not be difficult to modify for that case; simply git clone your repo in a web-readable directory.

We are now going to hook this all up so that when you save a file on your local laptop/desktop, and SparkleShare checks it in, the web-readable clone of it all also automagically pulls down your change and makes the file available in a web browser. Woo!

On your remote system:

Get into the hooks directory where your main (not clone) git repo is (don’t forget the .git!):

cd ~/git.repo/hooks

We’re gonna set up a hook so whenever you push a file to the repo, it tells the web clone to pull. Create a file called ‘post-receive’ in ~/git.repo/hooks with the following in it, replacing ‘user’ with your own username on the remote system:

Okay, let’s see if it worked.On your local laptop/desktop where Sparkleshare is installed:

Make sure Sparkleshare is running. If it’s not, start it.

sparkleshare start

Go into the right directory and make another test file.

cd ~/SparkleShare/repo; echo 'Fedora 15' >> latestfedora.txt

Okay, moment of truth. Is your new test file in your web directory? Either visit (replacing the username and hostname of course) http://hellokitty.ponies.com/~user/repo and look for a file named latestfedora.txt (if it’s there, it worked!) or on the remote system:

Go into the web-readable repo clone:

cd ~/public_html/repo

Look for the file.

ls

It worked!!

What do you think?

So, what do you think? Pretty awesome, right? Goodbye, dropbox – a worthless solution when your corporate IT policy rightly frowns upon copying internal documents to third-party public servers. You can create as many repos like this as you like. You can host them on a remote server you have shell access to, as this tutorial assumed, or you can set it up on Gitorious (I recommend this one since it’s backed by free & open source sw) or Github. Or a Fedora Hosted repo if you’ve got one. Each repo will have a folder under your ‘SparkleShare’ folder in your home directory, and anything in that directory is basically remotely backed up, instantly, as soon as you save files in those directories (and given that Sparkleshare is running.)

Some things you might want to consider with this setup

Set Sparkleshare to always run when your machine is booted. For GNOMEies:

cp /usr/share/applications/sparkleshare.desktop ~/.config/autostart

Write down the remote git URLs of all of the remote repos your stuff is backed up to and keep it in a safe place. If disaster strikes and a panda mistakes your laptop for bamboo, it will make it that much easier to recover.

Rate this:

Share this:

Like this:

LikeLoading...

Related

About Máirín Duffy

Máirín is a principal interaction designer at Red Hat. She is passionate about software freedom and free & open source tools, particularly in the creative domain: her favorite application is Inkscape. You can read more from Máirín on her blog at blog.linuxgrrl.com.

The patents have nothing to do with syntax. If they did, then either Gosling, Stroustrup, Ritchie, or Richards would be the owner of all the relevant patents. :p

The patents, if they are valid and exist at all, would be on language runtime implementation details. Many of these probably have no effect on Mono, either, as it is not implemented the same as upstream .NET (which is way faster than Mono, due to a significantly more advanced JIT compiler and runtime), and the ones that do affect Mono are just as likely to affect any implementation of any kind of any statically-typed register-based virtual-machine, which would pretty much mean the language design and development segment of the CS community is permanently fubar’d.

So far as why not Python, there’s plenty of reasons. However, since it’s impossible to offer criticism of Python without a Python fanatic going off the deep-end and calling you a slightly retarded Java user, I’ll just say that “it’s probably a matter of personal preference” and leave it at that.

So far as why not C+GObject, it’s probably because the authors value their sanity, and realize that it being 2011 means that they don’t have to pretend that C is still the ideal language for writing anything. And seriously, OOP in C is an abomination unto mankind. If we don’t curb our use of the GObject macro-hell soon, we may very well awaken Godzilla. True story. 😉

IIUC the data is stored in a plain Git repo (or are there any Sparkleshare specific extensions?); so while the client software might be problematic, the data is in an open format. If you suddenly have to pay fees for using Mono, you can still extract your files with command-line git, or by writing a Sparkleshare replacement in your favorite language.

The python/nautilus segfault problem should be vaguely worked out in the packages: it’s been removed from the Fedora 15 versions. Nautilus apparently expects its extensions to be python3, but pygtk is python2 – the two get mixed and it seems that hilarity ensues.

I don’t know how I’m going to solve that any time soon, it might be that we have to wait for pygtk to catch up or something, not sure. But at any rate, that bug shouldn’t raise its head any more.

I’m hoping we can also look at some seahorse integration or something to take more of the pain out of the keys and stuff.

I think this is an really awesome piece from the full puzzle. I mean, if i would like to have exclusively different servers including cloud at home, mailing, file share, NAS, print server and so on – this is one of them that I would like to see between the apps. Indeed, it can be an awesome fedora remix if we collect services, that holds my privacy on my machine based on free open source GPL programs.

I believe you also may agree that this is not a easy solution that can be happily implemented by a non-technical user. I have a better solution (imho) which is to use http://www.wuala.com

How it works is, you store files on your computer and they get synchronized to the cloud. I know Dropbox also does this. But the big difference is, the files are Encrypted before they are synchronized to server. The encryption key is with you and it never leaves your machine. So you can be rest assured that others may read your contents.

The point here is that the solution documented here is 100% free and open source and you can deploy your own server wherever you want without handing your files over to some third party company, potentially signing over your rights to your own files via some strange implicit licensing agreements. Wuala is not free & open source software, so I have absolutely no interest in it.

Thanks for the suggestion for others who might not be so picky’s benefit, though!🙂

> Thanks for the suggestion for others who might not be so picky’s benefit, though!

Yes, I pasted for that reason.

Regarding the open-ness aspect that you mentioned, I see my data just as I see my money. I should be able to access it whenever I want and it should not be refused to me when I want to reach it. But it is impossible for me to lock all my money in my house. So I have to trust a bank to make my life easy. In the same way, I am trusting some online providers to make my life easy.

However, 99% of online vendors will have the capability to read my data. So I become choosy and not trust everyone. I trust only GMail and Wuala (because there is no cheaper alternative for such a huge high-available mailing system) The reason why I trust wuala is no one apart from me can read the data that I store with them. The way in which they get money is by paid accounts. I like this model (much like flickr).

The prime reason why none of these online services will be having open-source as their prime product is because it will affect their revenue. Consider if wuala was open source, somebody like Facebook can just take their source and kill their service. We have shifted into a desktop/server applications paradigm to a cloud storage paradigm where open-source tools will play a big role (apache, ha, hadoop etc.) but the prime business running service will not be open-sourced. We may have to get used to it.

I am not brain-washing you to start using wuala or some cloud servuce 😉 I am just trying to share my opinion on why open-ness may not matter more in cloud based services in the future (where the world is heading)🙂

> I am just trying to share my opinion on why open-ness may not matter more in cloud based services in the future (where the world is heading)🙂

Sure, and I wasn’t really taking it any other way. Although to be fair, I don’t think ‘cloud’ is an excuse to sweep software freedom under the rug. You say that open sourcing cloud solutions is a bad idea because anybody can take it and undercut you – I don’t think it’s that simple. I don’t want my data (no matter what kind of data it is) locked into some particular cloud vendor’s proprietary way of doing things – be it their particular API or way of organizing things – and not have the option to take it and move it to wherever else I want. Only with standardization is the user going to have any chance of having a decent solution and the software of having any impetus to not suck.

On the undercutting part, honestly, I am paying dreamhost right now to host my stuff. But I can take any of my things on Dreamhost in a heartbeat and move them anywhere else. I would very much prefer someone else to host and admin my stuff, since I don’t particularly enjoy doing it, but not at the cost of using proprietary software. Help me host open source server applications like Dreamhost does and admin them for me like they do, and I will gladly pay you whatever. (Oh, haha, this very blog is a good example, I’m using wordpress.com’s hosted service but WordPress is open source and I’m totally cool with that.)

I had an absolutely awful experience with gmail back in 2003 which led me to be so paranoid, I would never trust them (good luck to you.)

This is great stuff but I am wondering if you have looked at how one can do automatic encryption at the client side before sending data across the network. I would love do automatic syncs to multiple servers but only if the data is encrypted transparently

Hey Rahul, I haven’t tried it yet but I plan on experimenting with encryption when I set up my dreamhost backup. My thought is that you could do, client-side, git hooks for pre-push to encrypt and post-pull to decrypt… but I don’t know if it’d kill performance as to be useless. Also, it kills the web sharing aspect but you probably don’t care too much about that if you’re encrypting the data.🙂

Very nice Mairin. I’ve been running a system like this for many years now using Unison.

It syncs all my computers. On the server I can just symlink files and folders that I want to share. So I’ve symlinked e.g. my “public” folder, but also some “private” folders, like in Documents/Projects/Lumiera/images/ symlinked so that I can share it directly.

I’m not too fond of SparkleShare because dumping big files into Git just doesn’t sound all that nice. And well, unison is well proven technology and it Just Works(tm) and has for many years, so no use changing that. Also, it’s easy to set up once I wrote the script that I use.

(Yes, it also has version “backups” (3 older versions of all files, when they change), and if you want instant-sync you can use incron to watch for FILE_CLOSE and run the script on that)

Really nice software, indeed. Some problems I have with this, however:
– doesn’t run on my RHEL5 workstation at work (not sure if it’s due to RHEL5 or due to the NFS home directory)
– always connects to some public IRC channel on freenode on startup – WTF? Big no-go for usage at work

Let’s hope these will be fixed in future versions… Or if not, well the software is at least an inspiration on how to make a Dropbox replacement🙂