Synchronizing Your Life

Once upon a time, one computer was all you needed. All of your
documents lived on that computer, or a stack of floppies or
CD-Roms nearby, and nowhere else. Those days are gone, much like
the one-car, one-TV, and one-iPod days.

Today I have my home computer and my wife has hers. There's
also my laptop, my daughter's laptop, my work computer, and my
file server. At any time I could find myself sitting in front of
any of these and wherever I happen to be sitting there is bound
to be a file that is sitting on one of the others that I would
prefer to be readily available. These files are mostly along the
lines of current projects I'm working on. If inspiration strikes
I want to be able to open up the appropriate file or create a new
one and start writing without worry. I worry because keeping
these files synchronized across all my logins on the various
computers I might sit in front of in a single day is a big
issue.

There are many ways of keeping files up-to-date across
multiple computers. The simplest is to carry everything around
with me on a USB key or other writable removable media. I do use
this for some files, mainly those that I want to keep very
secure. USB keys are sometimes inconvenient though. My file
server, for example, is stuck away in a closet, with the
keyboard, monitor, and mouse routed out to a little desk that
sits outside the closet. Getting to the USB ports on the back of
the server is not easy.

Another simple method is to copy the files back and forth
using scp like so:

scp -rP /home/me/Documents me@192.168.0.2:/home/me/

This works but I quickly run into problems when I have
modified one file on the home computer, and a different file on
the laptop. When I next go to scp I am going to overwrite one of
the files depending on which computer I initiate the scp from. To
prevent this, I need to always scp at the end of every editing
session, but I don't always remember to do that.

Another problem with scp is that it always copies everything,
even if an identical copy exists at the destination. This is one
of the problems that rsync solves quite well. The above scp
command can be duplicated like so with rsync:

rsync -avP /home/me/Documents me@192.168.1.2:/home/me

With rsync, any files that already exist at the destination
will not be transferred. This speeds up the transfer time
considerably. However, there is still the problem of having
modifications made on both sides. By default, the rsync program
only looks to see if the files are different in size and
timestamp. It doesn't care which file is newer, if it is
different, it gets overwritten. You can pass the '--update' flag
to rsync which will cause it to skip files on the destination if
they are newer than the file on the source, but only so long as
they are the same type of file. What this means is that if, for
example, the source file is a regular file and the destination is
a symlink, the destination file will be overwritten, regardless
of timestamp. Even looking past its quirks, the --update flag
does not solve the problem because all it does is skip files on
the destination if they are newer, it doesn't pull those changes
down to the source computer.

Another problem that both scp and rsync have is versioning.
Once the files on the destination are overwritten there's no
going back to what was there before.

In order to keep files in sync on multiple machines and keep a
history of changes the obvious choice is to use one of the many
version control systems that are out there. Git, and Bazaar are
two popular choices. They have a steep learning curve, but once
you get past that, they become very useful in many situations.
Packages for both can be found in most package repositories. On
Ubuntu the packages for git and Bazaar are called git-core and
bzr respectively.

To use one of these to keep files in sync on multiple
computers, the sequence of events goes something like the
following. In the example I use git, but Bazaar is similar. One
final note on the example is that Computer 1 has an ip address
of 192.168.0.1 and computer 2 has an ip address of
192.168.0.2.

To get started with git on computer 1:

cd /home/me/Documents/shared
git init
git add *
git commit -a

In the above commands I switch to the directory I want to put
under version control and use the 'git init' command to turn the
directory into a git repository. I then use 'git add *' to add
everything in the directory to the new repository. Lastly I check
everything in. Now on computer 2 I do the following:

Now both computer 1 and computer 2 are again in sync. On the
off chance that the same files have been edited on computer 1 and
computer 2, git will let me know that there is a merge conflict.
These conflicts are usually easy to fix and nothing is lost since
a history of all changes is kept and I can revert back to any
previous version at any time.

The bad thing, as you may have noticed, is that the process is
labor and memory intensive. I say memory because I need to
remember to commit after making changes and then when I am on a
different computer I need to remember to pull down the changes
from the computer that I was on. In the example above that's not
a huge problem because there are only two computers, but with all
of the computers I regularly use, remembering where I've been
is a problem. I could
pull from every other computer every time I sit down in front of
one, but that is tedious and disruptive to my work flow. What I
really want is for the synchronization to happen in the
background.

I should mention one other way of using git or Bazaar to
manage documents: to work with one repository and then rsync it
to the different computers. The benefits of rsync still apply,
and you get versioning thanks to git or Bazaar. The downsides of
each method still exist though, including the problem of rsync
assuming that the source is the correct version and the
destination can be overwritten. With the addition of versioning,
this method is an improvement over rsync alone, but not by
much.

Wua.la was one option that I considered. It allows you to
trade storage with others on the Internet securely and there is
filesystem integration through a built-in nfs server. I wrote
about Wua.la here:
http://www.linuxjournal.com/content/online-storage-wuala.
Even though at that time I used Wua.la's NFS integration, I don't
do so now because I found it too buggy. So while I use Wua.la for
backups, it's not something I trust for behind-the-scenes
synchronization, and it does not do versioning.

What I want is something simple, integrated with my file
manager (Nautilus) and which works in the background without me
having to think about it. It should "just work".

There is one new program+service that, on first glance, fits
the bill perfectly: Dropbox.

Dropbox allows you to store your files online and keep them
synchronized between various computers. They provide clients for
Windows, Macintosh, and Linux, so it is about as cross-platform
as they come.

Setting up Dropbox on Linux involves installing their
nautilus-dropbox plugin for Nautilus and the dropboxd daemon that
communicates with the Dropbox servers. Packages that include both
programs are available on the http://getdropbox.com website for
Fedora 9, Ubuntu 7.10, and Ubuntu 8.04. The plugin is GPL'd, so
the source to it can also be downloaded and compiled manually if
you wish. The dependencies for the source include GTK 2.12 or
higher, GLib 2.14 or higher, Nautilus 2.16 or higher, Libnotify
0.4.4 or higher, and Wget 1.10 or higher. The dropboxd daemon is
closed-source and proprietary, unfortunately, so if you are not
on an x86 or x86_64 platform, you are out of luck.

Once the package is installed, to get Dropbox working all you
have to do is restart Nautilus with "killall nautilus" from a
terminal window or you can log out and then back in.

With that done, a little icon will appear in the notification
area and a configuration wizard will appear. After going through
the simple signup process (or connecting to an existing account)
a "Dropbox" folder will appear in your home directory and a brief
tour will appear. There is also a Dropbox contextual menu that
appears when right-clicking while in the Dropbox folder or sub
folders.

Every file you put or create in the Dropbox folder is
automatically synchronized to your Dropbox account on
getdropbox.com and from there to every other computer that you
have Dropbox running on. This synchronization is automatic and
happens every time a file is saved, moved, or updated in any
way.

To help you keep track of the status of files, Dropbox adds
several emblems to Nautilus. Emblems are little icons that you
can add to other icons to indicate the status of a file. Dropbox
automatically adds these emblems and changes them as necessary.
This makes it very easy to see at a glance which files have been
successfully synchronized (green circle with a checkmark), and
which files are in the process of being synced (blue circle with
arrows). The notification area icon also animates to indicate
status.

Since Dropbox is a web-enabled technology, there is of course
a web front-end to your files. This comes in very handy when I am
on a computer I don't own and need to access a document.

One other nice thing that Dropbox does is versioning. Using
the web interface you can see previous versions and revert back
to them.

On the surface, Dropbox is everything I am looking for. It
keeps the files I'm working on in sync across all of the
computers I use, it does this in the background and provides
simple versioning in case I want to revert back to a previous
version of a file. Dropbox is not without issues though.

One issue is that it does not tolerate case changes in
filenames. I had one directory in my Dropbox directory named
'writing' that for some reason I wanted to rename to 'Writing'.
When I did this Dropbox went crazy and started creating new
directories in an attempt to solve the conflict. These new
directories kept proliferating to the point where I had to
stop Dropbox and delete all of them and rename the 'Writing'
directory to 'My Writing'.

Another issue is that when I'm editing a file I tend to save
often and occasionally my text editor will report when I try to
save that "the file has been modified" since I last saved. I
don't know what Dropbox has done, but it has obviously done
something to make my editor think that the file has been changed
in some way outside of its control. I haven't lost any work as
far as I can tell, but messages like that worry me.

Program bugs aside, the biggest issue I have with Dropbox is
that it is not fully open source. The nautilus plugin is, but
the plugin is useless without the behind-the-scenes service. With
the dropboxd service daemon not being open, what happens if
Evenflow (the company behind Dropbox) goes belly-up? I have no
idea what their financial situation is, but in today's economic
climate, anything is possible. Also, this is my data, and while
they say it is encrypted and protected, I don't trust them (or
anyone else, for that matter). There are too many horror stories
of supposedly private and secured data that has "gone missing" or
been out-and-out stolen.

I suppose what I really want to be able to do is run my own
Dropbox-like server on hardware that I have complete control over
using encryption I trust.

So, while Dropbox is a wonderfully useful program, the issue
of personal control and trust pretty much rules them out for the
synchronization of important files across the various computers I
use. Instead I use it for project files where the convenience
outweighs the loss of total security and control.

For those files I don't trust with Dropbox, I use a mixture of
the other methods, depending on the file and how paranoid I am
about loosing it and whether or not I want versioning. The great
thing is, there are a lot of options out there for making sure I
have the files I want, when and where I want them.

I'm no IT pro, but encrypted volumes are a neat way to ensure maximum security with dropbox syncs. Forget any merging solution nor multiple users dealing with that file at the same time... Never heard of Truecrypt, but on OSX you can natively create small "sparseimage" files with the disk utility that are encrypted and really easy to mount/unmount. (like DMGs) Sparse image only take the place they need, but with a little overhead maybe due to encryption.

Dropbox is realy neat once you get used to it. I would really enjoy a similar solution that let you choose your storage/versioning server to allow faster sync (like LAN or VPNs on T1).

I have searched high and low for a good product to use in this method. Most of them are windows only and you have to pay for the product or they just don't do the right thing.

If you have patience, time and pure will you can give Novell's iFolder product a try. www.ifolder.com They open sourced some of it, but it's not very polished. They don't appear to have updated the site in a long time. It has a Linux and Windows client. The backend server runs on Apache/Linux.

It doesn't do anything with version control, i don't have much use for that. But it does notify you about file conflicts. You install the client and select which folders you want to sync. Set the sync time and let'er rip. It chugs along in the background and syncs all my files between my two workstations and my server. Everything stays in my network, in my control and on 3 machines. It's not perfect, but it has been syncing my files for the past 3 years.

I have using a central file server and then mounting my folders over sshfs using the FQDN. This allows me to have full access to all of my data on my server wherever I go as long I have internet. I modified this recently to sync my Documents folder with my server rather than mounting it so that even off line I could access the data. I wrote a couple of scripts for NetworkManager and it will auto sync my data whenever it has a connection.

I don't see the point for synchronization if you have a (central) file server. I have a linode where I put my data and use openvpn on all my machines. The linode has webdav functionality protected by a selfsigned browser cert (SSLVerifyClient require and SSLVerifyDepth 1).

I do see a point for using tools like subversion, git and the likes when you work with multiple people on the same (source) files.

Lastly, I cannot grasp why people *voluntarily* hand out their data to strangers (companies) and blindly trust those companies. Especially the audience reading this type of magazine should know better and, more importantly, can do better.

this is a bit lengthy, but bear with me as I make a few points distinguishing 100% OSS from mixed-sourced software.

Companies behind Dropbox and other apps (like zotero) may be releasing plugins for browsers, file managers, etc as open source, but the services stay closed. I'd almost rather they choose one or the other: freedom/open or proprietary/closed. This practice of hiding proprietary applications behind an open plugin is deceptive. the plugin download page may have "open source" written all over it, but never is it mentioned that the core of this system is closed. Still Mr. Bartholomew goes to great lengths talking about how to install and use dropbox, and how it has helped remedy some of his problems. He is promoting it. Even though, he does mention that the core/backend is not open. It comes as almost a side note, and too close to the end, and after he has spent almost 1/2 the article praising dropbox. But still at the end of the article he says its incomplete, and he doesn't trust it. If that's the truth, then why did he bring our attention to it? why not, just post a feature list, and ask if anyone knows of a 100% OSS product that fits it? Articles like this are not fostering open source.

Any part of a system or application that is closed, is a component that could vanish or become incompatible at anytime. then to get access to you pictures, documents, source code, helper scripts, recipes, addressbooks, etc... you must do whatever it is the corporation want to get access to YOUR data. the corp might just want you to use their new client, but that might have bugs in it that wont get fixed for a while, or it may be to processor intensive for you to run on your computer. The corp may start charging for what was once a free service. The corp may decide that your data, in some part, violates a copyright and simply delete it, even if it doesn't.

This practice of hiding proprietary applications behind an open plugin

We need to encourage the truly 100% open alternatives out there, and make them the best available. use the 100% open software, give feedback, help code, donate cool graphics, and stable plugins for browsers. Do this in you home, at your church, in your place of work. Talk about them at parties, and other community events.

--

I just read Spideroak's comment. thank you for letting us all know about that OSS option. i'll look into it tonight.

I've also used unison in the past, and I am learning about its descendant harmony/boomerang. I think if either of these can be combined with a version control system like bzr or git with some good plugins for access and conflict resolution, then we'd really have something.

You are correct in your assessment of open source applications that are tied to proprietary services: The proprietary bit could disappear at any time, and any data that is tied to it along with it. User beware.

That said, am I promoting Dropbox? Yes. Absolutely. The reason is because I don't know of any other application+service that does what Dropbox does that is as easy to setup and use. I find Dropbox useful in certain situations, and I'm betting that others will too, which is why I wrote the article. I did not write the article to foster or promote open source, as heretical as that may sound. The purpose of the article was to put forth some solutions to the problem of keeping data in sync across multiple computers.

Do I like that the backend is proprietary? No. Do I wish that there was some fully open source version that I could use on my own server and desktop computers? Yes. But those concerns are secondary to the purpose of the article.

Since you brought it up, are there any truly open alternatives to Dropbox? I don't know of any. Spideroak has been mentioned, but as far as I can see they have only released the code to certain tools and libraries that they use. This is at best the same effort that Evenflow has done in releasing the source to their Nautilus plugin. The backend services in both cases are controlled by their respective companies and you can lose your data at any time at their discretion. Make backups!

Does this lack of open alternatives mean we should stay away from the imperfect solutions that do exist and tell everyone else to stay away as well? No. In my mind an imperfect solution is better than no solution at all.

If anything, the existence of these services should inspire developers to come up with truly open alternatives. This is what happened in the case of Apache, MySQL, Samba, and a host of other open source projects. But until those alternatives arrive or are as easy to actually set up and use, there is nothing wrong in my mind with using the ones that exist today, even if they are flawed.

Of course, I may be completely wrong about there not being any completely open alternatives. If so, I welcome it. Please, bring out the alternatives! If they can pass my 10 minute up-and-running test, I might even use them. I'll promote them too. I'm already going to look into Spideroak, and I'm very willing to look at others.

Thank you for the clairification. I'm very gun-ho about OSS (as if you can't tell), and I sometimes forget that not all solutions are avail as 100% OSS. I did research Spideroak (a bit), and discovered the same thing you did, and I was disappointed. Apparently, companies who provide this kind of solution, feel there is a need to hold back. Where I work, it is the perceived value of the company to investors/shareholders that influences management to disapprove of releasing products as OSS. I believe that they are afraid of the competition getting ahold of their 'capital' and beating them at their own game. However, they forget that by using our code the competition is required to release their addtions to the community, so we can all benfit. This is the next battleground for OSS. So, if you, dear reader, work for a company like this, I encourage you to keep a vocal and constant encouragment on decision-makers that everyone will benefit from opensource.

I like DropBox a lot - its very polished and reliable (so far). As you say, the problem is that your data ends up behind a proprietary 'lock and key'. The other thing for me is the limitation on 2GB, although being free it is understandable. ;-)

As well as DropBox, I'm also using Jungledisk, which is a front-end to the Amazon S3 cloud. In this case there is no limit on size, although you do pay Amazon for storage - I paid the grand total of 24 cents last month!

I should point out I'm only using it as a backup target for multiple machines, not to keep them all in sync.

> My file server, for example, is stuck away in a
> closet, with the keyboard, monitor, and mouse routed
> out to a little desk that sits outside the closet.
> Getting to the USB ports on the back of the server
> is not easy.

Yes, either turn the machine around or use it as a real file server! My file server is actually a NAS running FreeNAS. I rsync to it every day so it always has the latest versions of my files. And of course, since it's NAS, it is accessible by all machines on my network.

So I guess I don't understand why you would ever need to use a USB stick to grab files off it unless for some reason you're not using it like a file server? There are still the versioning problems you mentioned and also when you're away from home, your documents aren't accessible by default but you could certainly open up your NAS so that they are!

The USB key is not for pulling files off of the server, it is to access files that aren't on the server and that I don't trust to transfer over the network. There are some things I don't trust ssh for.

These files include things like my private pgp keys: basically stuff that I never want to reveal to anyone. That's why I keep them on my USB key inside an encrypted Truecrypt volume.

You just gave some ideas about how to do the sync between *two* computers, but finally, you didn't give a final conclusion how the method works for the case with many computers as you said at the beginning, that you have. You discussed a lot about an online service (there are many of them), solution which doesn't work for people with a lot of files and limited bandwidth. Regarding your solution with a repo, I agree that I am using something like that, but if you have binary files (and in my home directory I have many), this doesn't work also well. Thanks anyway.

My rsync and scp examples only used two computers because those synchronization methods are designed to work on a primary to secondary computer basis. What I mean by this is you have the primary source and then you have a secondary source (or multiple secondary sources). The secondaries are kept in sync with the primary, not the other way around. If you try to have more than one primary you will quickly run into trouble and you will lose data. These two methods can work well in a two computer setup, but not so well in a three or more computer setup.

I used two computers in the git example because it is easier to explain. To expand to 'N' numbers of computers, you just follow the same steps.

Once the repository is initialized, and you've cloned from computer one to computer two, do the same thing on computer three and computer four (and on all other computers) that you did on computer two. The trick is to remember to do a git pull from every computer you have made changes on whenever you sit down to make new changes (wherever you happen to be sitting). Like I said in the article:

I could pull from every other computer every time I sit down in front of one, but that is tedious and disruptive to my work flow.

The key word I want to point out is "every". In order to make git work across 2, 3, 4, or more computers, you have to make sure to pull from all others that you've made changes on.

That is why I like Dropbox. You just install it on every computer that you want to synchronize, and it handles everything. You can make changes to any of your documents in the Dropbox folder on any computer and Dropbox will keep things in sync. That's how I am able to keep my project files in sync across all of the computers I mentioned. Sorry if that wasn't clear enough in the article.

You can use incron to control when you change any file on a directory, on any change incron would execute a script which:
- exec git commit
- then make ssh to each machine and make a pull (authenticating with a rsa_key or dsa_key)

"...the biggest issue I have with Dropbox is that it is not fully open source"
You can use Spideroak: https://spideroak.com/
- completely Open Source
- client-side encryption to ensure privacy
- first 2 GB is free

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.