Peace between git and Dropbox with git-worktree

This is the first of what may be a series of short bits about ways of harmonizing workflows between people who work in low-level, programmatic ways with goals of reproducibility and automation, and those work with high-level graphical tools with less support for those things. I won’t go too far into the why here, suffice to say that it’s problem many people face, and

I think specialization is good; some people should learn to code, others have other things to focus on

High-level, popular GUI tools work for a lot of people for a lot of things

People and organizations’ practices change slowly. Baby steps, etc…

With that, our first installment is about git/Github and Dropbox. Both are exceptionally useful tools, but for the most part they don’t work that well together when directories are being shared with multiple people via Dropbox. Conflicts in the .git folder can wreak havoc, dropbox gets bogged down with many small files in the repo, and in general they just are designed under fundamentally different approaches to collaboration. However, in a team it’s likely that Dropbox is the collaboration tool of choice for non-programmers.

I’ve found a useful solution recently with git-worktree. This is a git command that allows one to use different directories to represent different branches of a project. It’s good for many things, such as using different directory for the gh-pages branch of a project. With git worktree, you have one or more “linked working trees” (linked directories) connected to the “main working tree” (main directory). Importantly, linked directories do not have .git folders. Instead, they have .gitfiles which point them to the main directory. This means they are much less trouble to sync with Dropbox.

To setup a project this way, I start with my main project directory outside of Dropbox. Then I run the following

git worktree add -b dropbox ~/Dropbox/project-repo-dropbox

This creates a new branch called dropbox and gives me a directory structure like this:

home
├── project-repo
├── Dropbox
├── project-repo-dropbox

If you cd into project-repo-dropbox, you’ll find you’re in a git directory on the dropbox branch, but if you cat .git, you’ll get

The contents of this .git file are how git knows where the actual repository lies.

Now you can share project-repo-dropbox with team members via Dropbox and it won’t clobber your git repository. Dropbox-using team members can edit files, and you can commit those changes as needed. I typically push changes from dropbox to origin master prior to any time I might pull from origin master to master. I pull from origin master to dropbox right after I push from master to origin master. The origin remote doesn’t have a dropbox branch at all, though I’m sure there might be reason to in some workflows. It gets a bit tricky if more than one person needs to use git and Dropbox. I haven’t had this problem yet, but it’s likely to come up if multiple git users need to pull in changes coming from Dropbox users. First, gitdir: stores an absolute path, so all git users will have to tell Dropbox not to sync the .git file using this trick. Second, users after the first will likely have to create a separate linked directory, copy the .git file into the shared dropbox directory, then manually edit both the .git file and it’s equivalent in the main directory, which is

home/project-repo/.git/worktrees/project-repo-dropbox/gitdir

If anyone tries this final step, or any of this let us know how it works!

Cool idea Noam! I am still a bit confused about the relationship to origin. So you are pushing directly from dropbox to origin master, and not to origin dropbox? So this means you don’t merge dropbox into master locally? Can you think of reasons why one or the other might be better, or does it not really matter either way?

Alternatively, if you had access to a running remote server (not GitHub, but even an Amazon EC2 micro instance would work), you could set up an rsync script via a cron job to pull changes from a Dropbox folder into a git repo, commit those changes to the dropbox branch, and push to a GitHub remote, all on branch dropbox. That way everyone who wanted access to that via git would have it, and no extra setup is required beyond getting the server set up. You could conceivably run this cron job every few minutes, since rsync would be smart enough not to do anything if there are no changes, and git as well.

Not perfect if multiple people are editing something at the same time, but that is a problem with git and with Dropbox more generally. And I suppose a cron job running almost constantly would help with that.

On second thought maybe configuring all this is not as easy as just using git-worktree…

I am pushing from dropbox to origin master, in general, just so that the GitHub repo is updated slightly faster than when might push my local master changes. I don’t think it really matters either way.

I like the idea of a dropbox branch always automatically reflecting the state of the Dropbox folder, and one might be able to do this with some webhook approach that doesn’t require a server. The Dropbox API also lets you get version history, so one could use that to make a bunch of auto-commits to a dropbox branch that reflect the changes.

I think the issue with both Noam’s approach and my approach is as in the discussion here – they’re both basically one-way. Easy to get from dropbox changes into git, or with one person using git, git changes into dropbox, but what if there are multiple people working in git and want to get the changes back to dropbox?

Although I suppose if you use the git-worktree approach, any changes you merge into the dropbox branch would get synced to everyone else via dropbox. So maybe a hybrid approach solves this problem? A web-hook or rsync/cron job to keep the Github remote repo dropbox branch up to date (pull/push) with any changes made exclusively in dropbox, and then all users who prefer to use git have a local master or local working branch, and merge into local dropbox before pushing?

It feels like this is close to being something totally doable and something quite close, but just out of reach. Perhaps if there were a diagram and a list of possible workflow steps, it would make more sense. You’d have to account for different scenarios, one of which I would imagine is that the dropbox branch always needs to be the point of reference (and not master), so you don’t have to impose too many rules on Dropbox-only collaborators, and then some rules about the commit-merge flow for git-using collaborators (only commit to local working branch which never goes to a shared origin, pull origin dropbox, merge working to dropbox, push origin dropbox – which then gets pulled into dropbox for everyone else via the web hook/rsync/git-worktree combo).

oh my goodness, @noamross, this works wonderfully! I have a large group of collaborators who are writing several papers, including R code, in shared Dropbox folders. I have been nervously wishing I could secretly version control their work. Now I can!

I have found that it is impossible to add an existing dropbox folder as a git worktree, but it is possible to hack out an alternative. I created a git worktree folder next to my collaborators’ project folder. Then, I moved the .git file into the correct folder, and deleted the original folder.

It even works with Rstudio! The awesome-paper folder has an .Rproj file. When you open that project, users can make commits to the dropbox branch (checked out in the worktree) and then later I can merge dropbox with master (checked out in project-repo/). Then i’m free to push from there to Github, as an extra backup!

Further refinements on this front: I use Dropbox Selective Sync (SS) to prevent the .git file from being shared. I alluded to this above, but here are all the steps now that I’ve done this a few times:

In project-repo-dropbox, temporarily rename .git to something like .git2

Then create a directory with mkdir .git

Go to Dropbox UI on your client and under Preferences > Account > Selective Sync, navigate to project-repo-dropbox and unselect the .git folder. This will remove the .git folder from the directory on your computer.

Go back to project-repo-dropbox in the command line and rename .git2 to .git

In the Dropbox web interface, navigate to project-repo-dropbox and delete the .git folder there.

Yes, this is a lot of hoops to jump through, but basically none of it is visible/relevant to your dropbox collaborators.

Other git users can now do the same thing and have whatever folder structure they want without everyone’s .git files borking each other.

If multiple dropbox users are using RStudio, though you may want to advise them to use Selective Sync on the .Rproj.user directory (as suggested in the RStudio docs).

(Interestingly RStudio Server Pro, which has shared project support, splits up the .Rproj.user stuff into separate user folders to prevent conflicts and allow the users to each save different IDE states.)