Gitslave—gits

Introduction to GitSlave

Gitslave creates a group of related repositories—a superproject
repository and a number of slave repositories—all of which are
concurrently developed on and on which all git operations should
normally operate; so when you branch, each repository in the project
is branched in turn. Similarly when you commit, push, pull, merge,
tag, checkout, status, log, etc; each git command will run on the
superproject and all slave repositories in turn. This sort of
activity may be very familiar to CVS and (to a lesser extent)
Subversion users. Gitslave's design is for simplicity for normal git
operations.

Gitslave has been used for mid-sized product development with many
slave repositories (representing different programs and plugins),
branches, tags, and developers; and for single-person repositories
tracking groups of .emacs and .vim repositories (in the latter case,
it is basically used to keep the slave repositories up to date via a
single command).

The gits wrapper typically runs the indicated git command on each
repository in the project and combines (and occasionally
post-processes for some special commands) the output from the
individual git commands to make everything clearer, which is very
useful when you have a few dozen slaves—looking at a
concatenation of normally identical output for each git command would
lose the wheat in the chaff.

Gitslave does not take over your repository. You may continue to use
legacy git commands both inside of a gits cloned repository and
outside in a privately git-cloned repository. Gitslave is a value
added supplement designed to accelerate performing identical git
actions over all linked repositories and aside from one new file in
the superproject, adjustments to .gitignore, and perhaps a few private
config variables, does not otherwise affect your repositories.

Other options

git-submodules is the legacy solution for this sort of
activity. submodules went a different way where you have a submodule
at a semi-fixed commit. It is a little annoying to make changes to
the submodule due to the requirement to check out onto the correct
submodule branch, make the change, commit, and then go into the
superproject and commit the commit (or at least record the new
location of the submodule). It was originally designed for third
party projects which you typically do not doing active development on
(it works the other way with a little inconvenience). Most git
commands performed on the superproject will not recurse down into the
submodules. As suggested above, submodules give you a tight mapping
between subproject commits and superproject commits (you always know
which commit a subproject was in for any given superproject commit).

Another option is to stick everything in one giant repository (either
natively or by the git subtree merge strategy). This might make your
repository annoyingly large and it is usually a bad idea to aggregate
multiple concepts in the same repository. It also doesn't work
conveniently (or at least efficiently) if the subsets are shared with
other super-projects or you changes need to be shared with the other
super-projects or back upstream.

Another options include repo from Google, used with Android. Repo
seems to work much like gitslave from a high level perspective, but
I've not seen a lot of documentation on using it for other projects.
Gitslave also came first.

Still another option is kitenet's mr which supports multiple
repository types (CVS, SVN, git, etc). It is absolutely the solution
for multi-SCM projects, but since it works on the lowest common
denominator you would lose much of the expressive power of git.

Gitslave is not perfect

Gitslave is imperfect in a few ways. It can complicate forensic
archeology, it may need special care and feeding if one or more of the
repositories are third party repositories, you can have partial
success and partial failure (no atomic cross repository actions), not
every git command has specific support in gits which needs it, and
things can get a little squirrelly if different branches/tags have
different attached slave repositories. However, we have not had any
significant problems in over two years of intensive work on a project
using this script nor has anyone else reported anything—do not
mistake that for a warranty or a guarantee, for there is none.

Gitslave complicates forensic archeology in two ways. Most obviously
you cannot have gitk (or something similar) show the complete history
of all projects in all linked repositories. Less obviously, there is
a very loose relationship between commits in different repositories.
You cannot easily and precisely determine what commit/SHA any other
repository was at when a particular commit was made (though you can
approximate and assume pretty easily). Only tags provide exact
synchronization between different repositories. Thus, gitslave may
not be appropriate for blame-based debugging or egofull programming.

Your setup may need special care and feeding if one or more of
the repositories is a third party repository. If you blindly attached
the true upstream master to your local repository, you are at the
mercy of the upstream commits to master. If there is a change which
is not fully baked, you cannot refuse to accept it. Also you cannot
easily use public branches since you probably will be unable to push
those branches to the third party repository. The solution is to:

Consider using a unique naming system for branches and tags. This
allows you to keep your branches and tags separate from the upstream
branches and tags. This might even go as far as ditching master as
your normal branch for your project-specific repositories (`git
symbolic-ref HEAD refs/heads/mymaster` can change the default branch
when cloning from a bare clone).

Choose one of the following schemes for updating:

Keep a project-local master mirror repository for the third
party package as your project's upstream (git clone --mirror --shared
URL mydir). Periodically fetch in the bare repository. When you are
ready to bring in some/all changes, you can `git merge` from
remote/origin/ to . This has the
disadvantage of requiring server-side git commands (the fetch) to be
executed, of requiring a strict separation of reference namespace, and
requires that you remember which upstream branches correspond to which
project branches, but at least you can see (via gitk) those merges
with the correct names.

A slight variant on the above is to have a normal bare repository
as the project local master, and use a bare mirrored client repository
(with the projectmaster as a remote) as a proxy to avoid having to run
commands on the project repository server. Fetch on origin and
(metaphorically) `git push --all --tags projectmaster` You then can
have a normal clone do the merge of origin/master into mymaster. As
long as you keep all local changes off the upstream branch, your
transfer repository can happily import changes from the true upstream
to the projectmaster and a normal clone can merge as necessary. It
still requires a strict separation of reference namespace, and you
still have to remember which upstream branches correspond to which
project branches, but at least you can see (via gitk) those merges
with the correct names.

The next variant gets rid of the requirement to have a strict
separation of upstream namespace and your project namespace (except
for the namespaceless tags). You create a normal project-master bare
repository and have a normal clone of it. That clone add a remote for
the true upstream. That transfer clone then merges between the
upstream remote branch and the project branch and pushes the result to
origin as normal. This still has the problem that there is no
memorized mapping between the upstream and project branches. Even
worse, no-one except this repository (or any repository with upstream
as a remote) will be able to see (via gitk) the mapping. They will
just see the merge from an anonymous branch.

Finally we have the punting option. Have a normal bare repo as a
local master and create a vendor branch in the repository. When you
want to update, checkout the vendor branch and replace the working
directory with the most recent checkout/tarball from the appropriate
upstream release/commit. Then merge the changes in. You lose the
detailed history of the upstream changes, but this is a very easy and
tradition method of importing changes. There is no question of
namespace contamination, but you must manually figure out what to
merge where in a normal checkout from your local project master
(though gitk can help you see what you did in the past). This doesn't
work at all conveniently if different local-project release branches
are tracking different upstream-project release
branches—creating multiple vendor branches loses the simplicity
which makes this option attractive.

Some git subcommands need special support from gitslave because they
deal with (typically) repository URLs. For instance, `gits remote add
NAME URL` is special cased because it has to figure out the correct
URL for each of the submodules based on the superrepository URL and
the subproject information. However, not all git commands have been
specially modified when run with gits. See the manual page for the
list of the ones which have, but specifically `gits remote set-url`
and `gits branch --set-upstream` are two which have not been specially
supported yet.

Even less perfect is the full and complete project documentation on
what gitslave does, how it does it, and the various features and
tweaks it might have. Gitslave isn't all that complex so the hope is
that it doesn't need alot. We have an extensive manual page which is
a good first step, and there is a lengthy tutorial on basic gitslave
operations. See the links on the left for more information.

Summary, gitslave is a powerful tool when used for good

When you have a problem which calls for easy multirepository
management without lots of synchronization, where you typically might
want to run the same git command over every repository in your
project, gitslave is the solution for you.