gitology (n. from git, a comptemptible fellow or DVCS): The pernicious study of the inner workings of git and how to apply them.

Thus gitology refers to the theoretical framework and mental model one must absorb as a prerequisite to becoming fully competent with git’s commonly-used features. Here a distinction must be made, because although there are many gitological concepts that only apply to git, many others are part of general DVCS theory. Gitology, however, sometimes interprets general DVCS concepts in its own way.

The following gitological concepts are not particular to git:

repositories (repos)

commits or changesets (csets)

directed acyclic graph (DAG)

branches

pushing and pulling changes

whole-repo tracking, not individual files

rebasing csets

pushing and pulling csets

The following are purely gitological and add unnecessary complexity, in addition to eventually being unavoidable:

Exposing the index/staging area

Exposing other implementation details: blobs, trees, commits, refs

refs and refspecs

Branches are refs

Detached HEADs (a.k.a “not on a branch”)

Distinguishing remote and local tracking branches

Choosing which branch to pull onto

Bare repos

Hard, soft, mixed resets

Porcelain vs plumbing

I will say more of each in turn further down.

Unavoidable gitology

If you are a user of git, I invite you to try to describe the purely gitological concepts above. If you have used git more than casually, I am sure you probably know what most of them are. If you don’t know most of them, it is likely because you haven’t used git for very long.

It is clear that gitology is unavoidable. A user of git must quickly become a student of gitology for git will make no attempt to hide its ugly guts to said user. Consider for example one of the first tasks that a user of git will encounter, making a commit. This is how git’s manpage describes this operation (and manpage it is, in the classic style of nerd-only Unix documentation):

Stores the current contents of the index in a new commit along with a log message from the user describing the changes.

Immediately the first gitological concept pops up, the index. Now, the index is not unique to git, nor even to a DVCS. Practically every VCS must implement an index in one way or another However, it is an implementation detail, which a VCS might decide to not implement for whatever reason. The great gitological stroke of genius was to gleefully expose and moreover force the user to manually handle the index. Any other common VCS only optionally makes the user handle it. Moreover, this is touted as a frequently-loved feature of git.

Let me pause here for a moment to describe why this is characteristic of git’s pathological design choices.

The index is an implementation detail. Git refers to implementation details as “plumbing” and the user interface as “porcelain”, in order to make it clear that git’s designers think of git with the same reverence I think of the instrument that handles the organic waste that my body produces. Git’s makers (I hesitate to suggest that they actually consciously designed this, so I won’t call them “designers” again) refer to the index as “porcelain”, whereas it should be “plumbing”, as evidenced by how it’s handled by hg, darcs, bzr, and yes, even crufty ol’ svn handles the index automatically.

They used manpages to document them. The manpage is a Unix developer format that tends to lead to terse or obscure documentation. It doesn’t have to do this (e.g. BSD tends to have great manpages), but as the name indicates, it used to be just a single page, really, just a cheat sheet with mnemonics to remind people what they already should have known about a program. The UNIX Haters’ Handbook explores this documentation problem in more detail. In this case, if you didn’t already know what an index was, the manpage is going to make it difficult for you to figure it out. For programs that are intended to be used by nerdy Unix developers only this wouldn’t be a problem; however git is primarily a tool for collaboration, very widespread collaboration, and expecting all collaborators to be nerdy Unix developers is unrealistic and hinders the very collaboration it’s supposed to encourage. They could be Windows users, they could be less technical contributors like translators or graphic artists (yes, it makes sense to put graphics under source control, even a DVCS, with some care). Manpages are inadequate for these people.

Exposing the index is frequently touted. Git encourages micro-management, and git’s users end up loving this micro-management (and blog about it, and write books, and have conferences, and so on ad nauseam). This is characteristic of the perversion that git promotes, focussing on details instead of getting work done. Of course it’s hard work to understand gitology, so of course people feel accomplished once they complete this work, but it’s work that shouldn’t exist. It’s not that it’s wrong to expose an implementation detail if it allows more advanced use. It’s wrong to have no way to hide this detail completely. There are workarounds, like passing options to commit, or using a front-end to git, but remember that I specifically do not hate front-ends to git (poor things are just doing the best they can to fix a horrible UI). It should be the other way around, though. It should be abstracted away from the user, and should the user want more advanced use, there should be a developer API upon which we can build more advanced tools.

Why gitology?

So why does git do the things that it does? Why expose all the plumbing? The situation is basically the following:

While the makers of git are so happy about all the flexibility that git offers due to exposing its plumbing, its users that are not also of the same persuasion are aghast that it requires learning about the path that human refuse follows through git.

The ability of a tool to be so flexible as to allow myself to get shot in the foot never did much for me other than leave me with injured feet.

A brief excursion into gitology

The gitological concepts I enumerated above are all deserving of a more thorough study in forthcoming blog posts. For the moment, I will briefly describe the remaining ones and why it’s not necessary in order to have a working DVCS.

Blobs, trees, commits, refs

Git’s simple storage model is simple enough to be exposed to the user, so why not make everyone learn it? The more internal details we can expose, the better. It is a rite of passage for any serious student of gitology to read Git For Computer Scientists or equivalent.

refs and refspecs

Refs are part of git’s storage model. They’re basically pointers, and just like carelessly manipulating C pointers results in segfaults, carelessly manipulating git’s refs results in lost data. Most of the time it’s not a problem, so naturally they should also be exposed to the user. Refspecs are a general class of ways to specify a ref, and some of their syntax is based on what a certain hacker learned to type when he was inspecting the results when a certain kernel took a core dump on him.

Branches are refs

A branch should be a simple concept independent of any implementation detail: a line of development in the repo’s DAG. For the most part, this is what they are in git, except that they identify branches with refs, so amongst other complications this leads to

Detached HEADs

When you check out an earlier commit in git, you end up with the cryptic “not in a branch” message… then where the hell are you? Isn’t the DAG made up exclusively of branches? No, a branch is a ref, so if you’re not at a commit (recursively) pointed to by a ref, you’re not anywhere and you might as well not exist and git will eventually garbage collect you.

Distinguishing remote and local tracking branches

A DVCS really shouldn’t care where branches are, and for symmetry (because symmetry is beautiful and simple) a branch shouldn’t change its nature if it’s here or there, or at least it shouldn’t appear this way to the user. Git, of course, disagrees and makes you remember the distinction between your local copy of the remote branch and the remote branch. It’s not such a big deal in the end, except when you run into

Choosing which branch to pull onto

In git, you can’t just say, “get over here whatever differences are over there” (i.e. “pull”, a common DVCS operation), because you have to remember where to pull it to. “Here” isn’t always enough to git, because the sad developmentally stunted self-described stupid content tracker can’t know which “here” you mean. You have to specify a branch. Because your branch isn’t the same as the branch over there, so it can’t always automatically go on whatever branch here corresponds to the branch there. This is all symptomatic of the asymmetry of remote and local tracking branches, as are

Bare repos

Because in git you can’t treat remote and local symmetrically (i.e. push and pull are not symmetric operations), you also can’t push (or shouldn’t, rather) push to any repos willy-nilly. So for example, you can’t make local clones in git and push and pull amongst them, because git can’t figure out what to do with the working directory of each. The gitological solution to not knowing what to do with the current working directory is to remove it altogether in what is known as a bare repo. There are workarounds, of course (create a zillion branches, which is what git users love doing), but git’s design choices really hinder the naturality of cloning. There is also deeper design problem lurking here that seeps through the stupidity of the content tracking.

Hard, soft, mixed resets

This one is admittedly more esoteric. There are several ways to do a “reset”, and really they all mean something completely different, but I chose this gitological example to point out another boneheaded aspect of git’s, shall we say, evolution. Gitology teaches that it’s best when a single command does too much and does completely different things depending on which options you choose but actually sort of the same thing if you realise that the underlying implementation is the same. It’s like having to “move” files if you want to “rename” them because that’s how filesystems are implemented, oh, wait, I start to see where these gitologists are getting their ideas from…

Coming up…

So this is just a taste of my git hate, because the fans were clamouring for more. I have so much more to hate about git, but I already feel a little useless having devoted this much breath to git, so I will probably take a while to write the next installment. It will come, though. It doesn’t look like git is getting any less hateful any time soon…

19 Responses

I’m really quite enamoured with this particular bit of githate. I especially enjoyed your commentary on the plumbing vs porcelain issue. Also, the fact that the entire piece is so incredibly scathing – it as if you just can’t even fathom why git exists at all in its current state – exhibits excellent use of the prefrontal cortex.

Glad to know I helped. I have a forthcoming blog post about how “moving branches” is a really dumb “design” choice and exemplary of the general idiocy that pervades in git. With encouraging words like yours, it may happen sooner. :-)

Do you still have that blog post in the pipe? It currently seems I managed to avoid having git forced on most people in my project group, but that does not mean that this won’t change…

Bare repos are a nice example of a horrible design choice: The basic workflow of just having two versions of the code in two folders simply does not work with git.
But I think you could have elaborated a bit more on the problem of a zillion branches (they require an additional command even for trivial operations – though that might have helped getting git folks used to using many branches, which could be a strategically good thing, if that need were not enshrined in the basic design, so that it very likely isn’t just a startup learning phase but a permanent limitation).

Thank you! Everyone in my office is being forced to use GIT b/c, well, someone in management heard a buzzword and thought it would be great for all of us to use it. . .but, they want us to use it exactly like Subversion.

*bangs head on wall*

So far, I’ve had GIT commit the one file out of 3,000 that I wanted. When I looked at our remote GIT repository, it deleted 2,999 files and updated the one I changed. Fantastic.

luckily, our subversion repo is still around so I didn’t really lose anything.

Not sure if I found this on your site or someplace else, but it describes how easy it is to use GIT for a simple, 2 developer, non open source, project. In subversion, I change a file, I commit. that apparently is too difficult. Luckily, GIT has a much simpler work flow for us now:

“That is one of the simplest workflows. You work for a while, generally in a topic branch, and merge into your master branch when it’s ready to be integrated. When you want to share that work, you merge it into your own master branch, then fetch and merge origin/master if it has changed, and finally push to the master branch on the server.”

Subversion is indeed simple until you want to do something like working on your own without stepping on anyone’s toes. DVCS isn’t a problem. The problem is just how git presents it. Try reading this gitless Subversion reeducation and see if makes sense for you. I’ve heard a lot of people report that even if they decide to use git anyways, this bit of Mercurial documentation helps them out.

Furthermore, Mercurial tries to cater to users of other VCSes, distributed or not, because it wants to present a consistent interface. Git makes no such concession.

While it takes a while to get used to a DVCS, it shouldn’t take as long as git makes it out to be.

You shouldn’t fear a DVCS and wish for visual source safe, which is in all ways an inferior version control system than git (yes, it’s even worse than git). I do recommend that you give Mercurial a try. It’s not as popular as git, but it’s just as powerful, and not as crazy.

Mercurial originally attempted to capture CVS and SVN users. Its UI was built to mimic the UI of those two. Git willfully ignored those existing UIs and started to create its own. Now, a decade later, we’re in the strange situation of where everyone has learned the git UI and is unable to remember what A DVCS looked like before git.

I don’t have an “hg for git users” reference for you. Perhaps I should write it. All I can suggest for now is that you forget most everything you know about git and especially don’t expect words you learned in git to mean the same thing in Mercurial. You have to approach hg with a beginner’s mind.

I am curious in mercurial, mainly because I know a few projects that I consider to be important to me (and that I occassionally write a pull request for). I really envy mercurial users for being able to script it with python – famously git is extended by shell scripts and that is not pretty.

A lot of criticism of git boils down to “Exposes implementation detail to the user” while pointing at the solution mostly something like “Mercurial has well-defined operations so you do not need to think about internal data structures.”.

In my experience, data structures are easy to understand, processes are harder to understand. A directed acyclic graph of revisions is something that is not terribly difficult to grasp. tags as references to vertices in that DAG are not difficult to understand. Branches as tags that move with committing is slightly more difficult to understand but its quite well-understandable in the context of the underlying DAG. Within a commit, the tree structures etc. are also within my knowledge as a PC user, resembling a file system. Although I hardly see references to this in my daily git usage, this part of GIT is hardly exposed to the user.

With mercurial, I have hard times because most commands are defined/explained procedural, not delarative with Graph changes. Mercurial implements “User stories” and probably this definitely leads to more meaningful command names and more sensible defaults, but by hiding away from me what is going on, I have less control, trust and generally I find it more challenging.

I hope I don’t sound like that person that just defends the first tool they have ever used and is not eager to learn another or accept another approach. Having used CVS and SVN, Git was a relevation to me because it was so SIMPLE and accessible.

I would love a Mercurial for Git users Tutorial from you that explains on Branches, merges and “rebase” workflows in Mercurial because I am always watching out to learn new things.

The problem is that is was used before to be worked correctly
and it is now impossible to “fix it”
even more because of actual users are fearing any kind of change
so i guess the only solution is ANOTHER IMPLEMENTATION
of similar concepts but in a CLEAN WAY.
git is non intuitive, non consistent, confusing, unfriendly, and sometimes dangerous.

About this blogger

I’m interested in several kinds of mathematics and free software. Often in an interplay between them. Most of the free software with which I like to directly contribute is mathematical. I’m involved with GNU Octave and Mercurial, a DVCS frequently offered as an alternative to git. You may contact me at my email: jordigh@octave.org Most […]more →