GIT Under the hood (part 1 of 2)

When you get started with GIT for the first time, it might be a little overwhelming. There’s a lot going on. Branches, merges, commits, fetching, pushing and pulling. Despite this, you will probably grasp the basics very quickly and work with GIT on a day to day basis without much hassle.
However, at a certain point you will want to use a solid workflow and with that, more complicated branching compositions. Concepts like fast-forwarding, rebasing and tagging will probably have to be a part of your toolset to maintain a readable history and an organized flow of changes. If someone (unlike yourself of course 😉 ) would screw up, you will want to be able to fix that. If you really understand the inner clockwork of GIT, and what specific actions do to your commits, you might just become the “GIT superhero” you always dreamed to be.

Before reading on…

This article assumes you have a very basic understanding of GIT.

That is, you have a basic understanding of the following terms:

Fetching, pushing and pulling

Branches

Merging and solving conflicts

GIT Building blocks

A git repository is composed out of branches,tags and commits. Furthermore, GIT is a de-centralized SCM, meaning that every GIT client has their own copy of the content.

It’s important to realize what this means and what properties all building blocks of a repository have.

Let’s look at all this in detail.

Commits: Building blocks

As you should know, commits are the building blocks of the current state of what content is residing in your repository. Each commit is preceded by either one or more commits. The sole exception being the first ever commit residing in the repository. A commit is a diff on top of the commit(s) that it succeeds.

A commit that is preceded by more than one commit is a merge-commit. A merge commit brings diffs that were done parallel to each other back to a single point. In the next chapter we will look at this in more detail.

Commits are identified by their hash, which is derived from their content. This way, a commit has an unique ID no matter where it resides, ever. Even when it is moved around inside the repo or between repositories.

Branches: Parallel universes do exist

Branches are a very important part of GIT. By the time you fully understand what a branch really is and how commits come together and split up because of them, you have taken the most important steps towards fully understanding the inner workings of GIT.

You might be surprised to know that a branch is just a pointer: a name that points to a commit. This means that deleting a branch does not delete any content, it will only be harder to find content back afterwards. You can create branches at any point in time referring to any commit. That includes ones you didn’t even push yet, or a commit that has been forgotten and long past. When you make a branch you can add to it whatever you want, without interfering with other clients. That is, as long as you don’t push your commits to their branches.

Branches can be made from commits in the past, so you can work from a situation in the past and merge it back in to your current branch as if you were altering content from a parellel universe! Awesome!

Branch Tips & Tricks:

Make a new branch from a commit ID

Make a branch from any commit of your own, or someone else. This is useful when you want to start new work while you want to avoid having to work with the latest state of the repository.

Another potential problem that you can solve using this technique is fixing problematic branches. Basically you would be excluding commits in a newly created branch that you continue your work in.

Decide afterwards, that the things you are committing to a branch, don’t belong there.

What you can do is create a new branch from your current branch, and delete or hard-reset your local copy of the branch you were previously working in. Be aware that this can only be done as long as you did not push your commits to the remotes yet, as you can’t change their history.

The de-centralized nature of git: local copies of remote branches

When you are talking about branches, that is where you will notice that GIT is decentralized. Everyone has their own local copy of a remote branch. Therefore when we refer to a branch, we always refer to it like this: <nodename>/<branchname> for example: origin/master.

When you want share your work in a branch, you pull it and you get a copy of the remote branch that we should call local/master. When you add commits, they are only residing in your local copy, until you push them to the remote. Pushing can also be done to other clients that have a copy. These are called “remotes” and can be added to the repository to be tracked. Git knows where you want to push your commits by default. That is because the git software defines unidirectional relationships towards remote branches. This is referred to in GIT as “tracking branches”

Tags: Making snapshots of perfect moments

A tag is just like a branch a pointer to a commit. The intention of what we do with tags is a little different, however. The difference being specifically that once we create a tag, we will (almost) never change it afterwards. This way we will have a permanent reference to the state of content at a specific point in time.

Tags are being used to identify versions/releases or other important points in time of a GIT repository, like project milestones, important implementation moments etc.

Tags tips & tricks

Make a new branch from a tag

Make a branch from a tag to apply a fix to a specific (older) version of your software. The fix should be given a new tag (e.g. 1.4 –> 1.4-rev1.) Afterwards, trash your branch and only keep the tag.

In the next part:

In the next part we will put some more of this in-depth knowledge to use. We are going to talk about rebasing and fast-forwarding, what really happens when a merge is done, and how commits are unique while even being moved around like in cherry picking.