Archive for the ‘Version Control’ Category

Over the last years, the version control system community has fought, what some people would call the “VCS war”. People were arguing on IRC, conferences, mailinglists, they wrote blog posts and upvoted HN articles about which was the best version control system out there. The term “VCS war” is borrowed from the “Editor wars”. A constant fight which people argue which of the major editors VIM, or Emacs and later TextMate or Sublime, or again Vim and Emacs is the best editor. It is similar to programming language discussions, shell environments, window manager and so on and so forth. What they all have in common is that they are tools that are used daily by software engineers, and therefore a lot of people have an opinion on it.

When in 2005 both Git and Mercurial were released and Bazaar followed shortly after, the fight who is the best system of the three started. While early Mercurial versions seemed to be much easier to use than Git, Git was already used in the Linux kernel and built up a strong opinion. Things were even till 2008 Github launched and changed the OpenSource world and is what people would consider Git’s “killer app”. Mercurial’s equivalent Bitbucket never reached the popularity of Github. But people were still writing articles, arguing about merging and rebasing, arguing about performance and abilities to rewrite history, wrote long blog posts about confusing branching strategies. Those were complicated enough that they had to write helper tools, about which they could write articles again….and so on and so forth.

Recently things have become quiet. But why is that? What happend to Git, Mercurial and Bazaar?

Bazaar

I haven’t followed bazaars history much. It’s most notable users were MySQL and Ubuntu. In the early development bazaar lacked performance and couldn’t keep up with Git and Mercurial. It tried to solve this by changing the on-disk format a few times, requiring their users to upgrade their servers and clients. The development was mostly driven by Canonical and they had a hard time reaching out for more active developers. In the end there isn’t much to say about Bazaar. It development slowly deceased and it’s been widely considered the big looser of the VCS wars. Bazaar is dead.

Mercurial

Mercurial started out for the very same reason Git was created and was developed at the same time Linux wrote Git. They both had a fast growing active development group and were equally used in the first years. While Git was the “faster” decentralized version control system, Mercurial was widely considered the more user-friendly system. Nevertheless with the rise of Github, Mercurial lost traction. However the development continued and while more and more people used Git and Github, the Mercurial community worked on some new ideas. Python picked it as it’s version control system in 2012 and Facebook made moved to Mercurial in 2013. So what’s so interesting about Mercurial?

Mercurial is extensible: It’s written mostly in Python and has a powerful extension API. Writing a proof of concept of a new backend or adding additional data that is transferred on cloned is fairly easy. This is a big win for the Python or the Mozilla community that makes it easy for them to adapt Mercurial to their needs.

Mercurial caught up on Git features and performance: Mercurial added “bookmarks”, “rebase” and various other commands to it’s core functionality and constantly improved performance.

Mercurial has new ideas: Mercurial came up with three brilliant ideas in the last 3 years. They first introduced a query language called “revsets” which helps you to easily query the commit graph. Second, they introduced “phases”. A barrier that prevents user from accidentally changing or rebsaing already published changesets – a common mistake for Git users. And last, but not least Evolution Changeset, a experimental feature that helps you to safely modify history and keep track of the history of a changing commit.

So while Mercurial is certainly not the winner, it found a niche with a healthy and enthusiastic community. It’s worth a shot trying it if you are not 100% happy with Git.

Git

The big winner obviously is Git. With the introduction of Github pushed Git usage. Github’s easy to approach fork&merge mechanism revolutionized OpenSource development to a point where most younger projects don’t use mailinglists anymore but rather rely on pull-requests and discussion on Github issues. Github’s feature and community is attractive enough for people to learn git. In addition, Git had a healthy and vocal community creating blog posts, introduction videos and detailed technical explanations. Noways Git market share is big enough that companies move from Subversion to Git because a new hire will more likely know Git than any other version control system (maybe SVN). As an open source developer, there is no way around Git anymore. Moreover the development is going on in rapid pace and the community constantly improves performance and is slowly reaching the v2.0 milestone. It’s yet to be seen if they are going to port some of the ideas from Git. A major challenge for Git however, still, is to deal with large repositories, something that at least the Mercurial community has partly solved. If you haven’t learned it, learn it, there isn’t going to be a way around it anyway – deal with it.

A conclusion

The war is over, and we are all back on working on interesting features with our favorite Version Control System. Nobody needs to write blog posts anymore which system is better and you certainly won’t be able to circumvent Git entirely.

A short reminder to myself. There is a small undocumented feature. It will try to linearize a changegroup by doing a deep first walk of revs and store them in that order. This creates long lines of revisions, in which each revision is stored after it’s parent. This assumes that changes are minimal between a child and a parent. To reorder an existing repository use:

It’s been a long time since I’ve written part I of the bookmarks revisited series. In the last two years, bookmarks changed a lot. They became part Mercurial’s core functionality and a lot of of tools became bookmark aware.

The current state of bookmarks

As of Mercurial 1.8 bookmarks are part of the Mercurials core. You don’t have to activate the extension anymore. Bookmarks are supported by every major Mercurial hosting platform. Commands like hg summary or hd id will display bookmark information. In addition, the push and pull mechanism changed. I will go into details about his Part III of the series.

It’s safe to say, due to it’s exposure, bookmarks became much more mature of the years. It’s time to take a look at how to use them.

Bookmark semantics

Bookmarks are pointers to commits. Think of it as a name for a specific commit. Unlike branches in Mercurial, bookmarks are not recorded in the changeset. They don’t have a history. If you delete them, they will be gone forever.

Bookmarks were initially designed for short living branches. I use them as such. It’s indeed possible to use them in different contexts, but I don’t do that. Please be aware, although they were initially intended to be similar to git branches, they often aren’t. They are not branches, they are bookmarks and they should be used like you would use a bookmark in a book. If you advance to the next site, you move the bookmark (or it gets moved).

A bookmark can be active. Only one bookmark can be active at any time, but it’s okay that no bookmark is active. If you have an active bookmark and you commit a new changeset, the bookmark will be moved to the commit. To set a bookmark active you have to update to the bookmark with hg update <name>. To unset, just update to the current revision with hg update ..

A bookmark can have a diverged markers. Bookmarks that are diverged will have a @NAME suffix. For example test@default. Diverged bookmarks are created during push and pull and will be described in Part III.

Software projects choose languages based on idioms of the languages. Languages can provide mechanisms and structures to support object orientation or functional programming. Less time is spent thinking about backwards compatibility of programming language runtimes. While this is usually a non-issue for short living software like websites or software in tightly controlled environment, it becomes an issue for software projects that need to guarantee backwards-compatibility for years. For example: a version control system.

The Mercurial project aims to support Python 2.4 to Python 2.7. It does not support Python 3. Why? Python 3 is a drastic change. Unicode is the default string type, classes removed, etc. The impact of the changes are similar to the change from PHP 4 to PHP 5. Most software projects have adopted these language changes, but for projects that need to support LTS operating systems like RHEL or Solaris 9/10, it can be become an issue. You could drop Python 2.X support and tell existing users of your software to look for something else – a no-go for a version control system. You could simply not support Python 3 at someday, but Python 2.7 already reached it’s EOL. It’s just a matter time until distribution stop shipping Python 2.X. LTS operating systems might still not have Python 3 and rely on Python 2. Writing software that needs to be backwards-compatbile for 8 years can be a problem.

The source of the problem

Why is this a not an issue for Java or C, but for Python, PHP and Ruby? Java and C compile to bytecode that is guaranteed to be stable. C compiles to machinecode. A processor architecture won’t change anymore. If it’s a x86 processor, it will support x86 machinecode. It won’t change with the next software update. If your code needs to support old C code that modern compilers don’t understand anymore, use an old one. Java is similar in that regard. The JVM runtime has a defined set of instructions, which won’t be changed anymore. It doesnt matter which Java compiler you use, in the end it will produce bytecode that will run on any JVM. Sure you still might have problems supporting multiple versions of a library, but at least the JVM will always run your compiled code.

Python and PHP compile to bytecode as well, similar Java. There is, however, one exception: They do it in memory and the VM to interprete the bytecode is bundled with the compiler. This is were the backwards compatibility problem comes in play. You cannot run Python bytecode compiled on Python 3 with a Python 2 interpreter. You cannot compile with PHP 5 and run it on PHP 4. Either the interpreter simply fails to your old code, or your VM implementation is not guaranteed to be stable. That means in Python and PHP the underlying machine that you compile might change with the next update. Let’s compare this to the x86 world. Your next software update might change the x86 instruction set? You would have to recompile all your C code and maybe some of the old C code cannot be compiled with modern C compilers and old C compilers might not be able to get compiled on the new instruction set. Sounds painful, particularly if you really care about backwards-compatibility.

Sidenote

I think that Python, PHP and others did an architectual mistake. They bundled the VM and runtime with the compiler. Thus your language version defines your runtime and the underlying machinecode. If you write a new language, write down a minimum instruction set that you will always support and separate your VM from your compiler. Always support that instruction set. This can lead to interesting problems. The implementation of Java Generics is a good example. Nobody thought about generics when defining the insturctions set. Therefore the bytecode was not designed to retain information about the generic type. Thats why the Java compiler needs to check the generic type information and than transform it, so that the resulting bytecode is compatible with old JVM versions. This is known as type erasure. Python and PHP developer would probably just introduce new bytecodes, not caring about BC. (Well PHP devs would just pretend that PHP is a web language and web projects shouldn’t care about BC at all ;)).

Conclusion
If you seriously care about backward-compatibility for LTS systems that are 8 years old, choose a language which separates the VM from the compiler. Languages like Java (probably C#) do this. Java developer won’t define behavior that requires a new opcode. PHP and Python are wonderful programming languages, but personally I am not sure if it is wise to write something like a VCS in such a language.

Long story short: Language choice matters for BC. If you write your own language, please separate your VM from your compiler. Better (as johannes pointed out) compile to an existing VM like JVM, CLR or LLVM

We recently launched geocommit.com. Geocommit is a service to add geolocation data to your commits. You only need a working WiFi connection. No GPS module is required.

This blogpost gives you an example how to use geocommit and the geocommit.com services. I’ll show how to use geocommits in your Git or mercurial project. How to make github and bitbucket more beautiful with our Chrome and Firefox extensions and how to get a fancy map of your geocommits.

What is geocommit

First of all, geocommit is a text format to attach geolocation data to version control system commits. The geocommit website has detailed information about the geocommit format.
Second, geocommit is a service to store and analyse your geocommit data. We offer a set of tools and a webservice to make geocommit cool. The Git implementation git geo runs on Mac OS X and Linux. The Mercurial implementation hg-geo runs only under Linux. Mac OS support is under way.

This will enable geocommit support in your project. If you commit something with git commit, git geo will try to get your current location and add a geocommit. If no WiFi connection is enabled, no geocommit will be created.

git geo push accepts the same options as git push. It pulls geocommits first, merges them and then pushes geocommits and the given branch to the remote repository.
That’s everything you need. Easy, isn’t it? So let’s see how to enable geocommits on Mercurial and then talk about the Chrome and Firefox extensions.

Deep dive
git geo stores geocommits in git notes. We use the namespace geocommit for that. Git notes have some cool properties. They are metadata and don’t change the commit hash. Therefore they can be added to a commit at anytime. They are displayed on github and can be deleted without any problem. You also can decide yourself when to push geocommits or not. You can delete already pushed geocommits without breaking the repository or changing any commit sha1. The drawback is that it is hard to deal with git notes from time to time. git notes is a new feature in git and not yet fully supported. We have to write a script to merge git notes as git notes merge is not available before git 1.7.7.

Mercurial & geocommit

You can add support for geocommits to Mercurial by installing the hg-geo extension. Clone the extension and enable it in your hgrc:

Deep dive
As Mercurial doesn’t have a way to store metadata, we are adding the geocommit data to the commit message itself. The obvious advantage is that you can use hg-geo with plain Mercurial. You do not need to enable hg-geo on the remote site to push geocommits (like Mercurial bookmarks). The disadvantage is that we modify the commit message and therefore the commit hash. There is no easy way to delete geocommits once they are created.

bitbucket.org and github.com

We can push geocommits easily now. But how to use them? We can install the Firefox or Chrome extension. This will display a map next to your commit!

Firefox
To install the geocommit extension for Firefox you need Greasemonkey. Greasemonkey is a well know and supported extension that enables user scripts to safely modify the displayed website.

Install Greasemonkey from userscripts.org. You can then browse bitbucket.org or github.com and see a map of your geocommit:

Post Hook

We offer a post hook that you can use with github.com and bitbucket.org. Your commits will be tracked by gecommit.com and we will create a global and a project specific map as well as provide further analytics as soon as possible.

github.com
To install the hook go to th eadmin section of your repository and select Service Hooks.

Add http://hook.geocommit.com/api/github as a POST service hook.

on bitbucket.org
Go to the admin seciton of your repository and select Services

If you want to review remote changes from Mercurial offline you cannot use hg incoming. For sure there is a nice way to do it. So here is what I do to get changes from a repository to review them later without pulling them into my repo before reviewing. It also has the advantage that you can review changesets that include a given file. This is not possible with hg incoming.

The -R incoming.bundle option tells Mercurial to use the bundle as an overlay for the current repository. The –no-merges option tells Mercurial to not display merges (which I usually use for reviewing patches) and the -p option is there to display the applied patches in the output. I use — hgext/bookmarks.py to display only changesets related to the bookmark extension.

Bookmarks is an extension to the Mercurial SCM, which adds git-like branches to Mercurial. The extension is distributed together with Mercurial.
Recently the extension has received a major update. Time to look back.

I really enjoy giving talks. This is particularly because I like to teach people something and because I’m really enthusiastic about the technical things I talk about. Once of these things are obviously decentralized version control system, in particular Git and Mercurial. Finally after two years of submitting talks to various conferences, people and conferences in the PHP community start to pick up this topic. Seems that 2010 is the year of DVCS, and I’m really looking forward to give a talk about the advanced features of Git at

The talk will give a very brief overview how Git works, and will then give a more detailed insight in how Git handles commits, files, etc so that people get a very good understanding about the concepts that are needed to fully understand tools like git rebase, git reflog and git svn. The aim is to provide them will all necessary information and a few examples to get lost commits back, rebase their branches and design more complex git workflows in the future without needing to search the web or ask a guru.

A second talk will be more focused on beginners and developers coming from subversion. This talk will be part of a series of talks the german telekom is organizing. I’ll also give an extended version of this as an in-house workshop at a Munich based company.

So for me it seems that after five years, DVCS is mature enough to get into companies and that we can expect a bright variety of companies to adopt new tools and workflows. Let’s see what’s coming…

For sure I still offer Git and Mercurial trainings, so feel free to contact me (dsp ~at~ php ~dot~ net)

Using a decentralized version control system (DVCS) like Mercurial (hg) or Git as a client for Subversion is very common. With the unique features of a DVCS a developer can have both the features of offline development and local branching while still being able to push to a subversion server. This approach is often used in environments in which subversion is the given version control system. While the approach of using this bi-directional push and pull mechanism, provided by git-svn or hgsubversion, works perfectly for one developer, it has limitations working in a team using the usual DVCS push and pull concepts.

The following article will outline the current limitations of bi-directional dvcs to subversion bridges and shows a simple approach to implement a solution for a certain instance of the problem.