2010-11-06

Some days ago, youtube-dl, my most popular project, moved from being managed using Mercurial at bitbucket.org to being managed using Git at github.com. Since the move, I’ve been wanting to write something about it. I’ve also been wanting to rewrite the program partly or completely every time I look at its source code, but that’s a different matter. Back to the main topic.

I should start by apologizing to anyone who thinks this is a bad move either because they may have to rebase all their work in a new repository, bringing all their changes back, or simply registered at bitbucket.org to follow the project. It currently has 17 forks and 100 followers, and I’m pretty sure some of them registered there just to follow youtube-dl, and the move to github.com is, if anything, a problem because they would have to create an account somewhere else to continue following the project. Again, apologies to anyone for whom the move has no practical aspects.

That said, I’d like to explain why I made the move. You may recall I wrote an article some time ago about Mercurial vs. Git. Apart from explaining what I considered were the main differences between the two, I also wanted to express my indecision about which one was better. While I think Mercurial is and was great, the balance has been leaning towards Git for some time now, and I tend to use Git for all my personal projects. Many of the reasons, if not all of them, have been expressed by other people in the past. It’s a good moment to quote a very well known blog post from Theodore Tso, written in 2007 when he was still planning to migrate e2fsprogs to Git from Mercurial:

The main reason why I’ve come out in favor of git is that I see its potential as being greater than hg, and so while it definitely has some ease-of-use and documentation shortcomings, in the long run I think it has “more legs” than hg, […]

I think that paragraph describes with great accuracy what I think too. In the medium and long run, Git’s problems almost vanish. Its documentation was a bit poor back then, but people have been writing more and more about Git and there are a few very good resources to learn its internals and basic features. Furthermore, once you have a simple idea about its internals and use it daily, you no longer need that much documentation. If you’re not sure how to do something, chances are a simple web search will tell you how to do what you wanted to achieve.

Also, as many people know, Mercurial was and is mostly about not modifying the project’s history, while Git has a lot of commands that directly modify the project’s history. With time, I’ve come to realize that modifying the project’s history is simply more practical in many cases and in a range of situations it leads to less confusion. In my day job, we are slowly moving from CVS to Subversion to manage the sources of a very old and important project, which exists since about 1984. At the same time, we are modifying our work flow here and there to take advantage of Subversion, and we’re heavily using branching and merging despite the fact that’s not one of Subversion’s greatest strengths, as you may know. That’s giving us some problems and it’s amazing how many times I caught myself thinking “this would be much easier if we were using git, because we would simply do this and that and job done”. Many of those actions would modify the project’s history and clean it up. I repeat, in real situations with a lot of people working on something and not doing everything exactly as it should be done, it’s only a matter of time that you miss a Git feature.

The only thing I don’t like about Git is its staging area. From a technical perspective, the staging area makes a lot of sense, and you can build many neat features based on it. However, one thing is having a staging area and a second thing is exposing it to end users. I think you can have a staging area and all the features it provides while hiding it from users in their most common work flows. Still, it’s something you get used to and everybody knows that, when your project is a bit mature, you spend way more time browsing the source code, debugging, running it and testing it than actually committing changes to the source tree. The staging area is not a big issue and “git commit -a” covers the most common cases.

Apart from Git itself, the move was partly motivated by the site, github.com. When I started using bitbucket.org I liked it a bit more than github.com, but things have changed slowly. github.com fixed a rendering bug that hid part of project top bar, got rid of its Flash-based file uploader and got an internal issue tracker with a web interface that works really really well. The site is very nice and the “pages” feature, that allows you to set up a simple web page for the project, is still not provided by bitbucket.org as far as I know. In addition, with the arrival of Firesheep, it quickly moved to using SSL for everything. It’s fantastic.

bitbucket.org was recently bought by Atlassian and their plans are indeed better. For me, however, the number of private repositories and private collaborators is not an issue, because all the projects I host on github.com are public. Still, it’s fair to mention their plans because it could be a deciding factor for some people.

I wouldn’t like to close this article without mentioning the big improvement that both sites bring to the typical free and open source software developer. I still host a few projects on sourceforge.net, and I can tell you I’m not going back to it despite the great service they have provided for years for which I thank them sincerely.

It’s been months since I last used it so I apologize if things have changed without me noticing, but back then it was very hard to get your code on sourceforge.net. You didn’t perceive it was hard because there was no github.com. Once you try github.com or bitbucket.org, you realize how much the process can be simplified. Two key aspects to note. First, the project name doesn’t have to be unique. It only needs to have a unique name among your own projects, which is much easier and simplifies choosing the project name a lot. Second, once the project is created and has a basic description, without filling any form and without having to wait for anything, you are only a few commands away from uploading your code to the Internet. It can literally take less than 5 minutes to create a project and have your code publicly available, and that’s fantastic and motivating. You don’t need to find time to upload your code or thinking if the process is worth it for the size of the project. You simply do it. That’s good news for everyone.

Let me finish by apologizing again to anyone for the inconveniences created by the move. I sincerely hope this will remain the project location for many years to come.