Forking responsibly

In a previous post, the topic of surviving legacy code was discussed. Browsers (or rendering engines within browsers) represent an interesting case of mission critical code as described in the post. A few folks noticed yesterday that Google has started a new rendering engine based on the WebKit project (“This was not an easy decision.” according to the post)

Relative to moving legacy code forward this raises some interesting product development challenges. This blog focuses on product development and the tradeoffs that invariably arise, and definitely not about being critical or analyzing choices made by others, as there are many other places to gain those perspectives. It is worth looking at actions through the lens of the product development discipline.

In this specific case there is an existing code base, legacy code, and a desire to move the code base forward. Expressed in the announcement, however briefly, is the architectural challenge faced by maintaining the multi-process architecture. Relative to the taxonomy from the previous post, this is a clear case of the challenges of moving an architecture forward. The challenge is pretty cut and dry.

The approach taken is one that looks very much a break in the evolution of the code base, a “fork” as described some. Also at work are efforts after forking to delete unused code, which is another technique for managing legacy code described previously. These are perfectly reasonable ways to move a code base forward, but also come with some challenges worth discussing.

What the fork?

(OK, I couldn’t resist that, or the title of this post).

Forking a code base is not just something one can do in the open source world, though there is somewhat of a special meaning there. It is a general practice applicable to any code base. In fact, robust source code control systems are deliberate in supporting forks because that is how one experiments on a code base, evolves it asynchronously, or just maintains distinct versions of the code.

A fork can be a temporary state, or sometimes called a branch when there are several and the intent to be temporary is clear. This is what one does to experiment on an alternate implementation or experiment on a new feature. After the experiment the changes are merged back in (or not) and the branch is closed off. Evolution of the code base moves forward as a singular effort.

A fork can also be permanent. This is where one can either reap significant benefits or introduce significant challenges, or both, in evolving the code. One can imagine forks that look like one of these two:

In the first case, the two paths stay in parallel. That’s an interesting approach. It is essentially saying that the code will do the same thing, but differently. In code one would use this approach if you wanted to maintain two variations of the same product but have different teams working on them. The differences between the two forks are known and planned. There’s a routine process for sharing changes as each of the branches evolve. In many ways, one could view the current state of webkit as this state since at no point is there a definitive version in use by every party. You might just call this type of fork a parallel evolution.

In the second case, the two paths diverge and diverge more over time. This too is an interesting approach. This type of fork is a one-time operation and then the evolution of each of the branches proceeds at the discretion of each development team. This approach says that the goals are no longer aligned and different paths need to be followed. There’s no limitation to sharing or merging changes, but this would happen opportunistically, not systematically. Comments from both resulting efforts of the WebKit fork reinforce the loosely coupled nature of the fork, including deleting the code unused by the respective forks along with a commitment to stay in communication.

For any given project, both of these could be appropriate. In terms of managing legacy code, both are making the statement that the existing code is no longer on the right evolutionary path—whether this is a technical, business, or engineering challenge.

Forking is a revolutionary change to a code base. It is sort of the punctuation in a punctuated equilibrium. It is an admission that the path the code and team were on is no longer working.

Maintaining functionality

The most critical choice to make when forking code is to have an understanding of where the functionality goes. In the taxonomy of managing legacy code, a fork is a reboot, not a recast.

From a legacy code perspective, the choice to fork is the same as a choice to rewrite. Forking is just an expedient way to get started. Rather than start from an empty source tree, one can visualize the fork as a tree copy of all the existing code to a new project and a fast start. This isn’t cheating. It can be a big asset or a big liability.

As an asset, if you start from all the same existing code then the chances of being compatible in terms of features, performance, and quality are pretty high. Early in the project your code base looks a lot like the one you started from. The differences are the ones you immediately introduce—deleting code you don’t think you need, rewriting some parts critical to you, refactoring/restructuring for better engineering. All of these are software changes and that means, definitionally, there will be regressions relative to the starting point in the neighborhood of 10%.

On the other hand, a fork done this way can also introduce a liability. If you start from the same code you were just using, then you bring with it all the architecture and features that you had before of course. The question becomes what were you going away from? What was it that could not be worked into the code base the way it stood? The answers to these questions can provide insights into the balance between maintaining exact functionality out of the gate and how fast and well you can evolve towards your new goals down the road.

In both cases, the functionality of the other fork is not standing still (though on a project where your team controls both forks, you can decide resource levels or amount of change tolerated in one or the other fork). The functionality of the two code bases will necessarily diverge just because everything would need to be done twice and the same way, which will prove to be impossible. In the case of WebKit it is worth noting that it was derived from a fork of KHTML, which has since had a challenging path (see http://en.wikipedia.org/wiki/WebKit).

Point of view required

As said, the process of rebooting via any means is a perfectly viable way to move forward in the face of legacy code challenges. What makes it possible to understand a decision to fork is having (or communicating) a point of view as to why a fork (a reboot, rewrite) is the right approach. A point of view simply says what problem is being solved and why the approach solves the problem in a robust manner.

To arrive at such a conclusion, the team needs to have an open and honest dialog about the direction things need to go and the capabilities of the team and existing code to move forward. Not everyone will ever agree—engineers are notoriously polarizing, or some might say “religious”, at moments like this. Those that wrote the code are certain they know how to move it forward. Those that did not write the code cannot imagine how it could possibly move forward. All want ways to code with minimal distraction from their highest priorities. Open minds, experimentation, and sharing of data are the tools for the team to use to work (and work it is) to a shared approach for the fork to work.

If the team chooses a reboot the critical information to articulate is the point of view of “why”. In other words, what are assumptions about the existing code are no longer valid in some new direction or strategy. Just as critically are the new bets or new assumptions that will drive decision making.

This is not a story for the outside world, but is critical to the successful engineering of the code. You really need to know what is different—and that needs to map to very clear choices where one set of assumptions leads to one implementation and another set of assumptions leads to very different choices. Open source turns this engineering dialog into an externally visible dialog between engineers.

Every successful fork is one that has a very clear set of assumptions that are different from the original code base.

If you don’t have a different set of assumptions that are so clearly different to the developers doing the work, then the chances are you will just be forked and not really drive a distinct evolutionary path in terms of innovation.

Knowing this point of view – what are the pillars driving a change in code evolution – turns into the story that will get told when the next product releases. This story will not only need to explain what is new, but ultimately as a matter of engineering, will need to explain to all parties why some things don’t quite work the way they do with the other fork, past or present at time of launch.

If you don’t have this point of view when you start the project, you’re not going to be able to create one later in the project. The “narrative” of a project gets created at the start. Only marketing and spin can create a story different than the one that really took place.

2 Responses

As a recent participant in an Apache project, it was startling, and then satisfying, to me that in the DNA of the Apache Software Foundation forking is a feature. That does not mean folks don’t get their noses out of joint over forks. Nevertheless, efforts to block forks are counter-acted.

That there be responsible forking is certainly important for anyone who is doing so for anything beyond private entertainment.