Part 1: analysis of CGit and JGit

Within this part, we will describe the existing state of CGit and JGit concerning command line merging and their customization possibilities, then try and outline evolution axes to provide logical model support for these merge operations.

This part focuses on two different command line git providers: CGit, the most widely used front end to git, written in C, and JGit, the java alternative developed within the Eclipse foundation. We will not consider other potential command line providers. This document will call logical model a set of inter-related physical files that should always be considered together to read a unit of information.

Problem statement

We need to be able to launch merge operations from the command line while having support for logical model merging. When launching operations with inner merges from either of the two front ends, the standard behavior will be to rely on textual, file-by-file merging. This may cause issues with logical models where text merges might fail with conflicts while there are no logical conflict, or where text merges might end successfully even though there were logical conflicts (in which case the end model will end up corrupted and unloadable).

Git operations that will involve file merging are the following:

merge

cherry-pick

pull

rebase

revert

stash apply

submodule update

The merge operation is in charge of modifying the files and adding the necessary information in the index when conflicts are detected so that the merge tool can then be called on these files to solve said conflicts. Currently, git merge operations allow for the customization of the merge drivers.

A merge driver's responsibility is to handle the merge of a single file when it is not considered to be a trivial operation by the merge strategy. As such, drivers will be called on a file basis and only during non-trivial merges. With regard to logical model merging, this is far from sufficient since:

it works on a file-by-file basis (a merge driver can only modify the single file it has been called on, and thus cannot account for larger logical models), and

this mechanism is only called for non-trivial merges... in the textual sense. For example, deleting a file is considered a trivial merge by the default merging strategies available from git, whereas deleting part of a logical model mandates changes in other parts of said model, not to mention a lot of potential logical conflicts that are not reflected as textual conflicts.

The merge tool will be also be called file-by-file, and only on the files that have been marked as conflicting by the merge operations. It is thus too late to handle the logical models as a whole since trivial merges may already have corrupted the logical models, and the conflicts detected where... textual conflicts that did not take models into account at all.

However, even if we were to be able to merge multiple conflicting files at once through an individual merge tool, the “git mergetool” command would still fail since it could then iterate on files that have already been merged through the logical model of another. We'll come back on this issue in the specification section.
We thus need to plug ourselves before the merge tool can kick in, and at a higher level than what the merge drivers allow. The only potential candidates for such a pluggable behavior are the merge strategies themselves, since they are what's in charge of deciding what a trivial merge is, and what is not.

Customizing merge strategies

Current Possibilities

CGit

CGit offers five distinct merge strategies available by default:

recursive

octopus

ours

resolve

subtree

Though all five strategies have their own specific uses, all are textual and neither handles logical models. The recursive strategy is the default when merging two distinct commits, octopus being the default when merging more than two. The other three are only available, by name, through specific options of the commands. For example:

git merge -s ours <commit1> <commit2>

or

git cherry-pick --strategy ours <commit>

Trying to use any other strategy than the default five will end in a failure from the command line:

However, even though that is undocumented, CGit allows users to add new, customized merge strategies to the list of available ones under the following conditions:

the merge strategy is implemented in shell,

the shell script implementing the strategy is named with the convention of using the git-merge- prefix followed by the name to use for this strategy, and

that shell script is available either in the same folder as the git command itself, or within the current user's PATH.

JGit

JGit provides four merge strategies available by default:

recursive

resolve

ours

theirs

Once again, none of these four strategies can handle logical models since they all operate on a textual level. recursive is always the default regardless of the merge operation that is to be performed. The other three are only available through specific options of the merge-involving commands. For example:

jgit.sh merge -s ours <commit1> <commit2>

JGit does not allow for customized merge strategies to be used by the commands. Furthermore, it does not provide most of the commands involving merge operations. Of the seven such operations we previously listed, only “merge” is provided by JGit.

Specifications

First and foremost, take note that none of this can be contributed back to either cgit or JGit since it involves too deeply-rooted and eclipse-specific changes.

CGit

Implement a custom merge strategy

Since providing customized merge strategies is already possible, what we need to do here boils down to implementing a custom merge strategy in Shell.

In order to determine whether files are part of a larger logical model while remaining compatible with previous developments and existing tools, we need these files to be part of an Eclipse workspace. As such, this strategy must be able to check whether there are eclipse projects in the repository on which it has been called, then launch a headless Eclipse with a temporary workspace containing these projects.

Any file for which there is an existing logical merger will then be handled from within this Eclipse container, while the merging strategy should be able to fall back to the default (recursive) strategy for all other files.

This cannot and will not handle octopus merges.

Implement a custom merge tool

The merge tool will be called on each file which merge failed with conflicts. Since we've used our own custom merge strategy, these conflicts will have been properly detected as either logical conflicts (merge handled by the logical model merger) or textual conflicts (for files which didn't have a model merger).
The standard git mergetool command will execute the individual merge tools (specified through git attributes) sequentially on each file in conflict on the repository. When we detected conflicts on logical model, we have set all files constituting a single logical model has being in conflict; which would mean that the merge tool would be launched three times in a row on that given logical model.

We thus cannot rely on the standard git mergetool command. What we propose here is to implement a new git command that will provide the same functionality as the standard merge tool while being capable of handling logical models. This new command could be called logicalmergetool and it would thus be callable through the command git logicalmergetool. This must be implemented in Shell.

Once again, the custom merge tool will need to check if the underlying repository contains Eclipse projects, but this time it will need to launch a full-fledged Eclipse (not a simple headless application) with a temporary workspace containing these files. The user will have to manually launch EGit's merge tool (Right-click > Team > Merge Tool) on the files in conflict to solve the issues and Add (Right-click > Team > Add) the resolved file to the index from there.

Any file that is in conflict but that is not contained in an Eclipse project will be handled by the custom merge tool command through a fall back to the standard individual merge tool defined for this file by the gitattributes.

JGit

Implement the missing merge-involving commands

There are already implementations of most of these commands in JGit, though the wrappers that allow these commands to be called from the command line JGit front end are missing. We need to implement wrappers for:

cherry-pick

pull

rebase

revert

stash apply

submodule update

Implement a mean for command line users to register custom merge strategies

JGit only looks up its own registry for the merge strategies, without allowing a user to register new ones in there.

We need to either implement a new look up for JGit to search for strategies within the user's PATH or for the user to register new strategies, either through the repository's configuration or from the command line itself.

Implement a custom merge strategy

Once we have a mean to register it, we'll need to implement a new custom strategy for JGit. The requirements for this will be very similar to what we previously outlined for the CGit variant.

This strategy needs to check whether the target repository contains Eclipse projects, then launch an headless Eclipse with a temporary workspace containing these projects. From there, it will look up for the files' specific model merger and use it, or fall back to standard git merging for any file which are not part of a logical model, or which model do not provide a custom logical merger.

This cannot and will not handle octopus merge, especially so since JGit does not support them natively.

Implement a custom merge tool

JGit does not provide a mergetool command yet, though we still propose to use a distinct name than mergetool since this will not be contributed back to the project and they will most likely implement one in the future to reflect what exists in other git front ends.

This task will be very similar to the same one that could be undertaken with CGit, with the same constraints to uphold apart from the coding language, since this one can be implemented directly in Java.

Part 2: general workflow and initial prototype

When a user wants to compare or merge EMF models from a command line, he needs to do that in an Eclipse environment similar to the one he used to create these models. As such, the environment requires some plugins to be installed but it may also requires some preferences to be set, some perspective to be activated etc.. Among these plugins, there are the mandatory ones that will be use to do the compare/merge operation: EMF Compare and EGit. Several options are possible to provision such an environment.

The first one is the manual way. It is necessary to download the Eclipse environment and install all the required plugins. Then, the git repository(-ies) that contains the models have to be cloned and binded to the Eclipse environment.. All these tasks have to be done manually, on each computer that wants to execute a comparison or a merge. Finally, it is necessary to write a program that allows to launch and manage the comparison/merge from the command line interface.

The second one is the programmatic way. All the tasks done manually in the first method have to be done programmatically on this one. That means we need to find a way to allow to the user to specify what he wants to provision in an Eclipse environment. It can be a very long and fastidious development that involves a lot of various APIs. The advantage of this method is there just to execute the final program on each computer that wants to execute a comparison or a merge, there is no further manual tasks.

Eclipse Oomph is a technology that allows to provision a set of plugins in an Eclipse IDE, clone Git repositories, bind Git repositories to this IDE, checking out projects, setting workspace preferences... The configuration is model driven, with files called Oomph setup model files. As such, Oomph seems to be a good framework on top of which we could implement the compare and merge command line. We only have to call the Oomph APIs instead of call a lot and various APIs from a lot of technologies.
We think the Eclipse Oomph technology is the most appropriate for this need in terms of costs, time, maintainability, reliability and performances.

New shell commands

We will initially develop new shell scripts that will add new commands to git:

git logicalmerge

git logicaldiff

git logicalmergetool

These scripts must be added on each computer that need to do logical git operations from command line interface, to enable them.

On linux systems, to create a new git command named logicalmerge, the script must be named git-logicalmerge.sh. Then, the scripts have to be reachable from your PATH and have execution permissions.

Basically, each command will mimic its non logical counterpart. They will take a additional mandatory parameter: an Eclipse Oomph setup model file describing the environment into which the compare/merge operation should be handled. In a first time, we will handle only a subset of standard parameters of counterpart commands.

git logicalmerge

The logicalmerge command is the logical version of the git merge command. To see a full description of the git merge command, please visit http://git-scm.com/docs/git-merge.

The command is specified as below:

git logicalmerge <setup> <commit>

Assume the following history exists and the current branch is master:

A---B---C topic
/

D---E---F---G master

Then git logicalmerge mySetupModel.setup topic will replay the changes made on the topic branch since it diverged from master (i.e., E) until its current commit (C) on top of master, and record the result in a new commit along with the names of the two parent commits and a log message from the user describing the changes.

A---B---C topic
/ \

D---E---F---G---H master

You can also replace the topic branch name by his commit id: git logicalmerge mySetupModel.setup 87ad5ff

git logicaldiff

The logicaldiff command is the logical version of the git diff command. To see a full description of the git diff command, please visit http://git-scm.com/docs/git-diff.

The command is specified as below:

git logicaldiff <setup> <commit> [<commit>] [–- <path>]

To see the changes between a revision and the HEAD revision, you should omit the second commit.

git logicaldiff <setup> <commit> [--] [<path>...]

In all cases, [– <path>] option allows to filter the diff command only on files that match the <path>.

In all cases, <commit> can refers to a branch name or a commit id.

git logicalmergetool

The logicalmergetool command is the logical version of the git mergetool command. To see a full description of the git mergetool command, please visit http://git-scm.com/docs/git-mergetool. Here is the constructions allowing for the git logicalmergetool:

git logicalmergetool <setup>

Run logical merge conflict resolution tools to resolve logical merge conflicts. In our case, it means run Eclipse and call the EGit merge tool on file(s) in conflict(s).

Workflow

Each shell script will wrapper of an Eclipse standalone application (provided by the EMF Compare project). This standalone application will itself call some Oomph API.

First, Oomph will provision an Eclipse with all appropriate plugins to launch the logical git operation. These plugins are EGit, EMF Compare and their dependencies. If the Oomph setup model provided as parameter contains other plugins (represented by the name of the repository and the name of the plugin/feature), they will be provisioned too.

For a given Oomph setup model file, the provisioning operation is executed only once. Indeed, if you launch again a git logical operation with the same Oomph setup model file, then the already provisioned Eclipse IDE corresponding to the setup model will be reused. It avoids to execute this potentially costly task each time.

In order to retrieve the Eclipse associated to a given Oomph setup model file, we will store all provisioned Eclipses in the temporary folder of the system. We will use a hash function on the Oomph setup model file to generate/retrieve a unique id. This unique id will be the name of the folder containing the provisioned Eclipse.

Then, in this provisioned Eclipse, the list of tasks contained in the Oomph setup model will be executed.

This Oomph setup model will contain, at least:

The path where the workspace will be created.

The git repository(-ies) to clone/bind with the Eclipse IDE.

The project(s) (represented by his path on the computer) to import in the workspace associated with the Eclipse IDE.

Once all Oomph tasks executed, EMF Compare will call the logical git operation with the others parameters provided in the command line interface.
Once the git logical operation has been executed, the user can see the results in his command line tool.

If the result shows conflict(s) on involved model(s), the user will call the git logicalmergetool command. This command will launch a full-fledged Eclipse IDE (not a simple headless application) with a workspace containing these files. This full-fledged Eclipse IDE is the same as the one provisioned previously by Oomph. The user will have to manually launch EGit's merge tool on the files in conflict to solve the issues, and then manually close the Eclipse to properly finish the process.

As an axis of evolution, in case of conflict(s), when the full-fledged Eclipse IDE has been launched, the EGit's merge tool could be automatically launched on file(s) in conflict(s).

Here is a schema representing the workflow of the process for the logical merge command (the workflow is nearly the same for the logical diff):