Abstract

SCM systems supporting concurrent development models have
existed for a long time. Today, more than ever, concurrent
development is crucial to productivity. Any paradigm
supporting concurrent development depends upon a trustworthy
3-way merge.

Although 3-way merge tools have been available for a long
time, their trustworthiness has been insufficient for a long
time as well. This results in development environments with:
a) concurrency forbidden, b) "gotta-do-a-merge" blues, and/or
c) additional CM staff to manually review all merge results.
All of these alternatives are productivity killers and feed
the black hole where all theoretical ROI goes.

This paper describes an advanced 3-way merge solution.
First, it discusses the many problems that have plagued 3-way
merge implementations -- test case kits are provided to
detect their existence. Then, it shows how solutions for
these problems were designed and implemented in a working
product, Guiffy SureMerge.

1. Introduction

Tools which support concurrent
development are more important today than ever before. Open
Source products, Offshore/Outsource development and
maintenance, customer customization of packaged application's
procedures and/or metadata are among the most challenging
concurrent development environments. While a great deal of
discussion about collaboration, XP, and Agility seems to
assume concurrency is practical and productive, very little
discussion has occurred regarding these other concurrent
environments which are NOT collaborating. They are not
collaborating in the sense that developers making changes are
for the most part unaware of each other. And, another
significant variance is the frequency of synchronization ---
collaborating teams will tend to synchronize often while
these other more challenging types of concurrent environments
will tend to synchronize infrequently.

Any paradigm supporting concurrent development depends upon a
trustworthy 3-way merge to automate the synchronization
process. In collaborating environments, the frequency of
synchronization and team awareness tend to minimize the
concurrency problems which may occur. However, 3-way merges
are used to automate the process, regardless of the Branching
Model [6] environment.

3-way merges take the original version of a file and two
separate versions, and merge both sets of changes back into a
new file. The changes of each separate version are determined
by comparing each version with the original. The 3-way merge
will automatically apply all the changes (which are not
overlapping) from each version. Then, if any changes were
overlapping, those are brought to the attention of the merge
operator for resolution.

SCM systems such as SCCS [4] and RCS [5] (plus dozens more
based upon these tools) supporting concurrent development
models have existed for a long time. Although 3-way merge
tools have been available for a long time, their
trustworthiness has been insufficient for a long time as
well. Almost every 3-way merge tool today is based on the diff3
algorithm. The research paper, "A Formal Investigation of Diff3" [2],
published in 2007 found the diff3 algorithm to be surprisingly unpredictable.
Another research project, "Testing Eclipse Compare's Algorithm for Three-way Merging" [1],
completed in 2010 employed black box testing on the diff3
based merge tool in Eclipse and found it to be correct on
only 86% of the test cases.
These problems can cause development environments to
forbid concurrency --- which is not always possible and
counter-productive. When concurrency is supported in the
environment, synchronization can become a dreadful,
depressing, counter-productive process for everyone involved.
In some environments the synchronization process is "managed"
by the CM staff. This isolates the pain, but requires
additional CM resources.

This paper describes an advanced 3-way merge solution. First,
it discusses the many problems that have plagued 3-way merge
implementations. Then, it shows how solutions for these
problems were designed and implemented in a working product,
Guiffy SureMerge. Test case kits are provided to detect the
problems discussed and demonstrate the SureMerge solutions.
It concludes with some observations on the benefits of a
trustworthy merge tool.

2. "Conflict" Definition

3-way merges look for
"conflicts" defined as changes that overlap. However,
implementations vary on what is defined as an overlapping
change. Some implementations do NOT treat lines inserted at
the same point in both version files as a conflict. Most of
the time this may not be a problem, but you would like to
know and perhaps switch the order of inserted changes. On the
other hand, if the inserted line(s) in both versions are
defining a new indexed value (i.e., a function number,
message type, or error code, etc.), both versions will be
adding a new index definition with the same value. If these
insertions are automatically merged, the merged code will
probably build properly, but one of the new indexed items
will NOT work. Because there is some central code like this
in almost every product, it is important for any trustworthy
3-way merge to consider inserts as potential conflicts.

The most dangerous deficiency found in some 3-way merges
results in losing one of the version's inserted changes.

The {1} Test Case Kit can be
used to evaluate how a 3-way merge handles inserts at the
same location.

3. Pathological Cases

The following scenarios pose
challenging exceptions which are known to confuse some 3-way
merges. Sometimes, the 3-way merge identifies the conflict
and non-conflicts satisfactorily, but it just looks
confusing. In the worst case scenario, the 3-way merge will
become confused --- not identify a conflict, lose a change,
or auto-merge the wrong change.

3.1. BOF/EOF Stingers

For a variety of "reasons",
3-way merges can get confused by conflicting changes at the
beginning or end of the files. Some 3-way merges post-process
their 3-way minimum lines of differences with heuristics in
order to improve their appearance or avoid problems caused by
shifting unique anchors. If they reach the beginning of the
file while looking back or reach the end of file while
looking forward, they can "give up".

The {2} Test Case Kit can be
used to evaluate how a 3-way merge handles changes in both
versions at the EOF.

3.2. Identical Twins

Most 3-way merges have been
enhanced to recognize identical changes and not call them a
conflict. If the tool's merge result view is in reverse,
though, such changes are usually visible and can be
distracting. 3-way reverse compare-merge result views may
seem easier to understand in simple cases. However, these
reverse views (displaying all changes by both versions to the
original) become so confusing for complicated change patterns
that users often quit the merge tool and do the job by
hand.

Using a 3-way compare
reverse view for merging is like driving down an Exit ramp in
reverse.

The {3} Test Case Kit can be
used to evaluate how a 3-way merge handles identical changes
and how they appear.

3.3. X-Tuplets

This change pattern usually started as
identical twin changes shared in both versions. Then, some
further revisions are made in one or both versions. As a
result, the identical change block contains a (subset)
change. Furthermore, it may even appear to be two or more
subset changes within the area of the original identical
change when comparing version 1 with version 2.

The {4} Test Case Kit can be
used to evaluate how a 3-way merge handles such change
patterns.

3.4. Hungry Blobs

This change pattern includes
separate changes in each version which are close or adjacent
to each other, resulting in one combined SureMerge
"Attention". Such changes will usually not be detected as a
"conflict" in other 3-way merges and will be auto-merged.

The {5} Test Case Kit can be
used to evaluate how a 3-way merge handles such change
patterns.

4. SureMerge "Attentions" --- Beyond
"Conflicts"

Guiffy's SureMerge considers changes of any
type (NOT just lines changed). Likewise, SureMerge's
"Attention Focus" goes beyond looking for conflicts and
catches changes which touch one another (but don't overlap)
or changes which are in close proximity to one another.

4.1. Minimum Blocks of Difference

Guiffy's SureMerge
is based upon a special proprietary algorithm designed for
the problems inherent in smart merging. Other 3-way merges
just apply the published compare algorithm [3] which they use
3 ways. Applying these "unique anchor" minimum lines of
difference algorithms 3 ways is the fundamental cause of most
of the pathological problems discussed above. Indeed, the
authors of those algorithms have acknowledged that the "the
3-way merge is problematic and was not considered when
designing the compare algorithm". In other words, it will not
work sometimes, and any post-processing attempts will be
"problematic" (not work in some cases).

4.2. 3-Way Auto Focus

SureMerge's Minimum Blocks of
Difference algorithm tends to express the differences in
fewer change blocks (by treating separate changes close to
each other as one change block). This naturally results in
expanded Attention Focus areas depending upon the size of the
change. Moreover, a user control option is provided to expand
the Attention Focus (if you want to be extra sure).

This screen shows how a typical 3-way merge would auto-merge
two changes (one from each version) without seeing a
"conflict".

And, this screen shows how SureMerge's Auto Focus would
identify the changes as an "Attention". Just one click of a
button would keep the first two lines from the first version
or the last two lines from the second version.

4.3. Hybrid Algorithm

As discussed in (4.1) above the SureMerge Minimum
Blocks of Difference algorithm avoids the problems of other 3-way auto merge implementations.
But, in some worse case scenarios, the Minimum Blocks of Difference algorithm would detect
sequences it knew were confusing and avoid doing any damage by alerting the user that the
merge would need to be done manually. That's better than blindly and silently producing
a corrupt merge result (which is what can happen with other 3-way auto merge tools in these
cases). But, "good enough" is NOT the idea behind SureMerge.

In 2008 we researched and developed an improved "Hybrid algorithm" which produces exceptional
merge results in these worse case scenarios. The "Hybrid algorithm" was originally released
as part of Guiffy SureMerge 8.4 in March, 2008. The implementation was further refined as
part of Guiffy SureMerge 8.6 in March, 2009. The "Hybrid algorithm" continues to be driven
by SureMerge's original "Minimum Blocks of Difference algorithm" and continues to include
advanced self-checking techniques to detect and avoid making an auto-merge mistake (corruption).
The "Hybrid algorithm" looks for unique anchors during the 3-way auto merge synchronization. The
algorithm applies some simple rules such as unique anchors can not be part of a change.
This simple idea implemented within the Minimum Blocks of Difference algorithm is what we came
to call the "Hybrid algorithm".

Just how exceptional are SureMerge's Hybrid algorithm results?
{6} Test Case Kit is a "real world",
example of a
3-way merge involving what we call "shifted unique anchors". This "real world" example
is from one of Guiffy Software's customers. At the customer's request the company and
product names were changed to protect their privacy.
This example involves XML project files generated by VisualStudio.
The same issue could arise with any IDE or tool
that keeps its information in XML files. The same issue could arise in non-XML files, in
your code. Anytime changes are made which move unique anchors (lines unique in both files
being compared), these auto merge mistakes can become an issue.
This example should be completely auto merged (no conflicts).
You'll see SureMerge does it right while most other 3-way merge tools
will report a confusing conflict and/or make an auto merge (corruption) mistake.

4.4. "Eat a Closer" Avoidance

As discussed in (2) above, its important to treat
inserts at the same point in the parent file as a "conflict". Sometimes the inserted changes
in each version have a similar structure and the last line of each inserted change may match.

In 2010 we researched and developed an improvement for SureMerge's algorithm which detects
these matching lines at the end of Attention inserts and avoids treating them as matching.
The improved SureMerge algorithm was released as part of Guiffy SureMerge 9.4 in March, 2011.

What do these "Eat a Closer" merge cases look like?
{7} Test Case Kit is an example of a
3-way merge involving what we call "Eat a Closer". This example
was made using Guiffy's Release Notes html document. The matching closer lines contain </LI>.
The same issue could arise in any type of structured XML files. And the same issue could
arise in non-XML files, in your procedure code (with a last line "}" to close a function).
You'll see SureMerge does it right while some other 3-way merge tools
will show the conflict in a way which makes it too easy to loose one of the closer lines
in the resolved merge result. With SureMerge one click on "Keep Both" and you're done.

4.5. Promote aware

As discussed in (2) above, its important to treat
inserts at the same point in the parent file as a "conflict". And sometimes as discussed
in (3.2) above the inserted changes in each version are identical. This occurs often when
doing what is called a "promote" merge.

The following scenario describes another case which happens often in a Promote merge:
Branch A(1st file) is being "promoted" to Branch B(2nd file). Branch B is a child of
(originally based on) Branch A. Originally Branch A (and Branch B) inserted 2 lines of code
after line 26. Branch A is updated adding another line to the lines previously added after
line 26. Without awareness that the merge being performed is a "promote" such a change is
treated as a "conflict".

In 2011 we researched and developed an improvement for SureMerge's algorithm as an option
which detects these additional lines in an inserted change and automtically merges
them. The SureMerge "Promote" option was released as part of Guiffy SureMerge 9.6 in
November, 2011.

What do these "promote" merge cases look like?
{8} Test Case Kit is a "real world",
example of a 3-way merge involving what we call "promote". This "real world" example
is from one of Guiffy Software's customers. This example involves header files for a Bug
Reporting application.
You'll see with SureMerge's "promote" option the merge is completed without any
Attentions (conflicts), while 3-way merge tools without
a promote option will treat the change as a conflict and require the user to manually resolve
it.

5. Conclusion

The problems discussed in this paper
have plagued 3-way merge tools for many years resulting in
the general warning from experienced veterans to "be careful
when you do that merge". To avoid these hazards, some folks
resort to forbidding concurrent development completely or
declare temporary periods of "code freezes", which can last
for weeks or months while CM/QA/Development do the release
"rain dance". In other environments concurrency is allowed,
but everyone faces the dreaded merge process. In larger
organizations, additional CM staff are sometimes tasked with
performing and/or manually reviewing all the merges.

Guiffy SureMerge avoids all these hazards and provides a
trustworthy 3-way merge capability. It can make the
difference between forbidding or embracing concurrent
development. It can eliminate the "dreaded merge blues".
It can shorten the time required for CM to
package bug fix builds or synch up a new release.

In today's world, "merge projects" to upgrade customizations
with a new Open Source or packaged product releases can take
a team weeks to complete without a trustworthy 3-way merge.
Teams using Guiffy SureMerge have completed merge projects in
half the time (sometimes less).

6. Test Case Kits

Each test case has 3 files: the two
versions and the parent (common ancestor or original). The
files are named with extensions .1st, .2nd, and .parent. To
aquaint yourself with each test case, begin by comparing the
parent and 1st, then compare the parent and 2nd. Then,
perform the 3-way merge.

{1} Conflict Definition and Inserts: conins.zip
This kit includes two test cases for lines inserted in each
version at the the same point. The FlownetController.cpp
files are an example from the real world of two concurrent
changes adding different code at the same locations. The
ErrorMsg.java files are an example of two concurrent changes
adding error message types with the same value.

{2} BOF/EOF Stingers: eofsting.zip
This test case includes changes at the Beginning and End of
File (BOF/EOF) in each version. In both the 1st and 2nd
Misc.java.versions code is added at the beginning and end of
the file.

{3} Identical Twins: idtwins.zip
This test case includes identical changes in each version. In
both the 1st and 2nd About.java versions line 7 in the parent
was changed and a line was added after that. These two lines
are identical in 1st and 2nd. The 2nd version has an
additional change --- it //ed several lines near the end.

{4} X-Tuplets: tuplets.zip
This test case includes a similar change in each version that
was shared to begin with, then slightly modified, resulting
in 2 conflict/Attentions. In both the 1st and 2nd
getCLIArgs.java versions a change block added the code for
processing the -fb and -fe arguments. Then, in the 2nd
version, improvements are made within that code. Other
changes are made in both versions to be realistic and provide
addtional test validation points.

{5} Hungry Blobs: blobs.zip
This test case includes separate changes in each version
which are adjacent to each other, resulting in one combined
"Attention". In the RearViewMirror 1st version line 7 is
changed to add an if condition. In the RearViewMirror 2nd
version line 6 is changed to add the sane condition in an
exsiting if statement. SureMerge's Auto Focus identifies the
changes as an "Attention". Just one click of a button would
keep the 1st or 2nd version.

{6} Unique Anchor Moves: moves.zip
This "real world" test case includes changes made in the 2nd version that move
unique anchors (lines unique in all 3 versions).
Compare each version to the Parent to become familiar with the changes made by that version.
The test case should be completely auto merged (no conflicts). Save the merge
result from Guiffy SureMerge. Compare the Parent with the Saved Merge Result
to verify manually the correctness of the auto merge result.
Try this test case with other merge tools. Most other 3-way merge tools
will report a confusing conflict and/or make an auto merge (corruption) mistake.
As a final step of your test, be sure to compare the merge result from other merge tools
with the saved merge result from SureMerge to find any mistakes.

{7} Avoiding "Eat a Closer": close.zip
This test case example was made using Guiffy's Release Notes html file. Version 9_4_1 inserts
4 lines after line 48 in the parent 9_4. Version 9_4_2 inserts 2 lines at the same point in
the parent. The last line inserted by both versions (a line with </LI>) match.
The test case should cause a "conflict". The "conflict" should be shown in a way so that
the user can easily resolve the conflict, keep both inserts, with each insert including its
last line. Is there a simple, one click button or shortcut keystroke which will do that (like
SureMerge's "Keep Both")?
Is there a simple, one click button or shortcut keystroke for reversing which version is placed
first in the merge result (like SureMerge's Edit->Flip)?

{8} Promote aware: promote.zip
This "real world" promote test case includes header files for a Bug Reporting application.
Branch A(1st file) is being "promoted" to Branch B(2nd file). Branch B is a child of
(originally based on) Branch A. Originally Branch A (and Branch B) inserted 2 lines of code
after line 26. Branch A is updated adding another line to the lines previously added after
line 26. Without selecting the promote option the change is treated as an Attention (conflict).
Now, select Options -> Compare/Merge... Promote merge and rerun the merge - the merge is
automatically completed without any Attentions (conflicts).