See the
analysis
done by the Solaris team for details as to why Mercurial.
From what I have read and seen, the two choices for a fast good DSCM (Distributed SCM)
was Mercurial or Git (both seem like excellent systems).
Using the same SCM as Solaris made a great deal of sense.
So far, the remote people in places (where the internet connections are slow) love Mercurial.
After an initial clone is created, the push/pull actions are very fast, since only the actual
changes or diffs are effectively going down the wire (so to speak).
Clones onto local disk and multiple clones on the same disk are very very fast and efficient
on Solaris and Linux, and Mercurial works on Windows too.
The ability to pull changes via http and push changes via ssh is a major advantage too.
And of course, it's "open source". ;\^)

Why no imported history from TeamWare or the posted OpenJDK SubVersion repositories?

Legally we had constraints on what history we could even try to provide.
Technically, importing history from one SCM to another is not as easy as it sounds.
The ideal history is the history of the changesets actually created by the developers, and
creating any other kind of history (per snapshot or per promotion) seemed problematic to me.
If the end result was that the old history details were left in the old SCM data, and that
was guaranteed to be accurate history data, then it seemed right to leave it there and not
confuse the issue. I'm sure there are people that would disagree with me on this.
TeamWare doesn't have a changeset model, but a loose group of files putback model that is not
very complete to create changesets like Mercurial wants.
History conversion from something like CVS, SubVersion, or a more changeset-like SCM would be
another story.

Why a forest and not one big repository?

Unlike Solaris, we already had broken up the JDK sources into separate workspaces e.g. hotspot
and j2se originally, later we split j2se into jdk, langtools, corba, jaxp, and jaxws.
Each of these separate repositories represent piles of sources managed by separate teams
or delivered
independently from the enclosing jdk product. Having them available as independently buildable
repositories was considered a major advantage, and more importantly, some of the developer
teams really wanted to keep themselves separate. It's that 'software silo' concept where
developers don't want anything changing on them except what's in their immediate area. ;\^)
From a Release Engineering point of view, or someone that has to build the whole thing, this
forest of repositories is a royal pain, a single repository would have been preferred.
But the Mercurial forest extension seemed to provide an answer, so we went with the forest.
Could the forest extension and Mercurial itself handle these nested repositories better,
I'm sure it could, and will over time.
I would not recommend creating separate repositories lightly, they need to be independently
buildable and relatively dependence free from the other repositories.

Why the dual repository setup (gates)?

When someone does a push to a Mercurial repository, and before the hooks are run that might
rollback those changesets, there is a window of time where someone pulling from that repository
could get these rolled back changesets.
In addition, we have seen situations where people have managed to 'push -f' multiple heads (forgot
to merge) to a shared repository, creating very confused co-workers and potentially a chain of
silly merge changesets.
So we only have people push to a gate (you cannot pull from a gate), where the changesets are checked
and verified, and a simple push happens to the matching pull gate.
Nobody can ever pull multiple heads, and nobody gets changesets that failed validation.
So having the dual setup will allow us to better protect the integrity of the repository.

Why so many different sets of repositories?

By having each team keep their changes isolated until they integrate into a master area,
they will be more likely to find their own mistakes and regressions before everyone else.
Testing is focused on the areas that have changed and can validate the entire team's contribution
to the product.
In addition, builds of these various team areas (if archived) can be used to isolate regressions,
quickly narrowing it down to the team, and then the changesets can be used to isolate it down
to the individual author, hopefully being able to quickly isolate and correct the regression.

Why not subsets of repositories for the teams?

Good question.
Initially we thought we could make the team areas sparse sets, but the complete jdk is a
forest or set of repositories, and a subset doesn't provide the teams with a complete set
of jdk sources.
So to make our (the initial Mercurial Transition engineers)
lives easier each team got a complete set.
Having a complete set of repositories provides the teams the possibility
of building a complete jdk with a known quantity.
This situation may change as we gain more experience with Mercurial and forests.

I'll add more to this as time goes on, assuming people find it useful.
Add your questions to the comments, I'll try and answer them.

Friday Nov 16, 2007

Just thought I would pull together some basic guidelines for anyone transitioning
from
TeamWare workspaces
to
Mercurial
repositories.

I'm assuming that multiple branches will not be maintained
in the Mercurial repositories in this information.
Mercurial will allow you to maintain multiple branches of development
in a single repository.
It's my opinion that multiple branches
of development can be
done safer and more reliably by just have a separate clone
for the separate branches.
Anyone experiencing this feature in SCCS files where separate revision
trees can be maintained on a file will probably agree.

TeamWare Basics

Skip this section if you are a regular TeamWare user.

A TeamWare workspace consists of a directory of files under SCCS control,
each file is managed individually.
Throughout the TeamWare workspace are directories called
SCCS contain s.filename files
which contain the original file as it was first entered into
the SCCS directory,
plus deltas to convert that original file to the various
increasing numeric revisions of the file.
A read-only file is kept in the parent directory of the SCCS directory,
edits to the file requires you to use the 'sccs edit filename' command.
A 'sccs delget filename' command is used to
define a new revision of the file.
Each revision of a file can contain a comment.
SCCS manages source files and revisions to source files.
TeamWare manages batches of SCCS files.
TeamWare features and SCCS features get blurred sometimes,
but SCCS can and is often used independent from TeamWare.

TeamWare allows for any number of workspaces with a child/parent
relationship
and also has a very nice code merging tool called filemerge.
TeamWare allowed you to have partial workspaces.
The top of the workspace contains a Codemgr_wsdata directory that holds
various TeamWare book-keeping files.
It's drawbacks besides not being open source are around performance
and the lack of features like changesets and revision markings (tags).
Over the years the short-comings have been somewhat corrected with
various scripts
and tools written by various teams.

Mercurial For TeamWare Users

Mercurial (like TeamWare) allows you to have any number of
repositories (assume repository==workspace)
and allows you to access repositories via NFS paths
or with ssh:// or even http:// paths.

In many ways, at a high functional level,
your Mercurial experience will be similar to the experiences
you have had with TeamWare,
but the details are vastly different,
especially if you have become dependent on the specific
format of a TeamWare workspace or the contents of the SCCS files.

At the very top of the Mercurial repository is a hidden directory
called .hg which holds the Mercurial book-keeping files plus a
secure set of all "commited" sources and changes to those sources.

Unlike TeamWare, where the visible source files were read-only
until you explicitly used
'sccs edit' to explicitly edit them,
the Mercurial "working set" sources are all read-write,
and you are free to edit these files at
any time.
So by default you will have a working set of read-write
sources and the more permanent
committed files that are saved in your .hg directory.

With Mercurial, all changes to a repository are done with via
"changesets", which are
originally created with an 'hg commit' somewhere along the line.
An ideal changeset would be all the file changes/renames/deletes/adds
for one particular bug,
but a changeset can be small or very large.
New files, deleted files, and renamed files must all be done
via a changeset.
You use 'hg commit' to commit file changes into a "changeset" in
your own repository, it doesn't go anywhere unless someone
pulls it from your repository, or you push the changeset somewhere.
The 'hg pull' is like the TeamWare bringover command,
and 'hg push' is like the TeamWare putback command, well sort of.
Both the 'hg push' and 'hg pull' push or pull "changesets"
or committed changes to and from the .hg
directories of a repositories.
So your working set files are NOT automatically updated
when the files in the .hg directory changes (where changesets are kept),
you must explicitly run 'hg update' to update your working set files.
And it's important to note that
with Mercurial you do not "pull or push files"
but the changesets or changes to the entire repository.
This is very different from TeamWare which manages SCCS files, where
you could bringover or putback individual files.

The changeset concept is like a repository wide SCCS revision number,
one changeset id defines the state of the entire repository.
A changeset that has no children changesets is called a "head", and there
should only be one head, which is also called the tip.
But when you do a pull, you often end up with multiple "head" changesets,
and the gola is to perform an 'hg merge' and 'hg commit' a new
"merge" changeset that will become the single "head" or "tip".
Regardless of any specific file changes that might be conflicting,
a merge changeset will always be needed to get back to one "head".

Roughly Equivalent Command Mappings

NOTE:
Optimally, the use of 'hg commit' should be done after all the
file adds, deletes, renames, and edits are done.
An ideal changeset is one that contains all the changes for
a particular feature or bug fix.

Using TeamWare and webrev to import a changeset

The tool
webrev
creates a set of web pages than can be used to browse code changes, but
more recent versions create patch files (looks like "diff -r -u")
that can be fed into gpatch (GNU patch) or a similar tool to apply the changes.

In this example /export2/build_integration/ws7/control is a path to a
TeamWare integration workspace and
/export2/build_integration/repos/control
is a path to an equivalent Mercurial repository.

Beginner Gotchas for TeamWare Users

Gotcha

Why?

Not setting up your ~/.hgrc file.

The name you define in ~/.hgrc with "[ui]" and "username="
is the name that will be permanently recorded in the
changesets you create with 'hg commit'.
I don't recommend adding your email address in username,
but that's up to you, just keep in mind it will be public
information when your changesets reach a public repository.
TeamWare/SCCS used your system username, but very few
TeamWare workspaces were ever made public.

Forgot the 'hg update'

After an 'hg pull' (aka bringover),
don't forget the 'hg update', or use 'hg pull -u'.
The default pull and push just updates the changesets
and doesn't update your read-write working set of files.
You need to be careful about updating the working set
files on shared repositories they could get updated
while others are viewing them.

Forgot to merge

After an 'hg pull', you need to 'hg update', and if you
have changesets that you have not committed you will
also need to 'hg merge' and 'hg commit'.
If you forget you will end up with multiple heads
and a more difficult time merging later.

Forgot to commit after a merge (multiple heads)

After 'hg merge' you need to 'hg commit'.
The merge just prepares you for the 'hg commit' of a merge
changeset.
If you forget you will end up with multiple heads
and a more difficult time merging later.

Making accidental edits

Mercurial working set files are always read-write
and ready to edit, no 'sccs edit' action is necessary.
Use 'hg status' to monitor what files you have changed.

Using the wrong relative path

File paths supplied to 'hg' commands are relative to the
current directory, the TeamWare bringover and putback commands
want paths relative to the root of the workspace, regardless
of the current directory.

Not defining the file .hgignore for 'hg status'

The 'hg status' command tells you what outstanding changes
you have in your working set, by default it looks
in '.' or the entire directory, but if there are files
created during a build, you want 'hg status' to ignore
those files.
Make sure you define the .hgignore file so that 'hg status'
will only find files in
the directories you want managed by the repository.
TeamWare never really helped with the
problem of forgetting
to 'sccs create' your files, 'hg status' solves this
common problem.

Using NFS/UFS for team integration areas

TeamWare for the most part was designed around
sharing data via NFS or UFS file systems.
Mercurial can work the same way, but when using it
for team integration areas we recommend
the use of the ssh:// parent
path mechanisms described in the
Mercurial Book
.
Unless everyone in the team or group is in
the same Unix group, have
the same default group, and all use 'umask 2', using
NFS/UFS will be problematic.
Mercurial obeys the strict Unix rules of file
creation and permissions, and over time TeamWare has
adjusted itself (perhaps improperly) to avoid the
file permission issues you can see with Mercurial.

Too quick on the 'hg commit'

Once a changeset is created (the 'hg commit'),
and pushed, it's pretty permanent.
Make sure that before the 'hg commit'
happens that the changes
are correct, reviewed, the right ones, and complete,
otherwise you'll
be creating yet another one to correct your mistakes.

Doing a push with outstanding working set changes

The 'hg push' will not detect any outstanding changes to your
working set, it just pushes the existing changesets.
ALWAYS use 'hg status' before an 'hg push' to make
sure you have created all your changesets with 'hg commit',
unless of course you have changes you don't want to push.

Committing a sensitive file

Accidental additions of sensitive source files can be a
big problem.
Completely removing a sensitive file that has been
accidently added to a repository can be a real problem.
be very careful what files you add to a repository!
Adding non-open source files to an 'open source'
repository will inflict major pain on many people.

Doing anything to the .hg files

Don't mess with the .hg data files, if you do you
are INSANE, leave that to the Mercurial professionals.
If you suspect they have been corrupted, use
'hg verify' to check.
Backups are always important, so make sure you keep
a relatively recent backup repository.
If you can't 'hg rollback', save the repository somewhere,
clone a fresh copy from your parent, remove the
working set files completely from the clone, and
copy in the working set from your corrupted repository
(but not the .hg files).
Now you can use the standard 'hg status' and 'hg diff'
to see what file changes you may have lost and adjust.

Using SCCS keywords

Mercurial by default does not support anything like
SCCS keywords in files.
You should remove these or find another solution.

Looking for putback comments or history files

Changeset comments represent BOTH the SCCS comments
and the effective TeamWare putback comment.

Using problematic filenames

Watch out for directory and filenames that only differ
in case (e.g. test and Test), at least on the Mac and
Windows these can be troublesome.
Long pathnames (>255 characters) can also be a problem.

Converting a TeamWare Workspace to a Mercurial Repository

Converting a TeamWare workspace to a Mercurial repository (without history) is pretty trivial:

A simple source tree can be turned into a Mercurial repository with just hg init; hg add.
Turning a TeamWare workspace into a plain source tree is relatively simple too,
I just create a separate workspace, purge a few files, make sure all the sources are in
'edit' mode, and remove the SCCS directories.

Performance Comparisons and Data

Nothing but good news in this area, for both time and space.

Many of the past tricks used to speed up TeamWare bringovers and putbacks,
especially over slow connections should not be necessary with Mercurial,
it is very fast.
The initial 'hg clone' of a repository should be considerably faster,
but the most important
actions of 'hg pull' or 'hg push' will be so much faster
you may question if the
action actually happened.
Unlike TeamWare, only the changesets are transported,
and many fewer files are accessed
and in a more efficient manner.

The size of the repositories should also be smaller (at least 50% smaller)
than the equivalent TeamWare workspace,
this isn't surprising due to the lack of compression and age of SCCS file.

Wednesday Oct 31, 2007

So how do you work with a Distributed SCM? There are many answers, the easy answer is that
you clone the forest, make the change in your local forest, create the changeset and push
the changeset. Well, that works. But maybe you are working on multiple fixes, and you don't
want to repeatedly clone over the network (even if it is fast), so here is another model
similar to the way many of the Sun developers worked with TeamWare:

The "incoming" forest is effectively just a local clone of the team forest for TL (Tools & Libraries),
or whatever team forest you decide your change belongs in.
Note that this TL forest may be sparse, it depends on the team as to what portion of the MASTER
forest the team forest will have.
The fix1-fix3 are also local forest clones where you would might be working on specific fixes or
features. Once a fix was finalized, reviewed, tested, and ready to go, you would create the changeset
(or changesets)
with an 'hg commit' and push the changeset to the outgoing forest.
Depending on how long it takes for each fix will determine how often you may need to sync with the
TL area via the incoming clone. You can push your outgoing changes in batches or as frequently
as you'd want.
Before pushing anything to a repository would require a sync with the parent forest of course.

Some people like to sync often, others wait until just before doing the push.
One concern with Mercurial is that each sync may create a merge changeset, depending on
whether anything is pulled over.
So frequent sync's could create many unnecessary merge changesets.

My tendency is to investigate the Mercurial "mq" extension and see if the fix1-fix3 forests
could just be one forest using the "mq" extension.
See chapter 12 in the
Mercurial Book.

Monday Oct 29, 2007

Sorry Dorothy, we aren't in Kansas anymore, and there isn't just one repository anymore. ;\^)

The JDK team has been using TeamWare (also a Distributed SCM like Mercurial) for
a very long time, and the strategy adopted involves having different teams
(usually based on functionality) push changes through specific team areas rather than
everyone integrating into one MASTER area.
Each team can focus their testing on the changes their team is making, and also
protect themselves from regressions made by other teams.
It also allows for changes to be "baked" before being pushed into the MASTER area.

There is some overhead here, in that an assigned integrator for each team will need to
periodically sync up or merge with the MASTER area, test the merge, and push the resulting
merge up to the MASTER area. Sometimes this happens every few days, sometimes every week,
and sometimes every two weeks. It depends on many factors.
And some of these areas may not push directly to the MASTER area, it's up to the integrator
and the team to decide if they want another ply on the wheel (so to speak).
For example, the hotspot team has GC, Runtime, Compilers, and Serviceability
areas (sometimes called baselines)
that those hotspot teams push changes to, and those changes then get pushed to
the hotspot area (sometime called "main" or "main/baseline").

And of course, all integrations to the MASTER area are done using a basic reservation model
so that the merge and integration is not interrupted or complicated with someone else
pushing changes to the MASTER area.

Hopefully this illustration will help.

Given any point in time, every one of these areas could be different in
different ways, depends on how often the integrators sync up with the MASTER area.
For the most part (with some exceptions) there is little overlap in the actual
files changed in these areas, so often the merges are fairly simple, but they
can get nasty. So if you need to talk to an integrator, remember, they don't
get paid extra for being an integrator, so be nice. ;\^)

For anyone considering a change to the OpenJDK, I recommend they go to the
OpenJDK email aliases
and connect with the appropriate team for the change you are making.

Sunday Oct 21, 2007

We are getting pretty close now to getting the OpenJDK Mercurial forest content.
A forest is just a directory tree or set of directories that can contain multiple
repositories.
Each of the repositories are independent and are grouped only due to the
location of it's directory.
Here is an illustration that may help understand the layout and content of a full OpenJDK
forest:

In many cases developers may only need to deal with one or two repositories so their own
local forests may be pretty sparse, they may not necessarily need all the OpenJDK forest,
however, verifying a change doesn't impact the build of the entire forest may require a
developer to have a full forest.

A distributed SCM like Mercurial (or TeamWare)
allows for distributed development but also isolated development between teams.
Each team has an integration area where team members push their changes to, and
one team member is assigned the task of integrating those changes into the MASTER
forest which is used to create the promoted builds and the final product.
A change will trickle from a individual's forest to the integration forest used and
finally to the MASTER forest.
This is essentially the historic model that we have used with TeamWare workspaces.
Changes to integration areas are fresher for that team, but unless it's been sync'd
with the MASTER may contain stale changes in other areas.

So there are decisions about what part of the forest you are interested in, and
what integration area you might want to see changes from.

Tuesday May 08, 2007

NetBeans 6 and Mercurial

I just happened to have a picture of the Village where NetBeans 6 and it's new
Mercurial plugin were created.

Ok, not really, I'm just kidding, this is actually
Plimoth Plantation
a re-creation of the original 1627 settlement, in Plymouth, Mass.
But both are pretty cool, if you are ever in Massachusetts, make sure
you put Plymouth on the list. It's quite educational.

As for NetBeans 6, it's much easier to access than Plimoth Village :\^)
Just go to
www.netbeans.org,
download and install NetBeans 6 Preview (Milestone 9).
Once you get it started, go to Tools->Plugins, find the Mercurial plugin
and install it.
The editor has changed quite a bit, but seems pretty solid so far.
I have a small Java project called
JPRT
(maybe 150 source files, maybe 35,000 lines of Java code), that has been in Mercurial for
quite some time now, and this NetBeans 6 and Mercurial plugin is just right for me.
Very handy. I'm in the process of converting to NetBeans 6 now, should be interesting.

The complete OpenJDK was released today at JavaOne 2007 and is
now available at openjdk.java.net.
Along with the OpenJDK sources are some NetBeans 6 projects, which are
documented at
openjdk.java.net/groups/nb-projects.
As of today, these projects haven't been tested with an OpenJDK
Mercurial repository yet, we will be doing that in the next few months.

Granted we haven't released any OpenJDK Mercurial repositories just yet,
but as soon as we catch our breath from doing the OpenJDK launch, we will
start on the Mercurial transition.

By the way, a great deal of people have been working long hours to make this
OpenJDK release at JavaOne 2007 happen. I've been involved in some of it
myself, but it took a dedicated and talented team of people to make this all happen.
It might seem like a trivial thing to open up some sources, but trust me, it wasn't trivial.
My special thanks go out to the Release Engineering Team, who spent some long
nights getting those OpenJDK source bundles ready.