Month: May 2005

Found out about The Python Challenge. While you don’t need to use Python to solve most of the problems, a knowledge of the language certainly helps. While the initial problems are fairly easy, some of the later ones are quite difficult, and cover many topics.

If you decide to have a go, here are a few hints that might help:

Keep a log of what you do. Solutions to may provide insight into subsequent problems.

Look at ALL the information provided to you. If the solution isn’t apparent, look for patterns in the information and extrapolate.

If you are using brute force to solve a problem, there is probably a quicker and simpler method to get the answer.

Phillip: the majority of applications have no cut and paste code in them — they rely on the cut and paste behaviour of the standard widgets. Since the standard widgets like GtkEntry in GTK 2.6 mark their selections as being savable (in fact, any code that calls gtk_clipboard_set_text() will have its selection marked as savable). Most of the remaining cases are ones where you’d want to be selective in what gets saved (e.g. selecting cell ranges in Gnumeric, or regions of images in Gimp), so need to be handled specially anyway.

So if you have a desktop running with GTK 2.6 and have a clipboard manager running, saving of clipboard contents will just work. With similar changes to Qt, Mozilla and OpenOffice you cover pretty much everything the user will come into contact with. For extra points, patch Xt and Xaw, and you’ll get most of the ancient X programs as well.

As for the use of GTK in Anders’ sample clipboard manager, I’m not sure what the problem is here — the important thing is the protocol, which is not GTK specific. I would expect that most desktop environments will provide their own clipboard manager, possibly integrated into some existing desktop component such as gnome-settings-daemon. Then again, they could just use a standalone clipboard manager like Anders’ one if they want.

Lastly, you brought up console programs again. I see this as a red herring for the following reasons:

There needs to be a single synchronisation point that states who owns the clipboard. This is to ensure that there is at most one owner of the clipboard, and allows paste requests get the right data.

If you want to interoperate with the X clipboard, you’ll need to allow X to control the clipboard ownership. So if some app is connected to your clipboard daemon, the daemon will need to assert ownership of the X clipboard on behalf of the application.

If the console app is going to have to be modified to talk to a clipboard server, what is the benefit of making the program depend on your clipboard daemon instead of bypassing it and using Xlib? Conversely, if the console app doesn’t want to talk to an X server, what makes you think it will want to talk to some other clipboard daemon?

The remainder of your points seem to either fall under the subject of standardisation of clipboard formats (not directly related to clipboard managers), or things that can be experimented with using the clipboard manager spec.

This posting follows on from my previous postings about Bazaar, but is a bit more advanced. In most cases you don’t need to worry about this, since the tools should just work. However if problems occur (or if you’re just curious about how things work), it can be useful to know a bit about what’s going on inside.

Changesets vs. Tree Snapshots

A lot of the tutorials for Arch list “changeset orientation” as one of its benefits over other systems such as Subversion, which were said to be based on “tree snapshots”. At first this puzzled me, since from my mathematical background the relationship between these two concepts seemed the same as the relationship between integrals and derivatives:

A changeset is just the difference between two tree snapshots.

The state of a tree at a particular point in just the result of taking the initial tree state (which might be an empty tree), and applying all changesets on the line of development made before that point.

The distinction isn’t clear cut in the existing tools either — Subversion uses changesets to store the data in the repository while providing a “tree snapshot” style view, and Bazaar generates tree snapshots in its revision library to increase performance of some operations.

So the distinction people talk about isn’t a simple matter of the repository storage format. Instead the difference is in the metadata stored along with the changes that describes the ancestry of the code.

Changesets and Branching

In the simple case of a single branch, you end up with a simple series of changesets. The tree for each revision is constructed by taking the last revision’s tree and applying the relevant changeset. Alternatively, you can say that the tree for patch-3 contains the changesets base-0, patch-1, patch-2 and patch-3.

Branching fits into this model pretty well. As with other systems, a particular revision can have multiple children. In the diagram below, the trees for both patch-2 from the original branch and patch-1 from the new branch “contain” base-0 and patch-1 from the original branch. Any apparent asymmetry is just in the naming and storage locations — both revisions are branches are just patches against the same parent revision.

So far, there’s no rocket science. Nothing that Subversion doesn’t represent. Pretty much every version control system under the sun tracks this kind of linear revision ancestry (as can be seen using svn log or similar). The differences really only become apparent when merges are taken into consideration.

Merges

Just as a particular revision can have multiple child revisions (a.k.a. branching), a tree can have multiple parent revisions when merges occur. When you merge two revisions, the result should contain all the changes that exist in the parent revisions.

In the above diagram, we want to merge the changes made on the second branch back into the original one. The usual way to merge changes goes something like this:

Identify the most recent common ancestor of the two revisions.

Take the difference between one of the revisions to merge and apply those changes to the other revision.

If the changes on the two branches are to different parts of the tree, this process can be done without any extra user intervention. If the two branches touch the same bits of code, the conflicts will have to be resolved manually.

It is important to pick the most recent common ancestor, otherwise the real changes in the two branches will get mixed in with changes common to the two branches, which can result in spurious merge conflicts.

In this particular case, it is obvious which common ancestor to use: patch-1 from the original branch. In Arch, the result of the merge is represented as a changeset on the original branch that contains the changes found on the second branch. In addition to the changes, it adds some metadata (known as patch logs) that records that patch-1 and patch-2 from the second branch have been merged in. This becomes important when performing future merges between the two branches.

Repeated Merges

While it was possible to pick the correct merge ancestor in the previous example using just the linear revision ancestry of the two branches, that isn’t true for subsequent merges between the two branches. Consider the following merge that results in patch-6 on the original branch:

Here the best merge ancestor to use is patch-2 on the second branch. However, without the record of the previous merge, the same ancestor as the previous merge would be chosen (which is what CVS will do by default with repeated merges).

While the above ancestor could be selected by just recording when you last merged with a particular branch, that is not sufficient when there are merges between more than two branches.

More Than Two Branches

Below is a fairly simple example involving three branches, where some changes have been merged from the third branch (yellow) into the original branch (cyan) and the second branch (magenta). Finally, there is a merge between the magenta and cyan branches.

For this last merge, there are a number of possible merge ancestors. If we ignore the yellow branch, the latest common ancestor is the initial branch point. This would result in merging the changes in patch-1, patch-2, patch-3 and patch-4 from the second branch into the patch-5 tree on the original branch. However, this is likely to result in a number of conflicts, since both branches contain changes merged from the yellow branch, which are going to overlap.

The better common ancestor ancestor to choose in this case is patch-2 on the yellow branch, which avoids the common changes.

Bazaar’s merge command will handle this kind of merge ancestry just fine (something that isn’t true for of the older tla star-merge algorithm).

Conclusion

This article doesn’t cover all aspects of branching and merging with Bazaar. One aspect I have completely ignored is the concept of “cherry picking”. This refers to applying a particular change to a tree, without the other changesets that exist on that branch. Cherry picking is mostly orthogonal to standard merging — in fact, one of the complications in merge ancestor selection is that it needs to ignore cherry picked patches.

Network effects also come into play here — if you make your code available as an Arch branch, then Bazaar is more useful to others since they can branch and merge with your archive (and the reverse holds too). The Ubuntu Arch imports certainly help here, but to get the full advantage of the advanced merge capabilities both sides need to be tracking history.

I’ve moved my diary over to blogs.gnome.org, which offers a few extra features over advogato (the main ones I’m interested in are more control over the layout, and the ability to embed images). Overall it seems pretty good, although I have a few gripes:

The login cookie gets set for the path /nb.cgi/ only, so when I go to the front page of my diary, which is not under that path due to some mod_rewrite magic, it never thinks I’m logged in.

My login cookie gets sent to all pages under /nb.cgi/, including other hosted blogs. Given that I can put arbitrary HTML in the templates for my blog, it would be possible to capture the passwords of other NewsBruiser users on the system without much trouble (it’s a good thing we all trust each other). This one is a bit difficult to fix because of the URI structure for newsbruiser pages, which look like “/nb.cgi/verb/username/...“. If they were structured with the username first, it would be trivial to set up the cookie so it only gets sent when viewing one particular blog.

I got a few responses to the comparison between CVS, Subversion and Bazaar command line interfaces I posted earlier from Elijah, Mikael and David. As I stated in that post, I was looking at areas where the three systems could be compared. Of course, most people would choose Arch because of the things it can do with it that Subversion and CVS can’t. Below I’ll discuss two of those things: disconnected development and distributed development. I’ll follow on from the examples in the previous post.

Disconnected Development

Disconnected development allows you to continue working on some code while not having access to the main repository. I hinted at how to do this in the previous post, but left out most of the details. The basic steps are:

Create an archive on your machine

Branch the module you want to work on into your local archive.

Perform your development as normal

When you connect again, switch back to the mainline, merge your local branch and commit the changes.

To create the local archive, you follow the same procedure as for creating the original archive. Something like this:

This creates an archive named joe@example.com (archive names are required to be an email address, optionally followed by some extra info) stored in the user’s home directory.

Now we can create a branch in the local archive. From a working copy of the mainline branch, run the following command:

baz branch joe@example.com/modulename--devel--0

It was necessary to specify an archive name in this call to baz branch, because the branch was being created in a different archive to the one the working copy was pointing at. This leaves the working copy pointing at the new branch, so you can start working on it immediately.

You can commit as many revisions as you want, and compare to other revisions on the branch.

When you have access to the main repository again, it is trivial to merge your changes back into the mainline:

You can then ignore the branch in the joe@example.com archive, or continue to use it. If you want to continue working on the branch in that module, it is a simple matter to merge from the arch@example.org archive first to pick up the changes made while you were disconnected.

Distributed Development

In a distributed development environment, there is no main branch. Instead, each developer maintains their own branch, and pulls changes from other developers’ archives. A few things fall out from this model:

To start working on a distributed project, you need to branch off from another developer’s archive. This can be achieved using the same instructions as found in the “disconnected development” section above.

In order for other developers to pull changes from your archive, they will need to be able to access it. This isn’t possible if it only exists in your home directory.

If you never merge from a particular developer, you don’t even need to know they exist.

Conversely, you don’t need to ask for permission to work on a module (however, if you want your changes to appear in the other developers’ archives, you’ll need to ask them to merge from you).

So assuming you’ve branched off an existing developer’s branch of a module, and want other developers to merge your changes. Assuming they can’t access your local computer, it will be necessary to create a mirror of the archive. To make the archive most widely available, you should mirror it to a directory that is published by a web server. The following command will create a mirror of the local archive:

Once the archive is created, you can mirror all the changes in the local archive to the remote one using the following command:

baz archive-mirror joe@example.com

If you always have access to the mirror host, it is possible to set up a hook script that mirrors after every commit. However, if you often make changes while offline you might decide to mirror manually.

Now that the archive has been mirrored, other developers can merge your changes into their working copy using the following command:

baz merge http://hostname/~joe/joe@example.com/modulename--devel--0

(after they’ve used your archive once, they can use the short name for the archive, and it will use the same location as last time).

Conclusion

While Arch allows full distributed development, most projects don’t use it in a fully distributed manner. Often there will be a central archive that is the “official” one, which tarball releases are made from. The exact policies can differ from project to project. Some possible policies are:

A core of developers have commit access to an “official” archive, which releases are made from. Developers generally commit directly to this archive (this is the default CVS/Subversion model). External developers follow the distributed development model, and get core developers to merge their changes.

As above, but the core developers usually develop their changes on separate branches (usually in their own archives), and only merge them when ready. This is how some projects currently use CVS, but has the benefit of allowing disconnected development.

Control of the official archive is managed by arch-pqm. Authorized developers can send merge requests to PQM (using PGP for authentication). When a merge request is received, the branch is merged into the mainline. If there are no conflicts and the test suite runs successfully, the changes are committed.

I’m not sure which model would work best for Gnome. At least initially, one of the first two models would probably be a good choice. If you have good test coverage, PQM can help ensure that the mainline stays buildable, and changes don’t destabilise things.

As has been mentioned elsewhere, regularly updated mirrors of various CVS repositories are being set up at arch.ubuntu.com. You can find out whether a mirror has been created for a module by looking it up on Launchpad. If a branch exists, you can check it out or branch it by prepending “http://arch.ubuntu.com/” to the full branch name (e.g. http://arch.ubuntu.com/‌imendio@projects.ubuntu.com/‌gossip--MAIN--0).

With the current discussion on gnome-hackers about whether to switch Gnome over to Subversion, it has been brought up a number of times that people can switch from CVS to Subversion without thinking about it (the implication being that this is not true for Arch). Given the improvements in Bazaar, it isn’t clear that Subversion is the only system that can claim this benefit.

For the sake of comparison, I’m considering the case of a shared repository accessed by multiple developers over SSH. While this doesn’t exploit all the benefits of Arch, it gives a better comparison of the usability of the different tools.

Setup

Before using any of CVS, Subversion or Arch, you’ll need a repository. This can be done with the following commands (run on the repository server):

(the --signed flag can be omitted if you don’t want to cryptographically sign change sets)

Once the archive is created, you’d need to make sure that everyone has write access to the files, and new files will be created with the appropriate group ownership. This procedure is the same for each system.

Now before users of the arch archive can start using the archive, they will need to tell baz what user ID to associate. Each user only needs to do this once. The email address used should match that on your PGP key, if you’re using a signed archive.

baz my-id "Joe User <joe@example.com>"

Next you’ll want to import some code into the repository. This will be done from one of the client machines, from the source directory:

The user can then make changes to the working copy, adding new files with the add sub-command, and removing files with rm sub-command. For Subversion there are also mv and cp sub-commands. For Arch, the mv sub-command is supported.

To bring the working copy up to date with the repository, all three systems use the update sub-command. The main difference is that CVS and Subversion will only update the current directory and below, while Arch will update the entire working copy.

If there are any conflicts during the update, you’ll get standard three-way merge conflict markers in all three systems. Unlike CVS, both Subversion and Arch require you to mark each conflict resolved using the resolved sub-command.

To see what changes you have in your working copy, all three systems support a diff command. Again, this works on the full tree in Arch, while only working against a subtree in CVS and Subversion. In all three systems, you can request diffs for individual files by passing the filenames as additional arguments. Unfortunately baz requires you to pass “--” as an argument before the filenames, but hopefully that’ll get fixed in the future.

When it is time to commit the change, all three systems use the commit sub-command. This command also works on a full tree with Arch.

Unlike CVS and Subversion, the baz command will also switch the working copy over to the new branch. By default it will create a branch in the same repository, but can just as easily create a branch in another location.

To switch a working copy between branches, the following commands are used:

This is where Arch’s history sensitive merging starts to shine. Since the working copy retains a record of what changes it is composed of, the merge operation simply pulls over the changes that exist in the branch but not in the working copy — there is no need to tell it what range of changes you want to apply.

To merge more changes from the branch, the CVS and Subversion commands change, while the Arch one remains constant:

The current Bazaar command line interface isn’t that different from CVS and Subversion (it’s definitely worth a second look if tla scared you off). The main difference is that some of the operations work on the whole working copy rather than a subset by default. In practice, this doesn’t seem to be much of a problem.

The history sensitive merge capabilities would probably be quite useful for Gnome. For example, it would make it trivial to merge bug fixes made on the stable branch to the head branch.

Disconnected development is a natural extension to the branching and merging support mentioned earlier. The main difference is that you’d have to make a local archive, and then create your branch of the code in that archive instead of the main one. The rest is handled the same.

Shortly after the scandal over Cornelia Rau (a mentally ill Australian who was in detention for 10 months), another case gets some media attention: Vivian Young/Alvarez/Solon.

She is an Australian citizen born in the Phillipines, who also suffers from a mental illness. From the news reports, the sequence of events seems to be:

In 1984, Vivian moved to Australia to live with her new husband.

In 2001, she was involved in a car accident in NSW. While being treated at Lismore Hospital for her injuries, she lodged a citizenship application and the staff contacted the immigration officials. She gave her name as “Vivian Alvarez”.

On July 17, 2001, the Queensland Department of Families finally notified police that “Vivian Young” was missing.

Days later, she was deported to the Phillipines — neither the NSW or Qld police noticing that she was on the missing persons list. Apparently she was pushed onto the plane in a wheelchair, still suffering from head injuries.

In 2003, an immigration official discovered the mistake while looking through the missing persons list. It doesn’t seem that any action was taken at this time.

This month, the mistaken deportation becomes public. This is the first time that the family is notified — four years after the deportation, and two years after the mistake had been discovered. The government says they don’t know her location, but are doing everything in their power to find her.

Among the Australian family, she left behind a son who is still in foster care.

Rather than being an isolated case, it is quite likely that there have been other questionable deportations — this one getting more attention because the person in question is an Australian. This case has racial overtones too, since it is unlikely that a white Australian would have been deported under the same circumstances. Despite all this, the Minister for Immigration does not feel that a Royal Commission would be appropriate.

Recently Bryan posted about background channels — a system for automatic updating desktop wallpaper. One of the features of the design is a new URI scheme based on the same ideas as webcal://, which I think is a bad idea (as dobey has also pointed out).

The usual reasoning for creating a URI scheme like this go something like this:

You want to be able to perform some action when a link in a web page is clicked.

The action requires that you know the URI of the link (usually to allow contacting the original server again).

When the web browser activates a helper application bound to a MIME type, you just get the path to a saved copy of the resource, which doesn’t satisfy (2).

Helper applications for URI types get passed the full URI.

So the solution taken with Apple’s iCal and Bryan’s background channels is to strip the http: off the start of resource’s URI, and replace it with a custom scheme name. This works pretty well for the general case, but causes problems for a few simple use cases that’ll probably turn out to be more common than you think:

Serving a background channel (or calendar, or whatever) via a protocol other than http. The first alternative protocol you’ll probably run into is https, but there may be other protocols you want to support in the future.

Any links to a background channel will need to be fully qualified since they use a different scheme. If you move your site, you’ll need to update every page that links to the background channel. If you could use relative URIs in the links, this wouldn’t be the case.

One alternative to the “new URI scheme” solution, that doesn’t suffer from the above problems is to serve a “locator file” from the web server that contains the information needed to request the real information. Even though the helper application will only get the path of a temporary file, the content of the file lets the app connect to the server. This is the approach taken by BitTorrent, and various media players like RealPlayer.

The separate “locator file” can even be omitted by placing the background channel location inside the background channel itself. This is the approach taken for Atom, via a <link rel="self"/> link.