Understanding the Mercutio-GDevice Problem

This page describes a problem that occurs with applications unlock GDevice handles and certain Quickdraw assumptions about the GDevice list. Although this problem has surfaced because of a bug in the Mercutio MDEF, Apple DTS reports that there are other (non-Mercutio) applications and even parts of the OS that unlock GDHandles and can cause the crash. The goal of this page is to document this problem as well as possible and alert the developer community to possible solutions. This page is geared towards Macintosh developers and technically-savvy users. It is split into the following sections:

When displaying popup menus, Mercutio uses a function called GetMenuScreen to determine what screen the menu should appear on. In the process of walking the GDevice list, Mercutio locks, then unlocks the GDevice handles (GDHandles). This leaves the GDHandles unlocked.

According to Jim Reekes at Apple who initially tracked down the source of this problem (thanks Jim!), many parts of QuickDraw expect the GDHandles to be locked and dereference them without verifying this first:

The bug is in the PowerPC version of StdText() which has been shipping
since the first PowerPC release. It deferences the device handle, and
then makes a call that can move memory. Thus the pointer is now invalid...
In fact, it appears that QuickDraw, video drivers, and the cursor code all assume the device handles are locked. The cursor code runs at interrupt level, so devices _must_ be locked at all times.

Thus, it seems that GDHandles are different than other handles in that they should never be unlocked. This is not currently documented in Inside Macintosh or elsewhere; hopefully a TechNote will be forthcoming.

Any application can cause this problem if it unlocks GDHandles. We have confirmed that other non-Mercutio-using applications that exhibit this behavior, as well as portions of Apple's own system software (!).

In the Mercutio MDEF, this code is part of the GetMenuScreen routine used to display popup menus. Therefore, we believe that although the Mercutio-portion of this bug is present in all versions of Mercutio up to and including 1.3.4, the bug only occurs in applications that use Mercutio to display popup menus and only if the user displays one of the popup menus.
For example, initial investigation shows that BBEdit, which uses the System MDEF for popup menus, does not exhibit the bug.

You can see the buggy code and how it has been fixed in subsequent releases on the GetMenuScreen page.

Because the crash only happens whem memory gets moved under certain conditions, the crash does not always occur, and does not necessarily occur when the problematic code is executed (i.e. a user displays a Mercutio popup menu). Following the above example, BBEdit may crash because another application previously left the GDHandles unlocked.

However, we do observe the following:

The most common symptoms of this bug are a crash in StdText() or NQDStdText().

The crash most often occurs while scrolling text (which uses NQDStdText()).

The crash is much more likely to happen if you've got more than one monitor. Two monitors are twice as likely to cause the crash.

Because the problem has two components, it needs to be solved in two parts.

Addressing the problem with Mercutio

An obvious solution is to fix the problem in Mercutio, ship a new version, and have developers upgrade their applications. The forthcoming Mercutio 1.5 fixes the problem; a beta is available now (see below).

The list of Mercutio developers is too great to expect everyone to update and get a fix to their users. Even for those who do upgrade, it will take time to get the upgrades to the users, and not all users will upgrade. A system-level solution is needed as well.

Addressing the problem with the OS

The problem can be solved by patching the OS with an INIT or a change to the system software. Based on the following alternatives, it looks like patching HUnlock is the best option:

Patching StdText and NQDStdText to lock the GDHandles. This would not fix all of the crashes since other QuickDraw routines assume the handles are locked (see Jim's characterization of the problem).

Patching PopupMenuSelect to "clean up" after Mercutio, locking the GDHandles. This is also only a partial solution:

patching PopUpMenuSelect to relock the gdevices before exiting can have
side effects. I just verified in MacsBug that the simple act of drawing
the pop-up itself can move the unlocked gdevice. Since it won't be
relocked until the pop-up is removed, the block is likely to be relocked
further up the System heap than it was originally allocated. This means
we could increase fragmentation of the System heap, aggravate low memory
problems, etc. Not all that likely perhaps, but it's a side effect that
needs to be at least considered and possibly explored a bit further. I
was able to easily see at least one gdevice move as the pop-up was drawn
(before it was removed) on both one-monitor and two-monitor machines.

Patching GetNextDevice will fix the Mercutio bug as every GDevice handle that Mercutio unlocks is passed into GetNextDevice as a parameter. It will not work for handles retrieved from other routines (e.g. GetDeviceList) and not passed to GetNextDevice. The MercutioGNDPatch implements this approach.

Patching HUnlock to ignore requests to unlock the GDHandles. This seems the most promising. The MercutioHUnlockPatch implements this approach.

It is too early to tell whether these patches provide a robust solution so to the problem, use them at your own risk. I suggest trying MercutioGetNextDevicePatch first because it is smaller and will have less of an impact on your system.

Mercutio MDEF 1.5 fixes this bug.

MercutioGetNextDevicePatch (a.k.a. MercutioGNDPatch) by Steve Sisak with help from Eric Shapiro and Darin Adler. The extension
patches GetNextDevice to stop Mercutio from unlocking GDHandles in popup menus; details are included in the ReadMe file. This patch may also stop similar bugs in other applications or extensions. It is a lower overhead patch than MercutioHUnlockPatch (see below). Source code for the extension is available from
ftp://ftp.codewell.com//Pub/Patches/Mercutio/

MercutioHUnlockPatch
(a.k.a. FixMercutio) by Brian Zuk: patches HUnlock to avoid unlocking GDHandles. Also available is MercutioHUnlockPatch_Debug that drops into Macsbug when HUnlock is called on a GDHandle. This is useful in finding out what applications (Mercutio or other) unlock the GDHandles. Brian's source for both extensions are available here: source code.

Several people have suggested writing an updater application that scans the user's hard drive to find applications that use Mercutio. The updater would then either install a new, safe version of Mercutio, or NOP-out the problematic code. Aside from the challenge of writing an updater that can recognize and change the various versions of Mercutio that are in use, this approach suffers from the following drawbacks:

Changing resources in an application will, at the very least, require great care on the part of the app's developer when preparing future updaters; if due care isn't exercised, then support nightmares will ensue.

Some applications use internal checksums as virus protection to detect
any changes of resources; a patcher would break this. Similarly,
applications that use and expect compressed resources, would also be problematic.

Adoption remains a problem -- simply releasing the updater does not ensure end-users will use it.

An updater is a one-time fix -- the bug will reappear if users reinstall the application from the original distribution disks, or download a fresh copy from the net.

The Mercutio source code has been licensed to 3rd parties, who may have made their own changes to the MDEF or API that would cause problems if patched.

In addition, Mercutio has had a few API changes in past versions, as documented in the version history, that will
would need to be taken into account. They are:

Version 1.3b2 changed the ID for the GetCopyright message.

Version 1.3b5 and 1.3b6 changed the format of the Xmnu resource.

If we ignore the different beta releases, then a patcher application would at the very least need to distinguish between pre- and post- version 1.3.

Thanks!

Thanks to everyone who has helped track down this problem and provided constructive advice on how to solve it.