Writing

Other Sites

James Gosling on Window System Architecture; Reinvents DirectX

Saturday 21 August, 2004, 03:51 PM

James Gosling has had a hand in many interesting computing
technologies. He is perhaps best known today for his involvement with the development of Java, and of course
there was also EMACS... But what is often less widely remembered was his work on early windowing systems.

He worked on two windowing systems: NeWS
and the Andrew project, both of which predate the
X11 window system which eventually became the de facto standard on UNIX systems (and later on Linux). X11
had a number of features in common with these predecessors. In particular, application code communicated with the
windowing system over the network stack in all three systems.

A lot of X11 advocates put this forward as being evidence of what they see as the superiority of X over, say
Windows, because it enables a remote application to present a UI on your desktop. So it's interesting that Gosling
has said that if he were to be designing a new windowing system
today, he would definitely not include that particular aspect of the design. It's not that he thinks that NeWS, Andrew, or
X11 did the wrong thing, it is rather that twenty years of evolution in computing means that some of the assumptions
that were reasonable when these old systems were designed no longer hold true. For example, twenty years ago
the cost of rendering graphics far outweighed the network communication overhead, so using a socket to connect
the application to the window system didn't look like a particularly expensive thing to do.

These days, even on a cheap PC the graphics rendering hardware is very much faster than the network. Attempting to
wedge a network stack in between the application and the graphics system makes it impossible to exploit fully
the graphics capabilities of a typical desktop computer. Simply removing the physical network hop by running the
application on the user's machine (as is the norm these days) is not sufficient - you still end up going through the
network stack for in-machine cross-process communication, and even that's too lardy to take full advantage of modern
graphics hardware. Graphics cards are staggeringly fast these days, and they can get a surprising amount of drawing
done in the time it takes to send a message from one process to another.

Gosling points out that this problem has not gone unnoticed by application developers; indeed, it has already
necessitated workarounds for the X11 architecture. In graphics intensive applications that run on X11, he observes
that it is becoming commonplace to sidestep the network layer, and use back doors that allow more direct access
to the hardware.

In short, windowing systems designed to allow distributed operation where the application
does not necessarily run on the desktop are demonstrably architecturally flawed - they cannot exploit modern graphics hardware.

What Gosling Would Do Today

Gosling's highlights this increasing unsuitability of an X11-style architecture in a
paper in which he outlines how he would design a
windowing system if he were starting from scratch today. (Or more accurately, if he were starting
in 2002 when he wrote the paper.) Of course as we all know, Microsoft is also in the process of designing
a new
windowing system from scratch, so I think it's interesting to compare Gosling's proposal with Microsoft's
preview implementation.

To me, the most striking feature of Gosling's proposal is how low level it is. Indeed he states as a goal
that he would:

"make the 'window system' so minimal that it is almost non-existent."

In fact what he describes sounds pretty much like DirectX. His proposal could be summed up roughly as:
give any application that wants it unencumbered access to the graphics hardware. Or as close as you can
get to that in practice without threatening the security or stability of the system. Of course applications
won't be speaking directly to the graphics hardware in reality even if you could arrange for that without
threatening the stability of the system - you need some degree of abstraction if you want to support multiple
graphics hardware vendors. That's what DirectX is all about - an abstraction that is fast, lightweight, and
as close to the hardware as is practicable.

This is certainly a good way to enable high performance graphics, if modern DirectX-based games
are anything to go by. However, I'm not sure that this constitutes a 'window system.' If you've ever tried to write an
application with a GUI using DirectX, you'll know that it requires an order of magnitude more effort than working
with a higher-level windowing system such as is provided by the Win32 API, or the various OS X windowing
APIs.

Not that this is a flaw in Gosling's proposal - I'm just observing that his proposal has very limited scope.
He recognizes that user-mode libraries will need to be built on top of his system to provide a richer model;
in his paper he has only set out to propose the fundamental architecture, rather than the design of a whole windowing
system. And he has come up, more or less, with DirectX. And by an astonishing coincidence, Microsoft
has chosen DirectX as the underpinnings of Avalon their next user interface system. So it looks like Gosling
and Microsoft are in agreement here. (Doubtless the rabid anti-Microsoft contingent will assume that this means
Microsoft read his paper and stole his idea... But I'm pretty sure the design and implementation of Avalon was
long underway by the time Gosling wrote this paper in December 2002. I think this is just convergent evolution
of ideas.)

Desktop Composition is Hard

There is one weakness in Gosling's proposal though. I think it's an interesting weakness, because it concerns an awkward
issue that always seems to be a fly in the ointment of any modern windowing system - desktop level
composition. If you've worked with the current publicly available builds of Longhorn, you'll know that Avalon supports
extremely rich graphical
composition inside of an application window, but today, the desktop-level composition effects are
switched off by default. You can switch them on, but it doesn't half slow everything down. Similarly, while
OS X's Quartz Composition is powerful, it does not offer quite the same rich set of transformation and
composition features that are available to you within a particular Quartz 2D drawing context. Exploiting the
graphics hardware in a cross-application way is intrinsically a much harder problem than letting applications
exploit the hardware within their own isolated worlds. Predictably, Gosling's proposal seems also to fall
somewhat short when it comes to inter-application composition.

Indeed, the composition model in Gosling's proposal feels curiously olde-worlde in comparison not just to
Avalon, but also to Mac OS X, or even good old Windows XP and Windows 2000. His description of clip lists and window stacking
describe a model that is essentially the one Windows used prior to Windows 2000: any given pixel on the screen
is owned by exactly one application at any given time. (He doesn't state that explicitly, but it does appear
to be one of his unstated assumptions.) Given the focus his proposal has on performance and simplicity,
this is understandable, but it precludes certain UI features that both Windows and Mac OS X users have already
had for some time. Windows introduced support for partially-transparent windows and other transparency features such as
drop shadows as far back as Windows 2000, with its layered window support. Apple added support for these
same features slightly later through their Quartz composition engine in OS X, first released in 2001.

As Gosling recognizes, any windowing system must manage composition - the process of combining
the output of many applications onto the screen is one of the primary jobs of a windowing system. (Indeed,
besides managing hardware resources and routing user input to the right application, composition is pretty
much the only other job that the windowing system Gosling proposes actually does.) He has just chosen
a simple but rather old fashioned approach. Moreover, it's an inflexible approach, so I can't agree with
his claim:

"The window system knows nothing of rendering and imposes no
preconceived notions on it."

This clearly isn't true. It imposes the preconceived notion that applications cannot share a region of
the screen. In his model, when two regions owned by different applications overlap on screen, one of
them is deemed to be 'on top' and completely obscures the other one in the region where they overlap.
At the time Gosling wrote his proposal, neither of the two mainstream proprietary windowing systems
imposed this restriction. He seems not to have understood that the composition model does in fact
impose constraints on your rendering capabilities.