Comment activity

Tech —

Mac OS X 10.2 Jaguar

Performance

Jaguar is marketed as a "features" release mostly because there are so many new features. New features are also easier to sell than incremental performance increases. Mac OS X performance was examined extensively in the 10.1 review, and for the most part those observations continue to hold for 10.2. But while 10.2 does not provide as big a leap as 10.1 did, performance has improved in many areas.

Boot time is substantially faster in Jaguar, thanks largely to its ability to start services in parallel, provided there are no dependencies between them (e.g. network time synchronization can't start until the network is up.) This is especially helpful when one of the startup items takes a long time to complete (e.g. requesting an IP address from a busy DHCP server.) Rather than blocking the entire boot sequence while waiting for this long-running task to complete, Jaguar will continue to launch other independent services.

On the G3/400, the boot sequence was dominated by POST and the new "gray apple" startup screen. Once the startup progress bar appeared, it zipped to the end very quickly, shaving a full 20 seconds off the 10.1 boot time. The G4/800 got a bit faster as well, but it has a static IP address and was already very fast (no waiting for an address from a DHCP server.)

On the memory usage front, Jaguar now enables the previously dormant window buffer compression feature of the window server. In Jaguar, "inactive" window buffers are compressed in order to save memory. Since many windows contain large areas of "empty space" (i.e. long lines of the same color) they tend to compress very nicely even using a simple algorithm like RLE (run length encoding, e.g. "white pixel x 200" rather than "white pixel, white pixel, white pixel, ...") Window compression is independent of Quartz Extreme, so everyone who can run OS X will benefit from it.

Application launch speed has improved in Jaguar, but the improvement is not comparable to the huge leap that 10.1 made over 10.0. I actually repeated the entire test suite from the 10.1 review, but the results were somewhat inconclusive because almost all the applications tested have been revamped since 10.1. In general, the launch times were a 1-3 seconds faster on Jaguar. Several bundled applications also stopped bouncing sooner than they did in 10.1.

The speed-up is likely due to changes in Jaguar's Mach kernel. Here's the explanation from Apple's technote 2053:

In order to reduce application launch times, the kernel now maintains
information about the working set of an application between launches
(in "/var/vm/app_profile").

These so-called "pre-heat files" are a variation on a well-known optimization for decreasing application launch time, but it's somewhat disappointing the the benefit is not more substantial. Nevertheless, it's good to see that Apple is working hard to improve this part of the user experience. (Yes, I'd definitely call modifying the kernel to improve application launch times "working hard" :-)

The dreading "spinning rainbow disc" has an all new look in Jaguar, but it appears just about as often as it did in 10.1. Since the rainbow disc is merely an indication of when an application has not responded to events for a few seconds, it's really an application problem. But if the application is blocking on a system call that's taking much too long, it may also be an OS problem. Either way, it's annoying.

Jaguar minimizes the annoyance by changing the cursor back to normal when it is not over a window owned by the unresponsive application. This makes it clear to the user that other applications are still okay and will respond correctly if clicked on. While this was also true in earlier versions of OS X, the cursor did not change, causing many users to assume they had to wait.

Although scrolling is still often much slower than it should be, the performance has improved significantly in Jaguar. Previous versions of Mac OS X would redraw the entire contents of the window during scrolling. Jaguar simply shifts the existing content upward or downward as necessary (something that can be done in hardware on the video card, even without Quartz Extreme), and then redraws only the newly revealed portion of the window. (This effect is easy to see using the Quartz Debug application included on the developer tools disk. Just turn on the "flash screen updates" option and scroll a few windows.)

Window resizing remains very slow, especially in "brushed metal" applications like iTunes and iPhoto. Window resizing sometimes actually seems worse overall in Jaguar--an impression created by the proliferation of the brushed metal interface. Whatever is making window resizing so slow needs to be found and fixed. Quartz Extreme doesn't help in any measurably way, so obviously window composition overhead is not the culprit.

To explore the actual performance benefits of Quartz Extreme on the G4/800, I returned to the old standby: the transparent terminal window. First, I downloaded the largest version of the Star Wars Clone War trailer.

To set a baseline, I played the movie as-is. As expected, the G4/800 had no problem playing it at its full framerate of 24 fps. The torture began with the placement of a single transparent terminal window (80x24, 0.25 transparent, where 1 is totally opaque) on top of the movie. I noted the maximum and minimum sustained (for more then 1 second) framerate during playback. Then I repeated the test after positiong two transparent terminal windows on top of the movie, then three, then four, and so on. The results are shown in the table below:

Framerate (Min/Max)

# Win

Mac OS X 10.1

10.2 w/ QE

0

24fps

24fps

1

8.6fps - 11fps

24fps

2

5fps - 10fps

24fps

3

1fps - 6fps

24fps

4

.5fps - 5fps*

24fps

5

0fps - 3fps*

24fps

* Sound drop-outs during playback

I actually ran tests with many more than 5 windows on top of the movie. In Mac OS X 10.1, the movie playback was pretty much shot after 5 windows were added. The movie actually stopped entirely (0fps) for a second or more several times, and the sound (which began dropping out with 4 windows) was completely broken up.

In Jaguar with Quartz Extreme enabled, the GPU really flexes its muscles. I actually kept adding transparent windows until I hit 25 and got bored. The framerate never dropped at all. Impressive!

It was time to pull out all the stops and unleash the dreaded shaking transparent terminal window. The test is the same as described above, except that the top-most terminal window is shaken as fast as humanly possible over the top of the rest of the pile. This is the test that caused iTunes to skip in Mac OS X 10.0 and could bring movie playback to its knees in 10.1. Here are the results for 10.2 with Quartz Extreme, listing the minimum sustained (for more than 1 second) framerate. (Results from 10.1 are shown just for completeness :-)

Minimum Framerate

# Win

Mac OS X 10.1

10.2 w/ QE

1

0fps

24fps

1

0fps

24fps

2

0fps

24fps

3

0fps

24fps

4

0fps

21fps

5

0fps

17fps

Things are grim in Mac OS X 10.1. I was able to stop the movie dead by shaking just one transparent terminal window vigorously. Not surprisingly, things didn't get any better as the number of (stationary) transparent terminal windows below it increased.

Jaguar with QE was a champ right up until the 4 window mark. After that, it began to degrade fairly linearly. Although this performance is worlds better than 10.1 without QE, I wanted to know exactly what it was about shaking the top window that cause the performance loss. After all, the amount of compositing calculations seemed equal, and could potentially be even less, since the top window occasionally moved away from the playing movie completely during shaking.

My next thought was that it might be a bandwidth problem. Perhaps shaking the window somehow caused the AGP bus to be swamped? But that didn't make much sense, especially with 64MB of VRAM and a set of windows that weren't changing at all (except for the movie). The unchanging window buffers for all the terminal windows could fit comfortably in the card's video memory.

So I tried to look at it more logically. QuickTime needs CPU cycles to decode the audio and video in the movie. A reduction in framerate means that QuickTime is not getting enough CPU cycles. Therefore shaking the transparent terminal window must be taking up CPU cycles. Running top verified it: the window server was taking roughly 70% of the G4/800's CPU cycles when I was shaking the window on top of four other windows. It took progressively less CPU time as the number of stationary transparent terminal windows decreased.

That explains why the movie's framerate started to drop, but what explains the window server's increased need for CPU cycles? The test with the stationary windows showed how well compositing calculations are handled by the GPU. With 25 stationary windows, there was no drop in framerate. Why does a shaking window require so much more CPU involvement? And, even more puzzling, why does the amount of CPU involvement increase in proportion to the number of stationary windows sitting below the shaking window?

Unfortunately, I didn't have time to explore the boundaries of Quartz Extreme much further, as there are many other interesting parts of Jaguar that required my attention. Furthermore, the example above is not likely to come up in daily use. Stationary windows are much more common than shaking ones, and QE handles them without breaking a sweat. Still, it would be nice if window movement was just as smooth.

While QE doesn't help window resizing and isn't responsible for the scrolling speed increase, you can most certainly make a Mac OS X system "feel" faster by playing to QE's big strength: compositing.

John Siracusa
John Siracusa has a B.S. in Computer Engineering from Boston University. He has been a Mac user since 1984, a Unix geek since 1993, and is a professional web developer and freelance technology writer. Emailsiracusa@arstechnica.com//Twitter@siracusa