386 VS. 030: THE CROWDED FAST LANE

Tyler Sperry

Past issues of DDJ have detailed the introduction and rise of Intel's 80386 microprocessor and its related software in some detail. The ability of 386 machines to maintain compatibility with MS-DOS software while also giving substantially faster performance is something we all can respect. The 80386 isn't the only CPU that can lay claim to the title of fastest microprocessor, however. Recently, Motorola introduced the latest member of its 68000 line--the 68030--with performance claims that would leave the 80386 in the dust. Obviously, this claim bears some investigation.

Like many of you, my first experience with a 68000-based computer was the original 128K Macintosh. Disillusionment is the mildest term I'd use for my first encounter. Despite the promise of a superfast processor with lots of 32-bit general-purpose registers and plenty of memory (128K!), I was able to go back to my CP/M machine without regret. By burdening the CPU with updating the video and emulating a disk controller, the Mac seemed a perfect demonstration of Grosch's Law<fn1>; the CPU might be inherently faster than 8-bit machines, but you'd never be able to prove it by the performance.

In the last few years we've seen the introduction of a number of machines that have delivered on the promise of the 68000. The Mac line has matured to produce the 68020-equipped Macintosh II that rivals the performance of the IBM PC AT. At the other end of the spectrum, Sun Microsystems has had enormous success with Unix boxes based on the 68000 and its descendants. Indeed, once you subtract a few proprietary CPUs (from IBM and DEC), workstations are powered almost exclusively by the 68000.

Fine, you say, but what difference does that make when there are more than 8 million DOS machines out there? And what about the 80386? Doesn't it blow away the 68020?

On the first point, I am happy to say that this article is concerned with comparing the latest offerings of the Intel and Motorola lines, not with software and hardware marketing. Still, those questions often come up in a discussion of the technical merits of competing CPUs. (The AMD 29000 looks to be a fantastic chip, but when will you get to program one?) For now, let it suffice to say that:

Any machine hoping to become the new standard in the personal computer marketplace will have to address the huge software "inertia" of the millions of DOS machines and their software compatibility demands.

The high-performance CPUs coming out in the next few years will probably make this point moot by providing PC emulation at speeds meeting or exceeding an AT. (This software already exists for Unix boxes.)

Even if software emulation isn't suitable, 286 "clone cards" are becoming increasingly easier to come by on a variety of buses.

At first glance, it might seem simple to compare the performance of the 80386 and the 68030. Just set up a test jig with some memory and the two CPUs and run some benchmarks. Although that approach might have worked with 8-bit CPUs (the infamous "good old days"), there are a few problems when you get to the new 32-bit processors.

There's memory, for example. Do you furnish the processors with the fastest static RAM available and let them run flat out, or do you run with a "typical" system (dynamic RAM) and slow the CPUs down with wait states? Arguments can be made on either side.

In the case of the 68030/80386 controversy, this concern has actually been addressed by the chip manufacturers. Both CPUs have provisions for handling the common problems associated with using dynamic RAM slower than the CPU can handle. Both chips have a provision for "burst mode" access, for example, which allows contiguous bytes of memory to be accessed without the delays normally associated with address setup and decoding. In some respects these two CPUs are more similar than different. There is one significant difference in their approach to handling memory access, however, that has a substantial impact on performance.

Let's take a brief detour down memory lane (so to speak). Back in the days of the 6502 and 8080, access to memory was slow but relatively straightforward. If the processor wanted an instruction, it went out to the memory bus and fetched one.

This procedure began to change with the introduction of the Intel 8088. One of the features of the 8088 was a 4-byte prefetch instruction queue that attempted to separate memory bus activity from computation time. Program instructions were moved from memory into a prefetch queue and then acted upon. Although this sped things up a bit, it was of limited usefulness. (See this month's Letters for more on the subject.) Indeed, the less charitable have referred to the prefetch queue as the prefetch bottleneck.

Eager to please, the engineers at Intel improved things in subsequent Intel desigus: the 80386 has a 16-byte prefetch instruction queue. (A simplified schematic is shown in Figure 1a.)

Motorola's attempts at speeding things up became noteworthy with the introduction of the 68020. The 68020 does not have an instruction queue but rather a 256-byte instruction cache. Once an instruction is loaded into the cache from memory, it need not be reloaded unless it's been replaced by a more frequently used instruction. Thus, a small, tight loop can run entirely from on-board cache memory and result in much faster performance. An instruction queue, on the other hand, is by definition limited to operating as an instruction pipeline; any branch taken forces the reloading of the queue.

As you might expect, the addition of an instruction queue can substantially improve a processor's performance. The amount of improvement will, of course, depend on how many tight loops there in your code. (Yet another reason to be wary of small benchmarks). Thayne Cooper and some engineers at Sperry ran both the 68020 and 80386 through some modified EDN benchmarks and published the results in IEEE Micro.<fn2> While a 16-MHz 80386 was able to surpass a 16-MHz 68020 with a disabled cache, enabling the cache better than halved the original 68020 benchmark times. (The cached 68020 beat the 80386 in all tests except the string search benchmark.)

Figure 1: Simplified view of the memory interface for the 9=80386 and 68030. The 80386 (a) has a 16-byte prefetch instruction queue. The 68030 (b) features a modified Harvard architecture with separate 256-byte caches for both instructions and data.

Now, given those benchmarks results, it'd seem pretty clear cut. The performance improvement of boosting the clock speed to 20 MHz should be pretty much the same for either chip. Score them neck and neck--with the edge to the 68020--and we're done, right?

Alas, as my friend Jerry Pournelle would say, it isn't all that simple. The benchmarks done by Cooper were modified 16-bit EDN benchmarks, performed on special hardware. The hardware was designed to keep things as equal as possible for the various processors (the 32032, 32100, and the 80286 were also tested).

Unfortunately, life isn't always fair: your choice of machines will often not include units comparable in all aspects except CPU; sometimes the benchmarks used in a test don't always bear a strong resemblance to your actual application and environment; and the compiler used can impact your results tremendously.

Consider the case of poor Richard Grehan at Byte.<fn3> He took several varieties of 386 computers and accelerator cards. compiled some benchmarks, and ran them. Then he did the same thing for a Mac II and some 68020 accelerator cards in different environments. If you have some experience with benchmarks (or if you read Byte regularly), you can anticipate what he found: the 80386 outperformed the 68020 in the majority of tests.

How to explain this? Well, there are some things to note in the Byte article benchmarks. First of all, these tests were performed with the intent to test mathematical performance. The only nonmathematical tests were the infamous Sieve and a quicksort routine. As the commercials say, your actual milage may vary.

Second, although Grehan tried to use the same compiler vendor for all machines, this wasn't always feasible. Some of the compilers used for two of the 68020 machines gave substantially better times than the Macintosh compiler used for the other tests, and in fact these times were in the same neighborhood as the best 80386 time (a 16-MHz Compaq 386 with an 80387 coprocessor, in case you were wondering).

The lesson here is unfortunately all too clear. The best benchmark is your target application, ported to the prospective machine. Depending on the optimizations offered by the compiler and individual machine peculiarities, you'll find benchmarks vary widely--there are too many confounding variables for a categorical statement that one chip is better than another. Still.

After all that discussion and equivocation on the subject of the 68020 vs. the 80386, you'd expect making a clear statement on the relative performance of the 68030 wouldn't be too plausible. After all, as of this writing, there aren't many 68030 machines available to test. (Both Apple and NeXT are rumored to be working on 68030 designs; both refuse to comment on unannounced products.) In reading through the literature, though, I came across some things that can let us make a pretty good guess.

To start with, the 68030 implements a modified Harvard architecture along with expanded caching. A Harvard architecture machine uses separate address and data buses for both instructions and data; in the 68030. a modified Harvard scheme is employed, in which separate buses are used internally and then multiplexed for access to the system. Figure 1a, page 18, shows a simplified schematic of the 68030's memory interface. Notice that there are now two 256-byte caches: one for instructions and one for data. Given the radical improvement a cache made in the 68020's performance, you can see why Motorola is proudly trumpeting the 68030 as "twice the microprocessor." Of course, it didn't hurt that the chip runs at a clock speed of 25 MHz.

Given that the 80486 is still quite a ways away, Intel would probably like you to believe that a 16-MHz 80386 is equal to a 20-MHz 68020 and that its 20-MHz 80386 is equal (or better) to a 25-MHz 68030. Motorola, as you might expect, has a different view: a fast 68020 is a match for any 80386 and the 68030 blows away an 80386 at any speed.

Aside from the engineer's instinctive distrust of (other people's) benchmarks, and despite the vendor charges and countercharges concerning the benchmarks, there are some clear lessons:

Other things being equal, an 80386 and a 68020 will perform at roughly the same rate: bloody fast.

A 68030 at 25 MHz will probably be faster than any 80386 you find. How much faster, though, will depend a great deal on your software and compiler.

If your application is primarily number crunching, a fast math coprocessor is essential and its presence or absence will probably swamp other aspects.

A weak compiler can mislead you on the performance of a given system. Conversely, a highly optimizing compiler can completely destroy the value of a poorly constructed benchmark.<fn4>

Beware of virtual machines. Today's 5-MHz PC clone is faster than a 50-MHz 80486 box that won't be shipping for another six months.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!