Tuesday, September 4, 2007

How to Design a Self-Balancing Multicore Processor, Part I

Forget threads. Forget algorithms. Forget programming languages. That stuff is not only old, primitive, ugly and labor-intensive, it is soon to be obsolete. The computer market of this century wants super fast, fine-grain, parallel desktop supercomputers and rock-solid software that does not fail, software that automatically takes advantage of all the available cores of the processor it is running on. The new market also wants an order of magnitude improvement in productivity, at the very least.

There is only one way the market is going to get what it wants. We must abandon the flawed algorithmic/procedural computing model and adopt a non-algorithmic, implicitly parallel, reactive model. For that, we must rethink, not only our software model, but the computer itself. Yes, the computer must be reinvented! Current processors are optimized for the algorithm, the old paradigm. We must design new processors to support the new paradigm. And what better time is there to make the switch than now, now that the industry is taking its first painful steps away from sequential computing toward massive parallelism? This three-part article is aimed at anybody who is interested in the future of parallel computing.

The Real Purpose of the Processor

The most important question a processor designer must ask himself or herself is, what is the purpose of the processor? Most people in the business will immediately answer that the purpose of the processor is to execute sequences of coded instructions. Sorry, this is precisely what got us in the mess that we are in. In order to figure out the answer to this crucial question, we must first grok the true nature of computing. And the sooner we learn that a computer is really a collection of concurrently behaving and communicating entities, the sooner it will dawn on us that the processor is a necessary evil.

Necessary Evil

Why necessary evil? Because, ideally, each and every elementary entity in a computer should be its own self-processor, not unlike a neuron in a biological nervous system that sits around most of the time waiting for a signal to do something. In other words, ideally speaking, addition operators should add by themselves and multiplication operators should multiply by themselves. Unfortunately, we have not yet progressed to the point where we can build super-parallel systems like the brain with its tens of billions of tiny, interconnected, self-executing elementary processors. That will come one day, but not today.

Processor as Simulator

Fortunately, what we lack in numbers, we make up in speed. We can have a single fast processor core do the work of thousands or even millions of elementary processors (cells). Ideally, adding more processor cores into the mix should improve performance linearly. At least, that is the goal. This, then, is the true purpose of the processor, to simulate the behavior of a large number of inter-communicating cells. So, essentially, the processor is a cell simulator, or cell processor if you wish.

The Key to Proper Multicore Processor Design

The key to designing a multicore processor is to keep in mind that one is designing a cell simulator in hardware. Nothing more, nothing less. Come to think of it, regardless of the number of cores, this is the way it should have been designed in the first place. At any rate, simulating large numbers of small concurrent entities is neither new nor rocket science. Neural network and cellular automaton programmers have been doing it for years, in software. In our case, we are simulating two things:

Behavior
This is what the cells actually do, i.e., their operations (adding, subtracting, comparing, etc.)

SignalingMost of the time, a cell will alert one or more other cells that they just performed an operation.

Essentially, a cell simulator consists of a cell processor, two buffers and a repeating loop. In Part II, I will explain how it works traditionally in software, and how it can be performed entirely by the processor. In addition, the processor can be easily designed to take advantage of all the available cores automatically and keep them busy. Scalable, automatic load balancing is what I’m talking about.