Overcoming the embedded CPU performance wall

The physical limitations of current semiconductor technology have made it increasingly difficult to achieve frequency improvements in embedded processors, and so designers are turning to parallelism in multicore architectures to achieve the high performance required for current designs. This article explains these silicon limitations and how they affect CPU performance, and indicates how engineers are overcoming this situation with multicore design.

Current status of multicore SoC design and useThe last few years there has been an increase in microprocessor architectures featuring multi-threading or multicore CPUs. They are now the rule for desktop computers, and are becoming common even for CPUs in the high-end embedded market. This increase is the result of processor designers desire to achieve higher performance. But silicon technology has reached its limit for performance. The solution to the need for ever increasing processing power depends on architectural solutions like replicating core processors inside microprocessor-based systems-on-chip (SoC's).

Moore's law states that the number of transistors that can be fit onto a square inch of silicon doubles every two years, as the size of transistors shrinks. It was postulated by Gordon E. Moore in 1965, who at that time was Fairchild Semiconductor's Director of R&D and later co-founder of Intel.

Although the word “law” is used to describe his projection, Moore's prediction is not a law of physics, but a conjecture based on empirical observation of the technology in the 60's and 70's. In the short history of modern computing, there have been many guesses and predictions with no few mistakes. And that makes Moore's law more impressive considering it has been accurate since it was first postulated right up to present time - and it is expected to hold for at least another decade.

Moore's law continues to hold because the ability to shrink the size of the components on a chip has enabled designers to continuously increase density of transistors in processors, memories, etc. With smaller transistors you can add more functional units to your processor and make more complex architectures in the same size.

Thanks to this higher density, techniques like branch prediction or out-of-order execution are now common features in modern processors, even though they are resource hungry. This leads to improved IPC (Instruction Per Cycle), i.e. improved instruction throughput, one of the two fundamental sources of the overall performance on a processor. A smaller transistor size also allows higher clock rates. When you shrink the gate length of a transistor by 1/k you can obtain a circuit delay reduced in the same amount. Transistor switching time decreases as circuit delay decreases, so you can achieve a clock rate multiplied by a factor of k. Operating at higher frequencies processors achieve higher performance, but at a cost.

However, designers are now encountering some practical restrictions to following this progression. Increasing density of transistors and frequency on a chip produces limiting consequences that have more influence as you go further down in transistor size. Two that are of primary concern and are the main barriers to further progress are higher power consumption and higher transmission delays.

Power consumption on a chipThe power consumption on a chip and the associated heat dissipation are becoming a big barrier for hardware designers. With the constant increase in number of transistors, current processors are demanding a considerable amount of energy in a very small area. This means a high power density to be dissipated. And it is not only the number of transistors. High operating frequencies also have a serious impact on power consumption, as we will see next.

To get an idea of the evolution of these parameters in the last decades, Figure 1 shows transistor count and operating frequency increments for x86 Intel architectures over a period of 20 years, starting with the 80386 architecture, the first 32-bit x86 processor.

Click on image to enlarge.

Figure 1: Transistor count and frequency for the X86 architecture

Note that both parameters are shown on logarithmic scales, which denotes the huge progression they have kept. With respect to power, Figure 2 shows typical power dissipation for these processors, this time on linear scale.

Figure 2: Power consumption of succeeding generations of X86 processors

The increase in number of transistors continues. Some of the lastest Intel Core i7 processors feature more than 2200 million transistors. The dissipated power also increases slightly, depending on models, reaching values of 130 W. However, clock frequency in these new processors is not increasing and remains around 3.5 GHz.

One of the reasons for this stagnation is that current integrated circuits have reached physical limits of power density, generating as much heat as the chip package is able to dissipate, and consequently hardware designers have had to limit frequency increments. It is true that Intel has never sacrificed performance for power efficiency, but now physical consequences leave them with no option but to look carefully at power consumption.

Some equations better demonstrate how frequency and transistor count affect power consumption on a chip. A few simple mathematical relationships will make it clear why these parameters are so important in today's designs.

The following equation shows how power dissipation on a chip relates to operating frequency and other factors:

This is the expression for power dissipation in CMOS technology, the dominant semiconductor technology for integrated circuits today. The first part (addend) of the equation accounts for the dynamic power consumption on the chip (i.e. the power consumption caused by charging and discharging capacitive loads when transistors are switched) that represents the useful work performed by the chip. A is the activity factor meaning the proportion of switching transistors in each cycle (since not all transistors have to switch every clock cycle); C is the capacitive load of the transistor; V is the voltage; and f is the frequency.

The second addend in the equation also accounts for dynamic power although in minor quantity, in this case because of the transitory short circuit current (Isc) that flows through transistors from voltage source to ground during finite rise or fall time t. And the last addend accounts for the static power consumption, i.e. the power consumption due to leakage current (Ileak) and the only one that is present in a circuit that is powered but inactive. It applies to the whole circuit independently of transistors state and therefore the activity factor does not appear in this addend.

If we observe the first term of the equation we can see why power has being increasing only linearly while frequency has been doing it logarithmically. The reason is the quadratic dependence on the voltage.

Engineers have been able to continuously reduce this voltage from 5V down to below 1V, which has helped them to control dissipated power without losing performance. Unfortunately, many factors are interdependent and engineers have to make trade-offs constantly. For example, imagine we want to decrease dynamic power consumption on a chip (consider only first term of the equation) by reducing the supply voltage initially fixed at 2V. If we are able to reduce it to 1.7V, it is only a 15% decrease in voltage but we get a significant 28% decrease in power. However, reducing supply voltage has a side-effect on the maximum frequency for the circuit and on the threshold voltage of transistors (the voltage at which a transistor switches on):

In our example, if you had a threshold voltage of 0.5V and the circuit was operating at a frequency of 4GHz you would have to reduce the threshold voltage to a value of approximately 0.32V in order to maintain the same operating frequency. However, this might be not feasible, since threshold voltage depends on technological parameters and beyond some specific value it is not possible to reduce it without making changes in your semiconductor manufacturing process. Without changing threshold voltage, maximum frequency would then be reduced to 3GHz, a 25% decrease.

On the other hand, although you were able to reduce supply and threshold voltage without affecting performance, leakage current depends exponentially on threshold voltage:

The voltage VT is the thermal voltage, that depends on the absolute temperature T; k is the Boltzmann constant and q is the electrical charge on an electron. At usual temperatures the thermal voltage value is around 30 mV. For large values of threshold voltages compared to the thermal voltage the effect on leakage current is negligible, but for small ones, around 100mV, the effect becomes relevant.

Moreover, it is not only the thermal voltage dependent on temperature, threshold voltage usually also varies with temperature and both variations are added together on their effect on leakage current. The increase on leakage current implies increase on static power consumption so this imposes a practical limit on the voltage reduction technique for low values.