This page contains too many unsourced statements and needs to be improved.

Megahertz myth could use some help. Please research the article's assertions. Whatever is credible should be sourced, and what is not should be removed.

“”There is in fact a "Megahertz Myth" and it exists in the minds of those who think that the only factor that matters is raw chip speed, as defined in megahertz ratings. Especially true in the case of different CPU designs, even among products in the same family. When you start to compare different classes of chips, the mythological 1:1 relationship of MHz to "speed" becomes even more difficult to cling to.

The megahertz myth is a name for the widely held[2] misconception that the computing power of a CPU is strictly a function of its clock speed. In reality, clock speed is only one of many factors that determine the speed at which a CPU can execute instructions. The myth is largely a creation of computer and hardware manufacturers' marketing departments, who for a while highlighted clock speed as one of the primary features in their advertising,[3] playing up the innate assumption that big numbers = MOAR POWER!!!11

Contents

“”For many years, the number of times a computer's clock — PC's "heart" to its processor's "brain" — ticked each second was a direct indication of how many calculations a processor could perform. One clock tick, one instruction, was how the design rule ran.

Modern CPUs include multiple execution units, or cores, within the same chip. Each one of these cores has all of the capabilities of a typical CPU, and these cores are able to work together on the same piece of software. This is known as parallel processing. While modern multi-core CPUs may run at a lower megahertz speed than past processors, their multiple cores allow them to process more workloads simultaneously. Therefore, the same amount of work can be achieved at lower speeds. Fewer megahertz equals less power, so multi-core CPUs are not only more powerful, they are also significantly more energy efficient. The multi-core CPUs in modern smartphones are able to achieve greater performance with fewer than five watts of power than the the hundreds of watts needed by desktop computers from less than a decade earlier. Desktop and laptop computers sold today typically have two- or four-core CPUs; enthusiast, academic and professional computers can have CPUs with dozens, if not hundreds, of cores (in the case of corporate mainframes).

To take advantage of parallel processing, developers must write their code accordingly in order to distribute the software's workload among the available cores. Some algorithms, especially those with many interdependencies between intermediate results, can be hard or impossible to parallelize.

In the same way that a single CPU with multiple cores can distribute its workload, multiple CPUs, each with multiple cores, can further parallelize the task at hand. These multiple CPUs may all be housed in the same case on a common motherboard, or they may be spread out across multiple racks of cases that then communicate with each other over high-speed network links. Supercomputers use this technique, having sometimes hundreds of thousands of CPUs (and GPUs) working together on one task to achieve their huge processing power. Similarly, the vast server and render "farms", employed by companies like Google and Industrial Light and Magic, utilize hundreds of thousands of cooperating CPUs.

Because of these techniques, making many CPUs act as one, the only theoretical limit to a supercomputer's processing power is the physical space it can occupy. The practical limits are energy usage and heat dissipation. At its peak, the Oak Ridge National Laboratory Titan's 37,376 CPUs and GPUs require 8.2 megawatts of power (or, enough to power about 8000 homes).[5] The Titan's cooling system has 6600 tons of capacity (a large house has about 3 tons).[6]

Distributed computing projects, like Folding@home and BOINC, allow home PC users to volunteer their unused processing power, through the internet, to globally networked projects. With distributed computing software, a home PC downloads and then calculates a small chunk of a much more massive shared data set. Once work on that small chunk has been completed, it is uploaded into the global data set, and a new small chunk is downloaded to be worked on. In this manner, the unused power of millions of home PCs can be leveraged into solving extremely complex problems, like genome folding and the cataloguing of astronomical data. Bitcoin also works similar to this.

Graphics processing units (GPUs) are specialized processors that can quickly perform a limited set of operations on a large series of values at the same time (for example, pixels on a screen). This differs from the CPU, which is able to preform a much broader array of functions, but at a slower rate. As their name implies, GPUs are usually used for rendering graphic elements (like video games) efficiently, but can also be used for machine learning or other data-driven tasks that are hugely parallelizable, yet mathematically simple. GPUs currently account for the majority of the processing power in nearly all of the World's supercomputers. Two of the fastest supercomputers, China's Tianhe-2 and the US's Titan, both utilize tens of thousands of GPUs, most of which are little more than scaled-up versions of the same add-in cards used in home PCs.[7]

Miniaturization of components makes it possible to have more components in the same chip, which in turn can be used to parallelize more tasks.
There are some physical limits on miniaturization, because as the size of components decreases, quantum effects such as tunneling become more apparent.

Instructions connect computer software to hardware and are information sent to the processor to be interpreted. The types of processors include:

Vector processors

Scalar processors

Superscalar processors

One fixed-length instruction is interpreted per clock cycle, one after another. Unlike scalar it excels at manipulating large blocks of data.[8]

Similar to vector, scalar interprets one instruction at a time and manipulates one data item at a time.[8]

Executes multiple instructions at a time because it has multiple pipelines and can manipulate multiple data items at a time.[8]

It is clear that superscalar is superior to scalar because more instructions can be interpreted at a time. Being able to process more instructions per machine cycle means that processes are performed relatively more quickly.

RAM is orders of magnitude slower than the CPU, and peripherals like drives and networking adapters are slower still. In modern computers, the CPU spends a good deal of time waiting for data to be read from or written to other components. Faster memories, more and bigger CPU caches, and faster buses are used to try to chip away at this delay.

Other techniques that have been employed to improve computing performance are:

Register renaming, to overcome the difficult instruction-level parallelism due to the scarceness of registers in some instruction sets (like those of the x86 architecture).

Out-of-order execution, to avoid wasting cycles by executing instructions in a different order while making sure the semantics of the program remain the same.

Addition of specialized instructions that are optimized for performing some tasks more quickly than they would if these operations were manually performed by the binary code. Compilers can recognize these patterns in programs and generate machine code that makes use of these specialized instructions. For example, POPCNT in SSE4 counts the number of 1 bits in a numeric value. Other instructions can speed up common cryptographic or media encoding/decoding operations.