Visitors

GPU Based Parallel Computing as an Intermediate Paradigm Before The Age of Quantum Computing?

”It can’t continue forever. The nature of exponentials is that you push them out and eventually disaster happens.” This is the statement of Intel co-founder Gordon Moore who is well known for his prediction that the number of transistors that can be placed on an integrated circuit doubles approximately every two years (Moore, 1965). This law was a successful guideline, however, as Moore noted, this trend will not continue forever.

In 2003, energy consumption and heat-dissipation problems slowed down the progress and virtually all processor vendors have changed their strategy to manufacture chips with multiple cores, which had a strong impact on the software developer community (Sutter and Larus, 2005). In the meanwhile, manufacturers have been looking into new technologies that would increase the number of transistors per wafer, the smaller the sizes of individual transistors the faster and more power efficient the chip. However, reducing these dimensions comes at its price since the current leakage becomes a problem (Butts and Sohi, 2000) and the bizarre effects of quantum mechanics (Griffiths, 1995) have to be taken into account. One such effect called quantum tunneling becomes a fundamental issue when the technology reaches the level of single nanometer transistors (Lerner and Trigg, 1991), which are only few atoms thick. This basically means that an electron can directly tunnel through the barrier by disappearing on one side and reappearing on the other rendering transistors leaky and uncontrollable.

Reaching atomic scales of transistors was once science fiction, not anymore. In 2010, mass-produced transistors moved from 45nm to 32nm scale, Wu et al. (2010) developed 20nm transistor technology based on fishfin design, Fuechsle et al. (2010) used a single silicon crystal and created so-called quantum dot of just seven atoms acting like a transistor, Obermair et al. (2010) as well as Tan et al. (2010) have demonstrated working technology for single atom transistors.

In the meanwhile, hundreds of materials were researched in order to replace traditional gate dielectric using silicon dioxide (SiO2) with new materials that would have higher dielectric constant (e.g. hafnium dioxide (HfO2), zirconium dioxide (ZrO2), titanium dioxide (TiO2)) resulting in chips with lower current leakage and temperature (Chau et al., 2005). Konstantin Novoselow and Andre Geim received The Nobel Prize in Physics ”for groundbreaking experiments regarding the two-dimensional material graphene” (Nobelprize.org, 2010), which is one atom thick planar sheet of carbon atoms that are densely packed in a honeycomb crystal lattice. This extraordinary material with unique electrical, optical, mechanical and thermal properties was already used by IBM to develop transistors operating at 100GHz (billion cy- cles/second) (Lin et al., 2009) with gate lengths of 240nanometers, which leaves s lot of space for optimisations.

Both IBM and Intel are striving to implement 1THz processors by using optical instead of electrical transmissions. Such system uses tiny optical nanofibers (e.g. Zhang et al. (2010)), which can be about 100 faster than wires and consumes only one-tenth as much power. IBM, Intel and Sun have setup photonics research laboratories where these technologies are being developed. Recently there has been a lot of progress in this field followed by some extraordinary announcements by both IBM and Intel claiming that a terabit per second transmission is within our reach as expressed by IBM Research’s vice president of science and technology TC Chen: ”With optical communications embedded into the processor chips, the prospect of building power-efficient computer systems with performance at the exaflop level is one step closer to reality.”

It is clear that the enormous technological advancements are pushing the limits of the current computational power. To keep up with the Moor’s law we have moved to multi- core processors, however, there are significant drawbacks since standard CPUs consume too much energy per instruction and are not suitable for highly parallel tasks. New paradigms in computer technology are therefore inevitable.

Quantum computing is arguably the most promising one and first experimental systems based on a small number of qubits already revealed its potential. Recently, Wolters et al. (2010) demonstrated a strategy that could be used for scaling up quantum computer systems. Wolters and colleagues managed to fabricate a rudimentary quantum computing hybrid system using nano-diamonds as qubits and also optical nanostructures. Wolter stated: ”Our results suggest a strategy for scaling up quantum information to large-scale systems, which has yet to be done. We regard our experiment as a milestone on the long road toward on-chip integrated quantum information processing systems, bringing the dream of a quantum com- puter closer to reality.” However, powerful quantum computers are still a distant future. It is generally believed that it might be another twenty years before we reach the age of quantum computers.

Since 2003, semiconductor industry have been divided into multicore and manycore design trajectories (Hwu et al., 2008). Manycore design aims to increase the processing power by increasing the number of cores in a processor. This number was doubling with each semiconductor process generation starting with dual-core chips and reaching hyper-threaded hexa-core systems. Manycore system is fundamentally different with regards to its design philosophy. While CPUs are optimised for processing of sequential code and feature sophisticated control logic and large cache memories, GPUs design philosophy emerged from the fast growing video industry where massive numbers of floating point operations are required to render every single frame. As a result, a GPU chip has most of its area dedicated to floating point operations and features only tiny cache memories. Back in 1980s 3D graphics processing was only available to few people who had access to large and expensive computers. The situation was different in 1990s when graphics accelerators became part of many PCs, the price of graphics processors decreased from 50,000 to 200 dollars and the number of pixels that could be calculated per second increased from 50 million to 1 billion. The ever growing game industry have been demanding better and faster graphics and graphics card manufacturers such as NVidia or ATI were competing to develop devices with more and more GPU processing cores and more sophisticated architectures.

In 2006, NVidia released GeForce 8800 GPU, which was capable of mapping separate programmable graphics processes to an array of GPUs, which paved the way to first general purpose computing using parallel GPU processors. GPGPU was an intermediate step where graphics card programmers had to use a OpenGL or DirectX API to implement their pro- grams. However, this required expert knowledge of these APIs and in addition all calculations needed to represent their inputs as textures while their outputs would be represented as a set of pixels generated through raster operations. This has changed during the development of Tesla GPU architecture when NVidia researchers realised that its potential could be much higher if one could think of GPUs like individual programmable processors. As a result, in 2007 NVidia released CUDA (Compute Unified Device Architecure) programming model that was designed to support mutual CPU/GPU application execution.

One could imagine every application as an apricot where its kernel represents problems that cannot be parallelised whereas its flesh represents parallelisable problems. GPUs are designed to deal with massive amount of numerical calculations in parallel and are therefore suited for processing the ’flesh of the apricot’ while CPUs perform much better at processing ’the apricot’s kernel’ but their design is simply not suited for parallel computation of massive amounts of data and not comparable to GPU devices such as NVidia GeForce GTX 570 with 512 GPU cores and 1.58TFLOPS (trillion operations per second). NVIDIA’s Chief Scientist Bill Dally said: ”To continue scaling computer performance, it is essential that we build parallel machines using cores optimized for energy efficiency, not serial performance. Building a parallel computer by connecting two to 12 conventional CPUs optimized for serial performance, an approach often cal led multi-core, will not work. This approach is analogous to trying to build an airplane by putting wings on a train. Conventional serial CPUs are simply too heavy to fly on parallel programs and to continue historic scaling of performance.”

Parallel computing using CUDA and NVidia cards is being increasingly recognised. Many commercial and research applications have migrated from using standard processors only to a collaborative CPU/GPU use where each architecture does what is best at. In general, most of these applications achieved tremendous speed-ups in performance, which can be anything between 1.3x to 2,600x (NVidia, 2010). Since quantum computing is still in its infancy and CPUs are approaching processing limits constrained by physical laws, It seems that parallel computing using GPU devices is the next paradigm that is yet to be fully recognised to become widely used.