GPU clusters change the game in visual computing

Following the game-changing IndeX, a GPU clustering approach that combines compute cycles and rendering cycles in a single interactive system, Nvidia has introduced another GPU code called Pascal. Pascal builds on the former by giving it higher attributes—bandwidth, capacity, and energy efficient per bit.

Up until now, (literally) you could use a single GPU for parallel processing and it did a fabulous job offering 10s to 100s of speed-up for certain classes of problems.

But we always want more so the idea was developed (by Nvidia's Germany group that until 2011 was Mental Images the ray tracing company) that you should be able to gang clusters of servers with GPUs in them and get more power.

That cluster-gathering scheme for visualizing huge volumetric datasets is called IndeX.

But we still want more. So what if you could gang up multiple GPUs within a cluster?

Well, now you can with a GPU-to-GPU inter-linking connection that is called NVlinks. With NVlinks you get scaling of GPUs in a cluster, and scaling of clusters. Just image the bitcoin farming you could do—it boggles the mind.

For example, real-time visualisation of volumetric data is essential for experts in a variety of fields; they use it to gain a visual insight. Dense, high-resolution, 3D images are used in medical examinations, meteorologists study the weather, and geophysicists use it to find oil deposits. This is the big, really big, data.

The Taranaki Basin dataset (Crown Minerals and the New Zealand Ministry of Economic Development). Source: Nvidia

However, the amount of data produced from a high-resolution simulation can be extremely large. It challenges traditional visualisation methods—and the researchers want more.

A typical geological subterranean survey is 80km to 120km wide and long, and goes down another 8km to 10km or more.

The geologists would like to get a resolution of at least 20m, which would yield 60B per data point, and end up with 20GB per shot. They take a lot of samples because one of the analyses they like to do is to make a movie.

If you look at the above image and notice the blue or orange slice, imagine either one of those slices moving back and forth to reveal the underground structure. These "movies" have to run at 30fps.

The usage model in medical diagnostic using CAT or MRI scans has exactly the same issues and data sizes.

And both, medical and geophysical (to name just two) are critically important to life threatening. Now think about weather systems, simulated nuclear explosions, and simulations of cars crashing into walls and you get a feeling for enormous amounts of data that needs to be processed, and processed fast.

IndeX

To try and wrangle this data under control and get the benefit of parallel processing using GPUs, Nvidia developed a scheme to put GPUs in a box they call a cluster, and then via a LAN gang up the clusters (the GPUs communicate with each other via PCIe or InfiniBand).

This design, which Nvidia calls IndeX, allows for scaling of one to n-clusters, and basically makes the solution a function of the checkbook of the researchers.

The IndeX software infrastructure contains scalable computing algorithms that run on a separate workstation or, more likely, a dedicated GPU-compute cluster.

Essentially, IndeX brings together compute cycles and rendering cycles in a single interactive system.

This is a big deal, in every sense of the word. Being able to leverage the compute power of a dedicated GPU cluster by means of a GPU rendering cluster is game-changing in interactive visual computing.

That's great, and systems are using it. But we want more, and faster; we always want more and faster. One way to get more, and faster is to stuff the clusters full of GPUs that could talk to each other more efficiently—more bigger clusters.

At Nvidia's GPU Technology Conference (GTC) the company announced a new GPU code named Pascal.

Pascal

Pascal (the subject of a separate discussion/article) has many interesting features, not the least of which is build-in, or rather I should say, built-on, memory. Pascal will have memory stacked on top of the GPU. That not only makes a tidier package, more importantly it will give the GPU 4x higher bandwidth (~1 TB/s), 3x larger capacity, and 4x more energy efficient per bit.

Basically the already high-speed GPU to video memory bandwidth will go up four orders of magnitude. That alone will help speed up things, but Nvidia took it one-step further and added GPU-to-GPU links that allow multiple GPUs to look like one giant GPU.

Nvidia's NVLinks connecting four GPUs and the CPU (Nvidia).

Today a typical system has one or more GPUs connected to a CPU using PCI Express. Even at the fastest PCIe 3.0 speeds (8 Giga-transfers per second per lane) and with the widest supported links (16 lanes) the bandwidth provided over this link pales in comparison to the bandwidth available between the GPU and its system memory.

NVLink addresses this problem by providing a more energy-efficient, high-bandwidth path between the GPU and the CPU at data rates 5 to 12 times that of the current PCIe Gen3. NVLink will provide between 80GB/s and 200GB/s of bandwidth.

The numbers are astronomical, and they need to be because the data sizes and rates aren't slowing down and are also astronomical. And, just to make a pun, this now improves astrophysics and astronomy research too. (Nvidia's GPU-compute systems are being used to tease out the beginning of the big bang—now that's truly BIG data).

And the really good news? The costs and power requirements are not astronomical, in fact, the power requirements are less than a tenth of what they would have been (for an equivalent amount of compute resource) four years ago.

This is the opening phase of a new threshold in understanding of enormously complex systems like weather, geophysics, mechanics, and the human body.

Ten years from now our lives will be so much better because of the wonders in medical science and the management of multi-faceted systems, we'll look back on 2014 with sympathy and say, how did they ever get along with such primitive tools?