"NVLink" shares up to 80GB of data per second between CPUs and GPUs.

Nvidia and IBM have developed an interconnect that will be integrated into future graphics processing units, letting GPUs and CPUs share data five times faster than they can now, Nvidia announced today. The fatter pipe will let data flow between the CPU and GPU at rates higher than 80GB per second, compared to 16GB per second today.

NVLink, the interconnect, will be part of the newly announced Pascal GPU architecture on track for release in 2016.

GPUs have become increasingly common in supercomputing, serving as accelerators or "co-processors" to help CPUs get work done faster. In the most recent list of the world's fastest 500 supercomputers, 53 systems used co-processors and 38 of these used Nvidia chips. The second and sixth most powerful supercomputers used Nvidia chips alongside CPUs. Intel still dominates, providing processors for 82.4 percent of Top 500 systems.

"Today's GPUs are connected to x86-based CPUs through the PCI Express (PCIe) interface, which limits the GPU's ability to access the CPU memory system and is four- to five-times slower than typical CPU memory systems," Nvidia said. "PCIe is an even greater bottleneck between the GPU and IBM Power CPUs, which have more bandwidth than x86 CPUs. As the NVLink interface will match the bandwidth of typical CPU memory systems, it will enable GPUs to access CPU memory at its full bandwidth... Although future Nvidia GPUs will continue to support PCIe, NVLink technology will be used for connecting GPUs to NVLink-enabled CPUs as well as providing high-bandwidth connections directly between multiple GPUs."

Nvidia further explained that "GPUs have fast but small memories, and CPUs have large but slow memories—accelerated computing applications typically move data from the network or disk storage to CPU memory and then copy the data to GPU memory before it can be crunched by the GPU. With NVLink, the data moves between the CPU memory and GPU memory at much faster speeds, making GPU-accelerated applications run much faster."

High-performance computing, data analytics, and machine learning will benefit from the new architecture. Nvidia also said the NVLink interconnect presents a path to "highly energy-efficient and scalable exascale supercomputers," but that milestone is still probably at least a few years away.

Separately from NVLink, a feature in Pascal, called "Unified Memory," will simplify programming by letting coders treat the CPU and GPU memory as a single block. Pascal's design also "stacks DRAM chips into dense modules with wide interfaces and brings them inside the same package as the GPU," Nvidia said. "This lets GPUs get data from memory more quickly—boosting throughput and efficiency." A module that houses Pascal and NVLink is one-third the size of standard boards, which means "they’ll put the power of GPUs into more compact form factors than ever before."

I was thinking this would make a wonderful basis for PCIe 5.0, but if NVidia is going to be keeping it proprietary, we will need to wait for the AMD or Intel version to be released to anyone interested in supporting a new standard for highspeed interconnects in consumer and low to midrange commercial products.

Whichever company supports the next industry standard first, will have a real advantage in the market for a short time. Proprietary, not so much, as it is a dead end that will be replaced by the next standard.

The more important takeaway from this for me is that future video cards can be 1/3 the size they are today, making smaller computer cases even more feasible, not to mention making laptops capable of even better GPUs to fit their available space. It also means that if NVIDIA chose to make video cards that are based on current card lengths, they could have triple the power of those smaller cards.

Something tells me that AMD is going to be left out of the NVLink party.

NVLink implies a new bus type (so any PCI-E GPU cannot use it). For consumer PCs you'd need for both the CPU and the chipset to support the new interconnect so unless AMD and Intel support it, it's likely going to be limited to some IBM Power Systems and NVidia Arm-based systems.

I don't think AMD or Intel need this. They already have both CPU and GPU on the same die accessing the same memory pool.

Intel needs it, or more specifically people wanting to use Intel CPU's along with a useful GPU for supercomputer use need it. Then again PCIe 4.0 does ~32GB/s with an x16 connector so for many applications they probably don't.

...AMD (well Global Foundries) are associated with IBM in cpu development, so I don't see why IBM would left them out of this.

AMD/GF is not associated with IBM on CPU development. Global Foundaries ( and AMD somewhat indirectly) is associated on chip fabrication process. CPUs are fabricated but so are other chips ( GPUs, ASICs, FGPAs, etc. ). IBM is doing the heavy lifting R&D on chip fab and GF, Samsung, & others take that know-how to larger contract fab-for-hire market. IBM isn't a top tier major contract fab player anymore. [ They make for themselves and a few others but not large scale volumes. ]

I am an ex-IBM Power Processor designer. I think this nvidia announcement is not quite what you may think it is. IBM has designed a memory coherent accelerator interface for their Power8 chip which is inbound landing in a few months. This interface is called CAPI - Coherent Accelerator Processor Interface. CAPI allows accelerators to directly address processor memory. I believe all nvidia did here was to use their NVLink marketing name to describe their implementation of the IBM CAPI protocol. CAPI is open to all; so expect to see accelerators from other companies operate with Power8 and it's descendants.

CAPI , at least in IBM slideware, is still using PCIe for transport. It is adding memory coherence for the most part. Don't see how get 2-4x faster than PCIe when using PCIe. They'd need to go to something else or simply bundling several 16GB/s bundles together into a bigger bundle. Four or five x16 bundles add up to 64-80GB/s

This smells a bit more like perhaps "CAPI + PCIE v4.0" which probably is coming down the road from IBM also when PCIe v4 gets folded into a future Power implementation. Remember, Nvidia isn't talking about anything that is available in the next year or so ( it is multiple years down the road).

Something tells me that AMD is going to be left out of the NVLink party.

AMD (well Global Foundries) are associated with IBM in cpu development, so I don't see why IBM would left them out of this.

The recent rumors for AMD's Excavator based server chips has a combo PCIe/HyperTransport unit. It is unclear if this would be passing coherency information over PCIe. The more likely setup would enable the pins on the CPU package to be reconfigured dynamically for either signal type. This would allow numerous PCIe devices on a single socket system or scale to quad socket with ease while maintaining a common package (AMD's AM3+, C32 and G34 processors used the same dies).

I am an ex-IBM Power Processor designer. I think this nvidia announcement is not quite what you may think it is. IBM has designed a memory coherent accelerator interface for their Power8 chip which is inbound landing in a few months. This interface is called CAPI - Coherent Accelerator Processor Interface. CAPI allows accelerators to directly address processor memory.

The other major problem is that slide also outlines cache coherency in Version 2. I think CAPI has coherency in version 1.

Quote:

I believe all nvidia did here was to use their NVLink marketing name to describe their implementation of the IBM CAPI protocol. CAPI is open to all; so expect to see accelerators from other companies operate with Power8 and it's descendants.

I suspect it borrows from CAPI but I also suspect it is somewhat more "next gen SLI" as much as a solid CAPI interface at the moment. The only CPU probably on track right now is Nvidia's future ARM, not POWER 8. The primary usage is going to be GPU-to-GPU links as this slide suggests.

Something tells me that AMD is going to be left out of the NVLink party.

AMD (well Global Foundries) are associated with IBM in cpu development, so I don't see why IBM would left them out of this.

The recent rumors for AMD's Excavator based server chips has a combo PCIe/HyperTransport unit. It is unclear if this would be passing coherency information over PCIe. The more likely setup would enable the pins on the CPU package to be reconfigured dynamically for either signal type. This would allow numerous PCIe devices on a single socket system or scale to quad socket with ease while maintaining a common package (AMD's AM3+, C32 and G34 processors used the same dies).

The recent rumors for AMD's Excavator based server chips has a combo PCIe/HyperTransport unit. It is unclear if this would be passing coherency information over PCIe.

the article cited doesn't make it unclear. It isn't so much combo but either/or. This is more so excavator picked up the general architecture of moving the old "Northbridge" ( high speed I/O and memory) into the CPU package and onto the die.

it is long overdue for AMD's offerings, but will demand a socket change (which AMD has be kick-the-can-down-the-road for a long while now ). The use of same pins for either HT or PCIe is simply a pin saving measure. Similar to how Xeon E5 uses a varaint of socket 2011 with 40 PCIe v3.0 lanes and up to two QPI links versus the new E7 uses a variant of socket 2011 with few PCIe v3.0 lanes and more QPI links.

Looks like perhaps the AMD approach is a bit more flexible in same model or possible will show up as product variants off the same die. General issue is the same. If CPU package provisions PCIe lanes and going to have multiple CPU packages in a system then each package doesn't have to provision as many PCIe lanes. What need is more interpackage bandwidth. Flip side if very few packages (e.g., a workstation) then probably need more PCIe lanes and fewer interpackage interconnect bandwidth. If can do both with electrical variations of the same sockets (and perhaps some firmware config settings or just die variants with feature flipped on/off ) then that simplifies products physical socket issues.

Quote:

The more likely setup would enable the pins on the CPU package to be reconfigured dynamically for either signal type.

Highly doubtful. The pins going out of the CPU package will not likely be changing what they are hooked to through the printed circuit board dynamically. Same part to build various systems, but each of those systems likely has a static instantiation.

AMD (well Global Foundries) are associated with IBM in cpu development, so I don't see why IBM would left them out of this.

AMD and IBM are no longer involved with chip development. AMD is a design house while IBM is in the process (sorry) of selling or trying to sell its own Fab. The fab is where they were connected in the past with IBM providing foundry support to supplement AMD's own, now Global Foundries, Fabs. There is no reason for these competitors to support one another going forward. In the HPC area, much the opposite could well be true.

I don't think AMD or Intel need this. They already have both CPU and GPU on the same die accessing the same memory pool.

Intel needs it, or more specifically people wanting to use Intel CPU's along with a useful GPU for supercomputer use need it. Then again PCIe 4.0 does ~32GB/s with an x16 connector so for many applications they probably don't.

Yeah, and you can double-up to get 32GB/s in 3.0, but it's ugly and wastes a slot. But HyperTransport and Intel's desktop memory interface both max out at ~26GB/s now, so much more than that seems a waste, although Intel's Xeon interface climbs up to 85GB/s. With both nVidia's and AMD's increasing focus on HPC, I can see why they want an interface that can cram as much data to as many cards as possible, particularly if they can bypass main memory and transfer directly from CPU to GPU caches. The tracings on the boards are probably going to be monstrosities, though, how thick is the board going to be?

Something tells me that AMD is going to be left out of the NVLink party.

Is AMD sharing HSA with Nvidia?

HSA works over PCIe, NVLink is a completely separate bus. If you purchased a motherboard with NVLink and you don't get a Nvidia GPU, you just wasted money. At least AMD's design has a fall-back compatibility mode, basic PCIe with no HSA.

I suppose it is about that time in the PC bus lifecycle where the bus is too slow for the graphics and people have to make stopgap solutions until the new technology comes out. NVLink sounds a lot like VLB and AGP.

Something tells me that AMD is going to be left out of the NVLink party.

AMD has HSA and do NOT need this NVlink interconnect. In fact, this NVlink interconnect can make NV cards mimic HSA in PCs with OpenCL 2.0 framework, so it is a path to HSA for NV. The difference being each CPU core still controls RAM space as opposed to the HSA switch that controls RAM space for AMD HSA implementation.

So, 80GB/sec in 2016. Why not HyperTransport? It has been at 51.2 GB/s aggregated since 2008.

HT 3.1 is 25.6GB/s each direction using 32bits path. But yeah, FOUR of these (128bit path) will do over 100GB/s each direction for a total of over 200GB/s aggregate. But need license from HP. If NVlink is simpler then there might be cost savings on the way. Maybe the complexity or simplicity lies in the receiver/transmitter circuits that might be a major factor in adoption of an interconnect standard.

[quote="NVIDIA today announced that it plans to integrate a high-speed interconnect, called NVIDIA® NVLink™, into its future GPUs"

Yeah......Nvidia ain't sharing.[/quote]This is a mutual effort for both Nvidia and IBM (for their Power chips) to leverage GPU processing for HPC markets. Since Intel is not licensing nor sharing any cpu interconnects, NV cannot do anything for Intel as Intel is obviously looking at their own gpu interests. IBM on the other hand had profited from the NV partnership and if this initiative is going to give them a strong lead, then it is mutually beneficial.

On the HSA side, this NV can be a step in that direction although the speed ought to be many times faster but maybe in later iterations of NVlink. Keeping the PCIe programming model is clever to preserve compatibility of the platform while the underlying hardware changes. It is alway good to see a new inter-connect technology being inplemented as it is rare in the industry that keep improving on what is current rather than innovating more forward.

Something tells me that AMD is going to be left out of the NVLink party.

AMD has basically no presence in the sector Nvidia is aiming at with NVLink. It' not like any AMD GPUs are powering any supercomputers along with IBM's Power CPUs. And I don't think this link will become a standard in the PC/mainstream segment anytime soon. Probably only Apple would dare to build a device with such proprietary hardware since they are in complete control their ecosystem, both HW and SW.

These are partnerships with very specific goals and targets in mind. Like building a better supercomputer. This involves only the two players: Nvidia for the GPU, IBM for everything else and a mixed effort for the NVLink. AMD has bigger worries right now than a niche market in which they've never had a foothold (CPU or GPU).