This site may earn affiliate commissions from the links on this page. Terms of use.

For nearly a decade, the HPC (high-performance computing) market has been divided into two camps: CUDA and OpenCL. CUDA, of course, is Nvidia’s proprietary standard. It was first out of the gate in 2007 and its competitor, OpenCL, wouldn’t reach 1.0 status until just over two years later, in 2009. Unlike CUDA, OpenCL is supported by a number of companies, including Intel, Imagination Technologies, AMD, Qualcomm, and ARM.

Despite the above potential advantage, Nvidia has held the lion’s share of the HPC and supercomputing markets. According to the recent Top500 list, AMD’s GCN is used in three systems, compared with 66 systems for Nvidia’s Fermi + Kepler architectures and 28 systems that use Xeon Phi. There are also four hybrid systems that use both NV and Xeon Phi.

AMD’s Boltzmann Initiative is meant to change the status quo by offering developers and researchers a much-needed software stack that should boost the company’s competitiveness in the HPC market. AMD’s competitive weakness in HPC and scientific computing has never been about hardware — GCN’s raw compute performance, at least in certain types of problems, was far better than Nvidia’s Fermi or Kepler cards. (Maxwell has not been positioned as an HPC solution.) Nvidia, however, poured huge amounts of money into developing its CUDA ecosystem, including a great deal of support for HPC developers and scientific research.

Click to enlarge

Here’s Boltzmann at a high level. The goal is to improve the workloads where AMD can compete effectively, offer better tools for evaluating performance, improve Linux support (including a new 64-bit driver for headless Linux), and to allow implementation of a new HSA (heterogeneous system architecture) extension, HSA+. This last item won’t be folded into the larger HSA standard — it’s an AMD-specific extension meant to allow for a greater range of HSA features when used with discrete GPUs. This will also allow supported GPUs to “see” GPU and CPU memory as a unified space.

The major announcement today, however, concerns a new HSA compiler and AMD’s heterogeneous-compute interface for portability, or HIP.

The new HSA compiler (HCC) can compile for both CPUs and GPUs and leverages existing ecosystems built around Clang and LLVM as well as HSA itself. The goal is to allow developers to create CPU and GPU code in a single language and source file. OpenCL, even in v 2.0, requires separate source for GPU kernels — HCC eliminates this bottleneck. The goal is to provide an ecosystem that developers can target and use more easily — something Nvidia did with CUDA. It should be easier for developers to optimize code for parallel execution. The new compiler will also include support for GCN-specific features, like asynchronous compute and GCN’s cache structure.

These types of features can bring AMD’s capabilities more in line with Nvidia’s, but that’s not sufficient to meaningfully dent the market. That’s where HIP comes in.

Hipify Tools: Translating CUDA source to run on AMD GPUs

As AMD describes it, HIP accomplishes several goals. It allows developers accustomed to Nvidia’s CUDA to develop using similar syntax. It includes a new toolset (Hipify Tools) that can convert CUDA code to HIP code. And once code is written in HIP (whether converted from CUDA or written that way initially) it can be compiled to target either Nvidia or AMD GPUs — using either Nvidia CUDA compiler (NVCC) or AMD’s HCC.

Just to be clear, Hipify Tools doesn’t run CUDA applications on AMD chips. Instead, it performs a source-to-source translation that’s meant to make it easy for developers to target either architecture. We asked AMD what the typical performance hit looked like for performing this task, and the company told us that in general-use cases, the performance hit is effectively zero. If a developer has specifically targeted a specific NV architecture with a great deal of optimization for each GPU, then it would take more time to optimize the same cases for GCN — but that the code would work out of the box, even in those cases.

In short, developers who are curious about FirePro and GCN performance and want to take their code for a test driver should find it much easier to do so. Out-of-the-box compatibility is useful for testing use-cases, even if it takes some additional optimization to bring performance up. Anandtech put together a useful image that showcases what HIP and HCC can do:

Unlike what’s been reported by some publications, AMD does not execute CUDA on GCN, CUDA applications are not analyzed or reverse-engineered, and AMD is not compiling these applications into OpenCL. The point of HIP is to allow for a vendor-neutral approach that targets either NVCC or HCC.

A huge step forward

AMD still has much to do to establish a place for itself in the HPC market, and it’s not clear how quickly Hipify Tools will adapt to newer versions of CUDA. AMD has told us that any interested developer will be able to sign up for the program beginning in Q1 2016, but that the tools will require a FirePro card to run. That’s unfortunate, because it limits the available audience for the card to developers who either own FirePros or are willing to fork over serious scratch to develop with them. AMD might have been better served by opening the software to consumer hardware.

Then again, this is just the beta, and it’s possible that AMD will expand compatibility in the longer term.

Either way, this project has the potential to reinvent AMD’s approach to the HPC space. While that market is small in absolute terms, it’s far more lucrative. It also gives AMD a seat at the table and the option to fight alongside Intel and Nvidia for market access to supercomputing over the long term. Overall, if AMD keeps attention on the product, it could help developers take much better advantage of the company’s hardware in a wide array of software environments.

Tagged In

The Xeon Phi is the future. Intel can do a lot more than what they delivered so far.

Ext3h

This may sound odd, but using the new compiler, and the HSA+ features, and the new headless mode, FirePro cards actually don’t differ so much from the Xeon Phi from a programmers view.

In fact, they now support the same C++ language extensions which allow to write code for the accelerator device in-line, with full transparent access to the hosts memory and the ability to execute arbitrary C++ code on the GPU/Phi.

Actually, all of Intel, Nvidia and AMD now support the same feature set and interface, you don’t need CUDA or OpenCL for any of these platforms any more. The compiler suites ICC (Intel), HCC (AMD) and NVCC (Nvidia) look mostly compatible. All of them support C++17 with the parallel STL extension and OpenMP 4.0.

Using the HIP library, more delicate fine tuning or explicit memory management is possible, but it’s not actually required to access the GPU any more.

For most use cases, you are better off writing semantic clean and comprehensible C++ code making use of the classic and the parallel STL, and let the compiler do the optimization.

John Pombrio

Optane will be out next year. I wonder how much impact that will have on, err, everything related to CPU, graphics, and memory?

Chris MacDonald

I thought this too when it first came out, an x86 based compute unit… But apparently it’s not that easy to get (3d rendering) software working on it.

John Pombrio

AMD seems to be doing a lot more splashy media conferences lately than actual products or software. No dates on when this “solution” (or the new red Catalyst or Zen or new HBM2 chips) will come out, who is on the team, how much of it is already done, or how on earth AMD will pay for all this new stuff they keep announcing. AMD’s R&D budget is rapidly shrinking every year and the graphics and APU/CPU divisions are the ones that are losing money. Why should AMD spend more R&D money on graphics and APUs if they cannot compete, are losing market share, are not profitable, and are a drain on the company? Unless this is all for show and to create FUD to help AMD look less incompetent than it really is.

SonyAD

Now tell us how you really feel about AMD.

Reginald Peebottom

Ok hold on.

First, When you say it’s amd’s graphics and apu/cpu divisions are the ones that are losing money – isn’t that basically the whole company? You seem to be implying there are these other divisions that aren’t losing money that actually mean something meaningful to the bottom line.

You point out shrinking R&D budgets – a bad thing, I assume. Then you go on to complain about them spending R&D money on those divisions which they cannot compete in anyway so it’s just all for show and create FUD. And that this is to cover their incompetence.

I’m assuming the bit about R&D was sarcasm.

That said about sarcasm, I take issue with the incompetence jab. The fact that AMD has had a very hard time against one of the most successful companies in history (Intel), tech industry or otherwise, is hardly an indicator of incompetence.

AMD has always been the underdog in the CPU world and even when it had superior chips like the Athlon series it got ground down by Intel dirty tricks and deep pockets.

ATI as it was prior to acquisition had a bit of a checkered past and AMD made a good tactical move in acquiring it. Unfortunately, again Intel’s overwhelming dominance and momentum has rendered that move largely moot – discrete graphics cards are not the big money makers they once were and the market was split with nVidia. Intergrated graphics were and are the main graphics systems of choice and the market was not prepared to go AMD when Intel offered a more competitive package overall even if Intel graphics were inferior. Of course, Intel has steadily closed that gap in graphics performance and features.

The only area that I think AMD can be damned on is its inexplicable failure to go into the mobile arena sooner either in conjunction with or in competition to ARM. Intel made the same mistake but can afford that error. AMD can’t and they are trying their damn best now to generate media buzz and investor value in the hope of wooing a deep pocketed suitor. That’s good business, if you ask me, and playing the hand you’ve got as best as AMD can.

Joel Hruska

“No dates on when this “solution” (or the new red Catalyst or Zen or new HBM2 chips)”

I think they are rebuilding their entiree ecosystem… which is good as the old one was sh1t… it takes a while to do that you know.

Busybee

Just that Nvidia’s CUDA became too well “established” in the HPC arena (since the early GPGPU push by Nvidia), thus most other APIs are often overlooked and/or ignored…

exjohn

Problem is nobody wants to be locked in, and being presented with a way out most will take it… look at intels offerings. Intel has had a presence in accellerator world for only 3 years or less … yet they already have 30% of the market and apparently even more new supercomputers are intel, based on knl. So we see the markets apparent reliance on cuda has already evaporated and continues doing so. Nvidia hasn’t had any good supercomputer cards since kepler and that chip is already ràped by both amd and intel…

Busybee

Intel’s quick ascendancy is due to ease of programming with their Xeon Phi series products, which can be treated like many core CPUs. And that will accelerate once Intel’s Knights Landing arrive as it can become a CPU by itself, rather than just being another accelerator card. AMD’s FirePro is mainly a GPGPU accelerator, just like Nvidia’s Tesla, thus the main problem is programming for the hardware itself. Since most of the early GPGPU coding was done using Nvidia’s CUDA and that trend continues to be popular nowadays, thus not surprisingly AMD finally had to follow and “adopt” it as well…

exjohn

The fight will be between x86 and gpgpu… a fight that at this point in time only one company can win at: intel. Nvidia will lose addopters faster then it can breath. I suspect in 5 years this entire industry will be dominated by intel. The only other company having a shot(slightly) at this is amd with its tiny carrizo cores. Those would be about 5mm^2 on 14nm. This means they can easily fit close to 100 such cores on a knl type built. Preserving its high ipc(relativelly speaking) such a built would be insanelly powerful. And they’d be able to run it 2ghz with 3ghz boost on individual cores if need be. And these cores would not be more like vector units as the knl cores are but true multipurpose cores.

Also hcc makes it easy to compile for both nvidia and amd from c++… i think that is something even nvidia should be thankful for, as it gives the gpgpu programmers the same or very simmilar tools to those working on native x86, this is a big plus for them and levels the playing field to some extent.

Busybee

Intel chose low power (mobile) cores for their Knights Landing, from http://www.anandtech.com/show/9436/quick-note-intel-knights-landing-xeon-phi-omnipath-100-isc-2015 quote “Intel unveiled a number of details about Knights Landing at last year’s ISC, where it was announced that the second-generation Xeon Phi would be based on Intel’s Silvermont cores (replacing the P54C cores in Knights Corner) and built on Intel’s 14nm process”. Likewise, AMD should be looking at low power “cat” cores (ie. Bobcat, Jaguar and Puma) such as those used in Sony’s PlayStation 4, which are much smaller than Bulldozer-derived architectures such as Excavator cores used in Carrizo SoCs, example http://www.realworldtech.com/jaguar/ quote “Jaguar is AMD’s first 28nm processor. It is a compact 3.1mm2 core that targets 2-25W devices, in particular tablets, microservers, and consoles”. Unfortunately due to AMD’s management shortsightedness did not utilize that existing technology and instead dabbled with ARM-based cores where products like Seattle-based Opteron A1100 is still in “limbo” (ie. still under “development” status, with no real world products outside of development systems and boards). If AMD had used their experience in console chips to produce an octa core microserver chip then Seattle project would be un-necessary. AMD’s server chips using “cat” cores had been used in real world products like HP’s Moonshot microserver http://www8.hp.com/us/en/products/proliant-servers/product-detail.html?oid=6488207#!tab=specs quote “AMD Opteron™ X2150 APU, 1.5GHz, (4) x86 cores,”. Furthermore AMD could have also progressively move to many core architecture like Intel’s Xeon Phis as well…

Yash

Source to Source translation! This just proves Nvidia’s superiority in the market where companies provide tools to make CUDA work on their hardware. AMD has lost the battle with OpenCL, HSA, and with HiP…it will be another disaster in the making. I believe in products which are launched with a download link, and not just words. AMD is fighting a battle which it will lose soon. The departure of Jim Keller and Phil Rogers has left a hole for them which they cannot fill.

What you provide are links for HCC. True that there are constructs for HiP in there. But, is it as clean as visiting an NVIDIA page for CUDA SDK or cuBLAS, or cuDNN and getting all you want with the documentation. This link works for an advanced systems developer, but for a computational physicist it is not what they are expecting.

Nvidia is very prompt on their support for their libraries and tools. And I do not remember them screaming about products before they were ready. AMD has thrown another media announcement, which has no backing. Even though the Radeon product line is great, AMD fails to deliver on software support.

The Watson

Should have done this a few years ago, to be up and ready now. AMD seems scatter brained or suffering from ADD. I would think a Hydra like idea would be better. Allow either code to run and simultaneously if hardware allows. Guess that would cost, due to ip.
They start many projects, but don’t have Google style cash flow….

Nv
put another nail in their coffin with me. Use a 560Ti and I can’t participate in the GeForce Experience! My 610,a 5xx rebadge is! AMD could also grease the dev wheel, it worked for Nvidia. Still holding my breath for a AM3+ refresh.

Sean Hughes

Its funny how every one loves bashing AMD, but lets be realistic, they at least keep it competitive. Intel gouges the hell out of you for their products. not everyone can afford intel’s price tags on their cpu’s. when in a lot of ways the performance to price comparison is just not worth it. I will admit their server cpu lineup is lagging, but their desktop cpu’s still hang in there. Everyone always talks like they would love to see amd fail and close the doors. But with no one to compete in the consumer market with intel I could see the prices for their cpu’s to increase. Just some food for thought

This site may earn affiliate commissions from the links on this page. Terms of use.

ExtremeTech Newsletter

Subscribe Today to get the latest ExtremeTech news delivered right to your inbox.