GPU computing breakthrough? Cloud rendering company claims to run CUDA on non-Nvidia GPUs

This site may earn affiliate commissions from the links on this page. Terms of use.

One of the major differences between AMD and Nvidia is their share of the professional graphics market. Nvidia dominates this space and its profit margins, and while AMD has had some high profile wins with Apple, it hasn’t cut deeply into Nvidia’s market share. Part of the reason Nvidia has a lock on both workstation and high performance computing is CUDA, its programming language for GPU compute. Now one company, Otoy, is claiming to have broken that lock.

Otoy is the owner and developer of Octane Render, a real-time unbiased rendering engine that supports 3D rendering software suites like 3ds Max, Maya, Cinema4D, and Lightwave. It’s also available as its own standalone software suite. It was the first unbiased rendering suite to support GPU-only rendering and a high-profile early win for Nvidia’s CUDA — which is part of why it’s surprising to see the company branching out to support other architectures in this fashion.

Didn’t AMD just announce this?

There’s some timing oddities here that I’m not sure how to explain. Last year, AMD announced its Boltzman Initiative. Part of that initiative is a software layer that allows AMD GPUs to execute CUDA code through the use of a compatibility layer.

AMD’s Boltzmann Initiative

Here’s how VentureBeat describes Otoy’s new compatibility layer: “In a nutshell, Otoy reverse-engineered Nvidia’s general purpose graphics processing unit (GPGPU) software, known as CUDA, to run on non-Nvidia hardware. That means that programs written in the CUDA language are no longer exclusive to Nvidia graphics chips.”

According to Otoy’s CEO, Jules Urbach, the point of developing this CUDA translation layer is so that the company’s high-end Octane Render software can run as easily on AMD GPUs as their Intel counterparts. “We have been able to do this without changing a line of CUDA code, and it runs on AMD chips,” Urbach said. “You can now program once and take CUDA everywhere. AMD has never really been able to provide an alternative.”

AMD’s Boltzmann Initiative would seem to provide the alternative that Urbach is referencing, and it appears to accomplish the same goal. It’s not clear how the two programs differ from each other, though Otoy does mention wanting to run software on a wider variety of platforms, operating systems, and technologies. AMD’s Boltzmann Initiative, of course, is designed solely for AMD’s own GPUs.

As for performance, Urbach states that “It runs on the other cards at the same speed as it runs on Nvidia cards.” But again, that’s something AMD has implied about its own Boltzmann Initiative — when we asked the company how AMD GPUs compared to NV cards running CUDA, the RTG division implied that unless the CUDA code had been hand-optimized for a specific CUDA architecture, it should run as quickly on AMD hardware as on an Nvidia counterpart GPU.

Urbach claims that the long-term goal is to allow CUDA to target Vulkan, DirectX, and OpenGL (along with Android, PS4, and WebGL 3), and that Otoy wants to be able to run CUDA applications on platforms like iOS, where Apple’s Metal is the dominant low-overhead API.

Supposedly Otoy is working on turning Octane Render into a plugin that the UE4 engine can utilize, but Octane Render isn’t used for real-time rendering. Adapting a version of it to work within a game engine would be extremely challenging. It’s not at all clear why Otoy would want to translate native CUDA into many of the APIs that Urbach lists — no games that I’m aware of leverage OpenCL or CUDA for any kind of tasks, and neither AMD nor Nvidia have talked about using either language for this purpose.

We’ve reached out to Octane Render and will update this story if we hear more details. At the very least, it looks like AMD’s push to convert CUDA code into something that can run on more GPUs has caught the attention and imagination of other vendors.

CUDA’s not really a programming language, per se, but an API; something that you can “bolt on” to a language to extend its functionality.

Either way, I’m really interested in this. Since Luxrender has essentially died, my only other practical GPU accelerated render engine available to me is Cycles, which is entirely a CUDA thing on my OS. It’s fast on my 750ti, but my 290x would be much faster. Please don’t let this be vaporware.

No, we do not know what you mean at all. Do you think Dade has disappeared or something?

He doesn’t have time to make posts with marketing nonsense. He and the rest of the team works on luxrender and posts when there’s something worthwhile to be said. Luxrender is still in active development and there should be an update soon.

What are you talking about? Luxrender has not died, it’s still being actively developed and has some specific and unmatched features that others do not offer, even for a large sum of money.

‘your os’ is not a valid complaint. Cycles GPU works excellently for me on a mixture of AMD cards. Partition your drive and use the best OS for the particular software/workload mixture or you will always suffer with substandard performance. That’s how it has always been with Blender in one way or another, it’s just reversed currently with GPU rendering.

You seem to be more interested in ease than performance.

http://ibin.co/2a2s5D70Up13
Just messing around and render a dense particle array in 15 seconds on 1x tahiti and 2x Hawaii without any difficulty at all.
Compare to >$2000 USD worth of NVidia hardware running CUDA for considerably less performance.
Usually, the easy and ‘quick’ way is the slow and expensive way in the long run. That’s never going to change.

Bob Plissken

“It would be extremely challenging”

Octane’s a big guy

Kwuarter

For you.

Bob Plissken

Do AMD even have anything that comes close to Titan? And please don’t say Zen…..

Stallman

Lol watch hitman benchmarcks, r9 390 non X beat titan X xD

Joel Hruska

I don’t know why anyone would say Zen. Zen is a CPU, not a GPU.

AMD’s Fury X is typically within 90-95% of the GTX 980 Ti and equal to or faster than it in a few titles. This depends on game settings and whether or not you benchmark in 4K (AMD GPUs lose less performance moving from 1080p – 4K than GeForce cards in most scenarios and therefore perform better against them in that resolution).

AMD’s Fury X only has 4GB of RAM, which will limit its usefulness in GPGPU applications where Titan X might perform more effectively due to its larger 12GB frame buffer. Neither GPU has much in the way of double-precision floating point. In single-precision floating point where RAM loadout is not a problem, I would expect Fury X to perform competitively with Titan X, though exactly how competitively will again be benchmark and optimization-dependent.

e92m3

the vram is largely filled by the decompressed textures when GPU raytracing. It’s a limitation with many workarounds, since vram has always been an inherent limitation of GPU rendering. You’re going to have to have many millions of vertices if you want to fill even 2 GB of vram with a model alone. Your vram usage is ~80-90% due to textures in almost all normal rendering situations.

current standings on the only relatively reliable point of comparison (latest luxmark) has Fiji and even 40 CU Hawaii outperforming heavily OCed titan x and 980ti consistently. Note the highest performing NVidia card is running at 1559 MHz on air (clock reporting is broken on some platforms, the user provided the actual clock during submission). That is not a production stable clock, the driver would crash if you tried to render a frame sequence.

Add in the fact that NVidia’s OCL implementation is currently broken for anything besides the benchmark (see post further down or luxrender forum) and the end result is Hawaii with 8gb frame buffer (or pro with 16gb) is far and away the best option for reliability, vram and performance. Fiji still seems slightly under-utilized at a driver level, but is the highest performing card. Might take 4 years before we see what it can really do ;)

All the functions are fp32 and lower (or just fp32). Double-precision is irrelevant in this context and all non-scientific visualization afaik. There’s no need for the additional precision unless data accuracy is critical. You need ECC for that kind of mission-critical workload anyways.

Corey

Fury X should bend the Titan X over (with no time for lube) in GPGPU processing. Taking CUDA out of it. I just raw processing 8.6 TFLOPS vs. 6.144 TFLOPS. AMD is still king in raw compute power trouble is these don’t always translate into gaming FPS at least until dx12

Mosab Al-Rawi

Maybe what I’ll write now won’t get approval from many of you. I won’t side with anyone but, this remind with something happened to me four years ago, a company tried to hire me but I asked for salary I found it fair and they thought it’s much. I worked for another company in the same field and I worked on redesigning parts of the factory to fix some incompatibility problems, the first company lost about 200K $ on the same parts that I fixed for the company I work for, now the first company trying to go to the factory made the changes and try to copy the work without a permission from the company paid me for that work.
What is happening now is that nVidia didn’t pay that money from no wear people paid to nVidia to get extra features and the extra money paid turned into 1.323 billion dollars in R&D (2015), now as moral choice anyone think that what nVidia offer worth the extra money can buy nVidia’s products and enjoy what he paid for and the people who think that the price is much can go to competing products but without expecting that they deserve to get advantage of nVidia’s free products like CUDA.
Now if nVidia tried to protect their investment and turn entire CUDA to cloud service and make the access to that service depending on a unique chip identifying code and the customer only get final compiled results, do you think they have the right to do that or not and why?

Joel Hruska

“Now if nVidia tried to protect their investment and turn entire CUDA to cloud service and make the access to that service depending on a unique chip identifying code and the customer only get final compiled results, do you think they have the right to do that or not and why?”

Reverse engineering a product is perfectly legal, provided you do not use proprietary information to do it. Everything would hinge on how this type of translation is provided and whether or not it uses anything NV considers proprietary or copyrighted / patentable. I can’t speculate on that situation without knowing more, and the Google vs. Oracle battle over APIs could also have some impact here.

You are correct that NV developed and positioned CUDA at expense to itself and it could theoretically take legal action to protect that stance. The company has yet to make any announcement that it intends to do so, however.

e92m3

Seems to me they have already played with this idea through their relatively poor OCL support.
Many people (not just me) suspect they intentionally limit their OCL performance relative to CUDA, especially since they have released marketing material with a negative stance on OCL and ‘comparisons’ showing poor relative performance.
Naturally, AMD’s OCL performance was left out of the discussion.

Nvidia’s OCL performance seems to be highly driver dependent and has shown multiple inexplicable regressions. Something funny is going on there imo.

as of Jan 5th:
“Yes, its known that NV compiler is casewise slower that the AMD cl compiler, but now for me 10 min * 3 gpu is unworkable”

NVidia ocl compiler itself is now functionally broken for even the luxrender devs, let alone normal users that can’t understand the kernels involved.

Current workaround is disabling NVidia ocl opts to get a reasonable compile time but also results in significantly slower rendering…. Pretty broken. Nvidia explicitly told the luxrender devs to utilize new nvidia ‘OCL’ instructions many months ago and it has never worked correctly, completely unusable for consistent production rendering BUT was just enough to give the appearance of improved luxmark performance if you don’t care about artifacts or using the actual rendering engine for anything besides luxmark benchmarking.
“….Among other features, it includes some OpenCL optimization suggested by NVIDIA to LuxRender project.”

I’m confused… Is this the same group (really one kernel developer) behind CUDA octane and the first broken revisions of blender cycles?
The same guy that wrote off OCL because he wanted to use giant monolithic kernels, directly against base principals of efficient GPGPU? AMD had to break his giant kernel into micro-kernels to fixed cycles because he was too obtuse to acknowledge his basic misunderstanding of GPGPU.
Of course he thinks CUDA is better, he didn’t learn anything else and expected the kernel compiler to optimize it all for him with OCL. That’s just ignorance.
Until he demonstrates a CUDA-based (or generic recompile) rendering engine using compute shaders that outperforms AMD desktop hardware running OCL for the same functions, he has zero credibility.
There is no inherent disadvantage to OCL compute shaders unless you decide to design software that is against the basic design principals.
A floating-point spherical light map with basic auto paint shaders doesn’t even begin to validate his position because it is not unique functionality nor reaching unique performance levels at all…
~$300 AMD desktop hardware outperforms >$2000 NVidia hardware on these same functions.
The only claim he can honestly make is that CUDA performance is not reliant on any specific NVidia logic blocks, only software compilers…. What the hell does he think the point of OCL is?

“…can run as easily on AMD GPUs as their Intel counterparts.”
eh?
Edit: article fixed.

Richard Krupski

Typo… They meant nVidia.

AlCarn

Why would anyone want to do this? If you want to run CUDA why not just get an NVidia GPU. It seems that anything else would be a compromise at best.

Ext3h

I don’t think you understand.

The announced product (just the same as the HIP tool included in AMDs Boltzmann Initiative), is not targeting end users who have bought CUDA based software product and now want to switch to other hardware. You can’t do that so easily.

It’s targeting the developers who maneuvered themselves into the CUDA dead end, and now wish to reach out to additional customers running OTHER GPGPU type accelerator devices. E.g. AMD or Intel GPUs, AMD APUs, Intels Xenon Phi, and numerous other platforms, even including outside the x86 world.

The alternative would be to rewrite these applications using a different framework, which is a costly approach. So just recompiling your application with such an compatibility layer linked in, is much, much cheaper. And, unless there has been some over-eager tuning towards a specific Nvidia GPU, also not much slower either.

tachyonzero

Nvidia’s lawyers are salivating….

Simon

Is that a good news for the AMD users? : )

This site may earn affiliate commissions from the links on this page. Terms of use.

ExtremeTech Newsletter

Subscribe Today to get the latest ExtremeTech news delivered right to your inbox.

Email

This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our
Terms of Use and
Privacy Policy. You may unsubscribe from the newsletter at any time.