AMD destroys Nvidia at Bitcoin mining, can the gap ever be bridged?

This site may earn affiliate commissions from the links on this page. Terms of use.

If you typically follow GPU performance as it related to gaming but have become curious about Bitcoin mining, you’ve probably noticed and been surprised by the fact that AMD GPUs are the uncontested performance leaders in the market. This is in stark contrast to the PC graphics business, where AMD’s HD 7000 series has been playing a defensive game against Nvidia’s GK104 / GeForce 600 family of products. In Bitcoin mining, the situation is almost completely reversed — the Radeon 7970 is capable of 550MHash/second, while the GTX 680 is roughly 1/5 as fast.

There’s an article at the Bitcoin Wiki that attempts to explain the difference, but the original piece was written in 2010-2011 and hasn’t been updated since. It refers to Fermi and AMD’s VLIW architectures and implies that AMD’s better performance is due to having far more shader cores than the equivalent Nvidia cards. This isn’t quite accurate, and it doesn’t explain why the GTX 680 is actually slower than the GTX 580 at BTC mining, despite having far more cores. This article is going to explain the difference, address whether or not better CUDA miners would dramatically shift the performance delta between AMD and Nvidia, and touch on whether or not Nvidia’s GPGPU performance is generally comparable to AMD’s these days.

Topics not discussed here include:

Bubbles

Investment opportunity

Whether or not ASICs, when they arrive next month, this summer, in the future will destroy the GPU mining market.

These are important questions, but they’re not the focus of this article. We will discuss power efficiency and Mhash/watt to an extent, because these factors have an impact on comparing the mining performance of AMD vs. Nvidia.

The mechanics of mining

Bitcoin mining is a specific implementation of the SHA2-256 algorithm. One of the reasons AMD cards excel at mining is because the company’s GPU’s have a number of features that enhance their integer performance. This is actually something of an oddity; GPU workloads have historically been floating-point heavy because textures are stored in half (FP16) or full (FP32) precision.

The issue is made more confusing by the fact that when Nvidia started pushing CUDA, it emphasized password cracking as a major strength of its cards. It’s true that GeForce GPUs, starting with G80, offered significantly higher cryptographic performance than CPUs — but AMD’s hardware now blows Nvidia’s out of the water.

The first reason AMD cards outperform their Nvidia counterparts in BTC mining (and the current Bitcoin entry does cover this) is because the SHA-256 algorithm utilizes a 32-bit integer right rotate operation. This means that the integer value is shifted (explanation here), but the missing bits are then re-attached to the value. In a right rotation, bits that fall off the right are reattached at the left. AMD GPUs can do this operation in a single step. Prior to the launch of the GTX Titan, Nvidia GPUs required three steps — two shifts and an add.

We say “prior to Titan,” because one of the features Nvidia introduced with Compute Capability 3.5 (only supported on the GTX Titan and the Tesla K20/K20X) is a funnel shifter. The funnel shifter can combine operations, shrinking the 3-cycle penalty Nvidia significantly. We’ll look at how much performance improves momentarily, because this isn’t GK110’s only improvement over GK104. GK110 is also capable of up to 64 32-bit integer shifts per SMX (Titan has 14 SMX’s). GK104, in contrast, could only handle 32 integer shifts per SMX, and had just eight SMX blocks.

AMD plays things close to the chest when it comes to Graphics Core Next’s (GCN) 32-bit integer capabilities, but the company has confirmed that GCN executes INT32 code at the same rate as double-precision floating point. This implies a theoretical peak int32 dispatch rate of 64 per clock per CU — double GK104’s base rate. AMD’s other advantage, however, is the sheer number of Compute Units (CUs) that make up one GPU. The Titan, as we’ve said, has 14 SMX’s, compared to the HD 7970’s 32 CU’s. Compute Unit / SMX’s may be far more important than the total number of cores in these contexts.

Tagged In

Nvidia Will not be filling the gap anytime soon and will instead focus on cuda, as Amd plans to further increase their performance in calculations of SHA-256 in the future.

DiabloD3

Nvidia is a founding member of the OpenCL working group at Khronos (the other three being Apple, AMD, and Intel). Their compute shader compiler has two front ends, but a singular backend, and accepts either CUDA or OpenCL and both perform equally.

Integer performance has nothing to do with the driver stack, and all to do with the hardware. CUDA has integers just as OpenCL does, and anything you can write in OpenCL you can write in CUDA and vice versa.

Terence M

If the two compute shaders CUDA, and OpenCl perform “equally” then why are Nvidia’s line of cards so poor at almost any Cl calculation. Would this mean Cuda is a poor performer on Nvidia Cards and Amd could excel in that function as well, if the Cuda “function” was adopted by Amd.

——–

Please respond If I misunderstood what you were stating.

DiabloD3

That is usually because whoever coded the OpenCL version did not understand how OpenCL worked and tried to just blindly transform the CUDA code into OpenCL code. Optimized versions of both should run equally. If you find a case where this is not true, that is indicative of a driver bug.

DiabloD3

DiabloMiner author here. Nvidia keeps claiming they produce a product for GPGPU compute, yet they keep failing on integer performance. Bitcoin is not the only use of integers out there, and its not even limited to crypto research either.

There is zero reason for Nvidia to have made this fundamental mistake generation after generation. This is why I don’t support their product, its too slow to be useful and Nvidia doesn’t seem to care. I repeatedly tried to reach out to that company, and I never got a response.

Although, now, it is too late for Nvidia to ride the BItcoin train, ASICs are now coming in and making all GPUs (including the highest performance Radeons) obsolete.

http://geek.com/ sal cangeloso

Thanks for stopping by. Many of use had tried Diablo at one point or another.

I’m not shocked that the company didn’t reply to you. This type of thing is something I could see a large company looking at as just a fraction of a fraction of their business, in other words, easy to ignore. The problems arise when a niche’s complaint is indicative of problems in other areas or problems down the road.

Have you tested any ASICs? Keep reading about them but hear about many that are actually in use.

Thanks for dropping by. While I recognize that ASICs are arriving, and ultimately make BTC mining on the GPU obsolete, I’ve stopped paying attention to them until shipping hardware arrives *in volume.*

I think the bigger problem for anyone considering doing some BTC mining is the current price volatility. Making your money back on any investment is an open question.

With that said: Do you agree that the problem is likely related to Int32 instruction rates per SMX? That’s the explanation that’s “newer” here compared to the funnel shifter in Titan, which is a known quantity.

DiabloD3

Avalon has already shipped 900 68ghash/sec units, ASICMINER privately owns a 65thash/sec farm (helping them fund further operations and have real world extreme testing of their designs) and is preparing to sell their upcoming 200thash/sec batch publicly.

The current network performance is about 70thash/sec and was about 25 before ASICs came online. I think its safe to say they’ve already come.

Yes, integer instruction rates in Nvidia are horrendous, they seem be as slow or slower than double precision math, but on Radeons I can issue a single cycle integer up every clock cycle on every ALU (VLIW5 has 4 + limited 5th, VLIW4 has 4, GCN has 4 quad width SIMD ALUs plus 4 single width ALUs (and the driver/hardware manages ALU usage across multiple work items to maintain optimal instruction level parallelization).

What also gives Radeons the leg up is they can do certain things SHA256 requires that would normally take 2-3 cycles in a single cycle, such as bitselect takes a single cycle as does rotate, Nvidia seems to be slower at these than simple integer ops (add, xor, etc).

If Nvidia was serious about Bitcoin mining, they’d make ZR25, ZR16, ZR26, ZR30, ZMa, and ZCh single cycle instructions, and they’d also make integers as fast as single precision ops instead of as slow as double precision ops. If they did this, they could possibly give current generation ASIC miners a run for their money.

And don’t bother making high integer performance a Quadro/Tesla only feature, make it part of consumer Geforces too. The reason people buy consumer Radeons over Quadros/Teslas for double precision math use cases is because consumer Radeons beat Quadros/Teslas at both per watt and per dollar.

tl:dr; Nvidia, stop screwing customers and you’ll make more money.

I’m willing to work with Nvidia on this to compete with existing and next generation solutions if they’re interested. They just have to email me.

Joel Hruska

Diablo,

I should clarify. ASICs aren’t something that “regular people” can buy in any reliable quantity at this point. And projects like this one: http://www.progressivebtcmining.com/ Have yet to get off the ground. Butterfly Labs advertises a 50GH/s miner for $2499, with a 5GH/s box for $249 — but no firm ship date. If I thought they were coming within 2 months, I’d probably buy in.

Let me ask you this: Given what we know about GK104 / GK110, do you think it’s possible to significantly improve current NV performance through kernel optimization?

DiabloD3

ASICs *are* something regular people can buy, just wait until ASICMINER opens sales for their 200thash batch. I will agree with you that BFL does not look like a reliable vendor, however.

As for NV fixes: nope. This is purely a hardware problem, Nvidia is going to have to fix this themselves, and I wish they would.

Joel Hruska

It’s rather ironic. AMD beats NV in many of the GPGPU benchmarks in an area of computing Nvidia pioneered — but that fact has generally slid by the wayside.

DiabloD3

Its called marketing. People who actually like getting things done every day have learned to ignore it, but most people still listen to it. Its why Intel still sells more CPUs than AMD even though Intel CPUs tend to be be slower per dollar.

If Nvidia’s fault is refusing to keep up technologically, AMD’s fault is not enough marketing.

Joel Hruska

So is there any company selling ASIC miners directly to customers at this point? (Meaning — not pre-orders).

DiabloD3

ASICMiner will not be doing pre-orders and will have sales open possibly next month.

Terence M

In my opinion, Asic miners will be the end of mining for “regular people” . After their release, the difficulty will skyrocket to a point that only benefits Asic developers and would “monopolize” if you will, bitcoin mining. Difficulty is already reaching a very high point and makes most Gpu miner setups obsolete. Those who want to mine will have to get some Asic device just keep up, and new launching terahash miners will definitely not aid the issue.

Joel Hruska

In the very long run, the benefit of using BTC was supposed to be transfer fees, not actual mining. But the long-term profitability depends on the price of BTC.

At $90 per BTC, a huge host of products are profitable that weren’t at $5. At least, in the short-term.

Joel Hruska

Diablo,

Would you mind dropping me an email? My address is listed above if you click on my name at the top of an article. I’d like to ask you a couple thing.s

That url you linked to is a pass through security for shares in the company. ASICMiner has not yet announced how they are gong to handle sales, although it seems that it is going to be ran through an auction-like format and let the market set the prices directly.

Unrelated question for you. How much optimization work could theoretically be done to squeeze more performance out of AMD cards at this juncture? I ask because it seems to me as though performance gains have plateaued. I remember in 2011, when switching from poclbm to phatk was a huge performance gain of 50-75MHash on my hardware.

Now, the benefits seem fractional — but it also seems like not much has been done in the way of new GPU clients. Poclbm and Diakgcn haven’t been updated in awhile (as far as I know — not with new performance capabilities, anyway).

Is there any fruit left on the optimization tree?

DiabloD3

Those kernels aren’t worth using, really. Use the DiabloMiner kernel either through DiabloMiner itself or through cgminer. Thats about as optimum as you’re going to get, AMD will have to improve their compiler to get anymore and you’re looking at 1-2% at the very most.

Joel Hruska

Diablo,

I’ve been testing the Diablo kernel, as released through 50Miner. Looks like it does deliver a modest speed-up. Very nice.

Henry Young

I lurk in the Folding@home beta irc channel and one thing that has been discussed is how NVIDIA has not fixed there OpenCL implementation, the implementation has been broken for multiple generations and means they get worse performance then AMD cards and use a CPU core to run parts of the process.

NVIDIA seam to have no interest in fixing GPGPU on there cards and will likely keep shipping cards with bad performance.

DiabloD3

The problem isn’t just limited to the OpenCL drivers. Nvidia drivers as a whole are very shoddy on both Windows and Linux (its more evident on Linux). Nvidia has no interest in the GPU market at all, really, and I find that somewhat ironic since no one is particularly interested in their mobile/ARM products.

More and more next gen games are picking up OpenCL to offload physics and other non-graphics tasks and Nvidia is going to be left behind if they don’t fix their drivers.

I don’t want to see them go under, but this is why the past 4 generations of cards I’ve bought have been all AMD: AMD at least treats me right as a customer. Benchmarks don’t mean anything to me if their driver stack is a failure.

Joel Hruska

AMD is far more likely to go under than Nvidia, sadly enough.

Car Audio-Outlet

“Nvidia drivers as a whole are very shoddy on both Windows and Linux “
I have to say that from years of using them, Nvidia’s Windows drivers are better than most other GPU vendors…

http://www.facebook.com/andrew.s.hodge Andrew Steven Hodge

The problem for AMD isn’t actual performance, but the implementation. I am not a programmer, but I do know a couple, and from what I’ve heard from them and read online, there are several issues. The first and probably biggest for the HPC space is the proliferation of CUDA prior to OpenCL. Most pre existing software is already written in CUDA, making porting harder than it should be. Second, and this is old, so it may be fixed by now, is that the OCL drivers on linux are unstable. This is something I have actually seen with luxrender and blender. My OCL version of Luxrender was, until recently, unstable, with the blender cycles render engine not even working with AMD hardware due to what the developers are referring to as “driver limitations”. Another issue is, at least from what I’ve heard, that OpenCL is harder and more complex than Cuda, although that has never stopped anyone before.

DiabloD3

CUDA does represent the bulk of legacy code due to Nvidia directly marketing to the University research market segment, however, very little brand new code is being written for CUDA as people don’t want to be locked into any single vendor.

http://www.facebook.com/rolex.wearer Rolex Waro

adobe recently dropped﻿ cuda support from nvi and adopted amd
gpugu,which they claim will be 20% faster than nvi cuda…see the news
from tom’s hardware or xbitlabs,etc

DiabloD3

Basically they switched from CUDA to OpenCL, and the OpenCL code will run fine on AMD, Nvidia, and Intel. It is an easy switch if you know both.

tgrech

Does this mean a HD7870Tahiti LE will do much better than a standard HD7870OC, and do these results carry over well for Litecoin?

The 7870 OC has 20 CUs at 1GHz. So I would expect Tahiti LE to be faster, yes. How *much* faster? Not that much. The 7790 is currently the best price/performance ratio.

aufdenschlips

actually I get 520 MHash/sec with my Club 3D 7870 XT Joker card. So price/performance wise I think that´s not too bad ;)

Joel Hruska

Screenshots and configuration details, please.

aufdenschlips

1222/750 @988 mV

case open

two additional case fans blasting right at the GPU

temp 167 F

stable now for days

pool reports sometimes as high as 580/sec

average 0.0325 BTC/day

Joel Hruska

I mean, which miner and what software config details? Vector settings, aggression, etc.

aufdenschlips

just simple guiminer, opencl, -v -w 256

Joel Hruska

Interesting. I haven’t wanted to push my 7950 up that high. That’s a solid rate.

aufdenschlips

I think it would go even higher. Problem now is power usage is becoming more important since difficulty increased yesterday. Instead of 0.035 I know only get 0.0265/24h
I would gladly buy a BFL single if they were shipping them.

But imho it´s not rational to sell golden gooses unless you already have your hands on the next gen.

Joel Hruska

At $90 per BTC, power is irrelevant. You can mine on NV hardware and make money.

Granted, you aren’t making much — about $1.82 per day. I wouldn’t recommend buying into GPUs at this stage — but if you were going to buy a GPU anyway, I see this as a way to make back part of the cost.

aufdenschlips

btw, unless you are constantly monitoring how many blocks get found per shift stay away from PPLNS.

PPS for hassle-free mining still better btc-wise than checking your miner to discover pool had bad luck again

tried deepbit first, shady imho, or just for more advanced miners

went to another pool with stratum servers because of long pull errors and idle miner

Joel Detrow

As far as I know, AMD’s latest cards were physically optimized for OpenCL, whereas I believe Nvidia’s cards have added it through drivers, it was really more of an afterthought which came about when AMD’s cards demonstrated such a massive advantage in this front. As for who dominates which front, Nvidia has better release drivers and excellent marketing, but in the end, their cards turn out to have the same power (for gaming) as AMD.

All that isn’t quite so relevant anymore, though, because Bitcoin mining has gotten to the point where at least one company has already begun producing custom processors specifically for mining – Butterfly Labs is one (the only?) example.

sukebe

So, I suppose you have to spend money to make money?

Joel Hruska

Always.

Joel Detrow

As the other Joel said, yes, but this was the case anyway. All that has changed is what you have to spend money on to make money, and even then, if one is to jump into mining, they’ll have to do it now, because soon mining will only be worth it for those who already own the hardware to mine with.

John Manglaviti

Google Litecoin.

DiabloD3

Litecoin is largely a failure. Although they use scrypt and I applaud that, they do not use it correctly. Litecoin should have been impossible to run on GPUs, yet most Litecoin miners mine on GPUs.

LCharlles

But ASICs are difficult to implement in LTC, is already half full of coins mined and the difficulty is keeping while SHA256 currencies has only increased.
There are some coins that are resistant to GPU, being created

Sasori

This article is either complete rubbish or it’s been hacked. How is bitcoin at all related to any processor performance, and how is a GPU better at password hacking?

Joel Hruska

1) Bitcoin mining is related to processor performance because the hashing algorithms used to generate and validate BTC can be run on a variety of architectures. In the beginning, hashing was done on the CPU. Now, in order to be competitive, you need at least a GPU.

2) A GPU is better than a CPU for password cracking if the relevant algorithm (SHA2-256 in this case) can be effectively parallelized. AMD’s GPUs contain multiple elements that improve SHA-256 hashing on these cards.

How can a GPU be faster than a CPU? That’s easy. The highest-end Intel Xeons can dispatch 4 int32 instructions per core. With eight cores, that’s 32 instructions per clock.

A top-end Radeon 7970 can execute 64 int32 instructions per CU and carries 32 CUs. That’s 2048 int32 instructions per clock. Yes, the x86 CPU is running 3x faster than the Radeon 7970, but the Radeon 7970 is executing 64x as many instructions.

That’s a no-brainer win for the graphics card.

Joel Hruska

1) Bitcoin mining is related to processor performance because the hashing algorithms used to generate and validate BTC can be run on a variety of architectures. In the beginning, hashing was done on the CPU. Now, in order to be competitive, you need at least a GPU.

2) A GPU is better than a CPU for password cracking if the relevant algorithm (SHA2-256 in this case) can be effectively parallelized. AMD’s GPUs contain multiple elements that improve SHA-256 hashing on these cards.

How can a GPU be faster than a CPU? That’s easy. The highest-end Intel Xeons can dispatch 4 int32 instructions per core. With eight cores, that’s 32 instructions per clock.

A top-end Radeon 7970 can execute 64 int32 instructions per CU and carries 32 CUs. That’s 2048 int32 instructions per clock. Yes, the x86 CPU is running 3x faster than the Radeon 7970, but the Radeon 7970 is executing 64x as many instructions.

That’s a no-brainer win for the graphics card.

sukebe

Thanks for making me less ignorant. I have a mere gtx 560.I never got into Radeon because they had crap drivers back in the day. Don’t know about now.

cvhoon

I sure regret buying a stupid nvidia card. :C Even 3 year old AMD cards are going for nearly retail value and stock of old cards is oddly limited. bitcoin mining has grown demand for AMD cards. It would be nice if the mining softwares supported nvidia’s opencl libaries though.

http://www.facebook.com/profile.php?id=1668429833 FireFox Bancroft

LOL Titan isn’t going to help in the bitcoin mining department. The price point of the cards vs. the rate at which they mine bitcoin doesn’t even break even. nVidia fanboys need to man up and build an AMD rig for mining.

http://profiles.google.com/marineuac Marine Uac

rpcminercuda is a million times slower then the real cudaminer out there…

posilepton

This article is completely redundant. There is absolutely no use of GPU mining because of ASIC miners. The difficulty will skyrocket and GPUs will have no chance of retaining their pool share.

Glum Shni

5870 is probably the best card for the job. 4 of them = 1.5GHash at a very good power consumption. I wouldn’t though the 7000 series for bitcoin mining, but there’s supposed to be a good one in the 6000 series. it’s too bad about butterfly labs… I was really hoping they would deliver…

Pat

This article is wrong. GPU’s are used for scrypt mining. Bitcoin is mined using the SHA-256 algorithym but that has nothing to do with GPU’s. That takes place in the CPU. Please do research before posting misleading articles.

VLSI Engineering

Before companies were building ASIC and programming FPGA’s, bitcoin was mined using GPU’s, and at the time this article was written, ASIC’s designed for SHA-256 calculation were not the norm. This is in fact a highly well written and illuminating article, and Joel Hruska has been one of my favorite technical writers for many years now.

Russell Schiwal

Considering the fact that Bitcoin mining is an exceedingly wasteful way to heat your home while attempting to cash in on a global ponzi scheme, I do not consider this article a ringing endorsement.
They make incredible cards, but the fact is that the makers of the software I use prefer to program in CUDA, so I’m sticking with NVidia.

Naoma Dorazio

My children were searching for CT Sales & Use Tax Resale Cerfiticate this month and were told about a company with a lot of fillable forms . If you want CT Sales & Use Tax Resale Cerfiticate too , here’s http://goo.gl/XJvY31

This site may earn affiliate commissions from the links on this page. Terms of use.

ExtremeTech Newsletter

Subscribe Today to get the latest ExtremeTech news delivered right to your inbox.