Earlier this week we took a look at the GeForce GTX Titan X, NVIDIA’s first product to use their new high-end Maxwell GPU, the GM200. Now just 2 days later the company is back again with GM200 and is set to launch it in their new professional graphics counterpart, the Quadro M6000.

Like Titan, 6000 is NVIDIA’s flagship Quadro card and today’s launch sees the new GM200 based Quadro M6000 take its place at the top of the Quadro graphics stack. What makes this launch interesting is that NVIDIA has never launched a flagship Quadro card so close to a flagship GeForce card in this manner. Quadro cards usually launch months down the line, not days. The end result being that professional users are getting much earlier access to NVIDIA’s best hardware.

NVIDIA Quadro Specification Comparison

M6000

K6000

K5200

6000

CUDA Cores

3072

2880

2304

448

Texture Units

192

240

192

56

ROPs

96

48

32

48

Core Clock

N/A

900MHz

650MHz

574MHz

Boost Clock

~1140MHz

N/A

N/A

N/A

Memory Clock

6.6GHz GDDR5

6GHz GDDR5

6GHz GDDR5

3GHz GDDR5

Memory Bus Width

384-bit

384-bit

256-bit

384-bit

VRAM

12GB

12GB

8GB

6GB

FP64

1/32 FP32

1/3 FP32

1/3 FP32

1/2 FP32

TDP

250W

225W

150W

204W

GPU

GM200

GK110

GK110

GF110

Architecture

Maxwell 2

Kepler

Kepler

Fermi

Transistor Count

8B

7.1B

7.1B

3B

Manufacturing Process

TSMC 28nm

TSMC 28nm

TSMC 28nm

TSMC 40nm

So just what is Quadro M6000? Packing a fully enabled GPU, this is GM200 at its best. All 3072 CUDA cores are enabled, and with a maximum clockspeed of 1.14GHz the card is capable of pushing 7 TFLOPs of single precision performance. Coupled with the card is GM200’s double-sized ROP clusters, giving M6000 96 ROPs and better than 2x the pixel throughput of the outgoing K6000.

Meanwhile it’s interesting to note that NVIDIA’s GPU Boost technology has finally come to the Quadro lineup via the M6000. The M6000 supports 10 different boost states, the fastest of which is the 1.14GHz state that gives the card its 7 TFLOPS of performance. As with GeForce and Tesla cards, GPU Boost allows NVIDIA to raise their shipping clockspeeds for better performance without violating the card’s cooling or power delivery restrictions.

Paired with the GM200 is 12GB of GDDR5 memory, which is as much as the K6000 and still the most one can pack on a memory bus of this size. M6000 clocks its memory at 6.6GHz, which is good for 317GB/sec of memory bandwidth. Furthermore, as with past high-end Quadro cards ECC protection is available for the memory (and only the memory, no cache), which trades off some memory bandwidth for better protection against memory errors.

On the overall performance front, Quadro M6000 is expected to offer a significant performance boost over K6000, similar to what we’ve seen on the consumer side with GTX Titan X. Along with the greater clockspeed and the slight increase in the number of CUDA cores, M6000 brings with it the Maxwell 2 family architecture and its efficiency improvements. Actual performance will depend on the application, but 50% or more is possible, especally in exotic scenarios that stress the ROPs. To that end NVIDIA gave Lucasfilm some of the first M6000 cards, and they reported a better than expected performance increase:

Along with Maxwell 2’s architectural efficiency improvements, Maxwell 2 also brings with it a series of feature improvements that make their debut in the Quadro family on the M6000. On the display side, M6000 is the first Quadro capable of driving four 4K displays (previous gen Quadros were limited to two such displays) thanks to the updated display controller. Meanwhile Quadro also gains the latest NVENC video encoder, which though unlikely to be used at this early stage, opens the door up to real-time HEVC encoding on Quadro.

As for the card’s construction and power requirements, both have changed compared to K6000. M6000’s TDP is 250W, up from 225W on K6000. The increased TDP allows for higher clockspeeds than the Quadro family’s historically conservative clockspeeds, and is at this point equivalent to the consumer GTX Titan X’s power requirements. Interestingly despite this increase, M6000 only requires 1 8-pin PCIe power connector (located on the far side of the card, as in past Quadro designs); this technically puts the M6000 out of spec on PCIe since 250W is more than what the slot + 8-pin connector can provide (225W). We asked NVIDIA about this, and they have told us that the card is pulling the extra power from the 8-pin connector, and though not officially in spec, the kind of systems expected to house the M6000 are expected to have no problem delivering the extra amperage necessary.

Meanwhile the card’s construction has seen the K6000’s plastic shroud and cooling apparatus replaced with the metal GTX Titan shroud and cooler, similar to the GTX Titan X. This change is largely driven by the power increase, as the GTX Titan cooler is already qualified to handle 250W designs. To set it apart from the GTX Titan X, the M6000 gets a black & green paint job rather than the Titan’s all-black paintjob. Otherwise the change in coolers has no effect on the card’s dimensions, with the card still being a double-slot 10.5” long card, just like the K6000.

Moving on, while M6000 will be a graphics monster, as it’s using the GM200 GPU this means that it will also inherit GM200’s compute capabilities, including the GPU’s highly limited double precision (FP64) performance. On the more recent Quadro 6000 cards, NVIDIA has used GPUs with high FP64 throughput (largely an artifact of also using these GPUs in Tesla compute cards) and left FP64 throughput unrestricted on Quadro cards. This made the Quadro K6000 a sort of jack of all trades, offering NVIDIA’s best pro graphics performance along with their full compute performance.

However GM200 and the Quadro M6000 change that. With Quadro M6000 having a native FP64 rate of 1/32 FP32, M6000 will only have minimal FP64 capabilities. In our GTX Titan X article we discuss the development rationale for this, but NVIDIA has essentially opted to build the best graphics and FP32 compute GPU they can, and not waste space on FP64 resources. Consequently this is the first Quadro 6000 series card in some time to have such poor FP64 performance. However as FP64 compute is not widely used in graphics, this is not something NVIDIA believes will be an issue. In the far more common scenario of FP32 compute (e.g. most ray-tracing engines), M6000 will be far more performant than its predecessors.

Finally, as far as use cases go, NVIDIA is aiming the M6000 at a cross-section of possible markets. There is of course the traditional pro visualization market, the high-end of which is always in need of greater GPU performance, something the M6000 can provide in spades. However the company is also pushing the use of Physically Based Rendering (PBR), a compute-intensive rendering solution that uses far more accurate rendering algorithms to accurately model the physical characteristic of a material, in essence properly capturing how light will interact with that material and reflect off of it rather than using a rough approximation. We’ll have more on PBR a bit later this week when we talk about Quadro developments at GDC.

Wrapping things up, NVIDIA tells us that Quadro M6000 will be available soon in complete systems through the company’s regular OEM partners, and as individual cards via the typical retail channels. As is company for NVIDIA, they have not announced a launch price for the M6000, but we would expect to see it launch at $5000+, as has been the case with past Quadro 6000 series cards.

Quadro VCA (2015)

Meanwhile with the launch of the Quadro M6000, NVIDIA is also using this opportunity to refresh their Iray Visual Computing Appliance (VCA), the company’s high-end network-attached render server. The VCA specializes in very high performance remote rendering jobs, packing in multiple GPUs into a single server box, with further scale-out capabilities to multiple VCA boxes via 10GigE and Infiniband.

Now dubbed the Quadro VCA, this updated VCA packs in 8 of NVIDIA’s high-end Quadro cards. The cards themselves are GM200 based but are technically not M6000 – NVIDIA is quick to note that they have a different BIOS that has them clocked slightly differently – but should perform similar to the aforementioned M6000. These cards have 12GB per GPU and are fully enabled, giving the entire VCA some 96GB of VRAM and 24,576 CUDA cores.

Driving the Quadro cards will be a pair of 10-core Xeon processors (we don’t have the specific model at this time, but believe it to be from the Xeon E5 V3 family), 256GB of system memory, and 2TB of solid state storage. Other than the change in processors and the updated Quadro cards, the rest of these specs are identical to the previous generation VCA.

On the software side, the new Quadro VCA runs CentOS 6.6. It will also come with Iray 2015 and Chaos’s V-Ray RT pre-installed to make setup easier, however it should be noted that the VCA does not include the licenses for those software packages and those must be purchased separately.

The Quadro VCA will be available soon through NVIDIA's VCA partners for $50,000.

There is a difference in channel width when it comes to desktops, servers and DIMMs: 72 bit vs. 64 bit. There is also an extra memory chip or two that holds the ECC information on each DIMM, though they're ordinary memory chips. Oddly, this extra memory is not marketed as part of memory capacity as it is not directly addressable for program usage (16 GB ECC DIMM actually contains 18 GB of memory). This extra hardware also means that ECC calculations do not significantly impact performance.

nVidia didn't widen the memory bus so enabling ECC on the M6000 will reduce usable memory as well as decrease performance. That 12 GB of memory drops to 10.5 GB usable with ECC enabled. Due to how wide and how the ECC algorithm works, expect a 20% to 25% drop in memory bound tests.Reply

I think the cost difference is in the amount of time and testing that goes in to the component durability and compatibility to make this a workstation grade product, instead of a consumer grade one.Reply

Three very important differences, ECC support, binning, and driver optimizations and validations for a wide range of content creation and productivity software. Can't overstate the importance of the last point, particularly in my line of work. Drivers can make a huge difference in terms of viewport performance and rendering accuracy in certain 3D packages.Reply