Nvidia Announces Tesla T4 GPUs With Turing Architecture

Nvidia CEO Jensen Huang took to the stage at GTC Japan to announce the company's latest advancements in AI, which includes the new Tesla T4 GPU. This new GPU, which Nvidia designed for inference workloads in the data center, leverages the same Turing microarchitecture as Nvidia's forthcoming GeForce RTX 20-series gaming graphics cards.

But the Tesla T4 is a unique graphics card designed specifically for AI inference workloads, like powering neural networks that process video, speech, search engines, and images. Nvidia's previous-gen Tesla P4 fulfilled this role in the past.

1234

10

add

06

1234

10

add

06

FP16

INT8

INT4

Nvidia Tesla T4 (TFLOPS)

65

130

260

Nvidia Tesla P4 (TFLOPS)

5.5

22

-

The Tesla T4 GPU comes bristling with 16GB of GDDR6, which 320 Turning Tensor cores, and 2,560 CUDA cores. The GPU supports mixed-precision, such as FP32, FP16, INT8, and INT4 (performance above). The low-profile 75W card slots into a standard PCIe slot in servers, but it doesn't require an external power source, like a 6-pin connector. Nvidia tells us that the die does feature RT Cores, just like the desktop models, but that they will be useful for raytracing or VDI (Virtual Desktop Infrastructure), which implies they will be unused for most inference workloads.

The Tesla T4 also features an INT4 and (experimental) INT1 precision mode, which is a notable advancement over its predecessor.

As expected, the card supports all the major deep learning frameworks, such as PyTorch, TensorFlow, MXNet, and Caffee2. Nvidia also offers its TensorRT 5, a new version of Nvdia's deep learning inference optimizer and runtime engine that supports Turing Tensor Cores and multi-precision workloads.

It's only 75W and will live in a server, so the linear airflow will keep it cool. Servers are like tornadoes inside, usually at least 200LFM. I'll add something to the article to explain that.

bit_user

1920539 said:

This new GPU, which Nvidia designed for inference workloads in hyperscale data centers, leverages the same Turing microarchitecture as Nvidia's forthcoming GeForce RTX 20-series gaming graphics cards.

Indeed, TU104 is the same silicon used in RTX 1070 and RTX 1080. All they did was down-clock and scale it back to fit a 75 W power envelope. It is then fitted with double the RAM (ECC, too), a passive heatsink, and a several $k price tag.

1920539 said:

the Tesla T4 is a unique graphics card designed specifically for AI inference workloads

It's a stretch to call it a graphics card. While it can do desktop virtualization, note the lack of any display outputs.

1920539 said:

Intel claims that most of the world's inference workloads run on Xeon processors

This seems like wishful thinking.

1920539 said:

it will likely be several years before the clear winners become apparent.

Looks like new 2050 with flashed firmware, acting like pro GPU and sold at 3x premium because why not.

jimmysmitty

328798 said:

1920539 said:

Intel claims that most of the world's inference workloads run on Xeon processors

This seems like wishful thinking.

I wouldn't be surprised actually since there is not only server CPUs but also the Knights series Xeons which are pretty much what Tesla competes with although Intel is said to not be continuing those in the future.

328798 said:

2794804 said:

why these type of card doesnt has cooler?

It's only 75W and will live in a server, so the linear airflow will keep it cool. Servers are like tornadoes inside, usually at least 200LFM. I'll add something to the article to explain that.

Eh, it doesn't really have anything to do with being only 75 W, as their 250 W Tesla V100 PCIe cards are also passively cooled.

The card has no fans but they are designed to go into a server rack which has fans spinning at full speed pushing air through the heatsinks out the back. Even right now with the door closed I can hear my servers spinning in my office.

The design is probably due to the way servers are built. I doubt you could throw a V100 into a mid tower or full tower and run it like in a server chassis without running into thermal problems.

368223 said:

Looks like new 2050 with flashed firmware, acting like pro GPU and sold at 3x premium because why not.

Except it has more VRAM thats also ECC and the drivers are mush more refined for what they do. Most GPUs start off as HPC based chips that get slowly trickled down to consumer ends after being cut off.

Its the same with CPUs. Most CPUs have a server variant that cost quite a bit more than the desktop counterpart does.

Scott_123

The 16GB DDR4 recommenedation is terrible. On Newegg you can get better ram for $30 DOLLARS LESS!

The card has no fans but they are designed to go into a server rack which has fans spinning at full speed pushing air through the heatsinks out the back. Even right now with the door closed I can hear my servers spinning in my office.
The design is probably due to the way servers are built. I doubt you could throw a V100 into a mid tower or full tower and run it like in a server chassis without running into thermal problems.

Yeah, I think that's what Paul was saying, and I agree. Nvidia specifies how much CFM (or m^3 / sec) are required for their passively-cooled Tesla cards.

It's not just Nvidia, either. AMD makes passively-cooled server cards, as did Intel, when they offered Xeon Phi on a PCIe card.

149725 said:

Most GPUs start off as HPC based chips that get slowly trickled down to consumer ends after being cut off.

That's twisting it, somewhat. I don't think it's really true to say it's a server chip before gaming, or vice versa. The past few generations have had the consumer cards released (or, in this case, simply announced) first. But Nvidia obviously collects requirements for each new chip. Some of those are for server applications, while others are for gaming and workstation uses. Then, all their chips (except for GP100 and GV100) are built to fill niches in all of these markets and sold on the appropriate vehicle (Tesla, for server; Quadro, for workstation; GeForce for consumer).

149725 said:

Its the same with CPUs. Most CPUs have a server variant that cost quite a bit more than the desktop counterpart does.

No, not in the same way as Nvidia is doing with GPUs. Intel's actual server chips are LGA 3647, and use different silicon than their workstation or desktop chips. AMD happened to use the same Zepplin die, in first gen Ryzen, Threadripper, and Epyc. But that's a first, for them, and I'm not sure if the dies from Epyc 7 nm will trickle down to desktop, or if they are going to bifurcate their silicon.

bit_user

368223 said:

Looks like new 2050 with flashed firmware, acting like pro GPU and sold at 3x premium because why not.

Don't be fooled by the size. As I said above, it uses the same chip as the RTX 2070 and RTX 2080, but with double the RAM.

Nvidia did the same thing with the P4, which was also a passively-cooled, low-profile card with the GP104 chip used on the GTX 1070 and GTX 1080.

Probably one of the ways they squeeze it onto such a small board is that the VRM needed for 75 W is just a lot smaller than what the desktop versions require. Not having a fan should also save a little area, since you don't have the fan header & controller, plus perhaps some accommodations for the airflow, etc.

jimmysmitty

328798 said:

149725 said:

The card has no fans but they are designed to go into a server rack which has fans spinning at full speed pushing air through the heatsinks out the back. Even right now with the door closed I can hear my servers spinning in my office.
The design is probably due to the way servers are built. I doubt you could throw a V100 into a mid tower or full tower and run it like in a server chassis without running into thermal problems.

Yeah, I think that's what Paul was saying, and I agree. Nvidia specifies how much CFM (or m^3 / sec) are required for their passively-cooled Tesla cards.
It's not just Nvidia, either. AMD makes passively-cooled server cards, as did Intel, when they offered Xeon Phi on a PCIe card.

149725 said:

Most GPUs start off as HPC based chips that get slowly trickled down to consumer ends after being cut off.

That's twisting it, somewhat. I don't think it's really true to say it's a server chip before gaming, or vice versa. The past few generations have had the consumer cards released (or, in this case, simply announced) first. But Nvidia obviously collects requirements for each new chip. Some of those are for server applications, while others are for gaming and workstation uses. Then, all their chips (except for GP100 and GV100) are built to fill niches in all of these markets and sold on the appropriate vehicle (Tesla, for server; Quadro, for workstation; GeForce for consumer).

149725 said:

Its the same with CPUs. Most CPUs have a server variant that cost quite a bit more than the desktop counterpart does.

No, not in the same way as Nvidia is doing with GPUs. Intel's actual server chips are LGA 3647, and use different silicon than their workstation or desktop chips. AMD happened to use the same Zepplin die, in first gen Ryzen, Threadripper, and Epyc. But that's a first, for them, and I'm not sure if the dies from Epyc 7 nm will trickle down to desktop, or if they are going to bifurcate their silicon.

Intel does have server CPU variants. The one you specified is HPC though. My point is technologies and ideas typically push to HPC/server first where money will be spent then come to consumer. Turing for example has some ideas from Volta which consumers never saw.