I am new to computer science and I was wondering whether half precision is supported by modern architecture in the same way as single or double precision is. I thought the 2008 revision of IEEE-754 standard introduced both quadruple and half precisions.

$\begingroup$It would be worth clarifying whether you just mean CPUs, or both CPUs and GPUs.$\endgroup$
– Max BarracloughMay 21 at 9:30

$\begingroup$Thanks for your comment Max Barraclough. I did not know that you could alter the precision in just the graphics processing units (GPUs); I thought it had to be done in both. Your comment is very helpful.$\endgroup$
– Asad MehasiMay 21 at 15:57

4 Answers
4

Intel support for IEEE float16 storage format

Intel supports IEEE half as a storage type in processors since Ivy Bridge (2013). Storage type means you can get a memory/cache capacity/bandwidth advantage but the compute is done with single precision after converting to and from the IEEE half precision format.

$\begingroup$The first portion of this answer is a bit misleading. While Ivy Bridge did indeed introduce support for F16C, and that does technically mean that half-precision floats are "supported", the only instructions provided are those to convert to and from half-precision floats. No operations can be done on them whatsoever, so that hardly meets the requirement in the question: "I was wondering whether half precision is supported by modern architecture in the same way as single or double precision is." It is merely a storage format. You're trading ALU conversion cycles for memory bandwidth/footprint.$\endgroup$
– Cody GrayMay 21 at 5:25

7

$\begingroup$I literally said it’s a storage type, all the savings comes from narrower data, and compute is done with single precision. It’s not misleading in the slightest. I said what support was present, which is half of the support possible.$\endgroup$
– JeffMay 21 at 13:30

$\begingroup$Thank you all. I actually found each of these comments helpful. The initial reason for asking the question was due to listening to others arguing about this exact thing.$\endgroup$
– Asad MehasiMay 21 at 16:02

In my opinion, not very uniformly. Low precision arithmetic seems to have gained some traction in machine learning, but there's varying definitions for what people mean by low precision. There's the IEEE-754 half (10 bit mantissa, 5 bit exponent, 1 bit sign) but also bfloat16 (7 bit mantissa, 8 bit exponent, 1 bit sign) which favors dynamic range over precision, and a variety of other formats (NVidia's 19-bit TensorFloat, AMD's fp24, maybe more?). Most of this stuff is running on special purpose GPGPU-type hardware.

The accepted answer provides an overview. I'll add a few more details about support in NVIDIA processors. The support I'm describing here is 16 bit, IEEE 754 compliant, floating point arithmetic support, including add, multiply, multiply-add, and conversions to/from other formats.

Maxwell (circa 2015)

The earliest IEEE 754 FP16 ("binary16" or "half precision") support came in cc (compute capability) 5.3 devices which were in the Maxwell generation, but this compute capability was implemented only in the Tegra TX1 processor (SoC, e.g. Jetson).

Pascal (circa 2016)

Pascal family members have either "full rate" (cc 6.0, 6.2) or "low rate" (cc 6.1) FP16 throughput. cc6.2 was again a Tegra family product, TX2. cc 6.0 and 6.1 found use in a variety of processors in various product families such as GeForce, Quadro, and Tesla. "full rate" here refers to a rate that is equivalent to twice the IEEE 754 FP32 ("binary32" or "single precision") rate for the processor in question, when operations were done using a half2 data type (two half quantities handled in the same register and instruction).

Volta, Turing (2017, 2018)

Volta and Turing family members (cc 7.x) support FP16 at "full rate", and in addition use the format in TensorCore operations.

Ampere (May, 2020)

The recently announced Ampere architecture A100 GPU also supports FP16 in a fashion similar to Volta and Turing, and introduces additional capability for a TF32 datatype, which is a format where the mantissa is the same size (number of bits) as a FP16 mantissa, and the exponent is the same size as a FP32 exponent. Bfloat16 capability was also announced in Ampere.

Apart from the Ampere architecture processor recently announced, support and throughputs for 16-bit floating point operations (and other operations) across compute capabilities (including architectures) can be found in table 3 of the CUDA programming guide. The throughputs are per clock, per multiprocessor, so need to be scaled accordingly for the GPU type and specifications. These throughputs are not for TensorCore operations, and the peak throughputs are generally only applicable when processing on half2 datatypes (two half quantities packed together in a single 32-bit word).

$\begingroup$Note that the C in F16C stands for Conversion; it only helps with memory bandwidth / cache footprint, not SIMD ALU throughput. You have to convert to float to do any math, so the number of elements per SIMD vector for FMA operations isn't improved.$\endgroup$
– Peter CordesMay 22 at 15:59