Effectively Utilizing 3DNow! in Linux

A description of this new technology and its impact on machine performance.

In 1998, AMD (Advanced Micro Devices)
released a new family of x86 CPUs that included 3DNow! capability.
3DNow! is designed to deliver enhanced performance for certain
multimedia and floating-point operations. Other x86 clone CPU
manufacturers, such as Cyrix and IDT (Integrated Device Technology,
Inc.), also initially pledged to support 3DNow! in forthcoming
CPUs. Currently, 3DNow! support is provided by IDT's most recent
generation of processors (WinChip 2) as well as by AMD's K6-2, K6-3
and Athlon (K7) families of processors.

In this article, we'll describe the 3DNow! technology
(especially how it impacts performance on the popular K6-2 and K6-3
CPUs) and show how to detect and take advantage of 3DNow! using
Linux. 3DNow! is an exciting development; using it effectively can
unleash outstanding performance by AMD and IDT processors.

What is 3DNow!?

3DNow! builds on the Intel MMX (multimedia extensions to x86)
capability. Ariel Ortiz Ramirez described MMX and how to utilize it
with Linux in issue 61 of Linux Journal, so we
won't go into much detail here about MMX. Briefly stated, MMX adds
eight 64-bit “multimedia” registers (MM0 through MM7), and 57
instructions that operate on those registers, to the x86 platform.
Multiple short integers can be stored (packed) into each multimedia
register, and the MMX instructions allow parallel computations on
these packed integers. While MMX is restricted to operation on
integers, 3DNow! extends the multimedia registers by enabling
multiple (two) single-precision floating-point numbers to be stored
(packed) into each of them. The 3DNow! instruction set includes 21
new operations on the multimedia registers. The majority of these
instructions provide fast, pipelined single-precision (packed)
floating-point computation.

3DNow! capability is well-suited for fast calculation of
common graphics operations such as clipping, lighting and 3-D
transformations, as well as special effects involving application
of physical models (e.g., fog, cloud and gravity effects). However,
any application with a fair amount of floating-point computations
can benefit from use of 3DNow! When used effectively, 3DNow! can
increase the floating-point throughput of an application by a
factor of two to four (or even more for some special-purpose
applications). The increased performance results because each
3DNow! operation produces two outcomes (packed into each multimedia
register), whereas standard floating-point operations by the
floating-point unit (FPU) produce only one outcome per
operation.

Furthermore, in the AMD K6-2 and K6-3, the MMX and 3DNow!
operations have access to dual pipelined execution units, enabling
up to two 3DNow! operations to execute simultaneously. Thus, up to
four results can be computed per processor clock cycle on the K6-2
and K6-3. (This compares to a maximum of one floating-point result
per clock cycle for the Pentium II; thus, a PII/450 has a peak
performance of 450 MFLOPS (million floating-point operations per
second) while a K6-2/450 has a peak performance of 1800 MFLOPS).
The standard floating-point computations on the AMD K6-2 and K6-3
are not pipelined, which means there is a delay of two or more
clock cycles between each concluded standard floating-point
computation. Using the 3DNow! capability can turbo-charge the
floating-point throughput of programs that utilize 3DNow!
instructions. For computers equipped with an AMD K6-2, K6-3 or IDT
WinChip2, peak floating-point performance is possible only for
programs that contain 3DNow! instructions.

Getting Started

Unfortunately, few compilers can generate 3DNow! instructions
for compiled code. Thus, to exercise the 3DNow! capability in
programs written in high-level languages such as C/C++, FORTRAN or
Pascal, it's necessary to include explicit assembly code which has
3DNow! operations. This is not difficult to do, so we will
demonstrate how to use 3DNow! in C/C++ programs in Linux.

One way to determine whether a given machine supports 3DNow!
is to download and run an application that identifies the processor
and checks for 3DNow! capability. AMD has an application of this
type that can be downloaded from their corporate web site. A
practical solution for determining from within a program whether
the host CPU supports 3DNow! is to use the
CPUID instruction, which returns
information on processor features and is supported by the entire
x86 family. If a program determines that 3DNow! support is present,
it can exercise the appropriate sections of code which utilize
3DNow! Specifically, 3DNow! support can be determined by calling
the instruction CPUID 8000_0001h. This
instruction sets flag bits in the EDX register according to the
CPU's level of multimedia support. Bit 31 of the EDX register
indicates whether there is 3DNow! support; thus, CPUID sets this
bit to 1 if the CPU supports 3DNow! If bit 30 is also set to 1, the
CPU supports the enhanced extensions to 3DNow! available in the new
AMD Athlon processor.

Some assemblers include support for 3DNow! instructions;
assembly language modules that include 3DNow! instructions will be
assembled without difficulty by such assemblers. However, many
assemblers do not include direct support for 3DNow! In many cases,
it is still possible to use 3DNow! instructions with those
assemblers, although it will be necessary to define the
instructions as pseudo-instructions using data blocks or emits.
Fortunately, AMD's web site has a C++ header file that contains
macro definitions for the 3DNow! instruction set. Inclusion of this
header file can enable development of embedded assembly code within
higher-level language programs. These macros specify the
hexadecimal decoding for the 3DNow! instructions using the
emit pseudo-instruction; the
header file may need to be modified for certain compilers, as not
all of them support emit. Under Linux, we used the freely available
Netwide Assembler (NASM) to assemble code. NASM allows
pseudo-instruction macros to be built using the
db command. We have created a
header file that defines the 3DNow! instructions using the
db commands. This header file is
available for download from
http://merlin.cs.uah.edu/visgig/threednow/.
However, NASM versions from 0.98 and beyond support 3DNow!, so the
header file is needed only with older versions. Incidentally, we
found that NASM 0.97 doesn't allow MM2, MM3, MM6, or MM7 to be
result registers for 3DNow! operations. NASM 0.98 has no such
problem.

Trending Topics

Upcoming Webinar

Getting Started with DevOps - Including New Data on IT Performance from Puppet Labs 2015 State of DevOps Report

August 27, 2015
12:00 PM CDT

DevOps represents a profound change from the way most IT departments have traditionally worked: from siloed teams and high-anxiety releases to everyone collaborating on uneventful and more frequent releases of higher-quality code. It doesn't matter how large or small an organization is, or even whether it's historically slow moving or risk averse — there are ways to adopt DevOps sanely, and get measurable results in just weeks.