New Intrinsic Support in Visual Studio 2008

Hello. This is Dylan Birtolo, a UE writer for Visual C++. This is my first vcblog entry, but hopefully I will be more of a regular contributor. One of my most recent tasks was to incorporate the documentation for all of the new intrinsic functions that are being put into Visual Studio 2008 for VC++. It is very exciting since support for over 100 intrinsics were added.

Before getting to the intrinsics themselves, it is important to mention why you should prefer using intrinsics when it is possible for you to use inline assembly (inline asm) to access the instructions directly. Here are some reasons to consider using the intrinsics:

Inline asm is not supported by Visual C++ on 64-bit machines. Therefore, if you want your code to be 64-bit compatible, you need to use intrinsics.

Ease of use. The intrinsics do not require you to be aware of registers or manage memory directly. Instead, you have a function that is complete with inputs and return values. This makes the instructions more accessible to a wider range of technical expertise.

The intrinsics are updated in the compiler. What this means from a user perspective is that if the compiler improves how it handles the intrinsics, you receive this benefit immediately. Otherwise, if you are using inline asm, you will be responsible for making any improvements.

The optimizer does not work well with inline asm code, so it is recommended that you write inline asm code in its own function, assemble it, and link it in. With the intrinsics, those additional steps are not necessary.

Intrinsics are also more portable over code that uses inline asm.

Now let's get back to the new intrinsics. For the most part, these functions provide support for the Supplemental Streaming SIMD Extensions 3 (SSSE3), Streaming SIMD Extensions 4.1 (SSE4.1), SSE4.2, and SSE4A intrinsics. A handful of instructions were also created to support advanced bit manipulation instructions not available on earlier chipsets. All of these new intrinsics are first supported by the Penryn and Nehalem architectures for Intel and the Third-Generation AMD Opteron processors for AMD. However, regardless of your processor, you should always verify that a given intrinsic is supported before you attempt to use it. Not doing so could result in a run-time error.

To facilitate this verification process, the CPUID instruction has been updated. The latest copy of the documentation for Visual Studio 2008 contains a sample program in the topic __cpuid that you can copy, compile, and use. It is currently up to date and prints out in plain text what technologies your processor supports.

All of the intrinsics are straightforward and have documentation as well as code samples. Take a look. Tables for the new intrinsics can be found in the following three topics: SSE4A and Advanced Bit Manipulation Intrinsics, Streaming SIMD Extensions 4 Instructions, and Supplemental Streaming SIMD Extensions 3 Instructions.

Here is a list of the new intrinsics, organized by the instruction they support. Several of the instructions are very similar and only differ based on the size of the input parameters. To save space, these instructions are listed together. The one unusual case that bears some special consideration is POPCNT. This is listed both under SSE4.2 and ABM. This is so that the intrinsics are compatible with both the AMD and Intel compilers.

SSE

CVTSI2SS - Converts a 64-bit signed integer to a floating point value and inserts it into a 128-bit parameter. Intrinsics: _mm_cvtsi64_ss

CVTSS2SI - Extracts a 32-bit floating point value and rounds it to a 64-bit integer. Intrinsics: _mm_cvtss_si64

CVTTSS2SI - Extracts a 32-bit floating point value and truncates it to a 64-bit integer. Intrinsics: _mm_cvttss_si64

SSE2

CVTSD2SI - Extracts the lowest 64-bit floating point value and rounds it to an integer. Intrinsics: _mm_cvtsd_si64

CVTSI2SD - Extracts the lowest 64-bit integer and converts it to a floating point value. Intrinsics: _mm_cvtsi64_sd

CVTTSD2SI - Extracts a 64-bit floating point value and truncates it to a 64-bit integer. Intrinsics: _mm_cvttsd_si64