Design

SIMD-Enabled Vector Types with C#

Optimizing code for current SIMD registers and preparing for AVX-based SIMD  all in C#

In the first article in this two-part series on working with SIMD-enabled vector types with C#, I explained how to install and configure the necessary components to make Microsoft's new JIT, RyuJIT, generate SIMD instructions from your C# code with fixed vector types. Here, I explain the different operations that map to SIMD instructions for fixed vector types. I also provide examples of more-advanced scenarios in which you can use hardware-dependent vector types that adjust their number of elements (based on the capabilities of the underlying hardware) and allow you to work with other data types than float.

Methods and Operations that Generate SIMD Instructions

The three fixed-sized vectors (Vector2f, Vector3f, and Vector4f) with different numbers of single-precision floating-point elements define operators and methods that generate SIMD instructions optimized to perform operations on packed floating points. If you've worked with SIMD intrinsics in C or C++, you will be able to take advantage of your existing knowledge in C#. But instead of coding with SIMD intrinsics, you can use the operators and methods provided by the fixed-sized vectors and have RyuJIT generate optimized SIMD instructions.

The documentation for fixed vector types included in Microsoft.Bcl.Simd is really very poor. So here I provide a summary with the operators and methods of these vector types. I include sample C# code and the main SIMD instructions that each operator or method generates with RyuJIT. I also include the equivalent SIMD intrinsics in case you have experience with their use in the Visual C++ compiler or Intel C/C++ Compiler. This way, you will know all the optimized operations you can use with the vectors and can write your algorithms using them. Don't forget that hardware-dependent vector types will allow you to work with a higher number of elements per SIMD instruction on capable hardware. In addition, the examples will be useful when you work with vectors that pack types other than single-precision floating point.

For each code sample, consider that the following lines define two Vector3f instances:

- operator or Subtract methods: They use the SUBPS instruction (Subtract Packed Floating Point Floating Point Values), equivalent to the _mm_sub_ps instrinsic. Sample lines that generate the SUBPS instruction:

* operator or Multiply methods: They use the MULPS instruction (Multiply Packed Floating Point Floating Point Values), equivalent to the _mm_mul_ps intrinsic. Sample lines that generates the MULPS instruction:

/ operator or Divide methods: They use the DIVPS instruction (Divide Packed Floating Point Floating Point Values), equivalent to the _mm_div_ps intrinsic. Sample code that generates the DIVPS instruction:

== operator or Equals methods: They use the CMPEQPS instruction (Compare Packed Floating Point Floating Point Values), equivalent to the _mm_cmpeq_ps intrinsic. Sample code that generates the CMPEQPS instruction:

var areEqual = (vector1 == vector2);

!= operator: It also uses the CMPEQPS instruction explained for the == operator. Sample code that generates the CMPEQPS instruction:

var areNotEqual = (vector1 != vector2);

CopyTo method: It uses both the MOVAPS (Move/Load Aligned Packed Floating Point Floating Point Values) and MOVUPS (Move/Load Unaligned Packed Floating Point Floating Point Values) instructions. These instructions are equivalent to the _mm_load_ps and _mm_loadu_ps intrinsics. In previous versions of RyuJIT and Microsoft.Bcl.Simd, the CopyTo method didn't take advantage of these SIMD instructions and generated a big distortion when measuring performance improvements in the SIMD-improved version of the code. Starting with CTP4, CopyTo has been improved to use MOVAPS and MOVUPS. Sample code that generates the MOVAPS and MOVUPS instructions:

var array = new float[3];
vector1.CopyTo(array);

The VectorMath class provides math functions that operate on vectors and generate optimized SIMD intrinsics. The math functions are useful for vectors with both a fixed size and a hardware-dependent size. I include sample C# code and the main SIMD instructions that each VectorMath method generates with RyuJIT. However, take into account that, in some cases, the generated SIMD instructions don't use the best instructions (that would reduce the number of required instructions to perform the math operation on the packed types). Newer versions might produce better optimizations and the use of more specific SIMD instructions.

Max: It uses the MAXPS instruction (Return Maximum Packed Single Precision Floating Point Values), equivalent to the _mm_max_ps intrinsic. Sample code that generates the MAXPS instruction:

var vector3 = VectorMath.Max(vector1, vector2);

Min: It uses the MINPS instruction (Return Minimum Packed Single Precision Floating Point Values), equivalent to the _mm_min_ps intrinsic. Sample code that generates the MINPS instruction:

var vector3 = VectorMath.Min(vector1, vector2);

SquareRoot: It uses the SQRTPS instruction (Compute Square Roots of Packed Single Precision Floating Point Values), equivalent to the _mm_sqrt_ps instrinsic. Sample code that generates the SQRTPS instruction:

var vector3 = VectorMath.SquareRoot(vector1);

Abs: It uses many SIMD instructions including MOVSS, SHUFPS, MOVAPS, and ANDPS to calculate the absolute value for all the elements of the vector. Sample code that generates many SIMD instructions to calculate the absolute value for all the elements of a vector:

var vector3 = VectorMath.Abs(vector1);

DotProduct: It uses many SIMD instructions including MULPS, MOVAPS and ADDPS to calculate the dot product, also known as scalar product, of two vectors. Sample code that generates many SIMD instructions to calculate the dot product:

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!