As you know, lots of software developers need random numbers while they develop applications. Especially, financial and estimation based applications are commonly used areas of random numbers. Today, there are many random number generators, and some of them are open source and free to use. Both MT (Mersenne Twister) and its improved version SFMT (SIMD-oriented Fast Mersenne Twister) are very popular and well known random number generator algorithms.

SSE2, Streaming SIMD Extensions 2, is one of the IA-32 SIMD (Single Instruction, Multiple Data) instruction sets. SSE2 was first introduced by Intel with the initial version of the Pentium 4 in 2001. It extends the earlier SSE instruction set, and is intended to fully supplant MMX. Intel extended SSE2 to create SSE3, in 2004. SSE2 added 144 new instructions to SSE, which has 70 instructions. Rival chip-maker AMD added support for SSE2 with the introduction of their Opteron and Athlon 64 ranges of AMD64 64-bit CPUs, in 2003.

When applications are designed to take advantages of SSE2 and run on machines that support SSE2, they're almost always faster than before. Today, many CPUs support the SSE2 instruction set. For detailed information about SSE2, please visit this link.

SIMD-oriented Fast Mersenne Twister (SFMT) is a Linear Feedbacked Shift Register (LFSR) generator that generates a 128-bit pseudorandom integer at one step. It was introduced by Mutsuo Saito and Makoto Matsumoto (from Hiroshima University) in 2006. SFMT is designed with the recent parallelism of modern CPUs, such as multi-stage pipelining and SIMD (e.g., 128-bit integer) instructions. It supports 32-bit and 64-bit integers, as well as double precision floating point as output. SFMT is a variant of Mersenne Twister (MT), and is twice as faster. So, it’s nice to know that the SFMT DLL is available to generate both 32-bit and 64-bit integers.

As I've said yet, in this article, I'll try to generate an SFMT DLL, and when I do this, I'll use the original version of SFMT codes. Its original C implementation (version 1.3.3) can be downloaded from here. During my development, some special and necessary changes on the original C implementation and reasons to modify the original code will be explained step by step. The base concept when generating SFMT.dll is not to change or modify its core codes, but make these codes callable and usable form outside of the generated DLL.

Note that I'll use Visual Studio 2008 on Windows Vista; both for analyzing the original code and developing the SFMT DLL.

In Visual Studio, I start a new C++ Win32 project named SFMT:

Now, the Win32 Application Wizard will be shown. In this window, from Application Settings, I choose “DLL” for Application type, and tick “Empty project” for Additional options.

After clicking the Finish button, a new empty project will be created on the Visual Studio screen.

I unzipped the original C implementation code of SFMT which I downloaded from this address under the Visual Studio 2008\Projects\SFMT directory. After unzipping, you see lots of files, but be sure we won't use all of them. Some of them are for test purposes, and some of the files include test results.

Actually, there are five main code files in the C implementation (version 1.3.3) that I focused on, and they are:

SFMT.c: The main code for the SFMT’s generator engine is in this file. It implements the main methods, for example, the “gen_rand32” method for generating 32-bit integers.

SFMT.h: Via this file, the main methods can be called easily. In addition, other useful methods such as generating real numbers are implemented here.

SFMT-params.h: It includes some basic definitions such as MEXP and parameters to be used while generating pseudo random numbers. Also, some preprocessor rules for the current MEXP (Mersenne Exponent) and “include” structures are coded here.

#elif MEXP == 19937 #include "SFMT-params19937.h"

SFMT-sse2.h: It provides SSE2 support and, of course, it has access to accelerated codes via the CPU’s SSE2 instructions. It uses emmintrin.h.

SFMT-paramsXXXXXX.h: Other necessary parameters are located in these files. Here, XXXXXX represents the MEXP constant. There are ten parameter files which are configured for different MEXP values. MEXP and the meaning of it are mentioned in the next paragraph.

In the code, you'll see a definition called MEXP, and it’s the starting point to use the algorithm. MEXP means Mersenne Exponent. The period of the generated code will be 2MEXP-1. It’s a must be definition to use the algorithm. It must be one of these values: 607, 1279, 2281, 4253, 11213, 19937, 44497, 86243, 132049, 216091.

Unless you haven't specified it, the default value is 19937.

If you examine the original implementation of SFMT, you see that it can be compiled in three possible platforms:

Standard C without SIMD instructions

CPUs with Intel's SSE2 instructions + C compiler which supports these features

CPUs with PowerPC's AltiVec instructions + C compiler which supports these features

Above, as you see, number 3 isn't applicable for Microsoft based platforms, because it uses AltiVec instructions. Number 2 (using the power of SSE2 instructions) is the way to go for me. While generating the DLL, my target is to modify the code to be compiled with the SSE2 instructions. Therefore, first of all, I'll clean some unnecessary parts of the code. Also, at the end of the development, when I build and compile the SFMT.dll, you'll switch easily between standard C and SSE2 supported versions.

In the Solution Explorer, under the SFMT project, I added the existing SFMT.c file to the Source Files directory and opened it to modify.

At the beginning, I detached some preprocessor codes in the SFMT.c file. For example, it includes some definitions and meanings like this:

#if defined(HAVE_ALTIVEC): This is optional. If this macro is specified, the optimized code for AltiVec will be generated. This macro automatically turns on the BIG_ENDIAN64 macro.

#if defined(BIG_ENDIAN64): This macro is required when your CPU is BIG ENDIAN and you are using 64-bit output. So, it’s for PowerPC-based computers with a Macintosh Operating System.

#if defined(ONLY64): This macro is optional. If this macro is specified, the optimized code for 64-bit output for BIG ENDIAN CPUs will be generated, and code for 32-bit output won't be generated.

The HAVE_ALTIVEC, BIG_ENDIAN64, or ONLY64 preprocessor commands and their related code aren't applicable or suitable for Windows platforms, and I removed these commands and their related code from the SFMT.c file carefully.

On the other hand, there’s a preprocessor definition called HAVE_SSE2, and it’s a critical one for us. It’s important to keep HAVE_SSE2 and its related code in the file when removing other unnecessary definitions.

#if defined(HAVE_SSE2): If this macro is specified, optimized code for SSE2 will be generated.

32-bit output

LITTLE ENDIAN 64-bit output

BIG ENDIAN 64-bit output

required

MEXP

MEXP

MEXP, BIG_ENDIAN64

optional

HAVE_SSE2, HAVE_ALTIVEC

HAVE_SSE2

HAVE_ALTIVEC, ONLY64

In SFMT.c file, there are two functions that are used for filling arrays with 32 bit or 64 bit random integer numbers. First is fill_array32 and second is fill_array64. I changed some part of these functions and want to mention these changes here:

Changing return type of fill_array32 and fill_array64: In the original C implementation of SFMT, both of the fill_array functions return nothing. It means they're used with void keyword. In my SFMT.dll, I upgraded the return type of these functions to int. After that, these functions had the ability to return 0 or 1 values. If the function returns 0, it means array isn't filled successfully by the function. This almost always indicates, some memory allocation for process is down. If the function returns 1, it means array's filled successfully by the function and the array is ready to use.

Always using extended size arrays for compatibility and flexibility: In the original C implementation of SFMT, there are two rules for both fill_array32 and fill_array64 functions:

The size of array must be greater than or equal to (MEXP / 128 + 1) * 4 for fill_array32 and must be greater than or equal to (MEXP / 128 + 1) * 2 for fill_array64.

The size of array must be a multiple of 4 for fill_array32 and must be a multiple of 2 for fill_array64.

Because of these rules, I had to use extended size arrays when generating pseudo random numbers. Also, it's very important and much flexible to have the ability using all the sizes for array. To fulfill the arrays, I coded new functions and added them to SFMT.c code file. These functions are listed below:

As I've mentioned in the previous paragraph, these modifications are very important. Via these modifications, we eliminated both the rule of array size must be multiple of 4 or multiple of 2 and the rule of array size must be greater than or equal to (MEXP / 128 + 1) * 4 or (MEXP / 128 + 1) * 2. To be more clear, for example, if you want to generate 2113 count integer number, you can do it easily by using modified fill_array32 or fill_array64 functions. By using the original version of fill_array32 and fill_array64 functions, you can't generate total 2113 count integer. Because 2113 isn't a multiple of 4 or multiple of 2.

Note: Body of modified fill_array32 and fill_array64 functions that integrated with get_array32_extended_size and get_array64_extended_size functions are mentioned below.

data alignment and using aligned memory blocks: To use fill_array functions, the pointer to the array must be \b "aligned" (namely, must be a multiple of 16) in the SIMD version, since it refers to the address of a 128-bit integer. In the standard C version, the pointer is arbitrary. If we defined HAVE_SSE2 macro, then it requires pointer to the array must be used 16 byte aligned memory blocks to generate random integers. Because, SSE (Streaming SIMD Extensions) defines 128-bit (16-byte) packed data types (4 of 32-bit float data) and access to data can be improved if the address of data is aligned by 16-byte; divisible evenly by 16. 16 byte alignment is a must be for using SSE2 support. Also, misaligned data slows down data access performance, too. You can visit songho page and IBM's this page to get more information about data alignment and 16 byte alignment for SSE2.

In MSVC CRT, a dynamic array can be allocated using _aligned_malloc() function, and deallocated using _aligned_free(). Below, the code for aligned memory allocation that is used in the fill_array32 and fill_array64 is given.

In this file, I removed only the #ifdef __GNUC__ preprocessor definition and its related code. Because I am using the Microsoft Visual Studio C++ compiler for generating a DLL, I don't need GNU based codes.

You can see some basic definitions in this file. Their structure and meanings are like this:

/*-----------------
BASIC DEFINITIONS
-----------------*//** Mersenne Exponent. The period of the sequence
* is a multiple of 2^MEXP-1.
* #define MEXP 19937 *//** SFMT generator has an internal state array of 128-bit integers,
* and N is its size. */#define N (MEXP / 128 + 1)
/** N32 is the size of internal state array when regarded as an array
* of 32-bit integers.*/#define N32 (N * 4)
/** N64 is the size of internal state array when regarded as an array
* of 64-bit integers.*/#define N64 (N * 2)

The MEXP value is used as a criteria for determining and including correct parameter files to the project, and via this mechanism, developers can use their necessary parameter files by just changing the value of MEXP. Because of this mechanism, the original SFMT implementation covers ten different SFMT-paramsXXXXXX.h header files.

In my project, I used 19937 for MEXP. Also, 19937 is the default value for the original C implementation too.

After modifying the SFMT-params.h file, it’s time to make changes in the associated SFMT-paramsXXXXXX.h files. There are ten files and, each has its own descriptions. The MEXP constant can take ten different values and so, there are ten different paramsXXXXXX.h files at present. I use 19937 for MEXP and the first file to be changed is SFMT-params19937.h.

In the SFMT-params19937.h header file, there are some parameters for Altivec. They start with a #if defined (__APPLE__) structure in the code. I removed this preprocessor code block. This block contains parameters for the MAC OS X and is listed below:

I changed and modified all these parameter files. In other words, I clean out all the unnecessary OS X specific code in the header files.

Below, you can see other necessary parameters that are defined in the SFMT-params19937.h file:

#define POS1 122 // the pick up position of the array.
#define SL1 18 // the parameter of shift left as four 32-bit registers.
#define SL2 1 // the parameter of shift left as one 128-bit register.
#define SR1 11 // the parameter of shift right as four 32-bit registers.
#define SR2 1 // the parameter of shift right as one 128-bit register.
/* A bitmask, used in the recursion. These parameters are introduced
to break symmetry of SIMD. */#define MSK1 0xdfffffefU
#define MSK2 0xddfecb7fU
#define MSK3 0xbffaffffU
#define MSK4 0xbffffff6U
// These definitions are part of a 128-bit period certification vector.#define PARITY1 0x00000001U
#define PARITY2 0x00000000U
#define PARITY3 0x00000000U
#define PARITY4 0x13c9e684U
// String representation of MEXP 19937 parameters.#define IDSTR "SFMT-19937:122-18-1-11-1:dfffffef-ddfecb7f-bffaffff-bffffff6"

The SFMT.h header file is very important. I'll add this file to my project. Of course, it’s a header (*.H) file so, I add it to the Header Files directory of my project. After making some modifications on it, I'll be able to call the SFMT functions outside of my DLL. Before talking about the changes, let’s look at the SFMT.h functions, declarations, their missions:

uint32_t gen_rand32(void): The mission of this function is to generate pseudorandom 32-bit integers. the approach of this function is named the sequential call method.

uint64_t gen_rand64(void): The mission of this function is to generate pseudorandom 64-bit integers. The approach of this function is named the sequential call method.

int fill_array32(uint32_t *array, int size): This function can fill an array with pseudorandom 32-bit integers. The first parameter of the function is an array where pseudorandom 32-bit integers are filled. The second parameter of the function is the size of this array. Also, the second parameter represents the number of generated 32-bit integers. The approach of this function is named the block call method. If the function fails, the return value is 0.

int fill_array64(uint64_t *array, int size): This function can fill an array with pseudorandom 64-bit integers. The first parameter of the function is an array where pseudorandom 64-bit integers are filled. The second parameter of function is the size of this array. Also, the second parameter represents the number of generated 64-bit integers. The approach of this function is named the block call method. If the function fails, the return value is 0.

void init_gen_rand(uint32_t seed): This function initializes the internal state array with a 32-bit integer seed. The parameter seed is a 32-bit integer used as the seed.

To call these SFMT functions outside of my DLL, I need to use a special keyword:

__declspec(dllexport): You can export data, functions, classes, or class member functions from a DLL using the __declspec(dllexport) keyword. __declspec(dllexport) adds the export directive to the object file so you do not need to use a .def file. Many export directives, such as ordinals, NONAME, and PRIVATE, can be made only in a .def file, and there is no way to specify these attributes without a .def file. However, using __declspec(dllexport) in addition to using a .def file does not cause build errors.

To export SFMT functions, the __declspec(dllexport) keyword must appear to the left of the calling-convention keyword, if a keyword is specified. For example:

Real versions of functions: In the SFMT.h file, you can see some real versions of functions. They're due to Isaku Wada, and are used to generate random real numbers. All of the real functions are inline functions. Inline functions cannot be compiled as part of a DLL. An inline function implies that it is compiled into the location that calls it. This implies that an inline function does not have an address since the function is duplicated wherever it is called (i.e., in the main app, for example). If you want to make it as a separate binary library (*.lib, *.dll, etc.), the exported function could not be inline - truly - they are located in the binary file, not in your executable code. Because of these reasons, I clean inline functions as part of the SFMT.h file, and then add the rSFMT.cpp file to my Project under the Source Files directory. This file includes real versions of functions but not inline versions. Then, I form them to be exported, as seen below:

Extern C: After these modifications, if you compile the SFMT DLL and call the exported functions, then you'll get an error message at runtime, like this:

This problem occurs because the C++ compiler decorates the function names to get function overloading. Let’s see the exact name of our functions using the powerful Windows utility dumpbin.exe. Our command is dumpbin -exports SFMT.dll. The result of this command prompt is shown below:

As you see in this command prompt, the function names aren't clear, and when we try to call them, an unhandled exception occurs always.

There isn't any standard way of decorating the function names. So, you have to tell the C++ compiler to not decorate function names. We'll use the extern C structure to not decorate our functions:

At the beginning of the SFMT.h file:

#ifdef __cplusplus
extern"C" {
#endif

and at the end of the SFMT.h file:

#ifdef __cplusplus
}
#endif

Now, the code and functions we write between this extern C structure will work correctly and will be callable easily. At this time, let’s see the dumpbin -exports SFMT.dll command results:

This code means, if you include the HAVE_SSE2 definition in the command line of our project, then the project will use the SFMT-sse2.h file. Therefore, if you examine the SFMT-sse2.h file you'll realize that this file is coded for using the power of the CPU’s SSE2 special commands. Of course, using this file makes our code faster. The first and only limitation of using this file is running it only on SSE2 supported CPUs.

Using SSE2 support and how to enable this functionality is mentioned on the next caption “Setting project properties”.

A new window with an “SFMT Property Pages” caption will be visible. In this window, on the left side, under the “Configuration Properties” tab, you can see some property categories (General, Debugging, C/C++ etc.) that we'll use.

First of all, on the upper side of the project properties window, click the “Configuration Manager” button, and the Configuration Manager will be displayed on the screen. In this window, set the Configuration parameter to Release. Also, set “Active solution configuration” to Release, too. Setting this parameter to Release means the compiling our project doesn't need debug data and it's ready to release.

The most important properties of our SFMT.dll project are Preprocessors.

Under the “Configuration Properties” --> C/C++ --> Preprocessor tab, there are preprocessor definitions. I'll add two definitions here: MEXP and HAVE_SSE2. MEXP has been mentioned before, and it represents the Mersenne Exponent. In addition, the HAVE_SSE2 definition is used for taking advantage of CPU’s SSE2 support.

I want to say that changing the MEXP value or eliminating SSE2 support is very flexible in this situation. You can always configure these two preprocessor definitions and then compile another version of the SFMT.dll easily.

Another important property is “Optimization” Under Configuration Properties --> C/C++ --> Optimization, please be sure Optimization is set to “Maximize Speed (/O2)”. Setting this property to Maximize Speed (/O2) means the compiler will produce some optimization output when we compile the project. This can increase the size of the SFMT.dll, but it can also be disregarded. Because, the speed of SFMT.dll is preferred to bigger size. It’s not necessary to have faster code when we're generating two or three random numbers, but when generating 10 million numbers, the speed of our code becomes a major factor. In time critical applications like mathematical operations or engineering applications, perhaps, a fast code might be more appropriate.

Also, we have to know another option called “Enable Intrinsic Functions (/Oi)” . Programs that use intrinsic functions are faster because they do not have the overhead of function calls, but may be larger because of the additional code created.

In Configuration Properties --> C/C++ --> Code Generation tab, the default value of the Runtime Library option is Multi-threaded DLL (/MD). I'll change this option to Multi-threaded (/MT). This causes your application to use the multithreaded, static version of the run-time library. It defines _MT, and causes the compiler to place the library name LIBCMT.lib into the .obj file so that the linker will use LIBCMT.lib to resolve external symbols.

C/C++ multi-threaded applications on Windows need to be compiled with either the -MT or -MD options. The -MT option will link using the static library LIBCMT.LIB, and -MD will link using the dynamic library MSVCRT.LIB. The binary linked with -MD will be smaller but dependent on MSVCRT.DLL, while the binary linked with -MT will be larger but will be self-contained with respect to the runtime. The actual working code is contained in MSVCR90.DLL (for Visual Studio 2008 projects), which must be available at runtime to applications linked with MSVCRT.lib.

If I build my project with the –MD option (dynamic linking), then my SFMT.dll will be approximately 10 KB. It’s a quite small one. If I build the project with the –MT option (static linking), then my SFMT.dll will be 57 KB. Of course, it’s larger than 10 KB.

On the other hand, If I try to call and use the dynamically linked SFMT.dll on the other computer, possibly, I can get an error like this:

"This application has failed to start because the application configuration is incorrect.Reinstalling the application may fix this problem"

This error shows that the computer and the Operating System which you are trying to run the SFMT.dll on don't have the C/C++ Runtime Libraries. In this situation, you must distribute The C/C++ Runtime Libraries with your SFMT.dll. You can see the analysis of the SFMT.dll running on an Operating System without the C/C++ Runtime Libraries below. As you can see, it needs the MSVCR90.dll and related libraries. Also note that, it’s quite simple to setup the SFMT Project with the necessary C/C++ Runtime Libraries. Because, we're using a powerful IDE: Visual Studio 2008.

In addition, in the Configuration Properties --> C/C++ --> Code Generation tab, set the Enable Enhanced Instruction Set property to StreamingSIMD Extensions 2 (/arch:SSE2). The arch flag enables the use of instructions found on processors that support enhanced instruction sets, e.g., the SSE and SSE2 extensions of Intel 32-bit processors. Note that, with this setting, it will prevent the code running on processors which don't support SSE2 extensions. But, in this project, our processor target is CPUs supporting SSE instructions.

On the other tab called “Linker”, it’s important to see the Target Machine property set to MachineX86. This is the default value for our project, but don't forget to check it. The Linker tab’s command will be like this:

Now, it is time to build the SFMT project. To do this, simply press the F6 key, or focus on the Build menu of Visual Studio and then click “Build Solution”. If all is OK, then you'll get a message “Build succeeded”. After this, Visual Studio will create a folder named “Release” under the SFMT project main directory. In this folder, you'll see SFMT.dll. To analyze SFMT.dll, I use the Dependency Walker tool. You can download it from here. All exportable functions in SFMT.dll can be seen easily via this GUI. You can see a screenshot representing the SFMT.dll below.

In addition, After building my project I renamed the SFMT.dll to SFMTsse2.dll for future compatibility. Actually, I'll need this kind of criterion when determining and using the right DLL. Anyway, we'll talk about it later.

If you don't have SSE2 support on the machine which SFMT.dll will run, then you'll get an error. Instead of getting this error, you could easily prepare C version of SFMT.dll and rename it to SFMTc.dll. This SFMTc.dll could generate random numbers without needing SSE2 support. It's too easy to configure project properties for SFMTc.dll:

Under the “Configuration Properties” --> C/C++ --> Preprocessor tab, there are preprocessor definitions. Delete "HAVE_SSE2" Preprocessor command from this window.

In the Configuration Properties --> C/C++ --> Code Generation tab, set the Enable Enhanced Instruction Set property to Not Set.

Rebuild your project and then in the release directory of your project, rename your SFMT.dll to SFMTc.dll.

That's it. You can use your SFMTc.dll on the machines that don't have SSE2 support.

License

Share

About the Author

He was born in Bursa, Turkey in 1979, and still lives in this fascinating place.

He started programming with Fortran when he was at university and then has experienced with MS Visual Basic, Borland Delphi, Java, C++ and C#.

He developed database based several client-server and multi-tier applications using OO concepts and OOP. He worked with Oracle, MySQL, MSSQL and Firebird RDBMS.

In the past, he worked on freetime image processing projects using OpenGL, DirectX, SDL technologies. He's interested in encryption/decryption and compression/decompression algorithms, random number generators and computational algorithms such as Monte Carlo Methods.

Comments and Discussions

The party said that this promise has been “cynically set aside” so soon after the elections. “MSP for paddy has been set at Rs 1360 ? the increase of Rs 50 from Rs 1310 of last year doesn’t cover even the inflation rate. In maize and bajra, there has been zero increase in MSP whereas in most crops the increase is just Rs 50,” it said.http://newsgaadi.com/[^]

PM Modi in his campaign speeches repeatedly announced, ‘We will fix the MSP (Minimum Support Price) of crops incorporating 50 per cent profit in farmers cost of production including seed, irrigation, manure, labour’,” the statement added.[^]

The AAP asks the PM whether he expects that the cotton farmers’ distress and suicides will come down in Vidarbha? Will he be answerable to the families of those farmers who might commit suicide this kharif season.

Thank you.
Second part of SFMT in action will be about using dll from C# project. One or two months later, I'll submit my article and you can find what ever you want in the next article. I'm working on it and after I publish you can easily convert it to VB.Net.

The page formatting was a touch wide - maybe if it gets edited this will be fixed, its not a big issue

Some paragraphs/sentences needed reading twice becuase I thought the 'tense' was wrong .. they were written in current tense eg 'I change this/that' and I think perhaps you meant 'I changed this/that' .. again, its not a biggie - I'm guessing Turkish is your first language, your English is fantastic though

I thought to top it off (nb I havnt downloaded the code yet), you could provide a test harness and some expected output - so anyone building the dll could see what the basic usage/expected result would look like - yes, we know, it outputs 'random numbers', but how do we verify its operation.

Thank you for these constructive comments.
You're right. My first language is Turkish and I'm going to correct my sentences as soon as possible.
Also, now you can download source codes and SFMT.dll easily.
This's first part of SFMT in Action serie. So, next time (Part 2) I'll explain SFMT.dll and it's usage and it's functions. The next part of the “SFMT in Action” serie will be about developing a C# project for using SFMT.dll.