The configuration choices primarily affect how multi-threaded applications behave. The behavior is important because the Random class is used to generate SIP CallIds, branch identifiers, and (from/to) tags.

The configuration choices primarily affect how multi-threaded applications behave. The behavior is important because the Random class is used to generate SIP CallIds, branch identifiers, and (from/to) tags.

−

The options are described below. See Random.hxx for most up-to-date descriptions.

+

The options are described below. See Random.hxx for the most up-to-date descriptions.

[WIN32-default]

[WIN32-default]

Line 15:

Line 13:

RESIP_RANDOM_WIN32_RTL

RESIP_RANDOM_WIN32_RTL

−

The Random class uses Window's RtlGenRandom (aka SystemFunction036) on XP and higher system, and falls back to rand() on older Windows platforms.

+

The Random class uses Windows' RtlGenRandom (aka SystemFunction036) on XP and higher system, and falls back to rand() on older Windows platforms. NOTE: This hasn't been tested yet, it is direct merge from CTPC branch.

[POSIX-default]

[POSIX-default]

−

The Random class uses libc's random(). The random state is shared across all threads and all libraries of the application. While Linux's libc implementation has an internal mutex for thread protection, POSIX doesn't require this and other platforms may have corruption. Also, any part of the application may "corrupt" the state by calling srandom with "stupid" values.

+

The Random class uses libc's random(). The random state is shared across all threads and all parts of the application. While Linux's libc implementation has an internal mutex for thread protection, POSIX doesn't require this and other platforms may have corruption. Also, any part of the application may "corrupt" the state by calling srandom() with "stupid" values.

RESIP_RANDOM_THREAD_MUTEX

RESIP_RANDOM_THREAD_MUTEX

Line 26:

Line 24:

The Random class allocate state for each thread and tracks that using ThreadLocalStorage. As above, this requires that the platform support random_r(). This avoids serialization among threads (no mutex), and also prevents corruption by non-resiprocate parts of the application.

The Random class allocate state for each thread and tracks that using ThreadLocalStorage. As above, this requires that the platform support random_r(). This avoids serialization among threads (no mutex), and also prevents corruption by non-resiprocate parts of the application.

+

+

Early performance results (see below) indicate that where random_r() is supported, the THREAD_LOCAL mode should be used as it has higher performance in every case.

== Linux Performance results ==

== Linux Performance results ==

Line 35:

Line 35:

* 0. The test is run in the main process, but there is also a dummy thread hanging around doing nothing. Turns out this makes a difference, because libc's mutex protection of random() optimizes for the single-thread case.

* 0. The test is run in the main process, but there is also a dummy thread hanging around doing nothing. Turns out this makes a difference, because libc's mutex protection of random() optimizes for the single-thread case.

* 1. The test is run in a child process, and the main process does nothing.

* 1. The test is run in a child process, and the main process does nothing.

−

* N. Each child process generates C cycles of random numbers, and main process does nothing.

+

* N. There N child processes, and each child process generates C cycles of random numbers, and the main process does nothing.

+

+

The test results for 64-bit Linux 2.6 kernels on two different platforms are in table below. The columns are:

+

* random --> default, using native random()

+

* mutex --> using THREAD_MUTEX

+

* local --> using THREAD_LOCAL

+

+

(See more text below the picture -- the image has lots of excess white-space).

+

[[File:RandomPerfResult-1.png]]

−

[[File:RandomPerfResult-1.jpg]]

+

Observations:

+

* Thread local storage provides better performance in ALL cases, including all single-thread cases. Platforms that support random_r() should use this configuration.

+

* The mutex'd cases ("random" and "mutex") show expected linear performance on single-core processor. But they show much WORSE than linear performance on multi-core processor. That is, running many threads on multi-core processor is actually slower than running on single-core processor. This result doesn't really make sense, and I hope someone else can replicate/explain it.

+

* Linux random() optimizes the true single-thread (T=-1) case. The multi-threaded case has 3x penalty even without thread collisions (e.g., just the overhead of manipulating the mutex structure).

+

* The resip Mutex class has a ~25% penalty compared to the native mutex in libc's random(). Also, the resip Mutex doesn't special-case the single-thread case.

Latest revision as of 12:48, 28 February 2011

The Random class has compile time options to control how it generates random numbers. It is configured via pre-processor symbols, defined either by changing Random.hxx or in the build environment.

The configuration choices primarily affect how multi-threaded applications behave. The behavior is important because the Random class is used to generate SIP CallIds, branch identifiers, and (from/to) tags.

The options are described below. See Random.hxx for the most up-to-date descriptions.

[WIN32-default]
The Random class uses rand().

RESIP_RANDOM_WIN32_RTL
The Random class uses Windows' RtlGenRandom (aka SystemFunction036) on XP and higher system, and falls back to rand() on older Windows platforms. NOTE: This hasn't been tested yet, it is direct merge from CTPC branch.

[POSIX-default]
The Random class uses libc's random(). The random state is shared across all threads and all parts of the application. While Linux's libc implementation has an internal mutex for thread protection, POSIX doesn't require this and other platforms may have corruption. Also, any part of the application may "corrupt" the state by calling srandom() with "stupid" values.

RESIP_RANDOM_THREAD_MUTEX
The Random class itself keeps a Mutex, and will lock the mutex prior to calling random_r(). Of course, this requires that the platform support random_r(). This provides assured thread-safety. Also, the random state is private to resiprocate, and other parts of the application cannot corrupt it.

RESIP_RANDOM_THREAD_LOCAL
The Random class allocate state for each thread and tracks that using ThreadLocalStorage. As above, this requires that the platform support random_r(). This avoids serialization among threads (no mutex), and also prevents corruption by non-resiprocate parts of the application.

Early performance results (see below) indicate that where random_r() is supported, the THREAD_LOCAL mode should be used as it has higher performance in every case.

The test program starts T threads, and runs C cycles in each thread, where each cycle generates one million random integers (each integer is 32 bits). It then reports the total time from start of all the threads to completion of the last one. With respect to the number of threads, there are some special cases. Values of T (number of threads):

-1. The test is run in the main process, without creating any child threads.

0. The test is run in the main process, but there is also a dummy thread hanging around doing nothing. Turns out this makes a difference, because libc's mutex protection of random() optimizes for the single-thread case.

1. The test is run in a child process, and the main process does nothing.

N. There N child processes, and each child process generates C cycles of random numbers, and the main process does nothing.

The test results for 64-bit Linux 2.6 kernels on two different platforms are in table below. The columns are:

random --> default, using native random()

mutex --> using THREAD_MUTEX

local --> using THREAD_LOCAL

(See more text below the picture -- the image has lots of excess white-space).

Observations:

Thread local storage provides better performance in ALL cases, including all single-thread cases. Platforms that support random_r() should use this configuration.

The mutex'd cases ("random" and "mutex") show expected linear performance on single-core processor. But they show much WORSE than linear performance on multi-core processor. That is, running many threads on multi-core processor is actually slower than running on single-core processor. This result doesn't really make sense, and I hope someone else can replicate/explain it.

Linux random() optimizes the true single-thread (T=-1) case. The multi-threaded case has 3x penalty even without thread collisions (e.g., just the overhead of manipulating the mutex structure).

The resip Mutex class has a ~25% penalty compared to the native mutex in libc's random(). Also, the resip Mutex doesn't special-case the single-thread case.