Why does my compiler give me the err-msg: "Not allowed in current CPU mode" when I try to compile the cmd rdtsc? When I hard-code the opcode using "dw 310F" instead of "rdtsc" it works (no compiler error, no errors during execution)...
Does anyone know what intel centrino processors do with the "rdtsc" cmd? I wrote a cpu speed test (using rdtsc) that works fine on all processors - except my new centrino....!?

You will always have the problem timing code in ring3 that the OS interferes with the timing so you are best to set up timings for at least a half a second or longer to reduce the error to under 1% or so. RDTSC is fine in ring0 as nothing interferes with it but unless you want to write a driver to do your testing, just run the test long enough with the priority control you are using and you will get reliable results.

To Neitsa: found another "GetCPUSpeed" source that included "xor eax, eax" and "cpuid" before executing "rdtsc". But the results are the same...
To hutch: I increased DELAY_TIME and let the test run over 10 seconds, same results.....
What about the APIs GetPerformanceFrequency & GetPerformanceCounter!?

Since windows is not 'really' multi-threaded but handle process in a round-robin fashion, each time a time-slice occurs, the process and its thread states are saved. Then the O.S cycle through all process. Finally it came back to the process which use RDTSC.

So, does the time stamp counter continue to increment when the process has been saved (and therefore is freezed in a known state) ?

I time code with GetTickCount() in millisecond resolution as ring3 removes any additional advantage in terms of precision due to OS interference.

That's why you need to set the priority class. I use RDTSC and this trick to get the same numbers every time I run the code. ( I also subtract out loop overhead). Having timing code that produces the same number every run is important to accurate timing. I avoid GetTickCount like the plague due to how inaccurate it is.

From testing over a number of years, ring3 timing is at best about 3% on smaller samples and gets worse as they get smaller. Once you set a duration over about half a second, you get results that are down around 1% which is hard to improve on. You can perform timings at ring0 but its a lot more work and it does not properly represent the performance in ring3 so there is no real gain there.