Re: [Gumstix-users] nanosleep

Changing
#define OSCR0 0x40A00010
to
#define OSCR0 ((int volatile) 0x40A00010)
makes no difference; sampling OSCR0 returns the same value no matter how many times it's sampled.
Compiling with all optimizations off fixes the secondary issue.
But the primary issue remains: sampling OSCR0 in a tight loops seems to hang each sampling for almost 2 microseconds.
I've got an application that needs at least 1 microsecond resolution on the timer. It's also got lots that it has to get done between sampling OSCR0, so it can't afford to hang for ~700 instruction cycles.
Does anybody at gumstix have a clue about what's going on here?
----- Original Message ----
From: David Cary <d.cary+sourceforge.net@...>
To: General mailing list for gumstix users. <gumstix-users@...>
Sent: Wednesday, December 26, 2007 11:57:29 PM
Subject: Re: [Gumstix-users] nanosleep
Dear Tom Shepard,
> Date: Sat, 22 Dec 2007 13:30:28 -0800 (PST)
> From: Tom Shepard <tom_shepard@...>
...
> Here's the core of the test (OSCR0 is defined as 0x40A00010):
>
> getmem(OSCR0);
>
> for (i = 0; i < N; i++) {
> times[i] = getmem(OSCR0);
> }
>
> for (i = 0; i < N; i++) {
> printf("%u\n", times[i]);
> }
...
> Here's something else that's funny: if you remove the first
getmem(OSCR0),
> the sample values never change, even over 3000 samples.
...
Dear Tom Shepard,
I'm mystified by the other things you mention, but "values never
change" is a classic sign of missing "volatile" declarations.
I suspect that OSCR0 should be defined as something like
#define OSCR0 ( (int volatile *)0x40A00010 )
.
If somehow it is being defined as
#define OSCR0 ( (int *)0x40A00010 )
then the compiler may (or may not) "optimize" away repeated reads,
re-using the first read cached in a register, giving the "values never
change" problem you see.
I've heard that some compilers have had a bug, causing them to
improperly ignore the "volatile" qualifier in some specific programs
under some specific optimization levels -- do you still get the
"values never change" problem if you re-compile with all optimizations
turned off?

Thread view

I'm seeing some strange quantization when I sample OSClock 0 (OSCR0) in a tight loop. (I'm using memory mapping like that used in the gpregs (GPIO) sample code.)
According to the PXA270 documentation, OSCR0 ticks at 3.25 MHz; I've confirmed that via manual timing of busy code running for several minutes.
However, when I sample OSCR0 in a tight loop, over 1000 samples the clock ticks an average of 3.5 times between samples - just over 1 microsecond per sample (removing the very few anomalous samples when Linux was obviously busy doing something else.)
I'm running the test on an XM4; at 400 MHz, I'd expect to loop more than a few times recording the same clock value before seeing the clock change; instead, it appears that over 400 instruction cycles have gone by between samples.
Here's the core of the test (OSCR0 is defined as 0x40A00010):
getmem(OSCR0);
for (i = 0; i < N; i++) {
times[i] = getmem(OSCR0);
}
for (i = 0; i < N; i++) {
printf("%u\n", times[i]);
}
Here's the result for N = 10:
# ./clock
2250397945
2250397948
2250397951
2250397955
2250397958
2250397961
2250397964
2250397967
2250397970
2250397973
#
Here's something else that's funny: if you remove the first getmem(OSCR0), the sample values never change, even over 3000 samples.
I've pored over the PXA270 Processor Developer's Manual looking for an explanation. The closest I've come is an isolated remark that OSCR0 ticks at 3.25 MHz "with a resolution of 1 microsecond" - but no explanation of what that means. I'd still expect to see a number of samples go by without changing before seeing a change - even if that change averages 3.25 ticks. Instead, it appears that the act of looking at OSCR0 causes the program to hang for ~1 usec.
I'm seeing almost the same behavior when running a uboot standalone app which directly accesses OSCR0 (instead of via memory mapping).
----- Original Message ----
From: pvm <pvmorici@...>
Sent: Saturday, December 22, 2007 2:23:31 PM
Grahame Jordan wrote:
>
> Mmmmm? Where, who? That is just nuts - 20ms. Anyone would think that
> usleep(1) would sleep 1us?
>
Is what you are doing in kernel or user space? If you want very
acurate
delay times you should use the udelay function in the kernel. Since
the
idea of sleeping means the process will be taken off the run queue for
some
amount of time it makes sense that the sleep times would have to be in
denominations of the scheduling time slice.
If you want acurate delays in user space you could write your own delay
loop
in asm using some instruction that has a known number of clock cycles
etc...
--
_______________________________________________
gumstix-users mailing list
gumstix-users@...
https://lists.sourceforge.net/lists/listinfo/gumstix-users

Changing
#define OSCR0 0x40A00010
to
#define OSCR0 ((int volatile) 0x40A00010)
makes no difference; sampling OSCR0 returns the same value no matter how many times it's sampled.
Compiling with all optimizations off fixes the secondary issue.
But the primary issue remains: sampling OSCR0 in a tight loops seems to hang each sampling for almost 2 microseconds.
I've got an application that needs at least 1 microsecond resolution on the timer. It's also got lots that it has to get done between sampling OSCR0, so it can't afford to hang for ~700 instruction cycles.
Does anybody at gumstix have a clue about what's going on here?
----- Original Message ----
From: David Cary <d.cary+sourceforge.net@...>
To: General mailing list for gumstix users. <gumstix-users@...>
Sent: Wednesday, December 26, 2007 11:57:29 PM
Subject: Re: [Gumstix-users] nanosleep
Dear Tom Shepard,
> Date: Sat, 22 Dec 2007 13:30:28 -0800 (PST)
> From: Tom Shepard <tom_shepard@...>
...
> Here's the core of the test (OSCR0 is defined as 0x40A00010):
>
> getmem(OSCR0);
>
> for (i = 0; i < N; i++) {
> times[i] = getmem(OSCR0);
> }
>
> for (i = 0; i < N; i++) {
> printf("%u\n", times[i]);
> }
...
> Here's something else that's funny: if you remove the first
getmem(OSCR0),
> the sample values never change, even over 3000 samples.
...
Dear Tom Shepard,
I'm mystified by the other things you mention, but "values never
change" is a classic sign of missing "volatile" declarations.
I suspect that OSCR0 should be defined as something like
#define OSCR0 ( (int volatile *)0x40A00010 )
.
If somehow it is being defined as
#define OSCR0 ( (int *)0x40A00010 )
then the compiler may (or may not) "optimize" away repeated reads,
re-using the first read cached in a register, giving the "values never
change" problem you see.
I've heard that some compilers have had a bug, causing them to
improperly ignore the "volatile" qualifier in some specific programs
under some specific optimization levels -- do you still get the
"values never change" problem if you re-compile with all optimizations
turned off?

Hi Tom,
On Dec 27, 2007 6:15 PM, Tom Shepard <tom_shepard@...> wrote:
>
> Changing
>
> #define OSCR0 0x40A00010
> to
> #define OSCR0 ((int volatile) 0x40A00010)
To make a difference it would need to be (*(volatile int *)0x40A0010),
and that would only be from within kernel mode.
In user mode, you also need to use a volatile pointer, but in this
case, it's the pointer into the memory mapped region. It would also be
good to see the command you used to do the mmap.
--
Dave Hylands
Vancouver, BC, Canada
http://www.DaveHylands.com/

Dear Tom Shepard,
> Date: Sat, 22 Dec 2007 13:30:28 -0800 (PST)
> From: Tom Shepard <tom_shepard@...>
...
> Here's the core of the test (OSCR0 is defined as 0x40A00010):
>
> getmem(OSCR0);
>
> for (i = 0; i < N; i++) {
> times[i] = getmem(OSCR0);
> }
>
> for (i = 0; i < N; i++) {
> printf("%u\n", times[i]);
> }
...
> Here's something else that's funny: if you remove the first getmem(OSCR0),
> the sample values never change, even over 3000 samples.
...
Dear Tom Shepard,
I'm mystified by the other things you mention, but "values never
change" is a classic sign of missing "volatile" declarations.
I suspect that OSCR0 should be defined as something like
#define OSCR0 ( (int volatile *)0x40A00010 )
.
If somehow it is being defined as
#define OSCR0 ( (int *)0x40A00010 )
then the compiler may (or may not) "optimize" away repeated reads,
re-using the first read cached in a register, giving the "values never
change" problem you see.
I've heard that some compilers have had a bug, causing them to
improperly ignore the "volatile" qualifier in some specific programs
under some specific optimization levels -- do you still get the
"values never change" problem if you re-compile with all optimizations
turned off?
"volatile"
http://en.wikibooks.org/wiki/Embedded_Systems/C_Programming#volatile
--
David Cary

I am a little confused about this. In general I want to get a sure
understanding of creating delays/sleeps that are actually what I ask for
+- some reasonable resolution. 20msec is not reasonable for
usleep(1-19000) or even for nanosleep(xxx);
This particular program is written in user space, with threads.
One thread is reading a Liquid Flow Meter via I2C
Another thread is controlling a character LCD
The LCD requires sub usec sleeps to operate at a nice speed, which I am
implementing with gettimeofday() * n (~5us) which is ugly,
The Flow Meter needs to sleep for specific(accurate) amounts of time in
the msec range < 20msecs in some cases to enable accurate flow readings.
The sleeps need to be non_blocking so that the other thread can do its bit.
I would like to be sure at least in the cased of Flow Meter that the
sleeps are within +- 0.5 msec every time so that I can get accurate
flow/volume calculation. However the usleep should be able to do the
same +- 0.5usec.
Linux does things sometime that exceeds the time of the sleeps. Is there
a method to get the usec sleeps to be what I ask for. An example would
be nice.
Help and guidance will be much appreciated
Thanks
Grahame
Dan McDonald - Gumstix wrote:
> I don't know what the prototype for getmem is like. It may be discarding
> the volatile cast.
>
> try a direct memory access with
>
> time[i] = *OSCR0
>
> where the OSCR0 is a pointer to a volatile unsigned int.
>
> Later,
> Dan McDonald
>
>
>> Dear Tom Shepard,
>>
>>
>>> Date: Sat, 22 Dec 2007 13:30:28 -0800 (PST)
>>> From: Tom Shepard <tom_shepard@...>
>>>
>> ...
>>
>>> Here's the core of the test (OSCR0 is defined as 0x40A00010):
>>>
>>> getmem(OSCR0);
>>>
>>> for (i = 0; i < N; i++) {
>>> times[i] = getmem(OSCR0);
>>> }
>>>
>>> for (i = 0; i < N; i++) {
>>> printf("%u\n", times[i]);
>>> }
>>>
>> ...
>>
>>> Here's something else that's funny: if you remove the first
>>> getmem(OSCR0),
>>> the sample values never change, even over 3000 samples.
>>>
>> ...
>>
>> Dear Tom Shepard,
>>
>> I'm mystified by the other things you mention, but "values never
>> change" is a classic sign of missing "volatile" declarations.
>>
>> I suspect that OSCR0 should be defined as something like
>> #define OSCR0 ( (int volatile *)0x40A00010 )
>> .
>> If somehow it is being defined as
>> #define OSCR0 ( (int *)0x40A00010 )
>> then the compiler may (or may not) "optimize" away repeated reads,
>> re-using the first read cached in a register, giving the "values never
>> change" problem you see.
>>
>> I've heard that some compilers have had a bug, causing them to
>> improperly ignore the "volatile" qualifier in some specific programs
>> under some specific optimization levels -- do you still get the
>> "values never change" problem if you re-compile with all optimizations
>> turned off?
>>
>> "volatile"
>> http://en.wikibooks.org/wiki/Embedded_Systems/C_Programming#volatile
>>
>> --
>> David Cary
>>
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by: Microsoft
>> Defy all challenges. Microsoft(R) Visual Studio 2005.
>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>> _______________________________________________
>> gumstix-users mailing list
>> gumstix-users@...
>> https://lists.sourceforge.net/lists/listinfo/gumstix-users
>>
>>
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2005.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> gumstix-users mailing list
> gumstix-users@...
> https://lists.sourceforge.net/lists/listinfo/gumstix-users
>
>

Hi Grahame,
On Jan 8, 2008 9:38 PM, Grahame Jordan <gbj@...> wrote:
> I am a little confused about this. In general I want to get a sure
> understanding of creating delays/sleeps that are actually what I ask for
> +- some reasonable resolution. 20msec is not reasonable for
> usleep(1-19000) or even for nanosleep(xxx);
>
> This particular program is written in user space, with threads.
>
> One thread is reading a Liquid Flow Meter via I2C
> Another thread is controlling a character LCD
>
> The LCD requires sub usec sleeps to operate at a nice speed, which I am
> implementing with gettimeofday() * n (~5us) which is ugly,
> The Flow Meter needs to sleep for specific(accurate) amounts of time in
> the msec range < 20msecs in some cases to enable accurate flow readings.
Most LCDs only update 30-60 times per second (so 16-33 msec).
Getting accurate fine grained sleeps like that will probably require
writing a kernel driver.
> The sleeps need to be non_blocking so that the other thread can do its bit.
> I would like to be sure at least in the cased of Flow Meter that the
> sleeps are within +- 0.5 msec every time so that I can get accurate
> flow/volume calculation. However the usleep should be able to do the
> same +- 0.5usec.
>
> Linux does things sometime that exceeds the time of the sleeps. Is there
> a method to get the usec sleeps to be what I ask for. An example would
> be nice.
I think you meant that you want blocking sleeps (ones that allow other
threads to run) The thread issuing the sleep is the one that blocks.
Anyways, any type of blocking event which employs a timeout, uses
jiffies as the metric for the length of the timeout. By default, the
jiffy period is 10 msec. You could rebuild the kernel and decrease the
jiffy time to say 1 msec.
The next best thing would be to write a kernel driver which uses one
of the timers to generate interrupts at appropriate intervals, and
have a kernel thread do most of the work.
--
Dave Hylands
Vancouver, BC, Canada
http://www.DaveHylands.com/