if someone feels for making gfortran's system_clock into a nano second
resolution wall clock timer see this page for some assembly snippets(ia32,64):
http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/IA32LinuxCluster/Doc/timing.html
(relevant lines pasted below if the page disappears)
1.5 Linux Assembly code
The highest resolution but least portable timers are the Linux ASM timers. These
routines provide wall clock time .
IA64
In order to use the Linux ASM timer on the IA64 platform (titan), you will need
to compile the routine using the GNU C compiler. The routine is:
unsigned long long int nanotime_ia64(void)
{
unsigned long long int val;
__asm__ __volatile__("mov %0=ar.itc" : "=r"(val) :: "memory");
return(val);
}
IA32
On the IA32 platform (platinum) you can use either the Intel or GNU compiler.
The routine is:
unsigned long long int nanotime_ia32(void)
{
unsigned long long int val;
__asm__ __volatile__("rdtsc" : "=A" (val) : );
return(val);
}
You can link to the resulting object file with either Intel or GNU compiler from
C or Fortran with the appropriate wrapper if needed. If you extract the object
file get_clockfreq.o from /usr/lib/librt.a then you can call the function
__get_clockfreq() to determine clock frequency. To extract the routine, try:
ar xv /usr/lib/librt.a get_clockfreq.o
To use the routines as timers, you can use the following routine. Call it before
and after the section of code you want to time and the difference will be the
elapsed time. Be sure to include the appropriate routine from above.
static long int CPS;
static double iCPS;
static unsigned start=0;
/* CPU Clock Freq. in Hz from routine in /usr/lib/librt.a */
/* extern unsigned long long int __get_clockfreq(void); */
double second(void) /* Include an '_' if you will be calling from Fortan */
{
double foo;
if (!start)
{
/* CPU Clock Freq. in Hz from routine in /usr/lib/librt.a */
/* CPS=__get_clockfreq(); */
/* CPU Clock Freq. in Hz taken from /proc/cpuinfo */
CPS=800134992;
iCPS=1.0/(double)CPS;
start=1;
}
/* Uncomment one of the following */
/* foo=iCPS*nanotime_ia32(); */ /* If running on IA32 machine */
/* foo=iCPS*nanotime_ia64(); */ /* If running on IA64 machine */
return(foo);
}

Confirmed, and here is the PPC32 one too (but I do not know how to get the timebase frequency):
Note (long needs to be really a 32bit value and I also wrote this from memory):
unsigned long long GetTimebase(void)
{
unsigned long low;
unsigned long high;
unsinged long high1;
do
{
asm volatile ("mftbu %0":"=r"(high));
asm volatile ("mftb %0":"=r"(low));
asm volatile ("mftbu %0":"=r"(high1));
} while (high != high1);
return ((unsigned long long)high)<<32ULL|(unsigned long long) low;
}
PPC64 (easier as mftb gives the full timebase register for 64bit processors):
unsigned long long GetTimebase(void)
{
unsigned long long timebase;
asm volatile ("mftb %0":"=r"(timebase));
return timebase;
}

(In reply to comment #0)
> If you extract the object
> file get_clockfreq.o from /usr/lib/librt.a then you can call the function
> __get_clockfreq() to determine clock frequency. To extract the routine, try:
>
> ar xv /usr/lib/librt.a get_clockfreq.o
>
> To use the routines as timers, you can use the following routine. Call it before
> and after the section of code you want to time and the difference will be the
> elapsed time. Be sure to include the appropriate routine from above.
is this comment about get_clockfreq.o actually correct ? I find it returns different values depending on the load of the machine (I guess this is frequency rescaling at work, i.e.):
46799775 1596000000 0.029323167293233084
46703250 1596000000 0.029262687969924813
40773807 1596000000 0.02554749812030075
34589439 2394000000 0.014448387218045113
33201315 1596000000 0.020802828947368422
34758144 2394000000 0.014518857142857142
33325110 1596000000 0.020880394736842105
34576236 2394000000 0.014442872180451127
where the first number is the ticks as returned by differences of nanotime_ia32, and the second the number returned by get_clockfreq, the third is the estimated time if seconds (quite random, since it is allways the same matrix multiply). (an unrelated issue is that it wraps pretty quicky...)

Subject: Re: assembly snippets for nano second resolution wall clock time
"jv244 at cam dot ac dot uk" <gcc-bugzilla@gcc.gnu.org> writes:
> is this comment about get_clockfreq.o actually correct ? I find it returns
> different values depending on the load of the machine (I guess this is
> frequency rescaling at work, i.e.):
yup, it is rescaling. should be turned off if you want reliable high
res measurements.
Helge