Hi guys,
I have one problem on making pfmon-3.9. I know this question is kind of low level, but I have tried a lot of ways and no result.
The system I am using is Ubuntu (2.6.38 kernel), and I have already installed libpfm-3.10 successfully.
The information for this failure is:
make[1]: Entering directory
`/home/junjie/tools/PMU/pfmon-3.9/pfmon'
cc -g -ggdb -Wall -Werror -D_REENTRANT
-I/home/junjie/tools/PMU/libpfm3.10/include -DCONFIG_PFMON_X86_64 -DPFMON_DEBUG
-DDATADIR=\"/home/junjie/tools/PMU/pfmon3.9/share/pfmon\" -I.
-I/usr/include/libelf -D_GNU_SOURCE -DPFMON_DEBUG -g -c pfmon_symbols.c
cc1: warnings being treated as errors
pfmon_symbols.c: In function pfmon_gather_module_symbols?
pfmon_symbols.c:720:3: error: implicit declaration of function
tat?
make[1]: *** [pfmon_symbols.o] Error 1
make[1]: Leaving directory
`/home/junjie/tools/PMU/pfmon-3.9/pfmon'
make:
*** [all] Error 2
Could anyone give me some hint or solution for this?
I have modified the config.mk according to the README.
Thanks!
Best
Junjie

On Thu, 1 Sep 2011, Ryan Johnson wrote:
> definite culprit is ctests/overflow_allcounters, but I haven't done a
> bisection search in 2.6.38 to see if there are any others.
That's a bug in the kernel that will be fixed in 3.1 (and has been
backported to the stable trees). The current PAPI CVS has a patch
that disables the cause of the test (overflow on software counts) from
overflow_allcounters so that people with unpatched kernels won't have
issues running the PAPI tests.
Vince

On 01/09/2011 6:45 AM, stephane eranian wrote:
> On Thu, Sep 1, 2011 at 3:29 PM, stephane eranian<eranian@...> wrote:
>> On Thu, Sep 1, 2011 at 3:06 PM, Ryan Johnson
>> <ryan.johnson@...> wrote:
>>> On 01/09/2011 1:55 AM, stephane eranian wrote:
>>>> On Thu, Sep 1, 2011 at 1:07 AM, Corey Ashford
>>>> <cjashfor@...> wrote:
>>>>> On 08/25/2011 07:19 AM, stephane eranian wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Sorry for late reply.
>>>>>>
>>>>>> The current support for mmaped count is broken on perf_event x86.
>>>>>> It simply does not work. I think it only works on PPC at this point.
>>>>> Just as an aside, you can access the counter registers from user space
>>>>> on Power (aka PPC) machines, but because the kernel is free to schedule
>>>>> the events onto whatever counters that meet the resource constraints,
>>>>> it's not at all clear which hardware counter to read from user space,
>>>>> and in fact, with event rotation, the counter being used can change from
>>>>> one system tick till the next.
>>>>>
>>>>> If you program a single event, you can be guaranteed that it won't move
>>>>> around, but you still will have to guess or somehow determine which
>>>>> hardware counter is being used by the kernel.
>>>>>
>>>> Yes, and that's why they have this 'lock' field in there.It's not really a lock
>>>> but rather a generation counter. You need to read it before you attempt to
>>>> read and you need to check it when you're done reading. If the two values
>>>> don't match then the counter changed and you need to retry. And changes
>>>> means it may have moved to a different counter.
>>> This protocol is actually documented pretty well in
>>> <linux/perf_event.h>, too. Read the lock, read the index, read hw
>>> counter[index-1], read lock again to verify.
>>>
>>>> But the key problem here is the time scaling. In case you are multiplex
>>>> you need to be able to retrieve time_enabled and time_running to scale
>>>> the count. But that's not exposed, thus it does not work as soon as you
>>>> have multiplexing. Well, unless you only care about deltas and not the
>>>> absolute values.
>>> Doesn't perf_event_mmap_page expose both those, also protected by the
>>> generation counter? Or are you saying the kernel doesn't actually update
>>> those fields right now?
>>>
>> Yes, it does. I am not sure they're updated correctly, though.
>> I have not tried that in a very long time.
>>
> Did you manage to make libpfm4's self_count program work correctly?
> Even by just looking at the raw count coming out of rdpmc?
>
> I think there are issues with hdr->offset, i.e., the 64-bit sw-maintained
> base for the counter.
I only did limited testing because things took priority the last couple
of weeks, but I'll be back into it in the next couple of weeks.
Meanwhile, here's what I know:
The machine is a Westmere EX (which is why I can't just use an older
kernel+perfctr) running kernel 2.6.38. I've got the cvs head for papi,
wired up with git version 9fc1bc1e of libpfm4. self_count seg faults by
default because rdpmc is privileged, and papi's unit tests cause the
machine to hard-lock (have to use the hypervisor to reboot). One
definite culprit is ctests/overflow_allcounters, but I haven't done a
bisection search in 2.6.38 to see if there are any others. I upgraded to
kernel 2.6.39, ctests/overflow_allcounters is the only unit test
failure, but it "only" hard-locks the perf events infrastructure rather
than the whole machine. The unit tests's process hangs with 0% cpu util
and becomes unkillable, and any later process attempting to use perf
events suffers the same fate. The mmap+rdpmc support is apparently
disabled in 2.6.39, in that index=0 for all time. The self_count test
runs without errors and reports monotonically increasing values, but I
never attempted to verify that the starting count was meaningful.
For now I've rolled back to 2.6.38, since the later version is a step
backwards for my needs. With the kernel module I mentioned before,
user-level rdpmc seems to stay enabled indefinitely and self_count runs
without errors. I've extended the test slightly to run fib with
n={30,35,40}, to track which counter number it used directly (if any),
and to report the deltas between measurements. Here's the output I get:
> $ ./self_count
> raw=0xcd73 offset=0x0, ena=36278 run=36278 idx=-1 direct=0
> 52595 PERF_COUNT_HW_CPU_CYCLES (delta= cd73)
> raw=0xffff811d738b offset=0x7fffffff, ena=36278 run=36278 idx=0 direct=1
> 281474995417994 PERF_COUNT_HW_CPU_CYCLES (delta= 10000011ca617)
> raw=0xffff8d588633 offset=0x7fffffff, ena=36278 run=36278 idx=0 direct=1
> 281475200615986 PERF_COUNT_HW_CPU_CYCLES (delta= c3b12a8)
> raw=0xffff94aa8789 offset=0xfffffffe, ena=36278 run=36278 idx=0 direct=1
> 281477470914439 PERF_COUNT_HW_CPU_CYCLES (delta= 87520155)
> raw=0xffff95c33ede offset=0xfffffffe, ena=36278 run=36278 idx=0 direct=1
> 281477489311452 PERF_COUNT_HW_CPU_CYCLES (delta= 118b755)
> raw=0xffffa1ea0c92 offset=0xfffffffe, ena=36278 run=36278 idx=0 direct=1
> 281477693181072 PERF_COUNT_HW_CPU_CYCLES (delta= c26cdb4)
> raw=0xffffa8e995ed offset=0x17ffffffd, ena=36278 run=36278 idx=0 direct=1
> 281479958074858 PERF_COUNT_HW_CPU_CYCLES (delta= 86ff895a)
> raw=0xffffaa0262b9 offset=0x17ffffffd, ena=36278 run=36278 idx=0 direct=1
> 281479976477366 PERF_COUNT_HW_CPU_CYCLES (delta= 118cccc)
> raw=0xffffb6284921 offset=0x17ffffffd, ena=36278 run=36278 idx=0 direct=1
> 281480180287774 PERF_COUNT_HW_CPU_CYCLES (delta= c25e668)
Judging from the above, the offset does seem to be broken, truncated to
32 bits, perhaps? If I force to always call read() then it makes more sense:
> $ ./self_count
> raw=0xda66 offset=0x0, ena=39065 run=39065 idx=-1 direct=0
> 55910 PERF_COUNT_HW_CPU_CYCLES (delta= da66)
> raw=0x11dc60e offset=0x0, ena=10052007 run=10052007 idx=-1 direct=0
> 18728462 PERF_COUNT_HW_CPU_CYCLES (delta= 11ceba8)
> raw=0xd590016 offset=0x0, ena=120077612 run=120077612 idx=-1 direct=0
> 223936534 PERF_COUNT_HW_CPU_CYCLES (delta= c3b3a08)
> raw=0x9466a0de offset=0x0, ena=1334882738 run=1334882738 idx=-1 direct=0
> 2489753822 PERF_COUNT_HW_CPU_CYCLES (delta= 870da0c8)
> raw=0x957f95c4 offset=0x0, ena=1344755931 run=1344755931 idx=-1 direct=0
> 2508166596 PERF_COUNT_HW_CPU_CYCLES (delta= 118f4e6)
> raw=0xa1b53a47 offset=0x0, ena=1454582523 run=1454582523 idx=-1 direct=0
> 2713008711 PERF_COUNT_HW_CPU_CYCLES (delta= c35a483)
The counter itself seems to work fine, though, and I'd only be using it
for deltas anyway.
Ryan

On Thu, Sep 1, 2011 at 3:29 PM, stephane eranian <eranian@...> wrote:
> On Thu, Sep 1, 2011 at 3:06 PM, Ryan Johnson
> <ryan.johnson@...> wrote:
>> On 01/09/2011 1:55 AM, stephane eranian wrote:
>>> On Thu, Sep 1, 2011 at 1:07 AM, Corey Ashford
>>> <cjashfor@...> wrote:
>>>> On 08/25/2011 07:19 AM, stephane eranian wrote:
>>>>> Hi,
>>>>>
>>>>> Sorry for late reply.
>>>>>
>>>>> The current support for mmaped count is broken on perf_event x86.
>>>>> It simply does not work. I think it only works on PPC at this point.
>>>> Just as an aside, you can access the counter registers from user space
>>>> on Power (aka PPC) machines, but because the kernel is free to schedule
>>>> the events onto whatever counters that meet the resource constraints,
>>>> it's not at all clear which hardware counter to read from user space,
>>>> and in fact, with event rotation, the counter being used can change from
>>>> one system tick till the next.
>>>>
>>>> If you program a single event, you can be guaranteed that it won't move
>>>> around, but you still will have to guess or somehow determine which
>>>> hardware counter is being used by the kernel.
>>>>
>>> Yes, and that's why they have this 'lock' field in there.It's not really a lock
>>> but rather a generation counter. You need to read it before you attempt to
>>> read and you need to check it when you're done reading. If the two values
>>> don't match then the counter changed and you need to retry. And changes
>>> means it may have moved to a different counter.
>> This protocol is actually documented pretty well in
>> <linux/perf_event.h>, too. Read the lock, read the index, read hw
>> counter[index-1], read lock again to verify.
>>
>>> But the key problem here is the time scaling. In case you are multiplex
>>> you need to be able to retrieve time_enabled and time_running to scale
>>> the count. But that's not exposed, thus it does not work as soon as you
>>> have multiplexing. Well, unless you only care about deltas and not the
>>> absolute values.
>> Doesn't perf_event_mmap_page expose both those, also protected by the
>> generation counter? Or are you saying the kernel doesn't actually update
>> those fields right now?
>>
> Yes, it does. I am not sure they're updated correctly, though.
> I have not tried that in a very long time.
>
Did you manage to make libpfm4's self_count program work correctly?
Even by just looking at the raw count coming out of rdpmc?
I think there are issues with hdr->offset, i.e., the 64-bit sw-maintained
base for the counter.
>> Ryan
>>
>>
>> ------------------------------------------------------------------------------
>> Special Offer -- Download ArcSight Logger for FREE!
>> Finally, a world-class log management solution at an even better
>> price-free! And you'll get a free "Love Thy Logs" t-shirt when you
>> download Logger. Secure your free ArcSight Logger TODAY!
>> http://p.sf.net/sfu/arcsisghtdev2dev
>> _______________________________________________
>> perfmon2-devel mailing list
>> perfmon2-devel@...
>> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
>>
>

On Thu, Sep 1, 2011 at 3:06 PM, Ryan Johnson
<ryan.johnson@...> wrote:
> On 01/09/2011 1:55 AM, stephane eranian wrote:
>> On Thu, Sep 1, 2011 at 1:07 AM, Corey Ashford
>> <cjashfor@...> wrote:
>>> On 08/25/2011 07:19 AM, stephane eranian wrote:
>>>> Hi,
>>>>
>>>> Sorry for late reply.
>>>>
>>>> The current support for mmaped count is broken on perf_event x86.
>>>> It simply does not work. I think it only works on PPC at this point.
>>> Just as an aside, you can access the counter registers from user space
>>> on Power (aka PPC) machines, but because the kernel is free to schedule
>>> the events onto whatever counters that meet the resource constraints,
>>> it's not at all clear which hardware counter to read from user space,
>>> and in fact, with event rotation, the counter being used can change from
>>> one system tick till the next.
>>>
>>> If you program a single event, you can be guaranteed that it won't move
>>> around, but you still will have to guess or somehow determine which
>>> hardware counter is being used by the kernel.
>>>
>> Yes, and that's why they have this 'lock' field in there.It's not really a lock
>> but rather a generation counter. You need to read it before you attempt to
>> read and you need to check it when you're done reading. If the two values
>> don't match then the counter changed and you need to retry. And changes
>> means it may have moved to a different counter.
> This protocol is actually documented pretty well in
> <linux/perf_event.h>, too. Read the lock, read the index, read hw
> counter[index-1], read lock again to verify.
>
>> But the key problem here is the time scaling. In case you are multiplex
>> you need to be able to retrieve time_enabled and time_running to scale
>> the count. But that's not exposed, thus it does not work as soon as you
>> have multiplexing. Well, unless you only care about deltas and not the
>> absolute values.
> Doesn't perf_event_mmap_page expose both those, also protected by the
> generation counter? Or are you saying the kernel doesn't actually update
> those fields right now?
>
Yes, it does. I am not sure they're updated correctly, though.
I have not tried that in a very long time.
> Ryan
>
>
> ------------------------------------------------------------------------------
> Special Offer -- Download ArcSight Logger for FREE!
> Finally, a world-class log management solution at an even better
> price-free! And you'll get a free "Love Thy Logs" t-shirt when you
> download Logger. Secure your free ArcSight Logger TODAY!
> http://p.sf.net/sfu/arcsisghtdev2dev
> _______________________________________________
> perfmon2-devel mailing list
> perfmon2-devel@...
> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
>

On 01/09/2011 1:55 AM, stephane eranian wrote:
> On Thu, Sep 1, 2011 at 1:07 AM, Corey Ashford
> <cjashfor@...> wrote:
>> On 08/25/2011 07:19 AM, stephane eranian wrote:
>>> Hi,
>>>
>>> Sorry for late reply.
>>>
>>> The current support for mmaped count is broken on perf_event x86.
>>> It simply does not work. I think it only works on PPC at this point.
>> Just as an aside, you can access the counter registers from user space
>> on Power (aka PPC) machines, but because the kernel is free to schedule
>> the events onto whatever counters that meet the resource constraints,
>> it's not at all clear which hardware counter to read from user space,
>> and in fact, with event rotation, the counter being used can change from
>> one system tick till the next.
>>
>> If you program a single event, you can be guaranteed that it won't move
>> around, but you still will have to guess or somehow determine which
>> hardware counter is being used by the kernel.
>>
> Yes, and that's why they have this 'lock' field in there.It's not really a lock
> but rather a generation counter. You need to read it before you attempt to
> read and you need to check it when you're done reading. If the two values
> don't match then the counter changed and you need to retry. And changes
> means it may have moved to a different counter.
This protocol is actually documented pretty well in
<linux/perf_event.h>, too. Read the lock, read the index, read hw
counter[index-1], read lock again to verify.
> But the key problem here is the time scaling. In case you are multiplex
> you need to be able to retrieve time_enabled and time_running to scale
> the count. But that's not exposed, thus it does not work as soon as you
> have multiplexing. Well, unless you only care about deltas and not the
> absolute values.
Doesn't perf_event_mmap_page expose both those, also protected by the
generation counter? Or are you saying the kernel doesn't actually update
those fields right now?
Ryan

On Thu, Sep 1, 2011 at 1:07 AM, Corey Ashford
<cjashfor@...> wrote:
> On 08/25/2011 07:19 AM, stephane eranian wrote:
>> Hi,
>>
>> Sorry for late reply.
>>
>> The current support for mmaped count is broken on perf_event x86.
>> It simply does not work. I think it only works on PPC at this point.
>
> Just as an aside, you can access the counter registers from user space
> on Power (aka PPC) machines, but because the kernel is free to schedule
> the events onto whatever counters that meet the resource constraints,
> it's not at all clear which hardware counter to read from user space,
> and in fact, with event rotation, the counter being used can change from
> one system tick till the next.
>
> If you program a single event, you can be guaranteed that it won't move
> around, but you still will have to guess or somehow determine which
> hardware counter is being used by the kernel.
>
Yes, and that's why they have this 'lock' field in there.It's not really a lock
but rather a generation counter. You need to read it before you attempt to
read and you need to check it when you're done reading. If the two values
don't match then the counter changed and you need to retry. And changes
means it may have moved to a different counter.
But the key problem here is the time scaling. In case you are multiplex
you need to be able to retrieve time_enabled and time_running to scale
the count. But that's not exposed, thus it does not work as soon as you
have multiplexing. Well, unless you only care about deltas and not the
absolute values.
> - Corey
>
> ------------------------------------------------------------------------------
> Special Offer -- Download ArcSight Logger for FREE!
> Finally, a world-class log management solution at an even better
> price-free! And you'll get a free "Love Thy Logs" t-shirt when you
> download Logger. Secure your free ArcSight Logger TODAY!
> http://p.sf.net/sfu/arcsisghtdev2dev
> _______________________________________________
> perfmon2-devel mailing list
> perfmon2-devel@...
> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
>