Re: [Discuss-gnuradio] oprofile inband code results

From:

Eric Blossom

Subject:

Re: [Discuss-gnuradio] oprofile inband code results

Date:

Tue, 9 Oct 2007 16:43:43 -0700

User-agent:

Mutt/1.5.9i

On Tue, Oct 09, 2007 at 05:13:08PM -0400, Brian Padalino wrote:
> I really don't know much about oprofile and haven't done much
> profiling, but I do have a question or two.
>
> Q. Since the profiler looks at the lowest function that is taking so
> much time, I find it strange that pmt_nthcdr is the second method
> listed there. Intuitively, pmt_nthcdr should just run a tight loop of
> pmt_cdr in which case I would assume pmt_cdr would be higher on the
> list but it is not. Same with pmt_nth. What might be taking so long
> within those functions that is NOT taking as long within
> pmt_cdr/pmt_car? Is something turning into an inline function which
> really yields a false profile?
>
> Q. I am surprised to see a destructor (pmt_pair::~pmt_pair())
> utilizing so much time. Are there that many pmt_pairs that have to
> get destroyed? To answer my own question, I suppose so since every
> call to pmt_cons actually creates a new pmt_pair - which might be a
> good reason why the malloc and frees are high on the list. Any idea
> why so many pmt_cons are used?
Because we use a lot of them to construct argument lists.
It would be possible to move to an pmt_vector based approach, which
would cut this down dramatically. I think we're still a bit early in
the game to start that kind of modification.
I think the first thing I would try is moving to the intrusive
implementation of the boost shared pointers for the pmt types.
Then I'd look at a data type specific alloc/free as well as see how
the default allocator is working across multiple threads. That is,
does it already use a separate allocation pool / thread. If it
doesn't, we could speed up the allocation/free and reduce the amount
of locking required in the typical case.
Before hacking away, I think we need to run the same test cases on
other machines besides the P4 Xeon and gather the oprofile data, as
well as the basic [ $ time <mytest> ] numbers. We may find wildy
different answers as f(microarchitecture). There's a reason intel
isn't featuring the P4 Xeon anymore ;)
Eric