Vitja Makarov, 26.01.2012 21:19:
> 2012/1/27 Stefan Behnel:
>> Robert Bradshaw, 21.01.2012 23:09:
>>> On Sat, Jan 21, 2012 at 10:50 AM, Stefan Behnel wrote:
>>>> I did some callgrind profiling on Cython's generators and was surprised to
>>>> find that AddTraceback() represents a serious performance penalty for short
>>>> running generators.
>>>>>>>> I profiled a compiled Python implementation of itertools.groupby(), which
>>>> yields (key, group) tuples where the group is an iterator again. I ran this
>>>> code in Python for benchmarking:
>>>>>>>> """
>>>> L = sorted(range(1000)*5)
>>>>>>>> all(list(g) for k,g in groupby(L))
>>>> """
>>>>>>>> Groups tend to be rather short in real code, often just one or a couple of
>>>> items, so unpacking the group iterator into a list will usually be a quick
>>>> loop and then the generator raises StopIteration on termination and builds
>>>> a traceback for it. According to callgrind (which, I should note, tends to
>>>> overestimate the amount of time spent in memory allocation), the iteration
>>>> during the group unpacking takes about 30% of the overall runtime of the
>>>> all() loop, and the AddTraceback() call at the end of each group traversal
>>>> takes up to 25% (!) on my side. That means that more than 80% of the group
>>>> unpacking time goes into raising StopIteration from the generators. I
>>>> attached the call graph with the relative timings.
>>>>>>>> About half of the exception raising time is eaten by PyString_FromFormat()
>>>> that builds the function-name + line-position string (which, I may note, is
>>>> basically a convenience feature). This string is a constant for a
>>>> generator's StopIteration exception, at least for each final return point
>>>> in a generator, but here it is being recreated over and over again, for
>>>> each exception that gets raised.
>>>>>>>> Even if we keep creating a new frame instance each time (which should be ok
>>>> because CPython has a frame instance cache already and we'd only create one
>>>> during the generator lifetime), the whole code object could actually be
>>>> cached after the first creation, preferably bound to the lifetime of the
>>>> generator creator function/method. Or, more generally, one code object per
>>>> generator termination point, which will be a single point in the majority
>>>> of cases. For the specific code above, that should shave off almost 20% of
>>>> the overall runtime of the all() loop.
>> I think that could be easily fixed. CPython doesn't add any traceback
> info for generator's ending.
>>https://github.com/vitek/cython/commit/63620bc2a29f3064bbdf7a49eefffaae4e3c369d
Interesting. It doesn't solve the general problem of slow exceptions, but I
think that's a beautiful way of fixing this particular performance problem
for generators. Here are the timings. Your change made the above code twice
as fast!
Cython compiled Python implementation of itertools.groupby(), *without*
your change:
python2.7 -m timeit -s 'from cytertools import groupby; \
L=sorted(range(1000)*5)' 'all(list(g) for k,g in groupby(L))'
100 loops, best of 3: 3.76 msec per loop
Cython compiled Python implementation of itertools.groupby(), *with* your
change:
python2.7 -m timeit -s 'from cytertools import groupby; \
L=sorted(range(1000)*5)' 'all(list(g) for k,g in groupby(L))'
1000 loops, best of 3: 1.81 msec per loop
The real itertools, for comparison:
python2.7 -m timeit -s 'from itertools import groupby; \
L=sorted(range(1000)*5)' 'all(list(g) for k,g in groupby(L))'
1000 loops, best of 3: 1.31 msec per loop
That's close.
Stefan