If you study this carefully, you'll see that (unless the
optimizer intervenes), Python has turned a perfectly good
O(N) tree traversal into an O(N log N) traversal.

How to Fix It. I'm probably going to write a paper
with the details, and try to get it published, but here's
the crux of the matter.

Python insists on performing the two subiterations manually,
and on returning their output unchanged. But if you stack
enough of these iterations on top of each other, you can
kiss your performance goodbye (both asymptotically and
cycle-wise).

Now the compiler has a fighting change to optimize a deep
stack of iterators into reasonable code.

It turns out that you can formulate such concepts as
self-recursive iterators, tail-called iterators, etc., and
apply optimization techniques analogous to those used by
LISP compilers to optimize function calls. What's worse, you
can actually implement all of this using a
portable C back-end to your compiler. :-)