Wednesday, April 22, 2009

Tail Recursion Elimination

I recently posted an entry in my Python History blog on the origins of Python's functional features. A side remark about not supporting tail recursion elimination (TRE) immediately sparked several comments about what a pity it is that Python doesn't do this, including links to recent blog entries by others trying to "prove" that TRE can be added to Python easily. So let me defend my position (which is that I don't want TRE in the language). If you want a short answer, it's simply unpythonic. Here's the long answer:

First, as one commenter remarked, TRE is incompatible with nice stack traces: when a tail recursion is eliminated, there's no stack frame left to use to print a traceback when something goes wrong later. This will confuse users who inadvertently wrote something recursive (the recursion isn't obvious in the stack trace printed), and makes debugging hard. Providing an option to disable TRE seems wrong to me: Python's default is and should always be to be maximally helpful for debugging. This also brings me to the next issue:

Second, the idea that TRE is merely an optimization, which each Python implementation can choose to implement or not, is wrong. Once tail recursion elimination exists, developers will start writing code that depends on it, and their code won't run on implementations that don't provide it: a typical Python implementation allows 1000 recursions, which is plenty for non-recursively written code and for code that recurses to traverse, for example, a typical parse tree, but not enough for a recursively written loop over a large list.

Third, I don't believe in recursion as the basis of all programming. This is a fundamental belief of certain computer scientists, especially those who love Scheme and like to teach programming by starting with a "cons" cell and recursion. But to me, seeing recursion as the basis of everything else is just a nice theoretical approach to fundamental mathematics (turtles all the way down), not a day-to-day tool.

For practical purposes, Python-style lists (which are flexible arrays, not linked lists), and sequences in general, are much more useful to start exploring the wonderful world of programming than recursion. They are some of the most important tools for experienced Python programmers, too. Using a linked list to represent a sequence of value is distinctly unpythonic, and in most cases very inefficient. Most of Python's library is written with sequences and iterators as fundamental building blocks (and dictionaries, of course), not linked lists, so you'd be locking yourself out of a lot of pre-defined functionality by not using lists or sequences.

Last, let's look at how we could implement tail recursion elimination. The first observation is that you can't do it at compile time. I've seen at least one blog entry that used a bytecode hack to replace a CALL opcode immediately before a RETURN opcode with a jump to the top of the function body. This may be a nice demo, but unfortunately Python's compiler cannot reliably determine whether any particular call is actually reference the current function, even if it appears to have the same name. Consider this simple example:

def f(x): if x > 0: return f(x-1) return 0

It looks like you could replace the body with something like this:

if x > 0: x = x-1 <jump to top>return 0

This seems simple enough, but now add this:

g = fdef f(x): return xg(5)

The call to g(5) invokes the earlier f, but the "recursive" call no longer recurses! At run-time, the name 'f' is rebound to the later non-recursive definition, so the returned value is 4, not 0. While I agree that this particual example is bad style, it is a well-defined part of Python's semantics that has plenty of legitimate uses, and a compiler that made this replacement in the optimistic hope that f's definition will remain unchanged would introduce enough bugs in real-world code to cause an outrage.

Another blog post showed decorators that can be used to implement tail recursion using magical exceptions or return values. These can be written in plain Python (though that post shows an optimized Cython version that is claimed to be "only 10% slower", though it doesn't seem to be thread-safe). If this tickles your fancy I won't try to stop you, but I would strongly object against inclusion of something like this in the core distribution: there are many caveats to the use of such a decorator, since it has to assume that any recursive call (in the decorated function) is tail-recursive and can be eliminated. In the hands of less experienced users this could easily lead to disasters. For example, the common recursive definition of factorial is not tail-recursive:

def fact(n): if n > 1: return n * fact(n-1) return 1

There are also plenty of functions that contain a tail-recursive call and another recursive call that isn't tail-recursive; the decorators don't handle such cases. Another subtlety that those decorators don't handle is tail-recursive calls inside try blocks: these may look like they could be eliminated, but they can't, because TRE could remove the exception handling which is guaranteed by the language. For all these reasons I think the decorator approach is doomed, at least for a general audience.

Still, if someone was determined to add TRE to CPython, they could modify the compiler roughly as follows. First, determine "safe" tail-recursive call sites. This would be something like a CALL opcode immediately followed by a RETURN opcode, and completely outside any try blocks. (Note: I'm ignoring the different CALL_* opcodes, which should be easy enough to handle using the same approach.) Next, replace each such CALL-RETURN opcode pair with a single CALL_RETURN opcode. There's no need for the compiler to try and check if the name of the function being called is the same as the current function: the new opcode can represent savings for all CALL+RETURN combinations merely by saving the time to decode a second opcode. If at run time the determination is made that this particular call is not applicable for TRE, the usual actions for a CALL followed by a RETURN opcode are carried out. (I suppose you could add some kind of caching mechanism indexed by call site to speed up the run-time determination.)

In the determination of wheter TRE can be applied, there are several levels of aggressiveness that you could apply. The least aggressive, "vanilla" approach would only optimize the call if the object being called is the function that is already running in the current stack frame. All we have to do at this point is clear the locals out of the current stack frame (and other hidden state like active loops), set the arguments from the evaluation stack, and jump to the top. (Subtlety: the new arguments are, by definition, in the current stack frame. It's probably just a matter of copying them first. More subtleties are caused by the presence of keyword arguments, variable length argument lists, and default argument values. It's all a simple matter of programming though.)

A more aggressive version would also recognize the situation where a method is tail recursive (i.e. the object being called is a bound method whose underlying function is the same as the one in the current stack frame). This just requires a bit more programming; the CPython interpreter code (ceval.c) already has an optimization for method calls. ( I don't know how useful this would be though: I expect the tail recursive style to be popular with programmers who like to use a functional programming style overall, and would probably not be using classes that much. :-)

In theory, you could even optimize all cases where the object being called is a function or method written in Python, as long as the number of local variables needed for the new call can be accommodated in the current stack frame object. (Frame objects in CPython are allocated on the heap and have a variable allocation size based on the required space for the locals; there is already machinery for reusing frame objects.) This would optimize mutually tail-recursive functions, which otherwise wouldn't be optimized. Alas, it would also disable stack traces in most cases, so it would probably not be a good idea.

A more benign variant would be to create Python-level stack frames objects just like before, but reuse the C stack frame. This would create an approximation of Stackless Python, though it would still be easy enough to run out of C stack by recursing through a built-in function or method.

Of course, none of this does anything to address my first three arguments. Is it really such a big deal to rewrite your function to use a loop? (After all TRE only addresses recursion that can easily be replaced by a loop. :-)

But your last remark in parentheses still strikes me as simplified. TRE does not only help with loop-like recursions, but also with implementing continuation passing style programs.

(And if one thinks a little harder, one may come up with a less outlandish use case than CPS that's different to loops.)

Something different: Lack of TRE does not really hinder Python's FP capabilities, as you can always just encapsulate your loops in combinators. Nobody cares whether filter, map and reduce use recursion or loops in their implementation. And Python's closures work just fine.

The lack of persistent data structures does however put a damper on FP in Python. To be more presice: I wish there was a Pythonic persistent alternative to dict. Like Strings are persistent already and a pleasure to work with.

I totally agree with this post. My perspective is someone in late 20s who learned programming from the SICP / computing-the-length-of-an-array-is-a-good-idea school and switched to Python in mid 2004. I still think SICP is a great way to learn programming, but I've since come to see TRE as an abomination.

Teaching young people to use recursion for algorithms that are not recursive is a bad idea - it encourages the sort of narrow-mindedness and ham-handedness that makes for convoluted, difficult-to-understand programs.

Having said this, I don't have a serious problem with, say, GCC eliminating tail calls in a release compile as an optimization. The problem stems from the elevation of TRE to dogma and the proramming style such dogma encourages.

When you program in non-pure FPs like OCaml, where you have a choice, you get accustomed to avoiding side effects as much as you can. And last time I checked, it was generally accepted that side effects is a good source of errors. So every time you use side-effects you start to think about it as a suboptimal solution.Now when you come to Java or Python you can't safely write side-effect free code anymore because of potential stack overflows. It hurts your sense of beauty because you feel that your code as suboptimal.

Guido, why do you refer to it as TRE instead of the far more common TCO?

I was the commenter on yesterday's post that mentioned the relationship between nice stack traces and TCO — I didn't intend to say that TCO makes neat stack traces impossible, it just frustrates the process (more bookkeeping to do).

Haskell, for instance, has historically had piss-poor stack traces, but it has more to do with erasure / caching / inlining / graph-reduction than it does TCO.

On persistent vs. immutable: there may be some references in literature and even wikipedia, but most people use the term immutable for in-memory data structures that cannot be modified semantically (regardless of how shared versions are implemented), and persistency for techniques of writing stuff to disk and restoring it later. I just read half a book on Haskell and AFAIR it uses these definitions too.

On TCO vs. TRE: Sorry, I just made it up. The blog posts I reacted to were mostly talking about tail recursion, not TCO. Wikipedia (again!) defines both tail call and tail recursion, but the latter section is twice as big. :-) OTOH Google claims 10x more results for "tail call" than for "tail recursion". I stand corrected.

On side effects being bad: Booooooooooring. I did an undergraduate project on side effects in Pascal around 1977, and ABC (Python's predecessor) has copy/restore semantics to revert any side effects after a pure function returns (but it also has subroutines which can freely have side effects, but cannot be used in expressions). I don't think the benefits of enforcing side-effect-free code (even optionally) make up for the many work-arounds you have to use to get anything done in the real world. (Not even in Haskell, FWIW. :-)

Tail Call Optimization has nothing to do with side effects. It's just away to avoid making useless calls. As for the impact on stack traces I don't see why knowing what functions called which is as useful as some people think it is. If you need to know that look at the code.

In my mind, algorithmic complexity (space and time consumption, ie big-O) is critical to the semantics of an algorithm. Having the language implementation alter your semantics based on external details should be considered a side-effect.

Tail calls in most languages are implicit optimizations, so they meet that definition of a side effect. Using TCO to avoid side effects, when it is itself a side effect, is pointless.

Explicit tail calls are just fine under that definition. They're a little more awkward than being implicit, but you can also design them to be less error prone. The simplest I can think of at the moment is to have each function return (nextfunc, args, kwargs), and wrap it all in a single loop that continuously calls the nextfunc.

You will look silly if you do that when a normal loop will do. This isn't a bad thing.

I prefer working with immutable (aka persistent) data structures as much as possible, and find them easier to reason about. And when you're working with such data structures, that's when tail-call optimization and writing things in a more recursive style really shines.

But when working with Python's mutable lists and dicts, while loops are usually just fine.

I've encountered a few times where I had mutually recursive functions in Python that would have benefited from tail-call optimization (TCO isn't merely for a function calling itself), but those situations seem rare.

I believe you are looking at TCO the wrong way. As a theoretical optimization, it has nothing to do with recursion and everything to do with calling semantics. TCO can be applied in any situation where the function call does not have a continuation context surrounding it. It really only has something to do with recursion in-so-much as we have no reason to shorten the call stack in any other situation.

Note, it is possible to rewrite *any* call to pass the continuation of the current expression context rather than create a stack frame, so tail recursion isn't *just* an optimization for those who force their algorithms to use tail recursion -- it can be applied to any recursion (doing so entails overhead that is likely not worth it -- but it's possible)

Check out chapter 7 and 8 of Essentials of Programming Languages for more discussion (warning: scheme -- devolves to math halfway through chapter 8).

Of course, none of this does anything to address my first three arguments. Is it really such a big deal to rewrite your function to use a loop?If ceval.c is considered the premium style for writing programs I wonder why we came up with useful abstractions instead of creating just statemachines that manipulate the stack?

The latter is exactly the kind of rewrite that is necessary for all kinds of recursion and it is ugly, less readable and less compact.

As a daily user of the functional programming paradigm (<3 Erlang), I agree with Guido on this one. Python strives to be simple to read and simple to debug and trying to make it act like a functional language would likely make both more difficult.

TRE is incompatible with nice stack tracesSo are loops. You don't get a stack trace of where all a loop has been - exactly the same information as a tail recursive call.

Once tail recursion elimination exists, developers will start writing code that depends on it,Doesn't that mean developers find it comfortable to think about certain problems using recursion? Why deny them that intellectual tool?

Third, I don't believe in recursion as the basis of all programming.Never heard that one. That would be like claiming that "mathematical induction" is the basis of all mathematics.

'Persistent' is definitely the appropriate term, and it implies very important fundamental characteristics that are far more than simple implementation details (i.e. achieving immutability with amortized performance characteristics via lazy evaluation).

The same word can have different meanings in different contexts and communities.

I personally if I think of a function as a tail recursive function, and I'm on a language without tail call optimization, I just make the optimization myself, rewriting the function just as the optimizer would do. I do this from time to time. That said, some of those optimizations require using goto, I'm thinking of two mutually recursive functions.

I agree with Kumar here not having TCO because people might use them is a very strange argument to me. You are basically saying that you feel that using tail recursive algorithms is wrong so you won't add proper support for it. At least give the user a reasonable chance to find the best way of solving their problem.

It is easy to describe algorithms where not having TCO makes it more difficult to clearly implement an algorithm.

I completely sympathize with the argument "TCO's removal of stack frames will make debugging harder" but it seems then to avoid a possible optimization while also avoiding the real problem. If TCO has value, then also invest in solving the debugging problem. Frame the discussion as "If I have TCO and lose my stack frames, here are some ideas to improve the debugging experience" Don't attempt to explain away TCO's intrinsic value and don't attempt to contrast good and bad programming styles based on whether you want to implement a feature.

Kay, can you clarify your point? Do you mean that implementing a language using a stack machine is bad? To me, that suggests that CPython's interpreter should walk an expression tree instead of executing bytecode. But it would be much slower that way.

Or maybe you mean that a stack machine can be less ugly, and just as fast, in a language that has closures and space-efficient tail calls. There must be a way to test that hypothesis.

Anyway you chose an example that has little to do with typical real-world code in dynamic languages. Most Python programs are not fast Python interpreters!

I'm sympathetic with the idea that a syntactic construction that treats each element of a list or sequence equally is, in most cases, easier to understand than a construct that treats the first element differently from the rest. That is, I think

But I take issue with "Last, let's look at how we could implement tail recursion elimination. The first observation is that you can't do it at compile time."

As Sean said, if you think about it as compiling a call correctly, you come to the understanding that it doesn't matter if the call is recursive or not, you just need to make sure your compiler keeps track of whether the current expression is returning a value or not, and whether there is a surrounding context to return to. EoPL covers this well, and I have a very simple compiler that does this at compile time for a language, scheme, that has the same issues of dynamic rebinding of functions as Python:

http://norvig.com/paip/compile2.lisp

It is not hard at all -- the whole compiler is just a couple pages of code.

Peter, did you read all the way through to the end of my section on implementation? The "you cannot do it at compile time" argument based on dynamic rebinding was mostly aimed at some blog entries that (perhaps naively) tried to optimize self-tail-calls only.

Towards the end of my implementation section I show how you could do it for all tail calls, with a compile-time component (for determining appropriate contexts that are proper tail calls) and a run-time component.

I don't know about Scheme, but in Python you need a significant run-time component, not because of the rebinding problem, but because some things just cannot be called in this way (especially calls into extension modules, including system calls). These things need "real" C stack frames. The dynamic binding issue plays a role only because the compiler doesn't know what kind of thing is being called.

I end by rejecting this approach because it would suppress useful entries from tracebacks in many cases, not because it is unimplementable.

Guido, yes I read all the way through, and saw that your "optimize all cases" was similar to what I was saying. I objected to the statement "you can't do it at compile time" but I guess I missed the subtlety that that was someone else's blog talking, not you.

I think your analysis that you'd rather have informative stack traces than have "optimize all cases" is fine -- it comes down to which one would user's prefer and I'll trust your judgment on that. You said that you need to do some runtime analysis to decide if you can do the tail call, because of extension module calls; I was thinking of handling that case by always doing the tail call, but having extension module functions wrapped in something that would set up the frame the way it needs to be. Again, this is a tradeoff as to whether to do the work in the extension-wrapping object, or do analysis for every call.

Guido, what's a stack trace look like for a recursive function rewritten with a loop? I can get a nice trace by sticking print statements into my loop, but I don't think that's what we're after. ;-)

Anyway, it seems to me that one could provide a very nice stack trace for calls that have been optimized with TRE. Instead of just replacing the tail call with a jump, how about recording the optimized-away call somewhere other than the stack and then jumping?

You'd trade a bit of space and time in exchange for infinite (within memory constraints) recursion and nice stack traces for TRE-optimized functions.

Does any of that make sense? I can elaborate further if need be, as I have a few naive ideas about how to deal with integrating the "real" and "fake" stacks.

Josh, your design has already been implemented, in Stackless Python. However, it is not what the folks asking for TCO need -- you'd use O(N) space even for algorithms that only need O(1) space, and that would be unacceptable. It is also incompatible with my Second point -- not all Python implementations can easily adopt this approach, for example I expect it would not work in Jython and IronPython. (This incidentally explains why I have always resisted incorporating the Stackless Python design in Python.)

The point of TRE is not in allowing one to write verbose loops -- If I want a simple loop in Scheme I just use "do", itself sugar for a TRE'd named let behind the scenes. Rather, it is to make it possible to both write any loops at all in purely functional programming languages, but also to allow one to employ a declarative approach to programming, by decomposing the problem in many verifiable single steps declaring what the program does at each step rather than how.

none of which any useful. Sure, the imperative version has no such problems. I too don't see the point of having efficient recursion in an imperative language. Next people would call for multi-line, more useful lambdas than now! :P

Not having a lot of experience or knowledge of tail recursion, I am not in any position to argue about using it or not using it. I love python the way it is, and I trust you're making the right decisions.

However, a point that does bother me slightly is your general preference for loops in place of recursion. I feel like recursion is closer to the way my mind reasons about problems in many cases. And since programming languages are just a way of translating my thoughts for the cpu, it seems like a handy tool.

In any case, I would like to hear more on your preferences regarding recursion vs. loops and am hoping for another post exploring this.

2- PTC (proper tail calls) are not a mere optimization. They essentially change the space complexity of programs, and they allow code to be decentralized where loops require big spaghetti centralization.

3- lisp-style lists and recursion are completely different concepts. One does not imply the other. Tail-calls allow to build iterators on arbitrary data-structures in a modular and compositional way. (Felleisen has some nice papers on that.)

4- any Turing tar-pit is able to emulate proper tail-call through one extra level of indirection. But then in addition to the run-time cost, you pay the programmer-time cost of having to enforce convention and not being able to freely interoperate with code that doesn't use the convention.

"Is it really such a big deal to rewrite your function to use a loop? "

It is not. But I'd have to say it would definitely affect readability in the case of self recursive functions.

An interesting way to address this would be to use coroutines and instead of calling the function recursively one could send the arguments to the function. But then again, coroutines cant send values to themselves AFAIK. If they did, one could use a pattern like this:

The good thing about this is that it is explicit. It would be a fairly advanced trick and people would know what to expect from it (i.e.: a non-trivial stack trace).

Another way would be through new syntax (something different than return) for making a tail call (I know adding syntax is not to be taken lightly, please don't set me on fire). Someone mentioned Clojure which has similar syntax.

I actually find the best use case to be when you have functions that aren't just self-recursive. If you have two or three functions that recurse between each other, suddenly you have an unbounded stack problem.

I would love a syntax change. I had previously proposed "recurse", but I like send even better. If we do add syntax, I think that we should add an API. Callables should support a __send__ method that does optimized tail recursion.

The advantage of this is that we can write classes that can support __send__ and use them in concert with things like Threading to build much better actor systems.

What would be REALLY cool would be if generators used this API as well (since X.send(blah) is not quite as nice as send X(blah) ). Although, there is the problem that we are clouding the meaning what would otherwise looks like a normal function call, so it probably would be too confusing.

On the plus side, it would be very cool to have a subclass of threading.Thread that wraps a generator and will yield incoming "send" messages into generator, etc.

In the end, I'm pretty sure that to get a really good actor paradigm, we need to go further than just generators, probably with a whole new construct. Lots of things that work great in an actor model are the kind of thing that having this kind of feature is really good for.

Why not make a cleaned-up tail call an explicit variation on the return statement? Then you get the stack trace you asked for. And like what Kumar says, it's also the stack trace you probably would want in that situation.

I thought I read that Python was intended for both beginners and math types. I like Python because both those sides of myself like it.

The application that led me eventually to here, was a generator for a directory tree. It just seems wrong for each [[great...] grand] parent to loop over each [[great...] grand] child's results and re- [re-...] yield them.

I googled about that and found a discussion that said a tail-recursive "yield_all" would be very hard in Python.

Then I started trying to write an iterator subclass that does explicit continuation passing. My first try ended up having its space savings depend on a tail-recursive method call. Hmm, does that work in Python? Googling that question got me here. (No it doesn't work, back to the drawing board.)

Meanwhile, Guido, you wondered what possible use a tail- recursive method call could have. Hope this seems plausible. Imagine trees with >1000- deep chains of only children, but also bushy parts.

There are ways to do TCO in pure stack languages. What it does to the readability of your code and how it integrates with things like external C functions, are different matters.

Not Python-related, I just figured the perfect example of tail call for the layman, and had to share it with somebody:

You know how you have a browser tab whose only interesting content is 5 links, so you middle-click 4 of them to new tabs but open the 5th link in the original tab? That's tail call!

[Strictly speaking, that's imprecise because you have history (=stack) on each tab. So it's more like "tail spawn" for threads or processes: it's like some launcher shell scripts whose last line is "exec" (*). But coding-wise, in the abstraction level of a shell script, it looks exactly like tail call - because "calls" in the the shell are asynchronous.

(*) ironically, the reason launcher scripts use "exec" is precisely that they *want* to drop the "stack frame". Leaving the shell alive would just clatter the process table with no real benefit.]

Anirvan: sorry that I am late here. Its true that python does not have private function naming the way JavaScript does, and therefore you would have to write something like (replace ` with two spaces):

It is also true that unfortunately Python, like JavaScript, doesn't support macros (code that writes code), so you cannot wrap that whole beast in a simple decorator. It's a pity, it would be nice to be able to at the very least say something like:

@!self_scopeddef fact(n): ``return 1 if n < 2 else n * fact(n - 1)

...with `@!` being a sort of "macro decorator" so that you do not have to worry if someone else redefines fact. On the other hand, there is a Python principle that "we're all adults here, if I redefine 'fact' within your code that's only really going to affect me," etc. Still, you can see where it lacks power.

Beni: I am even later in responding to you, but I will make this point anyways. The right way to think about tail calls is 'iterorecursively'. You write an iterative algorithm by declaring the variables as arguments and recursing rather than looping. So you might have written an iterative factorial as:

The point is, GvR's advice here is mostly blustery crap. If I use @!tail_recursive (again with these fake macros that don't exist!), then the very least you can do is assume the @!self_scoped macro, too: whenever I refer to `name` it means the present function. The fact that the macro depends on this doesn't matter, because We Are All Adults Here, and if we say "I'm doing X" then we should be content to say "oh, look, he's doing X." The other problem that GvR highlights -- that sometimes it's syntactically incorrect to describe something as @!tail_recursive -- is probably the domain of throwing a ValueError. However, it is suggestive of the fact that we shouldn't even have to discuss this sort of thing, and tail call optimization can just happen for free on compilation. If you look at the Abelson-Sussman lectures here:

they discuss compiler design in Scheme, and the result is actually absurdly simple: there is a tiny, almost trivial difference where you realize that it doesn't matter which order two lines of compiler output are, so you put them in the opposite order that you might have expected, and then you don't leave behind any stack when the thing actually is tail-recursive, but you still gracefully handle the case where it is not (since you haven't lost generality).

So at least in principle Guido is wrong. Scheme proves that you don't need a special syntax to provide tail-call optimization, and I'm really not sure that things like debuggability fundamentally can't be done in this context.

Definitely seems to be complicated/impossible to determine a function is tail recursion 'compliant' statically in python, however, what if it were an 'opt in' feature that uses a different 'return' keyword?

def f(n): if n > 0: tailcall f(n - 1) return 0

Where tailcall would have special semantics similar to return but with the side effect of 'Tail Recursion'. You could easily use return here, but in the case where n is very large, you could replace the return with tailcall to 'fix the bug' vs rewriting the entire function to be iteration vs recursion. Arguably, the final effect of return and tailcall is the same (a value is returned), but the process in which you get there is significantly different. One results in a deep stack that possibly blows up vs the other results in a single frame that is harder to debug (but I think that could be mitigated, for instance: a tailcall frame/function could be specially marked in the stack trace).

Arguably, it would add yet another 'return-like' keyword (where yield and return are similar), but perhaps its simple and un-intrusive enough to benefit those that would like to use tail recursion.

Obviously, it's not as glamorous as being able to optimize 'return' to tail recursion automagically, but it does allow those familiar with its semantics to opt into its use. Other languages could easily implement it and it would not affect already written code.

Just had the thought, not sure if it has already been discussed. I know python does not attempt to be a functional language, but this would certainly open it up to more of a possibility.

Dan: your proposal has the redeeming quality of clearly being a language feature rather than a possible optimization. I don't really expect there to be enough demand to actually add this to the language though. Maybe you can use macropy to play around with the idea though?

TRE is explicitly enabled by the special return syntax(for example, return from ...).

with the new "return from", current frames is reused, and traceback is replaced by the further call.

using "return from" to explicitly release current frame provide more efficient gc (local variables die as soon as they were useless) and more clearly debug information (traceback of some wrapper function is totally useless, such as socket.socket.__getattr__).

just use a new syntax(for example, return from ...) for explicitly reuse current frame, and everything done.

explicitly release current frame on next function call could provide more efficient gc(local variables die as soon as they were useless) and more clean debug information(traceback of many wrapper function is totally useless, such as socket.socket.__getattr__).

My thoughts are to perhaps make TCO an opt-in feature via -O/-OO or something analogous. In that case, it would be the responsibility of the invoker of the interpreter to ensure that the program plays nice and doesn't try any bizarre tricks, just like right now one needs to make sure under -OO that docstrings aren't being used (I'm looking at you, ply). Also, under optimize mode, the neatness of stack traces is presumably irrelevant because a known side effect of optimization is the removal of debugging features.

I’ve grown to like tail recursion even though I “grew up” with Python (it was the first language I really liked). My reasons are that I found recursive functions easier to debug than loops, because the variables available at the beginning of each “iteration” are always clear. I have an example of this on my site: http://draketo.de/light/english/recursion-wins

Also it is already possible to get nice stack-traces with GCC by requiring compilation with -Og -foptimize-sibling-calls. For details see http://draketo.de/light/english/free-software/tco-debug

In a project we are working on, we are moving toward a more and more functional style, because it is much easier to reason about and this "feature" made use decide to move away from python. In python 3 functional programming style is even more discouraged.

I don't understand this decision. The debugging issue is a non issue, so what you lose some frames? It is not, that they have any interesting information in it. (f called with (3), f called with 2). And ehm, if you are really concerned about loosing information, it could be added as an optimization option to python.