This patch adds f_func to PyFrameObject and sets it for functions that get called (in PyFrame_New). For classes and modules it is set to None. The difference in performance was not noticable, as far as I could tell. However, I am willing to do more than just time 'make test' a few times if there is any concern.
A couple weeks ago a thread on python-ideas centered on the subject matter of PEP 3130[1]. The discussion started with, and mainly involved, the idea of adding __function__ to the frame locals during execution of the function. __function__ would point to the function that was called, which had resulted in the frame.
I spent quite a bit of time getting this to work using a closure, but the result was overkill. It also made it too easy to use __function__ for recursion, which Guido did not like (and I agree).
At this point it dawned on me that it would be much simpler to just add the called function to the frame object. This patch is the result. In the end it is much more efficient than the locals approach I had been taking.
[1] http://mail.python.org/pipermail/python-ideas/2011-August/011062.html

Thanks for the review, Nick. I'll be uploading a new patch in a couple hours with your recommended fixes.
Regarding the comments on python-ideas, would it be better to use a weakref proxy around the function, to help with the reference cycles? That's what I was doing with my closure-based solution. I didn't do it here just to see what would happen and I didn't see any problems in my very limited testing (basically just 'make test'). I don't mind using weakrefs and, if it matters, I could pre-allocate the weakref proxy in PyFunction_New to save a little overhead at each call.
For the moment I left in the code to limit f_func to only functions. I'll respond to that on python-ideas.
Finally, how does this patch relate to the ABI? I'm not too familiar with it (read through PEP 384) and want to make sure I'm okay here.

Finally had a chance to get back to this. Here's a new patch in response to Nick's review.
My only concern is the new _PyEval_EvalFunctionCode function. It is basically the existing PyEval_EvalCodeEx function with an extra parameter. I've turned PyEval_EvalCodeEx() into a very light wrapper around _PyEval_EvalFunctionCode(). This is how I tackled the backwards incompatibility of my previous patch. However, I am nervous about messing around with something like PyEval_EvalCodeEx().
I was toying with the idea of doing a more comprehensive refactor involving call_function, fast_function, and do_call, which seems like it would be a better fix. However, I'm even more reticent to make that scale of changes, especially on something so critical, even if it seems right to me currently.
I figured the safe thing for now would be to name the new function with a leading underscore to mark it as private.
Here are the results from my cursory check before and after my latest patch:
BEFORE
3 tests failed:
test_distutils test_packaging test_socket
[293234 refs]
real 14m40.578s
user 11m43.547s
sys 0m52.096s
AFTER
3 tests failed:
test_distutils test_packaging test_socket
[293171 refs]
real 14m33.785s
user 11m55.437s
sys 0m53.850s

My response to a similar query:
What it enables is the ability to introspect the *actual* function object belonging to the currently executing code, whether in the same execution frame or in a later one on the stack. We don't have that now. The best we can do is guess, and I'm trying to refuse the temptation by building in an alternative that doesn't get in the way (and doesn't provide an attractive nuisance).
My motivation for this patch was to be able to find out, in one of my functions, what function called it and to interact with that function object. The name or file represented in the code object on the stack didn't cut it; and decorators and function factories make the whole thing a mess. Guido sounded rather positive to the idea, which is why I pursued it. From Guido's post to which I linked in the issue[1]:
> For me the use case involves determining what function called my
> function. Currently you can tell in which execution frame a function
> was called, and thereby which code object, but reliably matching that
> to a function is not so simple. I recognize that my case is likely
> not a general one.
But it is a nice one. It solves some issues that pdb currently solves
by just using a file/line reference.
Not that it necessarily matters, but I could dig up the bit of code that motivated the whole thing. I recall it was something related to writing an enum library (a la flufl.enum). Anyway, I'd love to see this move forward.
[1] http://mail.python.org/pipermail/python-ideas/2011-August/011062.html

This topic came up in a discussion I was having with Yury, and thanks to generators and coroutines, I don't think "f_func" would be the right name for an attribute like this, with something more neutral like "f_origin" potentially being preferable
The specific runtime debugging use case I have in mind is this:
1. Given a frame reference, navigate to the generator-iterator or coroutine that spawned that frame
2. Given gi_back and cr_back doubly-linked counterparts to the current gi_yieldfrom and cr_await introspection attributes on generators and coroutines, expose the relevant generator and/or coroutine stack in addition to the frame stack
These secondary stacks could then potentially be incorporated into traceback outputs by default

I'd also like to make use of this in trio, as a way to get safer and less complicated control-C handling without having to implement things in C. (Exhaustive discussion: https://vorpus.org/blog/control-c-handling-in-python-and-trio/)
@Nick: I understand your proposal is to add a field on the frame that for regular function calls points to the function object, and for generator/coroutines points to the generator/coroutine object. Is that right? Two possible concerns there: (a) this creates a reference cycle, because generator/coroutine objects own a reference to the frame object, (b) AFAICT you also currently can't get from a generator/coroutine object back to the function object, so this would break some (all?) of the use cases for this. I guess the solution to (a) is to make it a weak reference, and for (b) either keep f_func as always pointing to the function and make f_generator a separate field, or else make it f_origin and also add gi_func/co_func fields to generator/coroutine objects. (Which might be useful in any case.)
Also, I'm not quite following the proposed use case. When a generator/coroutine is resumed, then send() steps through the whole yield stack and reconstructs the frame stack so it matches the generator/coroutine stack. (throw() currently doesn't do this, but that's a bug that breaks profilers/debuggers/etc - bpo-29590 - so we should fix it anyway, and I think Yury is planning to do that soon.) So if you get a frame from sys._getframe() or similar, then the stack is correct. And if a coroutine/generator isn't running, then the only way to get to the frame is by starting with the coroutine/generator object, so we don't really need a way to get back.
Tracebacks are a different case again because they continue to hold frame references after the stack is unwound, but generally the traceback already has the correct stack because the exception propagates up the coroutine/generator stack, and also aren't the {gi,co}_frame references cleared as the exception propagates anyway?

I'm not too keen on this.
Which frame is being executed is an implementation detail.
For example, we currently push a new frame for list comprehensions, but that is an implementation detail. The language only specifies that list-comps execute in a new scope.
I'd like to fix handling of ctrl-C, that Nathaniel mentions, properly in the compiler/interpreter.

Certainly which frame is being executed is an implementation detail, and I can see an argument from that that we shouldn't have a frame introspection API at all... but we do have one, and it has some pretty important use cases, like traceback printing. Obviously most people shouldn't be touching this API at all, but if you do need it then it IMO should be as useful as possible.
I don't see a better way to handle my (admittedly a bit weird) control-C case within the interpreter; I'd be interested to hear what you have in mind. But even if there is a nice way to fix control-C, then the proposal here would still be useful for e.g. giving better tracebacks.

> I don't see a better way to handle my (admittedly a bit weird) control-C case within the interpreter
I'm not sure I understand how `f_func` would help to better handle Control-C in Trio. Nathaniel, could you please elaborate on that?

> I'm not sure I understand how `f_func` would help to better handle Control-C in Trio. Nathaniel, could you please elaborate on that?
Sure. The issue is that I need to mark certain frames as "protected" from KeyboardInterrupt, in a way that my signal handler can see when walking the frame stack, so it can decide whether to raise a KeyboardInterrupt immediately or to wait until a safer moment. Right now, this is done by injecting a magic local variable into the frame object, so the code is effectively like:
def a_protected_function():
_trio_keyboard_interrupt_protection_magic_local_ = True
... actual function ...
But this has two problems: (1) it's not atomic: a signal could arrive in between when the frame is created and when the magic local gets set. (This is especially a problem for __(a)exit__ methods -- if bpo-29988 is fixed then it will guarantee that __(a)exit__ gets called, but this isn't much help if __(a)exit__ can instantly abort without doing any actual cleanup :-).) (2) The decorator code for setting this magic local ends up having to special-case functions/generators/async functions/async generators, so it's very complex and fragile: https://github.com/python-trio/trio/blob/1586818fbdd5c3468e15b64816db9257adae49c1/trio/_core/_ki.py#L109-L149
(This is possibly the function that took the most tries to get right in all of trio, because there are serious subtleties and interactions with multiple CPython bugs.)
I suspect the way I'm currently munging frames at runtime also annoys PyPy's JIT.
OTOH if frames had f_func, then instead of having to munge frame.f_locals, I could set a magic attribute on the *functions* that are supposed to transition between protected/unprotected mode. Then I could delete all that nasty code and replace it with something like:
def enable_ki_protection(func):
func._trio_keyboard_interrupt_protection_enabled = True
return func
and in addition to being simpler, this would fix the race condition and improve performance too. (Note that now this decorator only runs at function definition time; I wouldn't have to do any tricky stuff at all during regular execution.)

> Sure. The issue is that I need to mark certain frames as "protected" from KeyboardInterrupt, in a way that my signal handler can see when walking the frame stack, so it can decide whether to raise a KeyboardInterrupt immediately or to wait until a safer moment.
OK, I got it, thanks!
> I suspect the way I'm currently munging frames at runtime also annoys PyPy's JIT.
Yes, whenever you touch frames you're disabling the JIT for the call site (and maybe for more call sites up the stack, idk). So it doesn't matter what you use, `f_func` or `f_locals`, the performance will suffer big time. Is that acceptable for Trio?

> Yes, whenever you touch frames you're disabling the JIT for the call site (and maybe for more call sites up the stack, idk). So it doesn't matter what you use, `f_func` or `f_locals`, the performance will suffer big time. Is that acceptable for Trio?
Ah, but that's the trick: with f_func the only time I have to touch frames is from my SIGINT handler, and its fine if *that* drops us out of the JIT (it probably has to in any case). The rest of the time, the interpreter would be automatically tracking all the state I need, so it's effectively free, and PyPy can do all its normal magic. The problem with using locals() / f_locals for this is that I have to constantly touch them during normal execution.
(Probably we shouldn't make a huge deal out of the PyPy case though – for me the main advantages of f_func are the simplicity and atomicity. But FWIW I also just pinged Armin on #pypy and he said that PyPy wouldn't have a problem supporting f_func.)

Regarding Mark's point about the anonymous scopes created for things like list comprehensions, at least in CPython, those are actually full functions with generated names like "<listcomp>" (at module level) or "f.<locals>.<listcomp>" (at function level).
From the point of view of code consuming the frame stack, they'd look pretty similar to a lambda expression, only with a different generated name.