In order to keep subinterpreters properly isolated, objects
from one interpreter should not be used in C-API calls in
another interpreter. That is relatively straight-forward
except in one case: indicating that the other interpreter
doesn't need the object to exist any more (similar to
PyBuffer_Release() but more general). I consider the
following solutions to be the most viable. Both make use
of recounts to protect cross-interpreter usage (with incref
before sharing).
1. manually switch interpreters (new private function)
a. acquire the GIL
b. if refcount > 1 then decref and release the GIL
c. switch
d. new thread (or re-use dedicated one)
e. decref
f. kill thread
g. switch back
h. release the GIL
2. change pending call mechanism (see Py_AddPendingCall) to
per-interpreter instead of global (add "interp" arg to
signature of new private C-API function)
a. queue a function that decrefs the object
3. new cross-interpreter-specific private C-API function
a. queue the object for decref (a la Py_AddPendingCall)
in owning interpreter
I favor #2, since it is more generally applicable. #3 would
probably be built on #2 anyway. #1 is relatively inefficient.
With #2, Py_AddPendingCall() would become a simple wrapper
around the new private function.

As a lesser (IMHO) alternative, we could also modify Py_DECREF
to respect a new "shared" marker of some sort (perhaps relative
to #33607), but that would probably still require one of the
refcount-based solutions (and add a branch to Py_DECREF).

"That is relatively straight-forward except in one case: indicating that the other interpreter doesn't need the object to exist any more (similar to PyBuffer_Release() but more general)"
Why an interpreter would access an object from a different interpreter? Each interpreter should have its own object space, no?

Adding a low level callback based mechanism to ask another interpreter to do work seems like a good way to go to me.
As an example of why that might be needed, consider the case of sharing a buffer exporting object with another subinterpreter: when the memoryview in the subinterpreter is destroyed, it needs to request that the buffer view be released in the source interpreter that actually owns the original object.

I would suggest that sharing of objects between interpreters should be stamped out. Could we have some #ifdef debug checking that would warn or assert so this doesn't happen? I know currently we share a lot of objects. However, in the long term, that does not seem like the correct design. Instead, each interpreter should have its own objects and any passing or sharing should be tightly controlled.

@Neil, we're definitely on the same page. In fact, in a world where subinterpreters do not share a GIL, we can't ever use an object in one interpreter that was created in another (due to thread safety on refcounts). The role of "tightly controlling" passing/sharing objects (for a very loose definition of "sharing") falls to the channels described in PEP 554. [1]
However, there are several circumstances where interpreters may collaborate that involves one holding a reference (but not using it) to an object owned by the other. For instance, see PyBuffer_Release(). [2] This issue is about addressing that situation safely. It is definitely not about safely using objects from other interpreters.
[1] The low-level implementation, including channels, already exists in Modules/_xxsubinterpretersmodule.c.
[2] https://docs.python.org/3/c-api/buffer.html#c.PyBuffer_Release

At least, it *might* be a performance regression. Here are the two commits I tried:
after: ef4ac967e2f3a9a18330cc6abe14adb4bc3d0465 (PR #11617, from this issue)
before: 463572c8beb59fd9d6850440af48a5c5f4c0c0c9 (the one right before)
After building each (a normal build, not debug), I ran the micro-benchmark Raymond referred to ("./python Tools/scripts/var_access_benchmark.py") multiple times. There was enough variability between runs from the same commit that I'm uncertain at this point if there really is any performance regression. For the most part the results between the two commits are really close. Also, the results from the "after" commit fall within the variability I've seen between runs of the "before" commit.
I'm going to try with the "performance" suite (https://github.com/python/performance) to see if it shows any regression.

I suspect changes for this issue may be creating test_io failures on my windows builders, most notably my x86 Windows 7 builder where test_io has been failing both attempts on almost all builds. It fails with a lock failure during interpreter shutdown, and commit ef4ac967 appears the most plausible commit out of the set introduced at the first failing build on Feb 24.
See https://buildbot.python.org/all/#/builders/58/builds/1977 for the first failure. test_io has failed both attempts on all but 3 of the subsequent 16 tests of the 3.x branch.
It might be worth noting that this builder is slow, so if there are timeouts involved or a race condition of any sort it might be triggering it more readily than other builders. I do, however, see the same failures - albeit less frequently and not usually on both tries - on the Win8 and Win10 builders.
For what it's worth one other oddity is that while having test_multiprocessing* failures are relatively common on the Win7 builder during the first round of tests due to exceeding the timeout, they usually pass on the retry, but over this same time frame have begun failing - not due to timeout - even on the second attempt, which is unusual. It might be coincidental but similar failures are also showing up sporadically on the Win8/Win10 builders where such failures are not common at all.

I've merged the PR for hoisting eval_breaker before the ceval loop starts. @Raymond, see if that addresses the performance regression you've been seeing.
I'll close this issue after I've sorted out the buildbot issues David mentioned (on his Windows builders).

If I can help with testing or builder access or anything just let me know. It appears that I can pretty reliably trigger the error through just the individual tests (test_daemon_threads_shutdown_std{out,err}_deadlock) in isolation, which is definitely less tedious than a full buildbot run to test a change.
BTW, while not directly related since it was only just merged, and I won't pretend to have any real understanding of the changes here, I do have a question about PR 12024 ... it appears HEAD_UNLOCK is used twice in _PyInterpreterState_LookUpID. Shouldn't one of those be HEAD_LOCK?

The commit ef4ac967e2f3a9a18330cc6abe14adb4bc3d0465 introduced a regression in test.test_multiprocessing_spawn.WithThreadsTestManagerRestart.test_rapid_restart: bpo-36114.
Would it be possible to revert it until the bug is properly understood and fixed?

Hi Eric, I'm sorry but I had to revert your recent work. It introduced regressions in at least in test_io and test_multiprocessing_spawn on Windows and FreeBSD. I simply don't have the bandwidth to investigate this regression yet, whereas we really need the CI to remain stable to catch other regressions (the master branch received multiple new features recently, and there are some other regressions by that way ;-)).
test_multiprocessing_spawn is *very* hard to reproduce on FreeBSD, but I can reliably reproduce it. It just takes time. The issue is a crash producing a coredump. I consider that the bug is important enough to justify a revert.
The revert is an opportunity to seat down and track the root cause rather than urging to push a "temporary" quickfix. This bug seems to be very deep in the Python internals: thread state during Python shutdown. So it will take time to fully understand it and fix it (or redesign your recent changes, I don't know).
Tell me if you need my help to reproduce the bug. The bug has been seen on FreeBSD but also Windows:
* test_multiprocessing_spawn started to produce coredumps on FreeBSD: https://bugs.python.org/issue36114
* test_multiprocessing_spawn started to fail randomly on Windows: https://bugs.python.org/issue36116
* test_io started to fail randomly on Windows: https://bugs.python.org/issue36177
-- The Night's Watch

Thanks for taking a look Victor! That info is helpful. It will likely be a few days before I can sort this out.
Once I have addressed the problem I'll re-merge. I plan on using the "buildbot-custom" branch to make sure the buildbots are happy with the change before making that PR.

messages:
+ msg337235title: [subinterpreters] Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed. -> Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed.