Now, it's important to stress that this particular code sample may execute
just fine. It's an excerpt from some code at work that illustrates a deeper
problem that you may or may not encounter in real-world applications. We've
seen it crash reproducibly, resulting in core dumps of the Python process,
with potentially disk filling dump files in /cores.

In this article, I hope to explain what I know about this problem, with links
to information elsewhere on the 'net. Some of those resources include
workarounds, but in my experiments, those are not completely reliable in
eliminating the core dumps. I'll explain why I think that is.

It's important to stress that at the time of this article's publishing, I do
not have a complete solution, and am not even sure one exists. I'll note
further that this is not specifically a Python problem and in fact has been
described within the Ruby community. It is endemic to common idioms around
the use of fork() without exec*() in scripting languages, and is caused
by changes in the Objective-C runtime in macOS 10.13 High Sierra and
beyond. It can also be observed in certain "prefork" servers.

What is forking?

I won't go into much detail on this, since any POSIX programmer should be well
acquainted with the fork(2) system call, and besides, there are tons of
other good resources on the 'net that explain fork(). For our purposes
here, it's enough to know that fork() is a relatively inexpensive way to
make an exact copy of the current process, creating a child process in the,
um, process! The parent process continues to run unchanged, and is isolated
from the child process in every important way.

fork(2) returns 0 to the child, and the process id (pid) of the child
to the parent, so in the code example above, that's why we check the return
code of os.fork() to know whether we're in the child or parent.

There are lots of icky semantics when fork() is used with threads, so it's
generally a bad idea to use fork in multithreaded applications. However, even
in single threaded applications, fork() can cause problems on macOS.

What about exec?

It is very common to call one of the exec*() family of functions right
after calling fork(). The exec*() functions replace the current
process with a new image, by executing the code in a file given by the first
parameter to exec*(). This is one of the most common pairs of calls to
run new programs on POSIX -- first you fork() to get a child process, then
the child exec's the new file, and you now have two independent processes
running. This common idiom is always safe since the child process after the
fork() is replaced by the new program. macOS fully supports the
exec-after-fork model.

The problem is that the above Python code sample, the prefork model, and
similar very common idioms all try to do additional work after the
fork() but before -- or even instead of -- exec*(). It's convenient,
especially in scripting languages, since you don't need a separate file to
exec*(). You also don't need to pass data from the parent to the child.
You just keep running code in the child stanza of your conditional statement,
and it inherits a copy of all the state of the parent process. As mentioned,
this can be dangerous in multithreaded applications, but is generally safe
(enough) in single threaded application. It is this idiom that is broken on
macOS.

What's the problem?

The basic problem is that the Objective-C runtime can't be both thread safe
and fork safe. In High Sierra, Apple clarified the rules for using the
Objective-C runtime between fork() and exec*(). Code which was
technically incorrect, but seemed to work before macOS 10.13 will now fail.
Objective-C +inititialize methods may not be called within this interval
due to the implicit acquisition of locks. With macOS 10.13 and beyond, the
process simply core dumps.

The problem is that your program may implicitly call a +initialize method
without you knowing it. In Python 3 for example, if you call into the popular
requests library, this will end up calling into the _scproxy module to
get the system proxies, and this will end up calling a +initialize
method. So you better not use requests between fork() and
exec*()! Setting the environment variable no_proxy = '*' will prevent
this, and avoid the crash, but it also bypasses the system proxies, which is
probably not what you want.

What's the fix?

The real fix is simply to avoid using this idiom in your Python code. Lots of
other projects are struggling with where and how to fix this problem.
E.g. should it be fixed in the prefork servers? Should it be fixed in the
language support (e.g. Ruby and Python)? Or should it be fixed in the
applications that run code between fork() and exec*()? It's a tricky
thing to answer, and some projects are adopting fixes while others are not.

Currently, Python does not implement any kind of fix, so it's up to the
individual applications to ensure that they are safe.

Ignore the problem

Individual users can prevent core dumps from filling up their disk by setting
the core dump size limit to zero. If you're using the bash shell (other
shells have similar commands), type this to set the core dump size limit to 0,
effectively disabling them:

$ ulimit -c 0

Without the 0 argument, you can print the current limit:

$ ulimit -c
unlimited

You can re-enable core dumps with this command:

$ ulimit -c unlimited

You can also disable core dumps system-wide by setting the following kernel
parameter:

$ sudo sysctl kern.coredump=0
kern.coredump: 1 -> 0

Note that neither approach prevents the child processes from crashing, it
just prevents your disk from getting filled. So it's not really a solution,
because the code in your child process will still not get executed.

Band-aid the problem

You see lots of recommendations to set the following environment variable:

$ export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

This does prevent the child process from crashing, but it comes with an
important caveat: You must set this outside of the forking process. It is
not good enough to do something like this in your Python code:

I haven't positively confirmed this with outside references, but my
experimentation has shown that this environment variable is only consulted
when the parent process first starts, so if you set it in the parent before
the fork(), it's ignored. Using os.putenv() doesn't help either.

Maybe this is good enough for you. At work, it's not, because we can't change
the environment for every possible process. We use both shiv and pex as
Python zip application formats (transitioning to all-shiv for Python >=
3.6) for command line tools (CLIs). For CLIs with multiple entry points, we
do use a wrapper script that forks-and-execs the zip application, and we can
set the environment variable there to prevent crashes. But for zip
applications with a single entry point, we don't use the wrapper, so there's
no universally good place to set this, and asking all the users to update
their various shell initialization scripts isn't viable.

Use ObjC classes with no +initialize overrides between fork() and exec().

Use pthread_atfork() to force your +initialize methods to run before fork().

Define environment variable OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES, or add a __DATA,__objc_fork_ok section, or build using an SDK older than macOS 10.13. Then cross your fingers.

Let's look at each of these in turn.

#1 and #2 are the real solutions because they use safer mechanisms to spawn a
child process, but both have the disadvantage of requiring you to implement
your child process in a separate file. This can be quite a serious drawback.
For us at work, it means designing a protocol to pass the required information
between the parent and the child, where with the old idiom, this data was just
maintained in local variables which get copied to the child upon
fork()-ing. Still, this is guaranteed to be safe, so it's what we'll
implement.

#3 and #4 are not, IMHO, practically effective because as I've mentioned, it
can be surprising what gets called in a scripting language like Python.
Before I debugged the problem, I never expected the calls from requests
into _scproxy into the Objective-C runtime.

#5 sounds promising, and I actually experimented with using the prepare
handler of pthread_atfork(), but I could never actually get it to work.
Another recommendation I found suggests that simply loading a framework
library will invoke its +initialize method. I wrote some code to
implement this. While it was fun to play with ctypes, this never actually
worked for me:

The list of frameworks I load was found by inspecting the core dumps with
lldb but try as I might, I could not find the right combination of loads to
prevent the crashes. It seems as if loading the framework doesn't actually
guarantee that its +initialize method gets called.

As I thought about it more, I'm not even sure a prepare handler and
calling pthread_atfork() is necessary. If this technique worked, why not
just call it inline before the call to os.fork()? Suggestions are
welcome!

I have not played with the other suggestions of #6, namely adding a
__DATA__,__objc_fork_ok section, but I'm skeptical that those will solve
the problem. And using an older SDK is also not a viable option.

Conclusions

I think the only safe thing to do on macOS is to call exec*() after
fork(), use the spawn family of functions, or better yet use the
subprocess module. I am going to rewrite our code at work to use the
latter, even though it's less convenient.

If you come up with any other reliably viable solutions, please do add to the
comments below, or on the open Python tracker issue.

Postscript

I have a real love/hate relationship with these types of problems. On the one
hand, they are frustrating and perplexing. All you have as evidence are some
core files, and you may not even notice them unless you have core dumps
enabled and notice your disk getting filled with them. Then the journey
begins!

On the other hand, they can be really fun to investigate. You have to be
adept at debugging C code, skilled at searching the interwebs, and clever in
your implementations. And even then -- as is the case here -- you may not
come up with a satisfying solution. But at least you learn a lot, and begin
to build a picture of what's going on that hangs together, even if there are
still some head scratching details, and little confirmation from other
sources.

But it can be a fun way to waste a week of work time, and it can provide
useful fodder for a blog that often gets neglected for long stretches of
time.