Unless it could replace the fork+exec code path in its entirety, which I do not believe is possible, I see posix_spawn() as a distraction and additional maintenance burden with no benefit.
http://pubs.opengroup.org/onlinepubs/7999959899/functions/posix_spawn.html
Read the RATIONALE section. The posix_spawn API was not created to make subprocess creation easier (i'd argue that it is the same burden to setup a proper call to posix_spawn as it is to do everything right for fork and exec).
One notable thing posix_spawn() does not support: setsid() (start_new_session=True) of the child process. Obviously it also couldn't handle the arbitrary preexec_fn but preexec_fn is in general considered harmful.

Our project (the Solaris packaging system, IPS), relies on posix_spawn() primarily for the ability to fork without making a large memory reservation (and possibly failing) because the forking process was itself very large. That's the (a?) bug benefit of posix_spawn() -- it's not a benefit for the programmer using it (who might have to fall back to fork/exec), but for the end-user that benefits from its streamlined operation.
You're right that it doesn't handle everything that subprocess.Popen() does -- though at least on Solaris there's a way to change the cwd in the file actions, and I'm sure we'd consider adding a way to do the setsid() as well. The rest should be possible cross-platform.

Danek, you might find https://github.com/dreid/posix_spawn/ useful, it provides bindings and a public API over posix_spawn (it's not complete yet, but if there's stuff missing, feel free to file a ticket!)

Cool. We implemented our own version as a straight-up native module (https://java.net/projects/ips/sources/pkg-gate/content/src/modules/pspawn.c), and our Popen replacement is not at present a complete replacement for the one in the stdlib, but it does what we need it to do. We'd absolutely switch if it came in to the stdlib, though, and I think there would be plenty of programs that would benefit from it (certainly a number of large Python programs in Solaris have run into exactly this problem).

Oh, for what it's worth, Solaris added setsid support to posix_spawn a few years ago, as a result of this conversation. I still think it would be worthwhile supporting this in the stdlib, since we keep running into processes which have a lot of memory reserved and can't fork some small program because they're running in a memory-limited environment.

To prevent subprocess/os.fork() doubling my memory after loading a large numpy array into memory, I now have to start my script with 650 calls to subprocess.Popen(), which just sit their waiting for some stdin to start doing something. This doesn't happen until after the numpy array is loaded, else I quickly run out of memory.
I think this sort of situation results in more unnecessary complexity compared to using posix_spawn, in spite of the concerns about using posix_spawn. Perhaps subprocess.Popen just needs a "posix_spawn=True" parameter?

I think someone wanting this will need to put forward a patch adding it to be reviewed and mulled over. As Alex mentioned in msg22571 - https://github.com/dreid/posix_spawn/ exists as does the code Danek pointed at in the next comment. try those.
I suggest someone who actively cares about a limited available process address space environment contribute this. (ie: not your typical 64-bit system)
fork()+exec() does not cause significant memory allocation, only a brief ~doubling of mapped address space, with backing pages being the originals just marked copy on write, but never written to by the child. The exec undoes that mapping.
It is technically possible to cause the copy on writes to happen if you immediately write to a large amount of memory in the parent process after the fork has happened before the exec has, but that seems like a rare timing problem that could even be worked around by monitoring the forked child to see that the exec has occurred before continuing.

I agree with everything you're saying Gregory, however I don't think the significance of the memory doubling is as inconsequential as you might first think. For example, i have on my 64bit Linux system 128Gb of RAM, and a numpy table that's around 70Gb. Spawning a subprocess, even though memory is doubled for a very short period of time, is enough to raise a MemoryError, despite the subprocess i'm spawning using only 2 or 3Mb after the exec().
I do appreciate that for most Python users however, they will not see much benefit from what I imagine is quite a lot of development work.
FWIW, I did try the posix_spawn module, but i couldn't figure out how to write data to the stdin of a posix_spawn subprocess, and gave up in place of the commonly recommended solution to this problem (via StackExchange) of spawning lots of subprocesses before you put stuff in memory. Fortunately, for my problem, this was a possible solution. For others I think they're going to have to use posix_spawn, or an entirely different programming language if that doesn't work.

All I'm really saying is that someone who wants this should provide a
patch/PR with unittests. :) I can help review and go from there.
It does make sense to me for it to be available as part of the subprocess
API if it is available at all, likely an alternative implementation of or
behavior flag to _posixsubprocess.fork_exec() with appropriate autoconf and
conditional compilation based on availability ifdefs.
On Fri, Mar 17, 2017 at 11:49 AM John Jones <report@bugs.python.org> wrote:
>
> John Jones added the comment:
>
> I agree with everything you're saying Gregory, however I don't think the
> significance of the memory doubling is as inconsequential as you might
> first think. For example, i have on my 64bit Linux system 128Gb of RAM, and
> a numpy table that's around 70Gb. Spawning a subprocess, even though memory
> is doubled for a very short period of time, is enough to raise a
> MemoryError, despite the subprocess i'm spawning using only 2 or 3Mb after
> the exec().
>
> I do appreciate that for most Python users however, they will not see much
> benefit from what I imagine is quite a lot of development work.
>
> FWIW, I did try the posix_spawn module, but i couldn't figure out how to
> write data to the stdin of a posix_spawn subprocess, and gave up in place
> of the commonly recommended solution to this problem (via StackExchange) of
> spawning lots of subprocesses before you put stuff in memory. Fortunately,
> for my problem, this was a possible solution. For others I think they're
> going to have to use posix_spawn, or an entirely different programming
> language if that doesn't work.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue20104>
> _______________________________________
>

Pablo’s code looked unfinished to me. As well as missing documentation, I suspect there may be memory leaks and poor error handling.
The two calls above the “fail:” label look like dead code. The “parse_envlist” result appears to be leaked.
I’m curious why you never call “posix_spawn_file_actions_destroy”. I saw on Open BSD <https://man.openbsd.org/posix_spawn_file_actions_init.3> it reclaims memory, and it seems sensible to call it on other platforms as well.
No error checking on any of the “posix_spawn_file_actions_” methods.
If “posix_spawn” fails, the “pid” variable will be returned uninitialized.

Does the PySequence_Fast result need releasing if the following “for” loop fails? There is a Py_DECREF only in the successful case, which seems inconsistent.
Does Python still support non-UTF-8 locales and bytes filenames? I haven’t been keeping up, but I assumed these things were still supported in many cases. It seems strange to only support UTF-8 in the “addopen” file action.
Pablo’s second PR currently <https://github.com/python/cpython/pull/5418/commits/e50bdb9> calls PyErr_SetString(PyExc_OSError, . . .) with a custom error message depending on which OS call failed. But I wonder if it is more important to set the “errno” attribute (which I think should choose an OSError subclass if appropriate). Perhaps you can do that by assigning the return values to “errno” and then calling PyErr_SetFromErrno. A disadvantage might be less context about which stage went wrong.

TypeError if “posix_spawn_file_actions_init” fails doesn’t seem right. I suggest OSError, MemoryError, or even plain Exception instead.
“File_actionsp” is set to point to a local variable “_file_actions”, but the variable goes out of scope before the pointer is used. This is technically undefined behaviour (even if it happens to work). Simplest solution would be to move the variables into the same outer scope.

Regarding the leak, I was under the assumption that as File_actionsp is pointing to a stack initialized _file_actions and is this last variable the one that is passed to posix_spawn_file_actions_init, it was not needed to explicitly call posix_spawn_file_actions_destroy as the variable will go out of scope and the pointer is pointing to a stack variable. Am I missing something?

Your assumption about calling “file_actions_destroy” would be okay if the posix_spawn_file_actions_t object was a simple object or structure. But I imagine most implementations would allocate memory when you call one of the “add” methods. Especially “addopen”, which is supposed to copy the filename string somewhere. Looking at “uclibc” <https://repo.or.cz/uclibc-ng.git/blob/v1.0.28:/include/spawn.h#l263>, I see it calls “free”, because it allocates an array for all the file actions.

> perhaps it would be better to remove this undocumented feature from 3.7
We need to make a decision about this before the b4 cutoff in 4 weeks. But it would not be fair to ask people to put in the effort to try to fix things if there is not a good chance for success and if people's time could be better spent on currently more important issues. Greg, Pablo, others: opinions?

PR 6332 is overall a good cleanup. I pushed a couple changes to that PR and left comments for some others. It should go in.
There are basic tests for os.posix_spawn in test_posix.py. More would be nice as we don't currently have any for the error paths.
The API is intentionally low level and simple as that is what the os (posix) module is all about.

The current implementation looks half-baked to me. It doesn't implement the following features:
1. posix_spawnp(). It is like posix_spawn(), but searches an executable in PATH.
2. Passing various attributes of the created child process. POSIX_SPAWN_SETSIGMASK, POSIX_SPAWN_SETSCHEDPARAM, etc.
And I have doubts about introducing constants like POSIX_SPAWN_OPEN:
1. There is no need to make them integers. They could be strings (for readability) or opaque values.
2. Their names can conflict with future standard constants related to posix_path().
Implementing new features in 3.8 can conflict with the current design. Removing posix_spawn() in 3.7 and deferring the work to 3.8 still looks a better solution to me.

> The current implementation looks half-baked to me. (...)
I would prefer to see a full implementation before Python 3.7 final, or to remove the feature from Python 3.7 and redo it in Python 3.8.
It seems doable to finish the implementation before Python 3.7 final.

I have open https://github.com/python/cpython/pull/6693 to start iterating over the missing capabilities of posix_spawn (Passing various attributes of the created child process).
Please, review to make sure that the design is OK and I will proceed to add tests.
Thanks!

Since this is a new API, the most important thing is to have decided how we want it to look in 3.7, it is okay for it to be missing features so long as the API doesn't prevent them from being added in the future.
ie: It is fine to ship it without posix_spawnattr_t attrp "flags" support and without the "p" variant supported in 3.7.
Regarding PR 6693, I agree, keyword only arguments for the flags instead of a sequence of tuples is a good idea. Since 3.7b4 has shipped I'm not sure we should even consider adding new _features_ to this new API in 3.7 to appear in rc1 - though being entirely new it wouldn't break anything, it just doesn't give us any change to revise them. I'd be conservative at this point.

To wrap “posix_spawnattr_setschedparam” perhaps you could combine it with the scheduler policy:
# Inherit current policy and parameters:
posix_spawn(..., scheduler=None)
# Set new policy with parameters:
posix_spawn(..., scheduler=(policy, param))
# Inherit current policy but set new parameters:
posix_spawn(..., scheduler=(None, param))

I have doubts about combining setschedpolicy and setschedparam.
On one hand, they are set by independent functions in C and have separate flags. When pass arguments separately we don't need to invent a parameter name: just remove the common "posix_spawnattr_" prefix from corresponding C function names. Passing scheduler=(None, param) looks clumsy. And from implementation side, parsing separate arguments is easier that parsing a single argument as a 2-tuple.
On other hand, these arguments are not completely independent: the setschedparam argument is required if setschedpolicy is specified.
Passing as separate arguments still looks more preferable to me.

I propose the following changes to the file_actions parameter in PR 6725. Currently it is the fourth positional-only parameter with the default value None.
1. Change its default value to an empty tuple. None will no longer be accepted as a special case. This will simplify a mental model.
2. Since the order of parameters already doesn't math the natural order of the corresponding C function, and many other keyword-only parameters will be added in PR 6693, it makes sense to make file_actions a keyword-only parameter too.

For Python 3.7, I discussed with Pablo and he seems to be ok to remove the function from Python 3.7, to get more time to polish the API in the master branch (future Python 3.8).
Can someone please write a PR to remove the function from the 3.7 branch?
I dislike the idea of applying a significant change on 3.7 to change the API between a beta and a rc release. IMHO it's too late, and we might find some API bugs. I would prefer to get more feedback on the API before setting it in stone (once a 3.x.0 version is released, it's really hard to change the API).

I suggested the “scheduler” tuple to bring the two related parameters (scheduling policy and sched_param) together, similar to how they are paired as the second and third parameters to “os.sched_setscheduler”, and because I thought it would imply that a scheduling policy without sched_param is not valid.
But separate keyword parameters would also work if you prefer. I might call them “schedpolicy” and “schedparam”, but if you prefer the naming scheme in the current proposal, you could call them “setscheduler” and “setschedparam”. Although in that case, setting the “setscheduler” parameter on its own would not be sufficient to correctly set the POSIX_SPAWN_SETSCHEDULER flag.

I think that is the biggest argument towards using a tuple: that just setting the priority is not enough (and also is decontextualized as different policies have different priorities). On the other hand one could say that the API is a low level API and therefore the user should be aware of such a problem. Also, using a tuple makes the API a bit asymmetric as Serhiy mention but I think this can compensate for the conceptual map the will be established (as in each priority is associated with a policy) if the tuple API is used.

As for the scheduler interface, yet one option is using two mutually exclusive parameters setschedparam and setscheduler. The first take a sched_param, the second takes a pair: int and sched_param. This will not simplify the implementation, but they directly correspond to functions os.sched_setparam() and os.sched_setscheduler(). I don't say that this interface is better than alternatives, I just mention yet one option.
posix_spawn() can be easily implemented via fork()/exec(). See the reference implementation: http://pubs.opengroup.org/onlinepubs/009695399/xrat/xsh_chap03.html . It can be directly translated to Python. Seems the only value of posix_spawn() is on systems that don't provide fork()/exec(), but provide posix_spawn(). Is posix_spawn() supported on Windows? What are other systems supported by Python that don't provide fork()/exec()?
I'm for removing os.posix_spawn() in 3.7. Even if we will accept the current interface and merge PR 6693 we can find other problems with the interface or the implementation. And os.posix_spawnp() still is not implemented. It is just too dangerous to add such complex feature between the last beta and RC.

posix_spawn can still be found in Python 3.7:
configure:11243: posix_fallocate posix_fadvise posix_spawn pread preadv preadv2 \
configure.ac:3470: posix_fallocate posix_fadvise posix_spawn pread preadv preadv2 \
pyconfig.h.in:710:/* Define to 1 if you have the `posix_spawn' function. */
This check has been added by the commit 6c6ddf97c402709713d668d0ed53836a7749ba99 and should be removed from Python 3.7 as well, no? It's not a release blocker, but it may be good to remove posix_spawn from configure.ac (and related generated files).

> Thank you for your contribution Pablo. This issue have appeared much more
complex and less obvious than it looked initially.
Yeah, thanks Pablo for your tenacy! This function is interesting in term of
performance and correctness ("atomic" function, signal safe?).