Does popen leak file descriptors to the command it executes? For example, I call curl with popen and end it with & ampersand so that my main process would immediately continue without waiting for curl to finish.

How to programmatically find out what is the maximum argument length popen can handle? ARG_MAX is not it because when I tested this popen was failing already with argument length of ARG_MAX/30.

popen does not execute the command directly. If it did, "&" would have no meaning. The command is being interpreted by /bin/sh -c. I wonder if your command would be killed by -c if not also disowned by the sh process? I'm not sure.

In glibc popen is basically just calling __execve, which as far as I can tell is just the system call implemented in the kernel in Linux. The only thing I saw in glibc is that if the number of arguments passed is greater than INT_MAX then it will error out. That's a minimum of 2-billion and change so unless you have a very exceptional command that likely isn't being hit.

The only information I could find was that it used to be based on ARG_MAX (in my Linux system that is defined as 128 kB) or on MAX_ARG_PAGES, which is apparently 32. With 4 kB pages that again yields 128 kB. That also includes the size of the environment so keep that in mind. These days it's based on 1/4 of RLIMIT_STACK, which is the stack size limit for a process. I don't know what that is, but I would guess it's in the hundreds of MiB or GiB (depending on your hardware and architecture probably). Of course, it would take out of the space available to the program for its stack, but again it seems unlikely you'd reach any of these limits. This is all specific to Linux though. You haven't told us which OS you're using.

The question is, how long is your command (keep in mind there is an additional size of "/bin/sh -c " added in the case of popen)? And how do you know that it's failing because of the length of the command? It might help if you tell us exactly what errors or return codes you're getting.

I think that the '&' is causing 'sh -c' to halt it immediately. My sh is dash and the manpage only says that if the shell is not interactive then stdin is set to /dev/null. I don't think that should cause this behavior so I'm not sure.

Append:

OK, here we go. Now I see. If I list processes with ps then "./sleep 3500" is still running. That makes a bit of sense. You told sh to background it so it did just that. sh was non-interactive so it exited normally. ./sleep is left running doing its thing. If you need control of those curl processes so they can't be left behind like this then you should probably not be using popen. You may need to use fork and exec or something like that to achieve this properly.

I'm programming it on Linux Mint but ideally it should be POSIX compliant. Below is the problematic function. I just added spaces at the end of the cfg parameter before calling str2hex on it. At some point the command didn't execute properly because netcat didn't respond me with anything.

Ideally I'd like to programmatically find out the maximum length of a single argument given to popen as the command to execute because then I could implement a fallback to named pipes.

This number looks suspiciously similar to the approximate command length that caused popen to fail. ARG_MAX is 2097152 in my system. When I divide it with 20 I get 104857 which is the kind of length where popen failed in my tests.

Maybe it would help to split up the command into a few different steps. I've gathered and hacked up this little program to assist with both writing and reading to a program as a pipe. Might be helpful with such a large pipeline.

The type argument is a pointer to a null-terminated string which mustcontain either the letter 'r' for reading or the letter 'w' forwriting. Since glibc 2.9, this argument can additionally include theletter 'e', which causes the close-on-exec flag (FD_CLOEXEC) to beset on the underlying file descriptor; see the description of theO_CLOEXEC flag in open(2) for reasons why this may be useful.

I'm uncertain about the importance of FD_CLOEXEC. It almost sounds like it should almost always be on by default for popen. Unless you happened to be popening a special command that expected that file descriptor to be open...

I think it's safe to use it here, but I think that's also a Linux extension so it will make the program less portable I think, which you expressed as one of your goals. The alternative, I think, would require you to wrap the program you wanted to run (e.g., curl) in a wrapper program or script that closes that file descriptor first. That said, I think that the called program would have to operate on a file descriptor that it didn't open for it to be affected, and it would be getting a file descriptor to one of its stdin/stdout handles anyway so it's probably pretty harmless...

Sounds like bash and dash both close a file descriptor if you redirect it to "&-". So this wrapper might suffice (but you'd have to ensure all supported shells work this way too, or you'd again limit portability):

That's in theory... In practice, this doesn't seem to work. Writing an equivalent C program that goes along with your program might be more portable anyway... But then that would probably have to reimplement popen to achieve this portably so probably the answer is just don't use popen if portability matters and this leaked file descriptor is significant.

While this could probably allow me to make very fast HTTP requests straight from my main program in a non-blocking manner without using fork, it certainly would have a learning curve. So far I have only used simple curl requests API in my practice. This multi interface definitely looks promising but since I need to get this done already I wouldn't like to spend more time researching completely new possibilities to solve the matter. In fact, yesterday I made some pretty good progress with the initial approach. Below is the code that falls back to named pipes if the command line would be too long.

With trial and error I found out that maximum command length is ARG_MAX/16 - 1. Could there be any portable way of determining that limit? I'm afraid ARG_MAX/16 could be either a coincidence or specific to my particular development platform.

I did look at xargs source. And it was ugly. I mean there were quite a few checks to determine that length and in some cases I believe the fallback was just to gradually increase the length of a string and feed it to command line interpreter until it fails. I believe I did see some reference to a guaranteed minimum buffer size for the command line and it was 4096. So I am now simply using that number in my code because in my case it is enough in 99% of cases. And those 1% of cases when 4096 is not enough then even 131072 would not be enough so I'm using named pipes fallback anyway.

Here's the new version of my function. I had to get rid of curly brackets between printf commands because turned out that when curl made requests to remote hosts rather than to localhost then the delay between printf and curl subsequent commands were so big that netcat registered an EOF from the first call to printf, after which curl closed with error "23 failed writing body". This problem was not manifesting when I tested locally. A fixed version uses xargs to turn the stdin into a command line parameter of another command.

edit:Damn, I almost forgot that the fallback also has to get rid off semicolons and printf to add a prefix and suffix for the string returned by curl before feeding it to netcat. And in case of non-fallback if the string returned by curl is too long then xargs will fail also.

Any ideas how to add a prefix and suffix to the string that we get from stdin before directing it to stdout?

1) You get a sting from stdin which you store in string_in2) You allocate a new empty string newstring3) append prefix to newstring4) append string_in to newstring5) append suffix to newstring6) cout newstring ?

I was unable to figure out why the shell started from popen did not finish properly. I still don't quite know why the previous code does not work but after adding unbuffer -p in front of the netcat command everything started to work. It is even more strange that stdbuf -i0 -o0 -e0 in front of netcat did not make any difference. So the unbuffer command does some magic that made my program work.

This stuff is quite hard to debug because when I start those commands from my terminal window they both finish nicely. The first version of the command is only malfunctioning when it is started with popen from the very process that is supposed to receive the curl response via netcat in the end.

The issue then is that somehow buffered IO was causing something to fail?

Site note: String manipulation is going to require "slow" memory allocations that likely will be immediately thrown out after. Avoid needless string memory options using a std::stringstream to build the final string or writing directly to the output stream instead of concatenating into a string first.

1) You get a sting from stdin which you store in string_in2) cout prefix3) cout string_in4) cout suffix ?

FTFY. Or if you need the whole string, use the stringstream as above. I imagine it's more efficient (modern libraries usually have a "StringBuilder" or equivalent that does this same thing to avoid constantly reallocating string objects...at least in C++ the same buffer might just be getting extended without copying).

While I'm nitpicking, printf '%s' 'x' is the same as echo -n 'x' (but maybe that's not as portable?).

While I'm on the subject of portability, my netcat in Linux doesn't have -I or -O options... Are you sure that's portable?

Well those -I -O options didn't do anything good anyway. So I have already dropped these. Everything would have been perfect if unbuffered would have worked for large responses too. This is getting sad already turns out the unbuffer function gave me only the first n bytes of the curl response so it is of no use. I am running out of ideas.

edit:This is becoming ridiculous. I want to add just a string prefix and suffix to whatever curl writes to its stdout before feeding it to netcat. And it does not work, I've tried hundres of combinations, using stdbuf and unbuffer, some hacks with variables. Unbuffer almost worked but it started failing on larger curl responses.

About std::strings being slow, I'm not buying it. Let's say I reserve 1000 bytes for the string right away and then append to it, no way it would be slow.

Maybe it would help to debug to write each stdout to files so that you can inspect the results at each stage? Instead of piping, write to a file, and then substitute that file for the stdin of the next process.

Append:

Agreed, if you reserve space for the string ahead of time you can avoid the overhead. That is, if you know a reasonable size to use. If it can vary a lot, you could be wasting lots of memory or still run into reallocation overhead.

Yes I tested with files and it worked well. I did not test file based logic with large curl responses though. Perhaps I could utilize named pipes somehow, if I'm using them anyway in case of requests that have large bodies.

AppendI just made humongous progress solving this issue. I suspect there is a bug in the popen system call or perhaps in Linux kernel. Turns out that if I called popen in write mode instead of read mode then everything started working as expected.