Piping, Forking, and Gathering execvp's Output

This is a discussion on Piping, Forking, and Gathering execvp's Output within the C Programming forums, part of the General Programming Boards category; Hey guys,
I'm building a minishell in C(for unix). I'm doing command expansion, so lets say I have:
Code:
echo ...

Piping, Forking, and Gathering execvp's Output

I'm building a minishell in C(for unix). I'm doing command expansion, so lets say I have:

Code:

echo a $(echo b c)

it works fine, however if I then do this:

Code:

echo a $(echo b)

it prints:

Code:

a b c

I think I've figured it out, when I'm reading the output of echo(from my pipe), echo does not output any null terminator, so I go on for a bit until I run into one. This results in me picking up a bunch of garbage. Currently my read loop looks like this:

So here it picks up a bit extra. I was thinking of simply turning the last '\n' into a 0, but I can't figure out how to start at the end of the string, and work my way back(if I did I probably wouldn't be having this problem), and if I start at the beginning and work forward, I may remove a newline that was meant to be there.

I was also thinking that I could write a null character right after echo executes, however, I don't know how I'd do this. When echo writes, it looks like this:

All this talk of \n and \0 is way off being the right answer.
The low-level read() and write() functions just deal with a 'count' of the number of bytes processed. They pay no attention at all to the content of the stream.

First of all, you cannot assume a single read() call reads everything. It can return a short count (only some of the data available), or -1, even without errors. In particular, if a signal is delivered while a read() call blocks, the call will return -1 with errno==EINTR. And since you are using pipes and child processes, you should be handling signals anyway.

You'll know you have read all data when read() returns zero. (Or if it returns -1 with errno set to something other than EINTR, EWOULDBLOCK, or EAGAIN.)

Therefore, what you need is a loop that reads more into a (preferably dynamically allocated) buffer, until read() returns zero (indicating end of input) or returns -1 with errno set to indicate a real error (in which case you should probably abort).

Second, the data you read is not strings. It is just data. You cannot assume it is a string. You cannot use the string functions to manipulate it; you must treat it as a character array instead. (The difference is that a string is terminated, whereas a character array has a specific length.) For example, consider the case where you convert an image using the Netpbm tools and a pipe. If you assume it is a string, you'll break the data! Instead, you simply need to record the amount of data read.

For command substitution, you will convert the data read into one or more strings (tokens). I don't know what expansion rules you have chosen, but I think you want to split $(command...) at whitespace into separate tokens (ignoring leading and trailing whitespace, and treating consecutive whitespace as a single separator).

So, you read() all data into a buffer, then convert it into one or more strings. The conversion rules are simple: skip all NUL (\0) and whitespace characters ([URL=http://www.kernel.org/doc/man-pages/online/pages/man3/isspace.3.html]isspace()[/FONT]) in the buffer first. Copy the consecutive non-NUL, non-whitespace characters to your command line buffer. When you see a NUL or whitespace, skip them, and if followed by a non-NUL, non-whitespace character, start a new token. When you arrive at the end of the buffer -- remember, you must keep track of the index so you won't go past the number of chars you read() --, you have converted the output to your command line.

All this talk of \n and \0 is way off being the right answer.
The low-level read() and write() functions just deal with a 'count' of the number of bytes processed. They pay no attention at all to the content of the stream.

I'm actually writing in the child process(I'm not even writing, I'm just dup2(outFD, 1), so that exec automatically writes for me). So here's my dilemma, how do I determine how many characters were written when I am reading in the parent? As far as I know, execvp doesn't return any info describing how many characters/bytes it wrote. I need to know this so I know when to stop copying from buf to my expanded string.

Here's another dilemma(that I didn't know I was having until I read your reply). I eventually fixed my program, but the way I did it was by removing the while loop, and simply changing it to:

The guys below says that this is wrong, because read won't necessarily read all the characters written to the pipe. So, how do I know when to stop reading? Will it attempt to read BUF_SIZE bytes? Does execvp insert some sort of terminating character? So if read takes a few times(meaning I would need to implement a loop), how could I get the cumulative number of bytes written? Would the following work?

First of all, you cannot assume a single read() call reads everything. It can return a short count (only some of the data available), or -1, even without errors. In particular, if a signal is delivered while a read() call blocks, the call will return -1 with errno==EINTR. And since you are using pipes and child processes, you should be handling signals anyway.

You'll know you have read all data when read() returns zero. (Or if it returns -1 with errno set to something other than EINTR, EWOULDBLOCK, or EAGAIN.)

Alright, well the next part of this current assignment is to add signal handling, so I can start doing that now. I wasn't aware that read may not read it all at once, and I wasn't aware that it may return -1 without error. So basically what I need to do is the following?:

Therefore, what you need is a loop that reads more into a (preferably dynamically allocated) buffer, until read() returns zero (indicating end of input) or returns -1 with errno set to indicate a real error (in which case you should probably abort).

Wait, dynamically allocated? I'm somewhat confused if it's being dynamically allocated, I'm not using malloc, but the function is being called recursively, so it may allocate any amount of memory. I have to be really careful to prevent stack overflows(according to my teacher), because he will be testing my script with command expansion up to 200,000 characters(with 256 byte buffers).

Second, the data you read is not strings. It is just data. You cannot assume it is a string. You cannot use the string functions to manipulate it; you must treat it as a character array instead. (The difference is that a string is terminated, whereas a character array has a specific length.) For example, consider the case where you convert an image using the Netpbm tools and a pipe. If you assume it is a string, you'll break the data! Instead, you simply need to record the amount of data read.

Alright, I'm starting to get this now. How exactly does read know when to stop reading? How does it know when execvp stopped outputting to my pipe?

For command substitution, you will convert the data read into one or more strings (tokens). I don't know what expansion rules you have chosen, but I think you want to split $(command...) at whitespace into separate tokens (ignoring leading and trailing whitespace, and treating consecutive whitespace as a single separator).

Ya, I am taking a single $(cmd ...) found in a string, and then expanding it. Since the entire cmd expansion system is recursive, it automatically separates it at whitespaces(and condenses whitespace to a single space)

So, you read() all data into a buffer, then convert it into one or more strings. The conversion rules are simple: skip all NUL (\0) and whitespace characters ([URL=http://www.kernel.org/doc/man-pages/online/pages/man3/isspace.3.html]isspace()[/FONT]) in the buffer first. Copy the consecutive non-NUL, non-whitespace characters to your command line buffer. When you see a NUL or whitespace, skip them, and if followed by a non-NUL, non-whitespace character, start a new token. When you arrive at the end of the buffer -- remember, you must keep track of the index so you won't go past the number of chars you read() --, you have converted the output to your command line.

I'm having trouble following this part. Read will automatically separate the buffer into tokens for me? How does it determine where to separate them? So if I come across two consecutive NUL or whitespace characters, or if I reach the end of the buffer length, I can stop copying? Currently I'm copying the from buf to my expanded string as follows:

This is obviously some crappy code, I don't even check to see if I am going beyond the bounds of the buffer. As you can see in my first code tag, I mark what I consider to be the end of the buffer with a NULL terminator. So this copy loop is completely based off of that.

Thanks for the help, you have really saved me several hours of confusion. Once I figure out these additional questions I'll be able to finish cmd expansion entirely. If you need any additional info just ask

I wasn't aware that read may not read it all at once, and I wasn't aware that it may return -1 without error. So basically what I need to do is the following?

Here's the actual code I'd use if I was you.

It takes the file descriptor, a pointer to a buffer pointer, a pointer to the buffer size, a pointer to the size of data already in buffer, and the size to reserve at the end of the buffer, as parameters. It reads everything from the descriptor (until end of input) into the buffer, dynamically growing it if necessary.

You can reuse a previous buffer, allocate an initial one, or set the pointed-to values to NULL, 0, 0. Remember to free() the buffer after you no longer need it. See the example at end of this message.

The function ignores signal delivery interrupts, but will return an error if the descriptor is nonblocking and no data is immediately available. (File descriptors are blocking unless you specifically ask or set it nonblocking.)

The function interface is such that if you deem an error unimportant, you can simply call the function again, to read the rest of the input. You can even read data sequentially from multiple sources into one buffer. You can even pre-fill the buffer with your own data.

I would not normally show such complete code, but this is so often implemented only partially, or downright wrong, that I feel it should be shown completely.

Here is an example main() to explore how the function works. I'll even throw in an example trim(), that converts the data into a string (adding a '\0' at the end), replacing all ASCII control characters and whitespace with a single space, and trimming out leading and trailing control characters and whitespace.

I hope you get your program working. I can imagine the contortions you have to do to make it work with just local variables (on stack)..

Originally Posted by Scriptonaut

Alright, I'm starting to get this now. How exactly does read know when to stop reading?

Whenever read() returns zero, it means there is no more data to read. Either you are at the end of the file, or the other end of the pipe or socket closed the connection.

(On the command line, pressing Ctrl+D causes that to happen; it just does not close the connection. It only tells the process reading the input you do not intend to provide any more input.)

Originally Posted by Scriptonaut

Ya, I am taking a single $(cmd ...) found in a string, and then expanding it. Since the entire cmd expansion system is recursive, it automatically separates it at whitespaces(and condenses whitespace to a single space)

In that case, you could use the data read by read_all() , remove any embedded '\0', then append an end-of-string '\0' to the data -- just like my trim() function does in the example. That converts the data into a string you can supply it to your command expansion system.

Originally Posted by Scriptonaut

I'm having trouble following this part. Read will automatically separate the buffer into tokens for me?