Bash and the process tree

The process tree

The processes in UNIX® are – unlike in other systems you may have seen – organized in a tree. Every process has a parent process that started it or is responsible for it. Also, every process has an own context memory (I don’t mean the memory where the process stores its data, I mean memory where data is stored that doesn’t directly belong to the process, but is needed to run the process): The environment.

To make it really clear I want to repeat it: Every process has its own environment space.

The environment stores, beside other stuff, data that’s useful for us: The environment variables. These are strings in the common NAME=VALUE form, but they are not related to shell variables. A variable named LANG, for example, is used by every program that looks it up in its environment to determinate the current locale.

Attention: A variable that is set, like with MYVAR=Hello, is not automatically part of the environment. You need to put it into the environment with the export utility:

1

export MYVAR

Common system variables like PATH or HOME usually already are part of the environment (as set by login scripts or programs).

Executing programs

All the diagrams of the process tree use names like “xterm” or “bash“, but that’s just for you to understand what’s going on, it doesn’t mean it really runs processes with these names.

Let’s take a short look what happens when you “execute a program” from the Bash prompt, a program like “ls”:

1

$ls

Bash will now perform two steps:

It will make a copy of itself

The copy will replace itself with the “ls” program

The copy of Bash will inherit the environment from the “main Bash” process: All environment variables will also be copied to the new process. This step is called forking.

For a short moment, you have a process tree that might look like this…

1

xterm-----bash-----bash(copy)

…and after the “second Bash” (the copy) replaced itself by the ls-program (it execs it), it might look like

1

xterm-----bash-----ls

If everything was okay, the two steps resulted in one program being run. The copy of the environment from the first step (forking) results in the environment for the final running program (ls in this case).

What is so important about it? Well, in our example, whatever the program ls will do inside its own environment, it can’t have any effect to the environment of its parent process (bash here). The environment was copied when ls was executed. That’s a one-way! Nothing will “copy it back” when ls terminates!

Bash playing with pipes

Pipes are a very powerful tool. You can connect the out- and inputstreams of two separate programs, and thus create a new utility – or better: a new functionality. Well, we’re not here to explain piping, we just want to see how they look in the process tree. Again, we execute some commands – ls and grep:

1

$ls|grep myfile

It results in a tree like this:

1

2

3

+--ls

xterm-----bash--|

+--grep

Just to be boring again: ls can’t influence the environment of grep, grep can’t influence the environment of ls, neither grep nor ls can influence the environment of bash.

How is that related to shell programming?!?

Well, imagine some Bash-code that reads data from a pipe. Let’s take the internal command read, which reads data from stdin and puts it into a variable. We run it in a loop here – we count input lines…:

1

2

3

4

counter=0

cat/etc/passwd|whileread;do((counter++));done

echo"Lines: $counter"

What? It’s 0? Yes! The number of lines might not be 0, but the variable $counter still is 0. Why? Remember the diagram from above? I’ll rewrite it a bit:

1

2

3

+--cat/etc/passwd

xterm-----bash--|

+--bash(whileread;do((counter++));done)

See the relation? The forked Bash will count the lines like a charm. It will also set the variable counter like you wanted it. But if everything ends, this extra process will be terminated – your variable is gone – R.I.P. You see a 0 because in the main shell it always was 0 and never something else!

Aha! And now, how to count those lines? Easy: Avoid the subshell. How you do it in detail doesn’t matter, the important thing is that the shell that sets the counter must be the “main shell”. For example, do it like this:

1

2

3

4

counter=0

whileread;do((counter++));done&lt;/etc/passwd

echo"Lines: $counter"

It’s nearly self-explaining. The while-loop runs in the current shell, the counter is increased in the current shell, everything vital happens in the current shell, also the read-command sets the variable REPLY (the default if nothing is given), though we don’t use it here. This small script should work.

Actions that create a subshell

Bash creates subshells or subprocesses on various actions it performs:

Executing commands

But imagine your command actually is a script that sets variables you want to use in your main script. This won’t work.

For exactly this purpose, there’s the source command (also: the dot. command). It doesn’t really actually execute the script like it would execute any other program – it’s more like including the other script’s source code into the current shell:

1

2

3

source./myvariables.sh

# equivalent to:

../myvariables.sh

Pipes

The last big section was about pipes, so no example here…

Explicit subshell

If you group commands by enclosing them in parentheses, these commands are run inside a subshell: