Trying to understand the behaviour of the environment in Linux (Ubuntu 13.04 concretely), I've find different situations where setting envirionment variables are used or defined for/in different contexts. For example, if I check, locale, I get:

The first thing the shell makes when executing a program, is call to execve, which does really call a program. Its first argument is the program being called, the second is the argv params of the called program, and the third parameter is the environment variables.

In that 3rd parameter, for example, doesn't appear either PS1 or LC_TYPE.

In general, variables appearing in env or set appear in the list of environment variables sent to execve. Some locale variables appear in env or set but others not (LC_TYPE, LC_COLLATE and LC_MESSAGE, as well as LC_ALL but with an empty value). Lastly, other variables are not defined in env although they have a visible effect (PS1), as reflected by set.

What's going on here? What are the differences between env, set (without arguments), locale (obviously respect to locale variables only)?

3 Answers
3

The primary issue here -- which accounts for why, e.g., $PS1 is not reported by env -- is that env is reporting from a non-interactive environment. Processes are executed from a fork of your interactive shell, but there's a subtlety involved in how their environment is set: It's actually inherited via a native C level external variable set for all exec()'d processes (see man environ). Here's an illustration:

The only difference is the name of the executable. So where does **environ come from and why doesn't it contain, e.g., $PS1?

The fundamental explanation is that process are always created as children of other processes and they inherit **environ, but PS1 was never part of it. At start up, a shell may source variables from standard places, and those places differ depending on whether the shell is interactive or not; see INVOCATION in man bash. An aspect of this is that:

PS1 is set [...] if bash is interactive, allowing a shell
script or a startup file to test this state.

Now, notice in /etc/bashrc something like this:

# are we an interactive shell?
if [ "$PS1" ]; then

Which is where your actual (fancy) prompt is set, and neither it nor the initial value of $PS1 were ever exported. The initial value was created by the shell at invocation because it was interactive, and then it sourced then that file -- but PS1 did not get put into **environ. You can see this if you execute:

#!/bin/sh
echo $PS1

Nothing -- even though if you echo $PS1 in your interactive shell it's defined. This is because the **environ of the executed #!/bin/sh is the same as that of the parent interactive shell, but that does NOT contain PS1. This implies each shell uses an internal table of global variables separate, but originally populated, from **environ (this is confusing, since it means **environ does not include many things referred to as environment variables).

The contents of **environ are in in /proc/[PID]/environ, and if you check that for your current interactive shell, cat /proc/$BASHPID/environ, you'll see PS1 is not there.

But how does stuff get into "environ"?

The simple answer is, via system calls. For example, if we throw some stuff into the example C program from earlier:

MYFOO=whatbar? will be in the output (see man putenv). Since the shell creates processes by fork()ing (which duplicates the parent's memory stack) and then calling execv() (which passes on the duplicated **environ), we can see a mechanism by which environment variables may be exported to child processes.

If you throw a fork() into that example, you'll see this is the case, and (to reiterate), this process of fork'ing and potentially exec'ing is how child processes are created and inherit **environ from their ancestors. exec calls replace the process image, but as per man execv and man environ (nb. some versions of the former do not refer to this), **environ is passed on by the system.

Here's a literal fork and exec of /usr/bin/env with MYFOO=whatbar? exported via putenv():

So where's the stuff that's not in "environ"?

It's private data of a particular shell instance. Bash will show you this + the inherited environ stuff via set with no arguments. Note this output also includes sourced functions.

But, if I find, for example, LC_CTYPE using env | grep "LC_CTYPE", it sends no output. In general, locale shows me 13 LC_* variables and env only nine:

I get no LC_ variables at all from env (just LANG) but 13 from locale. I would presume these are variables set by a locale call and not exported; the fact that you get any from env perhaps reflects a naive error in some configuration somewhere.

Why do you draw the line around "non-interactive subshells"? PS1="foo "; bash Is the new bash not an "interactive subshell"?
–
Hauke LagingApr 6 '14 at 14:26

Right, but that's not my point. My point is that interactive and non-interactive "subshells" (i.e. explicitely called shells) behave the same way.
–
Hauke LagingApr 6 '14 at 14:30

I think you look at the wrong area here, mix up the two. The difference between exported and non-exported variables ends at the execve(). At that point new processes are all the same, whether it's env or bash. What happens after the execve() is special to the program. And it is irrelevant whether the program has been started from a shell or whatever. Shell invocation is IMHO completely irrelevant for understanding the export mechanism.
–
Hauke LagingApr 6 '14 at 14:41

@Peregring-lk : All apologies for the previous ambiguities. After being pushed by Mr. Laging, I did some deeper research and experimentation to discover where the apparent dependencies in environment come from, and have updated my answer with those, I think in-depth and definitive, findings!
–
goldilocksApr 6 '14 at 18:36

@TAFKA'goldilocks' Thanks for your complete replay; and finally, where can I see non-exported variables? In other words, where are grouped all variables including those not collected by env?
–
Peregring-lkApr 6 '14 at 20:42

"internal" variables which are known to the shell only (and to subshells)

exported variables, the "official" ones which are seen by execve and thus by env. The shell builtin export shows you the exported variables.

If you execute

export PS1

and repeat

env | grep "PS1"

then you see it. Variables can be exported during creation (export foo=bar instead of foo=bar), they can be exported automatically on creation or modification (set -a), they can be exported later (var=foo; ...; export var) and they can be "unexported" (export -n var).

If the shell creates "real" subshells (by a|b, (a;b), $(a) and so on) it keeps several non-exported variables in order to avoid chaos.

+1 The fact that subshells inherit non-exported variables that other child processes do not is an interesting case, I guess accounted for by the fact that a subshell is a fork with no exec. This also means "subshell" does not refer to subprocess shells executed via a shebang. Thanks for arguing some of these points with me!
–
goldilocksApr 6 '14 at 23:00

The output from the locale command is not a list of environment variables from the current environment. It is a display of that process' effective locale settings (which is influenced in part by certain environment variables) and is presented in the same key=value format that the env command uses.