Thursday, March 28, 2013

Linux processes explained

http://mylinuxbook.com/linux-processes-part1

Generally, on any operating system, we say we have so many programs
running. These running programs introduce the concept of processes.
Let’s Define as to what is a process – A process is a program in
execution.
Robert Love expresses his definition of a process in one of his books as :

The Process is one of the fundamental abstractions in Unix Operating Systems, the other fundamental abstraction being files

Linux is a multi-user and multi-tasking operating system(seemingly,
discussed later in the article). A Linux process is a program in
execution on a Linux system. Therefore, whenever a program is executed, a
new process is created. A process also consumes resources like the file
system, memory or other CPU resources. This gives rise to the need of
process management in Linux.

Identifier for Linux Processes

In Linux, every process has a unique process Identifier(ID)
associated to it. A process ID( i.e. PID) is a number which is uniquely
assigned as soon as the process is created. The PID’s are allocated
sequentially as the processes are being created. However, it generally
starts from 2, as PID=1 is reserved for ‘init process. As we always
expect, there is a maximum limit to the PID value. In a system, the way
one can get to know the maximum limit to PID is

$ cat /proc/sys/kernel/pid_max

For me, I got the following value

32768

Hence, whenever the sequentially allocated PID reaches the maximum
value, it wraps to the lower limit(generally 300) and the next PID’s
allocated are the available ones starting from the lower limit.
The PID of the process, as the name suggests is its identifier.
Hence, most of the operations being done on a process needs the PID to
be mentioned.
We shall see in the following sections, how do we see display all the
processes with their PID’s and various operations that can be performed
on a process.

Listing Processes

At any moment, the Linux user can view the list of all the processes
which have been created and not terminated. There is a reason I don’t
say all the processes which are running, as once a process has been
started, it can be in any state, not necessarily running. More about
process states in further sections.
The Linux command used to view list of processes is ‘ps’ which means
‘process status’ (Some authors also interpret it as ‘process snapshot’).
To see what all this Linux command has to offer in detail, the best source is the man-page.
Let us try out running the command

$ ps
PID TTY TIME CMD
1779 pts/0 00:00:00 bash
2176 pts/0 00:00:00 ps

We just see two processes! Although we are sure there are other
processes being running as well. Well, the ‘ps’ command without any
options just lists the processes which are created by the current
terminal. The first one is the ‘bash’ which is the running linux shell
by the terminal and other is the process created by ‘ps’ command itself.
Getting to know from the ‘ps’ output, let’s walk through it column by column.

PID – The process Identifier which is ‘1779’ for ‘bash’ and ‘2176’ for ‘ps’.

TTY – stands for terminal-type and is the name of the console/terminal, the process is associated to.

Note: To determine the name of your terminal, use command ‘tty’.

TIME – The CPU time since the process has started. It is confusing
that why the CPU time for ‘bash’ process is ‘00:00:00’? This is because,
CPU time is the time for which the process is being executed by the
processor. However, when bash runs commands,lets say ‘ls command’ , a
child process ‘ls’ is spawned and whatever execution and cpu utilization
takes place, goes under the ‘ls’ process and not ‘bash’ Bash process is
just the parent process.

CMD – Command run to create the process.

There are many more options offered by the Linux command ‘ps’ to
explore the various processes being launched in a system. Kindly go back
to the referred man-page of ‘ps’ to get familiar with each and every available option. Here we shall be playing around with a few.

List All processes

$ps -e

If we look at the ‘ps’ man page, ‘-e’ option means “Select all
processes”, which implies now our list of displayed processes is not
limited to the ones by the current terminal. Instead, we’ll be able to
see all the currently running processes. Note, this option is identical
to ‘-A’.
Running the above command, we see a huge list of processes. Hence, to read them through reasonably, we pipe the output to ‘more’

UID – The User ID,is the username of the user, which owns the process.

PID – Already discusses the Process ID.

PPID – It is the Parent Process ID. Mostly, all the
processes, except the first one, have a parent process i.e. the process
which has created our relevant process. Therefore, every process retains
the PID of its parent process it calls as the parent process ID i.e.
PPID, in its process descriptors. We shall learn about process
descriptors in the next part of this article series. Meanwhile, one can
understand process descriptors as some related parameters describing the
process.

Therefore, we can conclude from here that, process can be represented
as a tree (hierarchical) structure in Linux. To view the complete tree
structure, linux provides a command – pstree.
It gives an interesting output, showcasing the first ‘init’ process and the other processes spawned out of it.
The process tree snapshot on my ubuntu system is:

More information about the ‘pstree’ command can be fetched from its man page

C – The CPU usage and scheduling information. The value is
incremented with every tick of the system clock, however degraded by the
scheduler by dividing it by two in every second. Therefore, A higher
value indicates CPU intensive process.

STIME – The start time of the process.

TTY – The terminal type associated with the process. If this
value is ‘?’, then it means the process is not associated with any
terminal. These are daemon process, which we shall be discussing in the
next section.

TIME – The cumulative CPU time since the process is running.

CMD – The command which launched the process.

We shall see more usage of ‘ps’ command in further sections, as we come to know about other dimensions of the linux processes.

Types of Processes

Although there is no standard classification of types of processes in
Linux. The segregation could be in interactive and non-interactive
processes, foreground and background processes or daemon or batch
processes. It can also be classified based on the status of the
processes such as as zombie processes. It is good enough if we
comprehend all these various terminologies in the linux system.

Interactive processes

An interactive process is one which needs user’s interaction while it
is active. For example, when we launch a vi-editor, it is an
interactive process. Another example could be the telnet command. Hence,
the interactive processes have to be associated to a terminal.
Under the umbrella of interactive processes, we have Foreground and Background processes.
Lets discuss them one by one :

Foreground Process

A process is a foreground process if it is in focus and can be given
input from the standard input. It blocks the shell until the foreground
process is complete. When we run our commands on the terminal, they
generally run as foreground processes. They block the terminal until it
is complete. Although most of our linux commands are quick enough for us
to even realise that.
Let us create our own program which sleeps for 10 seconds and then
ourselves experience what waiting for the foreground process feels like.
The C source looks like:

What do you experience? The terminal is blocked by the running
process, and not letting the user to do anything until the program is
complete.
To check another foreground blocking, open the geditor through the
terminal. The terminal launches the geditor, won’t let you input
anything to the terminal until we terminate the geditor.

Background Process

Background processes are ones, that are running, but in the
background, not taking any user input from the terminal. It doesn’t
block the terminal, and allows us to use the terminal irrespective of
the background process is complete or not. They key-character to make
any new process to be run in background is ‘&’.
How we use this character, is by suffixing it with the command, as in,

&

It is time, to run our ‘wait_process’ program to run as a background,
so that we can avoid the terminal to get blocked while the process is
sleeping.

$ ./wait_process &
[1] 2534
$

Whoahh! we get the command line back, to be able to use it and not to worry about the sleeping process.
To get these background and foreground handy, linux provides certain
commands to view what is running and also switch any foreground process
to background and vice versa.
One can use command ‘jobs’ to see what all is running associated with the terminal.

In the above exercise, we started two background processes – gedit
and the wait_process program. Hence, the command ‘jobs’ lists both of
them along with their PID’s.
If we want to switch the ‘geditor’ as a foreground process, use linux command ‘fg’

$fg

We see following after running the ‘fg’ command

gedit wait_process.c

And we again lose the command prompt. No points for guessing, now gedit
is running as a foreground process. However, to switch it back as a
background process, use key combinations ‘Ctrl + Z’ to suspend the
process and then run the linux command ‘bg’Note: When we suspend the process, it is not running and hence on resuming it will start with the same status, when it was stopped.

We got the command prompt back. We need to confirm if the geditor
process is still up and running. It can be done by again using the
command ‘jobs’

$jobs
[1]+ Running gedit wait_process.c &

Batch processes

These are the processes which are queued in and executed one by one
in FIFO (First In First Out). Batch processes are not associated with
any terminal instead given to the system to run, preferably when the system load
is low or at a specific time. Low system load is a relative term, and
hence it depends on the system and the type and requirements of the
batch processes.
There are two linux commands which are provided to create the batch processes:

Linux at command

The ‘at’ command is used to schedule a process at a specific
date/time. The time of today’s day is taken in the format HH::MM.
However, it also accepts phrases like ‘today’, ‘tomorrow’, ‘teatime’,
etc.
When we execute the ‘at’ command, we reach the ‘at’ prompt, where we
can queue all the tasks/commands/programs in an order. When we are done
with the queueing of the task, press key combination ‘Ctrl +d’. On
pressing the key combination ‘Ctrl + d’, we see ‘EOT’ displayed at the
standard output, following which we are back to our command prompt.
As an example,

Linux batch command

This command launches the process when the system load is low i.e. load average drops below 0.8, or the value set by atd
It’s usage example is similar to the command ‘at’ (explained above).

Daemon Processes

A daemon process in Linux is also one of its kind which runs in
background. However, what is different here is, daemon processes are not
associated to any terminal in any way. Therefore such processes don’t
take interact with the user. A widespread example of daemon process is a
server service of any kind. For example, if we consider a mail server,
it just have to listen to the relevant ports and respond with its
protocol routines on receiving packages. So, such kind of processes can
be run as daemon processes, independent of terminal and user
interaction.
Generally, when we code any program in C in Linux and execute it, the
terminal becomes its parent process. Hence, in order to develop a
daemon process service, the programmer needs to detach the process from
its parent process. This is done by killing its parent process, which
makes the process independent of the terminal, but controlled by its
grandparent process i..e init process.
We shall learn more about coding a daemon process in the second part of the article.

Zombie Processes

When a process terminates, there is a proper exit and cleanup routine
to be done by the developer of the program. If there is a bug, such
that cleanup could not happen appropriately, though the process has been
killed. Now, this process do occupy some memory, but will never be
scheduled by the scheduler as the process status is ‘terminated’.
Such processes are called zombie processes, which are killed but still exist.
Zombie processes are generally harmless if there are not much.
However, if we have a whole lot of zombie processes lingering, then it
could be a pain. Since, the PID’s still taken up by the zombie processes
are not available for re-allocation to new processes. Hence, soon the
system would be out of available PID’s if the zombie processes keep on
increasing, and no new process would be able be launch.

The init process

The init process is the one which initiates system processes taken from the script

/etc/inittab

and has been assigned PID = 1. So, these system processes include
setting up the user space, mounting file systems, set up everything to
get the system up and running. Worth mentioning, it is the init process
which is at the apex of the complete process tree. It is run as root,
and is the parent process of a user shell. It is the last sequence in
the booting and is the one, which launches and controls the shutdown.

Process States

Linux processes generally go through six major states, which are listed below:1. Running or Runnable ( R )
– A running state has a broader concept here. Running always does not
mean utilising the CPU. Even while a process is ready to run, the state
is running state.
Hence, there are two sub-states, when the process is queued in the
ready queue to run and when the process is actually being executed, it
is in the executing sub-state as has been scheduled by the scheduler.2. Stopped (T)
– If a running process receives a stop signal, it is moved to the
stopped state. A process can also be in stopped state if it has been
halted by a trace while debugging. .3. Uninterruptible sleep (D) – It is a sleeping state, process has been blocked. Mostly, process goes into an uninterruptible sleep during an IO operation.4. Interruptible sleep (S) – It is a sleeping state i.e. a blocking state where the process is waiting for an event to occur.5. Zombie/Defunct state(Z) – It is the process state in which process has been terminated but not reaped by its parent process.6. Dead (X) – A process never reaches this state, as as soon as it is dead, it is gone.Note: For BSD formats and when the stat keyword is used, additional characters may be displayed:

< high-priority (not nice to other users)

N low-priority (nice to other users)

L has pages locked into memory (for real-time and custom IO)

s is a session leader

l is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)

+ is in the foreground process group.

A simplified life cycle of a Linux process is illustrated through following diagram:

In order to check the current status of the active processes at a
moment on the linux system, we again use the ‘ps’ command but with a
different set of options.

$ps ax

In my system, I again get a huge list of processes. However, we are
interested in the status details of the listed processes, therefore here
is a snapshot of the output

Observer the status of the list of processes under column ‘STAT’.
Although, we have more confidence on Linux commands than anything, but
it is exciting to confirm from the process ‘ps’ (at the end). Its status
is ‘R+’ as we can read from the above output snapshot, which indicates
it is running and running as a foreground process. So true!

Real time snapshot of processes

The ‘ps’ command that we just discussed in the previous section,
evinces the active process list at a particular moment when the command
is executed. In many cases, it is a need of the hour to view dynamic
real time running of the processes. For such circumstances, linux comes
with the ‘top’ command.
The ‘top’ usage looks like

top -hv | -bcHisS -d delay -n iterations -p pid [, pid ...]

More details can be found at its man page
To see how and what all it provides, here is an output snapshot from my ubuntu system
The command:

Checking out running the linux ‘top’ command, we’ll observe that the
output keeps on changing with every moment, as the process dynamics keep
on changing at real time. This is possible, as the ‘top’ command is
running all this while and monitoring the running tasks. Hence, we need
to terminate the ‘top’ process to get back the command prompt.
Analysing and trying to understand the ‘top’ command output, let us go line by line.

average load on the system specifying three values for last one minute, last five minutes and last fifteen minutes respectively.

Coming to the next row of output, which looks like

Tasks: 136 total, 1 running, 135 sleeping, 0 stopped, 0 zombie

It helps us know how many total processes are there, which are 136 in
our case, which includes one in the running state, 135 are in the
sleeping state, zero in the stopped and zero zombie processes.
Now we know what all processes have been launched and are in which states.
Moving on,

0.0%st – It is the steal time i.e. percentage CPU time stolen from a
virtual machine i.e. time in an involuntary wait by a virtual machine,
while the hypervisor is servicing another process.

The next row in the output specifies physical memory usage

Mem: 507536k total, 498516k used, 9020k free, 10488k buffers

The information is in terms of total physical memory available, where
how much has been used, how much is free and how much is used for
buffers.
On similar lines, the next output states usage of swap space

Swap: 521212k total, 187996k used, 333216k free, 91572k cached

Then following rows list all the details of all the launched processes at real time. What each detail means is:

More details found in its man page
The ‘kill’ linux command sends a signal to the specified process. Which process to send the signal is specified by its PID.
There are standard numbers assigned to each set of signals. We can
get the information of what number is corresponding to which signal
using the same ‘kill’ command through ‘-l’ option.