Sunday, 16 March 2014

Linux Processes

In a very basic form, Linux process can be visualized as running
instance of a program. For example, just open a text editor on your
Linux box and a text editor process will be born.

First command (gedit &) opens gedit window while second ps command
(ps -aef | grep gedit) checks if there is an associated process(ps command gives running processes output of which is piped to grep command which searches the required token i.e gedit in iur case). In the
result you can see that there is a process associated with gedit.

You can see two entries corresponding to search of word gedit in processes. Each process in Linux is associated with a unique PID(process ID). You can see the output in the screenshot above number in 2nd column is the PID of the process. SO gedit has a pid of 2343. So whats 2039 ? Is is called PPID(Parents Process ID). We have run the gedit command/process in a terminal instance. Hence the terminal forms the parent of all the processes that we run via that terminal and gedit being one of them. So how do we verify that 2039 is indeed the PID of parent terminal process. To find the PID of the terminal you can simply type ps in your terminal.

You can see 2039 PID corresponding to bash process which is our terminal. I use bash but you may be using other shells like ksh, csh etc. To find which shell you are using you can refer to one of my earlier posts

How PIDs are assigned to process?

Under Unix, process IDs are usually
allocated on a sequential basis,
beginning at 0 and rising to a maximum
value which varies from system to
system. Once this limit is reached,
allocation restarts at zero and again
increases. However, for this and
subsequent passes any PIDs still
assigned to processes are skipped.

But there is a small update in above. For user processes PIDs to be assigned generally start from a number RESERVED_PIDS and go till PID_MAX_DEFAULT. PIDs from 1 to RESERVED_PIDS are reserved for kernel processes. Also know that these numbers can be configured.

Processes have priority based on which kernel context switches them. A
process can be pre-empted if a process with higher priority is ready to
be executed.

For example, if a process is waiting for a system resource like some
text from text file kept on disk then kernel can schedule a higher
priority process and get back to the waiting process when data is
available. This keeps the ball rolling for an operating system as a
whole and gives user a feeling that tasks are being run in parallel.
Processes can talk to other processes using Inter process
communication methods and can share data using techniques like shared
memory.

How processes are created in Linux?

In Linux, fork() is used to create new processes. These new processes
are called as child processes and each child process initially shares
all the segments like text, stack, heap etc until child tries to make
any change to stack or heap. In case of any change, a separate copy of
stack and heap segments are prepared for child so that changes remain
child specific. The text segment is read-only so both parent and child
share the same text segment. C fork function article explains more about fork().

Step By Step

The fork ( ) system call does the following in a UNIX system

Allocates slot in the process table for the new process.

Assigns a unique process id to the new process.

Make a copy of the process image of the parent, with the exception of shared memory.

Increases counters for any files owned by the parent, to reflect that an additional process now also owns these files.

Assigns the child process to a ready to run state.

Returns the Process ID number (PID) of the child to the parent process and a 0 value to the child process.

Note : All these works is done in Kernel space of parent process.

Above diagram shows the process table and how each entry in it points to a process image.

A Process image consists of

User Data

User program

System Stack(Kernel space).

Process control block (PCB) containing process attributes.

PCB looks like following

It has the process state(Eb. ready to run, sleeping, preempted etc), process number or PID which we talked about earlier, registers, PC, File descriptors etc.

Process States

From forking(birth) of a process to it's end(resources being freed up and entry removed from process table), a process goes through various states. Below diagram shows the state chart of a process in UNIX.

Threads in Linux

Threads in Linux are nothing but a flow of execution of the process. A
process containing multiple execution flows is known as multi-threaded
process.

For a non multi-threaded process there is only execution flow that is
the main execution flow and hence it is also known as single threaded
process. For Linux kernel , there is no concept of thread. Each thread
is viewed by kernel as a separate process but these processes are
somewhat different from other normal processes. I will explain the
difference in following paragraphs.

Threads are often mixed with the term Light Weight Processes or LWPs.
The reason dates back to those times when Linux supported threads at
user level only. This means that even a multi-threaded application was
viewed by kernel as a single process only. This posed big challenges for
the library that managed these user level threads because it had to
take care of cases that a thread execution did not hinder if any other
thread issued a blocking call.

Later on the implementation changed and processes were attached to
each thread so that kernel can take care of them. But, as discussed
earlier, Linux kernel does not see them as threads, each thread is
viewed as a process inside kernel. These processes are known as light
weight processes.

The main difference between a light weight process (LWP) and a normal
process is that LWPs share same address space and other resources like
open files etc. As some resources are shared so these processes are
considered to be light weight as compared to other normal processes and
hence the name light weight processes.

So, effectively we can say that threads and light weight processes are
same. It’s just that thread is a term that is used at user level while
light weight process is a term used at kernel level.

From implementation point of view, threads are created using functions
exposed by POSIX compliant pthread library in Linux. Internally, the
clone() function is used to create a normal as well as alight weight
process. This means that to create a normal process fork() is used that
further calls clone() with appropriate arguments while to create a
thread or LWP, a function from pthread library calls clone() with
relevant flags. So, the main difference is generated by using different
flags that can be passed to clone() function.

PIDs - User and Kernel View

In the kernel, each thread has it's own ID, called a PID
(although it would possibly make more sense to call this a TID) and they
also have a TGID (thread group ID) which is the PID of the thread that
started the whole process.

Simplistically, when a new process is created, it appears as a thread where both the PID and TGID are the same (new) number.

When a thread starts another thread, that started thread gets
its own PID (so the scheduler can schedule it independently) but it
inherits its TGID from the thread that created it.

That way, the kernel can happily schedule threads independent of what
process they belong to, while processes (thread group IDs) are reported
to you.

Fer example refer to following diagram

You can see that starting a new process gives you a new PID and a new TGID (both set to the same value), while starting a new thread gives you a new PID while maintaining the same TGID as the thread that started it.

PS : I picked up some basic knowledge from thegeekstuff and added some extra points and diagrams to make it more easily understandable.