35 Process Management in Linux

Mary Anitha Rajam

 

38.1 Introduction

 

In this module, we learn how process management is handled in the Linux operating system. We learn various aspects related to processes such as process identity, process environment and process context. We also learn the usage of the fork and clone system calls in Linux.

 

38.2 Process

 

A process is a program in execution. When a program is executed, a process is formed. In addition to the code that we see as a program, there are many other components for a process. A process comprises of text, data and stack regions. The text corresponds to the instructions. The data can be initialized or uninitialized global data. The stack is used when there is a function call. The parameters passed to the function and the local variables of the function are kept in the stack. When a process runs, a few registers are used. Some of them are the program counter (PC) and the stack pointer (SP). The program counter stores the address of the next to be executed for that process. The stack pointer points to the top of the stack for that process. There are a number of kernel data structures that are used during the execution of a process like the process table, file table, inode table and so on. All the above said components comprise a process. Linux uses a process model similar to other versions of UNIX. We first review the traditional UNIX process model and then learn the aspects of the Linux threading model.

 

38.3 Process Management

 

UNIX process management separates the creation of processes and the running of a new program into two distinct operations. The creation and running are implemented using two different system calls. The fork system call creates a new process. When a new process is created, a new program need not be run. The new process that is created can execute the same program as the parent. The newly created process continues to execute the next instruction from the point where the parent was executing at the time of creation of this new process.

 

The exec system call is used to run a new program. There are different variants of exec such as execvp, execve, execlp and execle. Executing a new program does not require a process to be created just before the program is run. Any process can run a program at any point of time. To execute a new program, the name of the object file is given as argument to the execve call.  The binary  object file is loaded  into the process’s address space. The new executable starts executing in the context of the existing process.

 

Under UNIX, a process encompasses all the information that the operating system must maintain to track the context of a single execution of a single program. Under Linux, process properties fall into three groups: the process’s identity, environment, and context. We learn the various aspects of these three groups in the following sections.

 

38.4 Process Identity Process id:

 

There are different ways in which a process is identified. On e of them is the process id. When a process is created, it is assigned a unique identifier called the process id. This process id is useful when the process needs to be referred to by a user or by a program. The process id is used to specify processes to the operating system. When an application makes a system call to signal, modify, or wait for another process, the process id is given as argument to the system calls to identify the process.

 

Credentials:

 

Each process must have an associated user ID and one or more group IDs that determine the process’s rights to access system resources and files.

 

Personality:

 

Personality is not traditionally found on UNIX systems, but present in Linux. The personality sets the process execution domain. Under Linux, each process has an associated personality identifier that can slightly modify the semantics of certain system calls. That is, a process with a particular personality can behave in a particular manner for certain system calls.

 

Namespace:

 

Each process has a specific view of the file-system, called namespace. For example, each process has a specific root directory. That is, the current root of one process can be different from the current root of another process. Similarly, the current directory of one process can be different from the current directory of another process. Each process can view a different set of mounted file systems. Most processes can share a common namespace and can operate on a shared file-system hierarchy (root directory, set of mounted file systems). When a parent process creates a child process, the child inherits the namespace from the parent process. Bu the child process can change its namespace. Therefore, processes and their children can have different namespaces.

 

38.5 Process Environment

 

The process’s environment is inherited from its parent when the process is created. The process environment is composed of two null-terminated vectors. One is the argument vector that lists the command-line arguments used to invoke the currently running program. When an object file is executed, the name of the object file followed by the arguments given to the executing program is given in the command line. This list of arguments along with the name of the object file forms one part of the environment of the process.

 

The second vector is the environment vector. The environment vector is a list of “NAME=VALUE” pairs that associates named environment variables with arbitrary textual values. For example, TERM is an environment variable which is used to name the type of terminal connected to a user’s login session.

 

Both the argument vector and environment vector are not altered when a new process is created. The created child inherits the environment of the parent. When a new program is invoked, a new environment can be set up. On calling execle() or execve(), a process can supply the environment for the new program as an argument to the system call The kernel passes these environment variables to the  next program, replacing the process’s current environment. The environment-variable mechanism custom-tailors the operating system on a per-process basis.

 

38.6 Process Context

 

The state of a running program at any point in time is called the context of a process. This state of the process keeps on changing and hence, the context of the process also keeps on changing. The context of the process includes scheduling contexts, accounting, file table, file-system context, signal-handler table, virtual memory context and so on. We now see what each of the above refers to.

 

Scheduling context:

 

This is the information that the scheduler needs to suspend and restart a process. We know that when a scheduler chooses a new process to run and a context switch is made, the context  of  the  old  process  is saved  and  the  context  of  the  new  process  is  loaded.  The scheduling context includes the saved copies of all process’s registers, information about the scheduling priority, information about any outstanding signals waiting to be delivered to the process and the kernel stack used by the process while executing kernel code.

 

Accounting information:

 

This is the information about resources currently consumed by each process and the information about the total resources consumed by the process in its lifetime so far. The resources consumed may be information about the CPU usage time, the amount of time the process spent in kernel mode, the amount of time the process spent in user mode and so on.

 

File Table:

 

The file table has an array of pointers to kernel file structures. Whenever a file is created or a file is opened, the file descriptor is returned. When making file I/O system calls (like read, write and so on), processes refer to files using this file descriptor. The kernel uses the file descriptor as an index into the file table. The file table has entries that point to other data structures that help in accessing file contents.

 

File-system context:

 

The file table lists the existing open files, whereas, the file-system context applies to requests to open new files. Process’s root directory, current working directory and namespace (default directories to be used for new file searches) are stored in the file-system context.

 

Signal-handler table:

 

Signals are used to notify events to processes. Signals may be sent from one process to another or from the kernel to a process. When a signal is sent to a process, the process can choose to ignore the signal or can invoke a routine in the process’s address space or can let the default action take place. The default action is to terminate the process. The signal-handler table defines the action to take in response to a specific signal.

 

Virtual-memory context:

 

The virtual-memory context describes the full contents of the process’s private address space. The address space of a process comprises of the text, data and stack regions of the process.

 

38.7 Processes and Threads

 

We now look at the difference between processes and threads as seen by Linux. Linux does not distinguish between a thread and a process. Linux uses the term task to refer to a flow of control within a program. The fork call duplicates a process without loading a new executable image. There is another system call called ‘clone’ which behaves similar to fork except that it accepts as arguments a set of flags that dictate what resources are shared between the parent and the child. The flags that can be given as argument to clone include CLONE_FS, CLONE_FILES, CLONE_SIGHAND and CLONE_VM. These arguments decide which resources are shared between the parent and the child.

 

For example, if CLONE_FS is set,  then it means that file-system information (current working directory) is shared between the parent and the child. If CLONE_VM is set, the same memory space is shared by the parent and the child. If CLONE_SIGHAND is set, signal handlers are shared. If CLONE_FILES is set, the set of open files is shared between the parent and the child. Thus, if no flag is set for the clone system call, then it behaves similar to fork. separate data structures. There is one process data structure (named as struct task_struct) thatThe lack of distinction between a process and a thread is because Linux does not hold a process’s entire context within the main process data structure. In the case of UNIX, there is process data structure that holds all the details related to a process. But in the case of Linux, the operating system holds the context within independent subcontexts. A process’s file-system context, file-descriptor table, signal-handler table and virtual-memory context are all held in has a pointer to these structures. There is a reference count associated with each subcontext.

 

Any number of processes can easily share a subcontext by pointing to the subcontext and incrementing a reference count. The arguments to the clone() call tell it which subcontexts to copy and which to share. If the argument flags are set, then the corresponding subcontexts are shared and if the argument flags are not set, the corresponding subcontexts are copied. Any new process is always given a new identity and a new scheduling context. According to the arguments passed, the kernel may share the subcontext data structures or make a copy of the subcontext data structures. The fork() system call is a special case of the clone() which copies all subcontexts, shares none.

 

30.2 Summary

 

In this module, we learnt different properties of a process such as the process’s identity, environment and context. We also learnt the difference between the fork and clone system calls used in Linux.

 

 

References

  1. Abraham Silberschatz, Peter B. Galvin, Greg Gagne, “Operating System Concepts”, Sixth Edition, John Wiley & Sons Inc., 2003.