Linux Process & Process Management

Introduction

A Linux server, like any other computer you may be familiar with, runs applications. To the computer, these are considered “processes”.

While Linux will handle the low-level, behind-the-scenes management in a process’s lifecycle, you will need a way of interacting with the operating system to manage it from a higher-level.

In this post, we will discuss some simple aspects of process management. Linux provides an abundant collection of tools for this purpose.

How To View Running Processes in Linux

The easiest way to find out what processes are running on your server is to run the top command:

# top

top – 07:45:38 up 9:38, 1 user, load average: 0.00, 0.01, 0.05
Tasks: 89 total, 2 running, 87 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 1014976 total, 517484 free, 102656 used, 394836 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 728192 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 128092 6680 3932 S 0.0 0.7 0:02.84 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.03 ksoftirqd/0
6 root 20 0 0 0 0 S 0.0 0.0 0:00.32 kworker/u30:0
7 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
————- output truncated ——————-

The top chunk of information gives system statistics, such as system load and the total number of tasks.

You can easily see that there is 2 running process, and 87 processes are sleeping (aka idle/not using CPU resources).

The bottom portion has the running processes and their usage statistics.

Though top gives you an interface to view running processes based on ncurses. This tool is not always flexible enough to adequately cover all scenarios. A powerful command called ps is often the answer to these problems.

List processes with ps command

When called without arguments, the output can be a bit lack-luster:

# ps

PID TTY TIME CMD
3125 pts/0 00:00:00 sudo
3126 pts/0 00:00:00 su
3127 pts/0 00:00:00 bash
3150 pts/0 00:00:00 ps

This output shows all of the processes associated with the current user and terminal session. This makes sense because we are only running bash, sudo and ps with this terminal currently.

We can run ps command with different options to get a complete picture of the processes on this system.

BSD style – The options in bsd style syntax are not preceded by a dash.

# ps aux

UNIX/LINUX style – The options in Linux style syntax are preceded by a dash as usual.

# ps -ef

It is okay to mix both the syntax styles on Linux systems. For example “ps au -x”. In this post, we’re using both style syntaxes.

# ps aux

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2 0.0 0.0 0 0 ? S Jan11 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S Jan11 0:00 [ksoftirqd/0]
root 6 0.0 0.0 0 0 ? S Jan11 0:00 [kworker/u30:0]
root 7 0.0 0.0 0 0 ? S Jan11 0:00 [migration/0]
root 8 0.0 0.0 0 0 ? S Jan11 0:00 [rcu_bh]
root 9 0.0 0.0 0 0 ? R Jan11 0:00 [rcu_sched]
root 10 0.0 0.0 0 0 ? S Jan11 0:00

————- output truncated ——————-

These options tell ps to show processes owned by all users (regardless of their terminal association) in a user-friendly format.

To see a tree view, where hierarchal relationships are illustrated, we can run the command with these options:

# ps axjf

PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
0 2 0 0 ? -1 S 0 0:00 [kthreadd]
2 3 0 0 ? -1 S 0 0:00 \_ [ksoftirqd/0]
2 6 0 0 ? -1 S 0 0:00 \_ [kworker/u30:0]
2 7 0 0 ? -1 S 0 0:00 \_ [migration/0]
2 8 0 0 ? -1 S 0 0:00 \_ [rcu_bh]
2 9 0 0 ? -1 R 0 0:00 \_ 1 2024 2024 2024 ? -1 Ss 0 0:00 /usr/sbin/sshd
2024 3100 3100 3100 ? -1 Ss 0 0:00 \_ sshd: ajoy[priv]
3100 3103 3100 3100 ? -1 S 1000 0:00 \_ sshd: ajoy@pts/0
3103 3104 3104 3104 pts/0 3153 Ss 1000 0:00 \_ -bash
3104 3125 3125 3104 pts/0 3153 S 0 0:00 \_ sudo su –
3125 3126 3125 3104 pts/0 3153 S 0 0:00 \_ su –
3126 3127 3127 3104 pts/0 3153 S 0 0:00 \_ -bash
3127 3153 3153 3104 pts/0 3153 R+ 0 0:00 \_ ps axjf
————- output truncated ——————-

As you can see, the process sshd is shown to be a parent of the processes like bash, su, sudo, and ps ajx itself.

List the Process based on the UID and Commands (ps -u, ps -C)

Use -u option to displays the process that belongs to a specific username. When you have multiple usernames, separate them using a comma. The example below displays all the process that are owned by user wwwrun, or postfix.

# ps -f -u wwwrun,postfix

UID PID PPID C STIME TTY TIME CMD
postfix 7457 7435 0 Mar09 ? 00:00:00 qmgr -l -t fifo -u
wwwrun 7495 7491 0 Mar09 ? 00:00:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun 7496 7491 0 Mar09 ? 00:00:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun 7497 7491 0 Mar09 ? 00:00:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun 7498 7491 0 Mar09 ? 00:00:00 /usr/sbin/httpd2-prefork -f /etc/apac

The following example shows that all the processes which have tatad.pl in its command execution.

# ps -f -C tatad.pl

UID PID PPID C STIME TTY TIME CMD
root 9576 1 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9577 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9579 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9580 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9581 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bi

The following method is used to get a list of processes with a particular PPID.

#ps -f –ppid 9576

UID PID PPID C STIME TTY TIME CMD
root 9577 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9579 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9580 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9581 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin

List Processes in a Hierarchy (ps –forest)

The example below displays the process Id and commands in a hierarchy. –forest is an argument to ps command which displays ASCII art of process tree. From this tree, we can identify which is the parent process and the child processes it forked in a recursive manner.

#ps -e -o pid,args –forest
468 \_ sshd: root@pts/7
514 | \_ -bash
17484 \_ sshd: root@pts/11
17513 | \_ -bash
24004 | \_ vi ./790310__11117/journal
15513 \_ sshd: root@pts/1
15522 | \_ -bash
4280 \_ sshd: root@pts/5
4302 | \_ -bash

List elapsed wall time for processes (ps -o pid,etime=)

If you want the get the elapsed time for the processes which are currently running ps command provides etime which provides the elapsed time since the process was started, in the form [[dd-]hh:]mm: , ss.

The below command displays the elapsed time for the process IDs 1 (init) and process id 29675.

For example “10-22:13:29? in the output represents the process init is running for 10days, 22hours,13 minutes and 29seconds. Since init process starts during the system startup, this time will be same as the output of the ‘uptime’ command.

# ps -p 1,29675 -o pid,etime=
PID
1 10-22:13:29
29675 1-02:58:46

List all threads for a particular process (ps -L)

You can get a list of threads for the processes. When a process hangs, we might need to identify the list of threads running for a particular process as shown below.

# ps -C java -L -o pid,tid,pcpu,state,nlwp,args

PID TID %CPU S NLWP COMMAND
16992 16992 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16993 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16994 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16995 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16996 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16997 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16998 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16999 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 17000 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_l

Finding memory Leak (ps –sort pmem)

A memory leak, technically, is an ever-increasing usage of memory by an application.

With common desktop applications, this may go unnoticed, because a process typically frees any memory it has used when you close the application.

However, In the client/server model, memory leakage is a serious issue, because applications are expected to be available 24×7. Applications must not continue to increase their memory usage indefinite because this can cause serious issues. To monitor such memory leaks, we can use the following commands.

# ps aux –sort pmem

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 1520 508 ? S 2005 1:27 init
inst 1309 0.0 0.4 344308 33048 ? S 2005 1:55 agnt (idle)
inst 2919 0.0 0.4 345580 37368 ? S 2005 20:02 agnt (idle)
inst 24594 0.0 0.4 345068 36960 ? S 2005 15:45 agnt (idle)

In the above ps command, –sort option outputs the highest %MEM at the bottom. Just note down the PID for the highest %MEM usage. Then use ps command to view all the details about this process id, and monitor the change over time. You had to manually repeat ir or put it as a cron to a file.

The VSZ number is useless if what you are interested in is memory consumption. VSZ measures how much of the process’s virtual memory space has been marked by the process of memory that should be mapped by the operating system if the process happens to touch it. But it has nothing to do with whether that memory has actually been touched and used. VSZ is an internal detail about how a process does memory allocation — how big a chunk of unused memory it grabs at once. Look at RSS for the count of memory pages it has actually started using

RSS:

Resident set size = the non-swapped physical memory that a task has used; Resident Set currently in physical memory including Code, Data, Stack

VSZ:

Virtual memory usage of entire process = VmLib + VmExe + VmData + VmStk

In other words,
a) VSZ *includes* RSS
b) “ps -aux” alone isn’t enough to tell you if a process is thrashing (although, if your system *is* thrashing, “ps -aux” will help you identify the processes experiencing the biggest hits).

# ps ev –pid=27645

PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
27645 ? S 3:01 0 25 1231262 1183976 14.4 /TaskServer/bin/./wrapper-linux-x86-32

# ps ev –pid=27645

PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
27645 ? S 3:01 0 25 1231262 1183976 14.4 /TaskServer/bin/./wrapper-linux-x86-32

Note: In the above output, if RSS (resident set size, in KB) increases over time (so would %MEM), it may indicate a memory leak in the application.

The following command displays all the process owned by Linux username: oracle.

# ps U oracle

PID TTY STAT TIME COMMAND
5014 ? Ss 0:01 /oracle/bin/tnslsnr
7124 ? Ss 0:00 ora_q002_med
8206 ? Ss 0:00 ora_cjq0_med
8852 ? Ss 0:01 ora_pmon_med

Following command displays all the process owned by the current user.

# ps U $USER

PID TTY STAT TIME COMMAND
10329 ? S 0:00 sshd: ajoy@pts/1,pts/2
10330 pts/1 Ss 0:00 -bash
10354 pts/2 Ss+ 0:00 -bash

The ps command can be configured to show a selected list of columns only. There are a large number of columns to to show and the full list is available in the man pages.

The following command shows only the pid, username, cpu, memory and command columns.

# ps -e -o pid,uname,pcpu,pmem,comm

PID USER %CPU %MEM COMMAND
1 root 0.0 0.6 systemd
2 root 0.0 0.0 kthreadd
3 root 0.0 0.0 ksoftirqd/0
6 root 0.0 0.0 kworker/u30:0
7 root 0.0 0.0 migration/0

The ps command is quite flexible and it is possible to rename the column labels as shown below:

# ps -e -o pid,uname=USERNAME,pcpu=CPU_USAGE,pmem,comm

PID USERNAME CPU_USAGE %MEM COMMAND
1 root 0.0 0.6 systemd
2 root 0.0 0.0 kthreadd
3 root 0.0 0.0 ksoftirqd/0
6 root 0.0 0.0 kworker/u30:0
7 root 0.0 0.0 migration/0

Combined with the watch command we can turn ps into a realtime process reporter. Simple example is like this

# watch -n 1 ‘ps -e -o pid,uname,cmd,pmem,pcpu –sort=-pmem,-pcpu | head -15’

Every 1.0s: ps -e -o pid,uname,cmd,pmem,pcpu –… Sun Dec 1 18:16:08 2009

PID USER CMD %MEM %CPU
3800 1000 /opt/google/chrome/chrome – 4.6 1.4
7492 1000 /opt/google/chrome/chrome – 2.7 1.4
3150 1000 /opt/google/chrome/chrome 2.7 2.5
3824 1000 /opt/google/chrome/chrome – 2.6 0.6
3936 1000 /opt/google/chrome/chrome – 2.4 1.6
2936 1000 /usr/bin/plasma-desktop 2.3 0.2
9666 1000 /opt/google/chrome/chrome – 2.1 0.8

Process IDs:

In Linux and Unix-like systems, each process is assigned a process ID, or PID. This is how the operating system identifies and keeps track of processes.

# pgrep bash

3104
3127

The first process spawned at boot, called init, is given the PID of “1”.

# pgrep init

1

This process is then responsible for spawning every other process on the system. The later processes are given larger PID numbers.

A process’s parent is the process that was responsible for spawning it. If a process’s parent is killed, then the child processes also die. The parent process’s PID is referred to as the PPID.

Process States:

Here are the different values that the s, stat and state output specifiers (header “STAT” or “S”) will display to describe the state of a process:

Running

The process is either running (it is the current process in the system) or it is ready to run (it is waiting to be assigned to one of the system’s CPUs).

Waiting

The process is waiting for an event or for a resource. Linux differentiates between two types of waiting process; interruptible and uninterruptible. Interruptible waiting processes can be interrupted by signals whereas uninterruptible waiting processes are waiting directly on hardware conditions and cannot be interrupted under any circumstances.

Stopped

The process has been stopped, usually by receiving a signal. A process that is being debugged can be in a stopped state.
Zombie
This is a halted process which, for some reason, still has a task_struct data structure in the task vector. It is what it sounds like, a dead process.

D uninterruptible sleep (usually IO)
R running or runnable (on run queue)
S interruptible sleep (waiting for an event to complete)
T stopped, either by a job control signal or because it is being traced.
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z defunct (“zombie”) process, terminated but not reaped by its parent.

For BSD formats and when the stat keyword is used, additional characters may be displayed:
< high-priority (not nice to other users)
N low-priority (nice to other users)
L has pages locked into memory (for real-time and custom IO)
s is a session leader
l is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)
+ is in the foreground process group

Processes Signals:

All processes in Linux respond to signals. Signals are an os-level way of telling programs to terminate or modify their behavior. The most common way of passing signals to a program is with the kill command. The default functionality of this utility is to attempt to kill a process:

# kill <PID of process>

This sends the TERM signal to the process. The TERM signal tells the process to please terminate. This allows the program to perform clean-up operations and exit smoothly.

If the program is misbehaving and does not exit when given the TERM signal, we can escalate the signal by passing the KILL signal:

# kill -KILL <PID of process>

This is a special signal that is not sent to the program.

Instead, it is given to the operating system kernel, which shuts down the process. This is used to bypass programs that ignore the signals sent to them.

Each signal has an associated number that can be passed instead of the name. For instance, You can pass “-15” instead of “-TERM”, and “-9” instead of “-KILL”.

Signals are not only used to shut down programs. They can also be used to perform other actions.

For instance, many daemons will restart when they are given the HUP or hang-up signal. Apache is one program that operates like this.

# kill -HUP <PID of httpd>

The above command will cause Apache to reload its configuration file and resume serving content.

You can list all of the signals that are possible to send with the kill by typing:

# kill -l

1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP
6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL 10) SIGUSR1
11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM
16) SIGSTKFLT 17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP
21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ
26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO 30) SIGPWR
31) SIGSYS 34) SIGRTMIN 35) SIGRTMIN+1 36) SIGRTMIN+2 37) SIGRTMIN+3
38) SIGRTMIN+4 39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 42) SIGRTMIN+8
43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7
58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2
63) SIGRTMAX-1 64) SIGRTMAX

The conventional way of sending signals is through the use of PIDs, there are also methods of doing this with regular process names.

The pkill command works in almost exactly the same way as kill, but it operates on a process name instead:

# pkill -9 ping

is equivalent to

# kill -9 `pgrep ping`

If you would like to send a signal to every instance of a certain process, you can use the killall command:

# killall firefox

Process Priorities

Some processes might be considered mission critical for your situation, while others may be executed whenever there might be leftover resources.You will want to adjust which processes are given priority in a server environment. Linux controls priority through a value called niceness.

High priority tasks are considered less nice, because they don’t share resources as well. Low priority processes, on the other hand, are nice because they insist on only taking minimal resources.

When we ran top at the beginning of this blog, there was a column marked “NI”. This is the nice value of the process:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 128092 6680 3932 S 0.0 0.7 0:02.84 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0

Nice values can range between “-19/-20” (highest priority) and “19/20” (lowest priority) depending on the system.

To run a program with a certain nice value, we can use the nice command:

# nice -n 15 <command>

This only works while executing a new program.

To alter the nice value of a program that is already executing, we use a tool called renice:

# renice 0 <PID of process>