Linux Process & Process Management

Introduction

A Linux server, like any other computer you may be familiar with, runs applications. To the computer, these are considered “processes”.

While Linux will handle the low-level, behind-the-scenes management in a process’s lifecycle, you will need a way of interacting with the operating system to manage it from a higher-level.

In this post, we will discuss some simple aspects of process management. Linux provides an abundant collection of tools for this purpose.

How To View Running Processes in Linux

The easiest way to find out what processes are running on your server is to run the top command:

# top

top – 07:45:38 up 9:38, 1 user, load average: 0.00, 0.01, 0.05
Tasks: 89 total, 2 running, 87 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 1014976 total, 517484 free, 102656 used, 394836 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 728192 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 128092 6680 3932 S 0.0 0.7 0:02.84 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.03 ksoftirqd/0
6 root 20 0 0 0 0 S 0.0 0.0 0:00.32 kworker/u30:0
7 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
————- output truncated ——————-

The top chunk of information gives system statistics, such as system load and the total number of tasks.

You can easily see that there is 2 running process, and 87 processes are sleeping (aka idle/not using CPU resources).

The bottom portion has the running processes and their usage statistics.

Though top gives you an interface to view running processes based on ncurses. This tool is not always flexible enough to adequately cover all scenarios. A powerful command called ps is often the answer to these problems.

List processes with ps command

When called without arguments, the output can be a bit lack-luster:

# ps

PID TTY TIME CMD
3125 pts/0 00:00:00 sudo
3126 pts/0 00:00:00 su
3127 pts/0 00:00:00 bash
3150 pts/0 00:00:00 ps

This output shows all of the processes associated with the current user and terminal session. This makes sense because we are only running bash, sudo and ps with this terminal currently.

We can run ps command with different options to get a complete picture of the processes on this system.

BSD style – The options in bsd style syntax are not preceded by a dash.

# ps aux

UNIX/LINUX style – The options in Linux style syntax are preceded by a dash as usual.

# ps -ef

It is okay to mix both the syntax styles on Linux systems. For example “ps au -x”. In this post, we’re using both style syntaxes.

# ps aux

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2 0.0 0.0 0 0 ? S Jan11 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S Jan11 0:00 [ksoftirqd/0]
root 6 0.0 0.0 0 0 ? S Jan11 0:00 [kworker/u30:0]
root 7 0.0 0.0 0 0 ? S Jan11 0:00 [migration/0]
root 8 0.0 0.0 0 0 ? S Jan11 0:00 [rcu_bh]
root 9 0.0 0.0 0 0 ? R Jan11 0:00 [rcu_sched]
root 10 0.0 0.0 0 0 ? S Jan11 0:00

————- output truncated ——————-

These options tell ps to show processes owned by all users (regardless of their terminal association) in a user-friendly format.

To see a tree view, where hierarchal relationships are illustrated, we can run the command with these options:

# ps axjf

PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
0 2 0 0 ? -1 S 0 0:00 [kthreadd]
2 3 0 0 ? -1 S 0 0:00 \_ [ksoftirqd/0]
2 6 0 0 ? -1 S 0 0:00 \_ [kworker/u30:0]
2 7 0 0 ? -1 S 0 0:00 \_ [migration/0]
2 8 0 0 ? -1 S 0 0:00 \_ [rcu_bh]
2 9 0 0 ? -1 R 0 0:00 \_ 1 2024 2024 2024 ? -1 Ss 0 0:00 /usr/sbin/sshd
2024 3100 3100 3100 ? -1 Ss 0 0:00 \_ sshd: ajoy[priv]
3100 3103 3100 3100 ? -1 S 1000 0:00 \_ sshd: ajoy@pts/0
3103 3104 3104 3104 pts/0 3153 Ss 1000 0:00 \_ -bash
3104 3125 3125 3104 pts/0 3153 S 0 0:00 \_ sudo su –
3125 3126 3125 3104 pts/0 3153 S 0 0:00 \_ su –
3126 3127 3127 3104 pts/0 3153 S 0 0:00 \_ -bash
3127 3153 3153 3104 pts/0 3153 R+ 0 0:00 \_ ps axjf
————- output truncated ——————-

As you can see, the process sshd is shown to be a parent of the processes like bash, su, sudo, and ps ajx itself.

List the Process based on the UID and Commands (ps -u, ps -C)

Use -u option to displays the process that belongs to a specific username. When you have multiple usernames, separate them using a comma. The example below displays all the process that are owned by user wwwrun, or postfix.

# ps -f -u wwwrun,postfix

UID PID PPID C STIME TTY TIME CMD
postfix 7457 7435 0 Mar09 ? 00:00:00 qmgr -l -t fifo -u
wwwrun 7495 7491 0 Mar09 ? 00:00:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun 7496 7491 0 Mar09 ? 00:00:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun 7497 7491 0 Mar09 ? 00:00:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun 7498 7491 0 Mar09 ? 00:00:00 /usr/sbin/httpd2-prefork -f /etc/apac

The following example shows that all the processes which have tatad.pl in its command execution.

# ps -f -C tatad.pl

UID PID PPID C STIME TTY TIME CMD
root 9576 1 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9577 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9579 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9580 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9581 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bi

The following method is used to get a list of processes with a particular PPID.

#ps -f –ppid 9576

UID PID PPID C STIME TTY TIME CMD
root 9577 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9579 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9580 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9581 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin

List Processes in a Hierarchy (ps –forest)

The example below displays the process Id and commands in a hierarchy. –forest is an argument to ps command which displays ASCII art of process tree. From this tree, we can identify which is the parent process and the child processes it forked in a recursive manner.

#ps -e -o pid,args –forest
468 \_ sshd: root@pts/7
514 | \_ -bash
17484 \_ sshd: root@pts/11
17513 | \_ -bash
24004 | \_ vi ./790310__11117/journal
15513 \_ sshd: root@pts/1
15522 | \_ -bash
4280 \_ sshd: root@pts/5
4302 | \_ -bash

List elapsed wall time for processes (ps -o pid,etime=)

If you want the get the elapsed time for the processes which are currently running ps command provides etime which provides the elapsed time since the process was started, in the form [[dd-]hh:]mm: , ss.

The below command displays the elapsed time for the process IDs 1 (init) and process id 29675.

For example “10-22:13:29? in the output represents the process init is running for 10days, 22hours,13 minutes and 29seconds. Since init process starts during the system startup, this time will be same as the output of the ‘uptime’ command.

# ps -p 1,29675 -o pid,etime=
PID
1 10-22:13:29
29675 1-02:58:46

List all threads for a particular process (ps -L)

You can get a list of threads for the processes. When a process hangs, we might need to identify the list of threads running for a particular process as shown below.

# ps -C java -L -o pid,tid,pcpu,state,nlwp,args

PID TID %CPU S NLWP COMMAND
16992 16992 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16993 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16994 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16995 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16996 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16997 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16998 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16999 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 17000 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_l

Finding memory Leak (ps –sort pmem)

A memory leak, technically, is an ever-increasing usage of memory by an application.

With common desktop applications, this may go unnoticed, because a process typically frees any memory it has used when you close the application.

However, In the client/server model, memory leakage is a serious issue, because applications are expected to be available 24×7. Applications must not continue to increase their memory usage indefinite because this can cause serious issues. To monitor such memory leaks, we can use the following commands.

# ps aux –sort pmem

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 1520 508 ? S 2005 1:27 init
inst 1309 0.0 0.4 344308 33048 ? S 2005 1:55 agnt (idle)
inst 2919 0.0 0.4 345580 37368 ? S 2005 20:02 agnt (idle)
inst 24594 0.0 0.4 345068 36960 ? S 2005 15:45 agnt (idle)

In the above ps command, –sort option outputs the highest %MEM at the bottom. Just note down the PID for the highest %MEM usage. Then use ps command to view all the details about this process id, and monitor the change over time. You had to manually repeat ir or put it as a cron to a file.

The VSZ number is useless if what you are interested in is memory consumption. VSZ measures how much of the process’s virtual memory space has been marked by the process of memory that should be mapped by the operating system if the process happens to touch it. But it has nothing to do with whether that memory has actually been touched and used. VSZ is an internal detail about how a process does memory allocation — how big a chunk of unused memory it grabs at once. Look at RSS for the count of memory pages it has actually started using

RSS:

Resident set size = the non-swapped physical memory that a task has used; Resident Set currently in physical memory including Code, Data, Stack

VSZ:

Virtual memory usage of entire process = VmLib + VmExe + VmData + VmStk

In other words,
a) VSZ *includes* RSS
b) “ps -aux” alone isn’t enough to tell you if a process is thrashing (although, if your system *is* thrashing, “ps -aux” will help you identify the processes experiencing the biggest hits).

# ps ev –pid=27645

PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
27645 ? S 3:01 0 25 1231262 1183976 14.4 /TaskServer/bin/./wrapper-linux-x86-32

# ps ev –pid=27645

PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
27645 ? S 3:01 0 25 1231262 1183976 14.4 /TaskServer/bin/./wrapper-linux-x86-32

Note: In the above output, if RSS (resident set size, in KB) increases over time (so would %MEM), it may indicate a memory leak in the application.

The following command displays all the process owned by Linux username: oracle.

# ps U oracle

PID TTY STAT TIME COMMAND
5014 ? Ss 0:01 /oracle/bin/tnslsnr
7124 ? Ss 0:00 ora_q002_med
8206 ? Ss 0:00 ora_cjq0_med
8852 ? Ss 0:01 ora_pmon_med

Following command displays all the process owned by the current user.

# ps U $USER

PID TTY STAT TIME COMMAND
10329 ? S 0:00 sshd: ajoy@pts/1,pts/2
10330 pts/1 Ss 0:00 -bash
10354 pts/2 Ss+ 0:00 -bash

The ps command can be configured to show a selected list of columns only. There are a large number of columns to to show and the full list is available in the man pages.

The following command shows only the pid, username, cpu, memory and command columns.

# ps -e -o pid,uname,pcpu,pmem,comm

PID USER %CPU %MEM COMMAND
1 root 0.0 0.6 systemd
2 root 0.0 0.0 kthreadd
3 root 0.0 0.0 ksoftirqd/0
6 root 0.0 0.0 kworker/u30:0
7 root 0.0 0.0 migration/0

The ps command is quite flexible and it is possible to rename the column labels as shown below:

# ps -e -o pid,uname=USERNAME,pcpu=CPU_USAGE,pmem,comm

PID USERNAME CPU_USAGE %MEM COMMAND
1 root 0.0 0.6 systemd
2 root 0.0 0.0 kthreadd
3 root 0.0 0.0 ksoftirqd/0
6 root 0.0 0.0 kworker/u30:0
7 root 0.0 0.0 migration/0

Combined with the watch command we can turn ps into a realtime process reporter. Simple example is like this

# watch -n 1 ‘ps -e -o pid,uname,cmd,pmem,pcpu –sort=-pmem,-pcpu | head -15’

Every 1.0s: ps -e -o pid,uname,cmd,pmem,pcpu –… Sun Dec 1 18:16:08 2009

PID USER CMD %MEM %CPU
3800 1000 /opt/google/chrome/chrome – 4.6 1.4
7492 1000 /opt/google/chrome/chrome – 2.7 1.4
3150 1000 /opt/google/chrome/chrome 2.7 2.5
3824 1000 /opt/google/chrome/chrome – 2.6 0.6
3936 1000 /opt/google/chrome/chrome – 2.4 1.6
2936 1000 /usr/bin/plasma-desktop 2.3 0.2
9666 1000 /opt/google/chrome/chrome – 2.1 0.8

Process IDs:

In Linux and Unix-like systems, each process is assigned a process ID, or PID. This is how the operating system identifies and keeps track of processes.

# pgrep bash

3104
3127

The first process spawned at boot, called init, is given the PID of “1”.

# pgrep init

1

This process is then responsible for spawning every other process on the system. The later processes are given larger PID numbers.

A process’s parent is the process that was responsible for spawning it. If a process’s parent is killed, then the child processes also die. The parent process’s PID is referred to as the PPID.

Process States:

Here are the different values that the s, stat and state output specifiers (header “STAT” or “S”) will display to describe the state of a process:

Running

The process is either running (it is the current process in the system) or it is ready to run (it is waiting to be assigned to one of the system’s CPUs).

Waiting

The process is waiting for an event or for a resource. Linux differentiates between two types of waiting process; interruptible and uninterruptible. Interruptible waiting processes can be interrupted by signals whereas uninterruptible waiting processes are waiting directly on hardware conditions and cannot be interrupted under any circumstances.

Stopped

The process has been stopped, usually by receiving a signal. A process that is being debugged can be in a stopped state.
Zombie
This is a halted process which, for some reason, still has a task_struct data structure in the task vector. It is what it sounds like, a dead process.

D uninterruptible sleep (usually IO)
R running or runnable (on run queue)
S interruptible sleep (waiting for an event to complete)
T stopped, either by a job control signal or because it is being traced.
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z defunct (“zombie”) process, terminated but not reaped by its parent.

For BSD formats and when the stat keyword is used, additional characters may be displayed:
< high-priority (not nice to other users)
N low-priority (nice to other users)
L has pages locked into memory (for real-time and custom IO)
s is a session leader
l is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)
+ is in the foreground process group

Processes Signals:

All processes in Linux respond to signals. Signals are an os-level way of telling programs to terminate or modify their behavior. The most common way of passing signals to a program is with the kill command. The default functionality of this utility is to attempt to kill a process:

# kill <PID of process>

This sends the TERM signal to the process. The TERM signal tells the process to please terminate. This allows the program to perform clean-up operations and exit smoothly.

If the program is misbehaving and does not exit when given the TERM signal, we can escalate the signal by passing the KILL signal:

# kill -KILL <PID of process>

This is a special signal that is not sent to the program.

Instead, it is given to the operating system kernel, which shuts down the process. This is used to bypass programs that ignore the signals sent to them.

Each signal has an associated number that can be passed instead of the name. For instance, You can pass “-15” instead of “-TERM”, and “-9” instead of “-KILL”.

Signals are not only used to shut down programs. They can also be used to perform other actions.

For instance, many daemons will restart when they are given the HUP or hang-up signal. Apache is one program that operates like this.

# kill -HUP <PID of httpd>

The above command will cause Apache to reload its configuration file and resume serving content.

You can list all of the signals that are possible to send with the kill by typing:

# kill -l

1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP
6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL 10) SIGUSR1
11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM
16) SIGSTKFLT 17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP
21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ
26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO 30) SIGPWR
31) SIGSYS 34) SIGRTMIN 35) SIGRTMIN+1 36) SIGRTMIN+2 37) SIGRTMIN+3
38) SIGRTMIN+4 39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 42) SIGRTMIN+8
43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7
58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2
63) SIGRTMAX-1 64) SIGRTMAX

The conventional way of sending signals is through the use of PIDs, there are also methods of doing this with regular process names.

The pkill command works in almost exactly the same way as kill, but it operates on a process name instead:

# pkill -9 ping

is equivalent to

# kill -9 `pgrep ping`

If you would like to send a signal to every instance of a certain process, you can use the killall command:

# killall firefox

Process Priorities

Some processes might be considered mission critical for your situation, while others may be executed whenever there might be leftover resources.You will want to adjust which processes are given priority in a server environment. Linux controls priority through a value called niceness.

High priority tasks are considered less nice, because they don’t share resources as well. Low priority processes, on the other hand, are nice because they insist on only taking minimal resources.

When we ran top at the beginning of this blog, there was a column marked “NI”. This is the nice value of the process:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 128092 6680 3932 S 0.0 0.7 0:02.84 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0

Nice values can range between “-19/-20” (highest priority) and “19/20” (lowest priority) depending on the system.

To run a program with a certain nice value, we can use the nice command:

# nice -n 15 <command>

This only works while executing a new program.

To alter the nice value of a program that is already executing, we use a tool called renice:

# renice 0 <PID of process>

Kernel Modules

What is a kernel module?

Modules are pieces of code that can be loaded and unloaded into the kernel upon demand. They extend the functionality of the kernel without the need to reboot the system.

If you want to add code to a Linux kernel, the most basic way to do that is to add some source files to the kernel source tree and recompile the kernel. In fact, the kernel configuration process consists mainly of choosing which files to include in the kernel to be compiled.

But you can also add code to the Linux kernel while it is running. A chunk of code that you add in this way is called a loadable kernel module. These modules can do lots of things, but they typically are one of three things: 1) device drivers; 2) filesystem drivers; 3) system calls. The kernel isolates certain functions, including these, especially well so they don’t have to be intricately wired into the rest of the kernel.

For example, one type of module is the device driver, which allows the kernel to access hardware connected to the system. Without modules, we would have to build monolithic kernels and add new functionality directly into the kernel image. Besides having larger kernels, this has the disadvantage of requiring us to rebuild and reboot the kernel every time we want new functionality.

Loadable Kernel Module [LKM]

Loadable kernel modules are often called just kernel modules or just modules, but those are rather misleading terms because there are lots of kinds of modules in the world and various pieces built into the base kernel can easily be called modules. We use the term loadable kernel module or LKM for the particular kinds of modules this post is about.

Some people think of LKMs as outside of the kernel. They speak of LKMs communicating with the kernel. This is a mistake; LKMs (when loaded) are very much part of the kernel. The correct term for the part of the kernel that is bound into the image that you boot, i.e. all of the kernel except the LKMs, is “base kernel.” LKMs communicate with the base kernel.

In some other operating systems, the equivalent of a Linux LKM is called a “kernel extension.”

What is “Linux”? Well, first of all, the name is used for two entirely different things, and only one of them is really relevant here:

  1. The kernel and related items distributed as a package by Linus Torvalds.
  2. A class of operating systems that traditionally are based on the Linux kernel.

Only the first of these is really useful in discussing LKMs. But even choosing this definition, people are often confused when it comes to LKMs.

LKMs did not exist in Linux in the beginning. Anything we use as LKM for today was built into the base kernel at kernel build time instead. LKMs have been around at least since Linux 1.2 (1995).

Device drivers and such were always quite modular, though. When LKMs were invented, only a small amount of work was needed on these modules to make them buildable as LKMs. However, it had to be done on each and every one, so it took some time. Since about 2000, virtually everything that makes sense as an LKM has at least had the option of being an LKM.

How Do Modules Get Into The Kernel?

You can see what modules are already loaded into the kernel by running lsmod, which gets its information by reading the file /proc/modules.

When the kernel needs a feature that is not resident in the kernel, the kernel module daemon kmod execs modprobe to load the module in. modprobe is passed a string in one of two forms:

A module name like softdog or ppp.

A more generic identifier like char-major-10-30.

If modprobe is handed a generic identifier, it first looks for that string in the file /etc/modprobe.conf If it finds an alias line like:

alias char-major-10-30 softdog

it knows that the generic identifier refers to the module softdog.ko.

Next, modprobe looks through the file /lib/modules/version/modules.dep, to see if other modules must be loaded before the requested module may be loaded. This file is created by depmod -a and contains module dependencies. For example, msdos.ko requires the fat.ko module to be already loaded into the kernel. The requested module has a dependency on another module if the other module defines symbols (variables or functions) that the requested module uses.

Lastly, modprobe uses insmod to first load any prerequisite modules into the kernel, and then the requested module. modprobe directs insmod to /lib/modules/version/, the standard directory for modules. insmod is intended to be fairly dumb about the location of modules, whereas modprobe is aware of the default location of modules, knows how to figure out the dependencies and load the modules in the right order. So for example, if you wanted to load the msdos module, you’d have to either run:

insmod /lib/modules/2.6.11/kernel/fs/fat/fat.ko
insmod /lib/modules/2.6.11/kernel/fs/msdos/msdos.ko

or:

modprobe msdos

What we’ve seen here is: insmod requires you to pass it the full pathname and to insert the modules in the right order, while modprobe just takes the name, without any extension, and figures out all it needs to know by parsing /lib/modules/version/modules.dep.

Linux distros provide modprobe, insmod and depmod as a package called module-init-tools. In previous versions that package was called modutils. Some distros also set up some wrappers that allow both packages to be installed in parallel and do the right thing in order to be able to deal with 2.4 and 2.6 kernels. Users should not need to care about the details, as long as they’re running recent versions of those tools.

What is modprobe?

The modprobe utility is used to add and remove kernel modules to/from linux kernel. Linux kernel modules have .ko as module name extension. ‘modprobe’ is intelligent enough to load the dependency of a kernel module(if any) first and then loads the actual module.

5 modprobe Examples
1. Basic example to load an LKM

In the first example we will see how to load a LKM (kept at any path in system) using modprobe. Use the following steps to achieve this :

sudo ln -s /path/to/your-kernel-module /lib/modules/`uname -r`
sudo depmod -a
sudo modprobe your-kernel-module

for example, what I did was :

$ sudo ln -s lkm.ko /lib/modules/2.6.32-21-generic/
$ sudo depmod -a
$ sudo modprobe lkm

NOTE: If your module outputs some debug messages then confirmation of the loaded module could be achieved by looking at the logs from dmesg utility. Alternatively, the lsmod utility can be used to view the currently loaded modules
2. Unload a loaded using modprobe

The modprobe can also be used to remove the loaded module. Or in other words we can unload a loaded module through modprobe using the -r option.

$ sudo modprobe -r lkm

The above command unloads a currently loaded module ‘lkm’ from kernel.
3. Have a dry run

Sometimes we face problems like module not being loaded properly etc. In that case it becomes very important to debug and know the level at which the problem exists. It becomes crucial to know whether the problem is before loading or after loading. To facilitate this type of debugging, there exists option -n which if used, forces modprobe to do everything else except the final stage of loading the module.

$ sudo modprobe -vn lkm
insmod /lib/modules/2.6.32-21-generic/kernel/arch/x86/kernel/lkm.ko

I used the -v option along with -n so that some debugging info could be spitted out by modprobe.
4. Suppress the error information

Usually in some error condition, the modprobe utility would try to output some error info. If that kind of info is not needed, then the -q option is used to suppress this kind of info.

$ sudo modprobe lk
FATAL: Module lk not found.
$ sudo modprobe -q lk
$

So we see that in the above output, when the command was run without the -q option then an error was thrown while when -q was used the error got suppressed.
5. List the modules

If there is a requirement to list all the modules or modules with specific name then modprobe provides -l option to accomplish this

$ modprobe -l crc*
kernel/arch/x86/crypto/crc32c-intel.ko
kernel/crypto/crc32c.ko
kernel/lib/crc-ccitt.ko
kernel/lib/crc-itu-t.ko
kernel/lib/crc7.ko
$ modprobe -l rds
kernel/net/rds/rds.ko

Syntax and Options

modprobe [ -v ] [ -V ] [ -C config-file ] [ -n ] [ -i ] [ -q ] [ -b ] [ -o modulename ] [ modulename ] [ module parameters… ]

modprobe [ -r ] [ -v ] [ -n ] [ -i ] [ modulename… ]

modprobe [ -l ] [ -t dirname ] [ wildcard ]

modprobe [ -c ]

modprobe [ –dump-modversions ] [ filename… ]

Short Option Long Option Option Description
-v –verbose Print messages about what the program is doing. Usually modprobe only prints messages if something goes wrong.This option is passed through install or remove commands to other modprobe commands in the MODPROBE_OPTIONS environment variable.
-C –config This option overrides the default configuration directory/file (/etc/modprobe.d or /etc/modprobe.conf). This option is passed through install or remove commands to other modprobe commands in the MODPROBE_OPTIONS environment variable.
-c –showconfig Dump out the effective configuration from the config directory and exit.
-n –dry-run This option does everything but actually insert or delete the modules (or run the install or remove commands). Combined with -v, it is use-ful for debugging problems.
-i –ignore-install –ignore-remove This option causes modprobe to ignore install and remove commands in the configuration file (if any) for the module specified on the command line (any dependent modules are still subject to commands set for them in the configuration file).
-q –quiet Normally modprobe will report an error if you try to remove or insert a module it can’t find (and isn’t an alias or install/remove command). With this flag, modprobe will simply ignore any bogus names (the kernel uses this to opportunistically probe for modules which might exist).
-r –remove This option causes modprobe to remove rather than insert a module. If the modules it depends on are also unused, modprobe will try to remove them too. Unlike insertion, more than one module can be specified on the command line (it does not make sense to specify module parameters when removing modules).There is usually no reason to remove modules, but some buggy modules require it. Your kernel may not support removal of modules.
-f –force Try to strip any versioning information from the module which might otherwise stop it from loading: this is the same as using both –force-vermagic and –force-modversion. Naturally, these checks are there for your protection, so using this option is dangerous.
This applies to any modules inserted: both the module (or alias) on the command line and any modules it on which it depends.

What is insmod?

insmod command is used to insert modules to Linux kernel.

3 insmod Examples
1. Specify module name as an argument

The following command insert the module airo to the Linux kernel.

# insmod kernel/drivers/net/wireless/airo.ko

Once you’ve inserted the module, use lsmod command to verify that the module has been inserted successfully as shown below.

# lsmod | grep airo
Module Size Used by
airo 66291 0

Note: It is recommended that you use modprobe command for all your Kernel module manipulation, including inserting a module.

All the modules that are available to be inserted to the kernel are listed in the modules.dep file as shown below.

# more /lib/modules/`uname -r`/modules.dep

(or)

# more /lib/modules/2.6.32-100.28.5.el6.x86_64/modules.dep

2. Insert a module with any arguments

If there are any arguments that needs to be passed for the module, give that as 3rd option as shown below.

The following command insert the module dummy to the Linux kernel. In this example, the dummy module takes two arguments type and debug. You can set the values for type and debug arguments and pass it to dummy module as shown below.

# insmod dummy type=”wpa” debug=1

3. Specify module name interactively

Let us assume that you want to insert a pcmcia module called ‘axnet_cs’.

First, verify to make sure this module is listed in the modules.dep file

# grep axnet_cs /lib/modules/`uname -r`/modules.dep
kernel/drivers/net/pcmcia/axnet_cs.ko:

Following is the full path of this file.

# ls /lib/modules/`uname -r`/kernel/drivers/net/pcmcia/axnet_cs.ko
/lib/modules/2.6.32-100.28.5.el6.x86_64/kernel/drivers/net/pcmcia/axnet_cs.ko

Use insmod – and enter the module name interactively.

# insmod –
kernel/drivers/net/pcmcia/axnet_cs.ko

Or, you can use < to give the module name as shown below.

# insmod – < kernel/drivers/net/pcmcia/axnet_cs.ko

Verify to make sure the module got inserted successfully.

# lsmod | grep axnet
axnet_cs 14627 0

Syntax and Options

Syntax:

insmod [ filename ] [ module options… ]

Real, Effective & Saved UID explained

Each Linux/Unix process has 3 UIDs associated with it. Superuser privilege is UID=0.

Real UID

This is the UID of the user/process that created THIS process. It can be changed only if the running process has EUID=0.

Effective UID

This UID is used to evaluate privileges of the process to perform a particular action. EUID can be changed either to RUID, or SUID if EUID!=0. If EUID=0, it can be changed to anything.

Saved UID

If the binary image file, that was launched has a Set-UID bit on, SUID will be the UID of the owner of the file. Otherwise, SUID will be the RUID.

  • What is the idea behind this?

Normal programs, like “ls”, “cat”, “echo” will be run by a normal user, under that users UID. Special programs that allow the user to have controlled access to protected data, can have Set-UID bit to allow the program to be run under privileged UID.

An example of such program is “passwd”. If you list it in full, you will see that it has a Set-UID bit and the owner is “root”. When a normal user, say “ajoy”, runs “passwd”, passwd starts with:

Real-UID = ajoy
Effective-UID = ajoy
Saved-UID = root

The program calls a system call “seteuid( 0 )” and since SUID=0, the call will succeed and the UIDs will be:

Real-UID = ajoy
Effective-UID = root
Saved-UID = root

After that, “passwd” process will be able to access /etc/passwd and change password for user “ajoy”. Note that user “ajoy” cannot write to /etc/passwd on it’s own. Note one other thing, setting a Set-UID on an executable file is not enough to make it run as a privileged process. The program itself must make a system call.

That is the idea.

CPU affinity an overview

When you are using SMP (Symmetric MultiProcessing) you might want to override the kernel’s process scheduling and bind a certain process to a specific CPU(s).

But what is CPU affinity?

CPU affinity is nothing but a scheduler property that “bonds” a process to a given set of CPUs on the SMP system. The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs. Note that the Linux scheduler also supports natural CPU affinity:

The scheduler attempts to keep processes on the same CPU as long as practical for performance reasons. Therefore, forcing a specific CPU affinity is useful only in certain applications. For example, the application such as Oracle (ERP apps) use # of CPUs per instance licensed. You can bound Oracle to specific CPU to avoid license problem. This is a really useful on large server having 4 or 8 CPUS

Setting processor affinity for a certain task or process using “taskset” command:

The command taskset is used to set or retrieve the CPU affinity of a running process given its PID or to launch a new COMMAND with a given CPU affinity. However taskset is not installed by default. You need to install schedutils (Linux scheduler utilities) package.

Install schedutils

Debian Linux:
# apt-get install schedutils

Red Hat Enterprise Linux:
# up2date schedutils
OR
# rpm –ivh schedutils*

Under latest version of Debian / Ubuntu Linux/RHEL etc taskset is installed by default using util-linux package.

# apt-get install util-linux

# yum install util-linux

The CPU affinity is represented as a bitmask, with the lowest order bit corresponding to the first logical CPU and the highest order bit corresponding to the last logical CPU. For example:

0x00000001 is processor #0 (1st processor)
0x00000003 is processors #0 and #1
0x00000004 is processors #2 (3rd processor)

To set the processor affinity of process 13545 to processor #0 (1st processor) type following command:

# taskset 0x00000001 -p 13545

If you find a bitmask hard to use, then you can specify a numerical list of processors instead of a bitmask using -c flag:
# taskset -c 1 -p 13545
# taskset -c 3,4 -p 13545

Where,

-p : Opera

ps -eo psr,pid

this gives the processor in each the pid is current assigned to.

taskset -p option pid
option 1 means 1st processor
option 2 means 2nd processor
option 0xffffffff means all the processor

List open files lsof command explained

The command lsof stands for list open files, which will list all the open files in the system. The open files include network connections, devices, and directories. The output of the lsof command will have the following columns:

COMMAND process name.
PID process ID
USER Username
FD file descriptor
TYPE node type of the file
DEVICE device number
SIZE file size
NODE node number
NAME full path of the file name.

Simply typing lsof will provide a list of all open files belonging to all active processes.

# lsof

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
init 1 root cwd DIR 8,1 4096 2 /
init 1 root txt REG 8,1 124704 917562 /sbin/init
init 1 root 0u CHR 1,3 0t0 4369 /dev/null
init 1 root 1u CHR 1,3 0t0 4369 /dev/null
init 1 root 2u CHR 1,3 0t0 4369 /dev/null
init 1 root 3r FIFO 0,8 0t0 6323 pipe
—————————————-truncated——————

By default, one file per line is displayed. Most of the columns are self-explanatory. We will explain the details about a couple of cryptic columns (FD and TYPE).

FD – Represents the file descriptor. Some of the values of FDs are,

cwd – Current Working Directory
txt – Text file
mem – Memory mapped file
mmap – Memory mapped device
NUMBER – Represent the actual file descriptor. The character after the number i.e ’1u’, represents the mode in which the file is opened. r for read, w for write, u for read and write.

TYPE – Specifies the type of the file. Some of the values of TYPEs are,

REG – Regular File
DIR – Directory
FIFO – First In First Out
CHR – Character special file

The lsof command by itself without may return lot of records as output, which may not be very meaningful except to give you a rough idea about how many files are open in the system at any given point of view as shown below.

# lsof | wc -l

3093

Use lsof –u option to display all the files opened by a specific user.

# lsof –u ajoy

vi 7190 ajoy txt REG 8,1 47

 

List opened files under a directory

You can list the processes which opened files under a specified directory using ‘+D’ option. +D will recurse the sub directories also. If you don’t want lsof to recurse, then use ‘+d’ option.

# lsof +D /var/log/

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
rsyslogd 488 syslog 1w REG 8,1 1151 268940 /var/log/syslog
rsyslogd 488 syslog 2w REG 8,1 2405 269616 /var/log/auth.log
console-k 144 root 9w REG 8,1 10871 269369 /var/log/ConsoleKit/history

List opened files based on process names starting with

You can list the files opened by process names starting with a string, using ‘-c’ option. -c followed by the process name will list the files opened by the process starting with that processes name. You can give multiple -c switch on a single command line.

# lsof -c ssh -c init

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
init 1 root txt REG 8,1 124704 917562 /sbin/init
init 1 root mem REG 8,1 1434180 1442625 /lib/i386-linux-gnu/libc-2.13.so
init 1 root mem REG 8,1 30684 1442694 /lib/i386-linux-gnu/librt-2.13.so

ssh-agent 1528 user1 1u CHR 1,3 0t0 4369 /dev/null
ssh-agent 1528 user1 2u CHR 1,3 0t0 4369 /dev/null

List processes using a mount point

Sometime when we try to umount a directory, the system will say “Device or Resource Busy” error. So we need to find out what are all the processes using the mount point and kill those processes to umount the directory. By using lsof we can find those processes.

# lsof /home

The following will also work.

# lsof +D /home/

# lsof -u ^ajoy

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
rtkit-dae 1380 rtkit 7u 0000 0,9 0 4360 anon_inode
udisks-da 1584 root cwd DIR 8,1 4096 2 /

The above command listed all the files opened by all users, expect user ‘ajoy’.

List all open files by a specific process

You can list all the files opened by a specific process using ‘-p’ option. It will be helpful sometimes to get more information about a specific process.

# lsof -p 1753

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
bash 1753 user1 cwd DIR 8,1 4096 393571 /home/ajoy/test.txt
bash 1753 user1 rtd DIR 8,1 4096 2 /
bash 1753 user1 255u CHR 136,0 0t0 3 /dev/pts/0

Kill all process that belongs to a particular user

When you want to kill all the processes which has files opened by a specific user, you can use ‘-t’ option to list output only the process id of the process, and pass it to kill as follows

# kill -9 `lsof -t -u ajoy`

The above command will kill all process belonging to user ‘ajoy’, which has files opened.

Similarly you can also use ‘-t’ in many ways. For example, to list process id of a process which opened /var/log/syslog can be done by

# lsof -t /var/log/syslog

489

Combine more list options using OR/AND

By default when you use more than one list option in lsof, they will be treated as OR. For example,

# lsof -u user1-c init

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
init 1 root cwd DIR 8,1 4096 2 /
init 1 root txt REG 8,1 124704 917562 /sbin/init
bash 1995 user1 2u CHR 136,2 0t0 5 /dev/pts/2
bash 1995 user1 255u CHR 136,2 0t0 5 /dev/pts/2

The above command uses two list options, ‘-u’ and ‘-c’. So the command will list process belongs to user ‘lakshmanan’ as well as process name starts with ‘init’.

But when you want to list a process belongs to user ‘lakshmanan’ and the process name starts with ‘init’, you can use ‘-a’ option.

# lsof -u user1 -c init -a

List all network connections

You can list all the network connections opened by using ‘-i’ option.

# lsof -i

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
avahi-dae 515 avahi 13u IPv4 6848 0t0 UDP *:mdns
avahi-dae 515 avahi 16u IPv6 6851 0t0 UDP *:52060
cupsd 1075 root 5u IPv6 22512 0t0 TCP ip6-localhost:ipp (LISTEN)

List all network files in use by a specific process

You can list all the network files which is being used by a process as follows

# lsof -i -a -p 234

You can also use the following

# lsof -i -a -c ssh

List processes which are listening on a particular port

You can list the processes which are listening on a particular port by using ‘-i’ with ‘:’ as follows

# lsofi :25

COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
exim4 2541 Debian-exim 3u IPv4 8677 TCP localhost:smtp (LISTEN)

List all TCP or UDP connections

You can list all the TCP or UDP connections by specifying the protocol using ‘-i’.

# lsof -i tcp; lsof -i udp;

List all Network File System ( NFS ) files

You can list all the NFS files by using ‘-N’ option. The following lsof command will list all NFS files used by user ‘lakshmanan’.

# lsof -N -u user1 -a

4608 475196 /bin/vi

sshd 7163 ajoy 3u IPv6 15088263 TCP dev-db:ssh->abc-12-12-12-12.socal.res.rr.com:2631 (ESTABLISHED)

A system administrator can use this command to get some idea on what users are executing on the system.

List Users of a particular file

If you like to view all the users who are using a particular file, use lsof as shown below. In this example, it displays all users who are currently using vi.

# lsof /bin/vi

COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
vi 7258 root txt REG 8,1 474608 475196 /bin/vi
vi 7300 ajoy txt REG 8,1 474608 475196 /bin/vi

Semaphores in a DB perspective

Semaphores can be described as counters which are used to provide synchronization between processes or between threads within a process for shared resources like shared memories. System V semaphores support semaphore sets where each one is a counting semaphore. So when an application requests semaphores, the kernel releases them in sets. The number of semaphores per set can be defined through the kernel parameter SEMMSL.

To see all semaphore settings, run:

# ipcs -ls

  • The SEMMSL Parameter

This parameter defines the maximum number of semaphores per semaphore set.

Oracle recommends SEMMSL to be at least 250 for 9i R2 and 10g R1/R2 databases except for 9i R2 on x86 platforms where the minimum value is lower. Since these recommendations are minimum settings, it’s best to set it always to at least 250 for 9i and 10g databases on x86 and x86-64 platforms.

NOTE:
If a database gets thousands of concurrent connections where the ora.init parameter PROCESSES is very large, then SEMMSL should be larger as well. Note what Metalink Note:187405.1 and Note:184821.1 have to say regarding SEMMSL: “The SEMMSL setting should be 10 plus the largest PROCESSES parameter of any Oracle database on the system”. Even though these notes talk about 9i databases this SEMMSL rule also applies to 10g databases. I’ve seen low SEMMSL settings to be an issue for 10g RAC databases where Oracle recommended to increase SEMMSL and to calculate it according to the rule mentioned in these notes. An example for setting semaphores for higher PROCESSES settings can be found at Example for Semaphore Settings.

  • The SEMMNI Parameter

This parameter defines the maximum number of semaphore sets for the entire Linux system.

Oracle recommends SEMMNI to be at least 128 for 9i R2 and 10g R1/R2 databases except for 9i R2 on x86 platforms where the minimum value is lower. Since these recommendations are minimum settings, it’s best to set it always to at least 128 for 9i and 10g databases on x86 and x86-64 platforms.

  • The SEMMNS Parameter

This parameter defines the total number of semaphores (not semaphore sets) for the entire Linux system. A semaphore set can have more than one semaphore, and as the semget(2) man page explains, values greater than SEMMSL * SEMMNI makes it irrelevant. The maximum number of semaphores that can be allocated on a Linux system will be the lesser of: SEMMNS or (SEMMSL * SEMMNI).

Oracle recommends SEMMNS to be at least 32000 for 9i R2 and 10g R1/R2 databases except for 9i R2 on x86 platforms where the minimum value is lower. Setting SEMMNS to 32000 ensures that SEMMSL * SEMMNI (250*128=32000) semaphores can be be used. Therefore it’s recommended to set SEMMNS to at least 32000 for 9i and 10g databases on x86 and x86-64 platforms.

  • The SEMOPM Parameter

This parameter defines the maximum number of semaphore operations that can be performed per semop(2) system call (semaphore call). The semop(2) function provides the ability to do operations for multiple semaphores with one semop(2) system call. Since a semaphore set can have the maximum number of SEMMSL semaphores per semaphore set, it is often recommended to set SEMOPM equal to SEMMSL.

Oracle recommends to set SEMOPM to a minimum value of 100 for 9i R2 and 10g R1/R2 databases on x86 and x86-64 platforms.

  • Setting Semaphore Parameters

To determine the values of the four described semaphore parameters, run:

# cat /proc/sys/kernel/sem
250 32000 32 128

These values represent SEMMSL, SEMMNS, SEMOPM, and SEMMNI.

Alternatively, you can run:

# ipcs -ls

All four described semaphore parameters can be changed in the proc file system without reboot:

# echo 250 32000 100 128 > /proc/sys/kernel/sem

Alternatively, you can use sysctl(8) to change it:

# sysctl -w kernel.sem=”250 32000 100 128″

To make the change permanent, add or change the following line in the file /etc/sysctl.conf. This file is used during the boot process.

# echo “kernel.sem=250 32000 100 128” >> /etc/sysctl.conf

  • Example for Semaphore Settings

On systems where the ora.init parameter PROCESSES is very large, the semaphore settings need to be adjusted accordingly.

As shown at The SEMMSL Parameter the SEMMSL setting should be 10 plus the largest PROCESSES parameter of any Oracle database on the system. So if you have one database instance running on a system where PROCESSES is set to 5000, then SEMMSL should be set to 5010.

As shown at The SEMMNS Parameter the maximum number of semaphores that can be allocated on a Linux system will be the lesser of: SEMMNS or (SEMMSL * SEMMNI). Since SEMMNI can stay at 128, we need to increase SEMMNS to 641280 (5010*128).

As shown at The SEMOPM Parameter a semaphore set can have the maximum number of SEMMSL semaphores per semaphore set and it is recommended to set SEMOPM equal to SEMMSL. Since SEMMSL is set to 5010 the SEMOPM parameter should be set to 5010 as well.

Hence, if the ora.init parameter PROCESSES is set to 5000, then the semaphore settings should be as follows:

# sysctl -w kernel.sem=”5010 641280 5010 128″

Init & Initrd

In this post, we’re discussing on two major components of Linux OS, initrd and init process.

Initrd a.k.a initial RAM disk:

The initial RAM disk (initrd) is an initial root file system that is mounted prior to when the real root file system is available. The initrd is bound to the kernel and loaded as part of the kernel boot procedure. The kernel then mounts this initrd as part of the two-stage boot process to load the modules to make the real file systems available and get at the real root file system.

The initrd contains a minimal set of directories and executables to achieve this, such as the insmod tool to install kernel modules into the kernel.

In the case of desktop or server Linux systems, the initrd is a transient file system. Its lifetime is short, only serving as a bridge to the real root file system. In embedded systems with no mutable storage, the initrd is the permanent root file system. We’re going to traverse both of these contexts.

Anatomy of the initrd

The initrd image contains the necessary executables and system files to support the second-stage boot of a Linux system. Depending on which version of Linux you’re running, the method for creating the initial RAM disk can vary. Prior to Fedora Core 3, the initrd is constructed using the loop device. The loop device is a device driver that allows you to mount a file as a block device and then interpret the file system it represents. The loop device may not be present in your kernel, but you can enable it through the kernel’s configuration tool (make menuconfig) by selecting Device Drivers > Block Devices > Loopback Device Support. You can inspect the loop device as follows (your initrd file name will vary):

Inspecting the initrd (prior to FC3)

# mkdir temp ; cd temp
# cp /boot/initrd.img.gz .
# gunzip initrd.img.gz
# mount -t ext -o loop initrd.img /mnt/initrd
# ls -la /mnt/initrd
#

You can now inspect the /mnt/initrd subdirectory for the contents of the initrd. Note that even if your initrd image file does not end with the .gz suffix, it’s a compressed file, and you can add the .gz suffix to gunzip it.
Beginning with Fedora Core 3, the default initrd image is a compressed cpio archive file. Instead of mounting the file as a compressed image using the loop device, you can use a cpio archive. To inspect the contents of a cpio archive, use the following commands:

Inspecting the initrd (FC3 and later)

# mkdir temp ; cd temp
# cp /boot/initrd-2.6.14.2.img initrd-2.6.14.2.img.gz
# gunzip initrd-2.6.14.2.img.gz
# cpio -i –make-directories < initrd-2.6.14.2.img
#

The result is a small root file system, as shown below. The small, but necessary, set of applications are present in the ./bin directory, including nash (not a shell, a script interpreter), insmod for loading kernel modules, and lvm (logical volume manager tools).

Default Linux initrd directory structure

# ls -la
#
drwxr-xr-x 10 root root 4096 May 7 02:48 .
drwxr-x— 15 root root 4096 May 7 00:54 ..
drwxr-xr-x 2 root root 4096 May 7 02:48 bin
drwxr-xr-x 2 root root 4096 May 7 02:48 dev
drwxr-xr-x 4 root root 4096 May 7 02:48 etc
-rwxr-xr-x 1 root root 812 May 7 02:48 init
-rw-r–r– 1 root root 1723392 May 7 02:45 initrd-2.6.14.2.img
drwxr-xr-x 2 root root 4096 May 7 02:48 lib
drwxr-xr-x 2 root root 4096 May 7 02:48 loopfs
drwxr-xr-x 2 root root 4096 May 7 02:48 proc
lrwxrwxrwx 1 root root 3 May 7 02:48 sbin -> bin
drwxr-xr-x 2 root root 4096 May 7 02:48 sys
drwxr-xr-x 2 root root 4096 May 7 02:48 sysroot
#

Of interest in the above command output is the init file at the root. This file, like the traditional Linux boot process, is invoked when the initrd image is decompressed into the RAM disk. We’ll explore about init in detail later in this post.

Tools for creating an initrd

The initial RAM disk was originally created to support bridging the kernel to the ultimate root file system through a transient root file system. The initrd is also useful as a non-persistent root file system mounted in a RAM disk for embedded Linux systems.

 

The mkinitrd utility is ideal for creating initrd images. In addition to creating an initrd image, it also identifies the modules to load for your particular system and populates them into the image.

Find out your kernel version:

# uname -r

2.6.15.4

Make a backup of existing ram disk:

# cp /boot/initrd.$(uname -r).img /root

To create initial ramdisk image type the following command as the root user:

# mkinitrd -o /boot/initrd.$(uname -r).img $(uname -r)
# ls -l /boot/initrd.$(uname -r).img

# mkinitrd -f -v /boot/initrd-$(uname -r).img $(uname -r)

The -v verbose flag causes mkinitrd to display the names of all the modules it is including in the initial ramdisk. The -f option will force an overwrite of any existing initial ramdisk image at the path you have specified. If you are in a kernel version different to the initrd you are building (including if you are in Rescue Mode) you must specify the full kernel version, without architecture

# mkinitrd -f -v /boot/initrd-2.6.18-164.el5.img 2.6.18-164.el5

You may need to modify grub.conf to point out to correct ramdisk image, make sure following line existing in grub.conf file:

# initrd /boot/initrd.img-2.6.15.4.img

 

initrd during Linux booting process:

initrd provides the capability to load a RAM disk by the bootloader. This RAM disk can then be mounted as the root filesystem and programs can be run from it. Afterward, a new root file system can be mounted on a different device. The previous root (from initrd) is then moved to a directory and can be subsequently unmounted. initrd is mainly designed to allow system startup to occur in two phases, where the kernel comes up with a minimum set of compiled-in drivers, and where additional modules are loaded from initrd.

  1. The boot loader loads the kernel and the initial RAM disk
  2. The kernel converts initrd into a “normal” RAM disk and frees the memory used by initrd
  3. initrd is mounted read-write as root
  4. /linuxrc is executed (this can be any valid executable, including shell scripts; it is run with uid 0 and can do basically everything init can do)
  5. linuxrc mounts the “real” root file system
  6. linuxrc places the root file system at the root directory using the pivot_root system call
  7. The usual boot sequence (e.g. invocation of /sbin/init) is performed on the root file system
  8. The initrd file system is removed

What is init?

Init is a user-space process that always has PID=1 and PPID=0. It’s the first user-space program spawned by the kernel once everything is ready (i.e. essential device drivers are initialized and the root filesystem is mounted). As the first process launched, it doesn’t have a meaningful parent.

 

When the kernel is loaded, it immediately initializes and configures the computer’s memory and configures the various hardware attached to the system, including all processors, I/O subsystems, and storage devices. It then looks for the compressed initrd image(s) in a predetermined location in memory, decompresses it directly to /sysroot/, and loads all necessary drivers. Next, it initializes virtual devices related to the file system, such as LVM or software RAID, before completing the initrd processes and freeing up all the memory the disk image once occupied.

The kernel then creates a root device, mounts the root partition read-only, and frees any unused memory.

At this point, the kernel is loaded into memory and operational. However, since there are no user applications that allow meaningful input to the system, not much can be done with the system.

To set up the user environment, the kernel executes the /sbin/init program.

The /sbin/init program (also called init) coordinates the rest of the boot process and configures the environment for the user. When the init command starts, it becomes the parent or grandparent of all of the processes that start up automatically on the system. First, it runs the /etc/rc.d/rc.sysinit script, which sets the environment path, starts swap, checks the file systems, and executes all other steps required for system initialization. For example, most systems use a clock, so rc.sysinit reads the /etc/sysconfig/clock configuration file to initialize the hardware clock. Another example is if there are special serial port processes which must be initialized, rc.sysinit executes the /etc/rc.serial file.

The init command then processes the jobs in the /etc/event.d directory, which describes how the system should be set up in each SysV init runlevel. Runlevels are a state, or mode, defined by the services listed in the SysV /etc/rc.d/rc<x>.d/ directory, where <x> is the number of the runlevel.

Next, the init command sets the source function library, /etc/rc.d/init.d/functions, for the system, which configures how to start, kill, and determine the PID of a program.

The init program starts all of the background processes by looking in the appropriate rc directory for the runlevel specified as the default in /etc/inittab. The rc directories are numbered to correspond to the runlevel they represent. For instance, /etc/rc.d/rc5.d/ is the directory for runlevel 5.
When booting to runlevel 5, the init program looks in the /etc/rc.d/rc5.d/ directory to determine which processes to start and stop.
Below is an example listing of the /etc/rc.d/rc5.d/ directory:

K74ypserv -> ../init.d/ypserv
K74ypxfrd -> ../init.d/ypxfrd
K85mdmpd -> ../init.d/mdmpd
K89netplugd -> ../init.d/netplugd
K99microcode_ctl -> ../init.d/microcode_ctl
S04readahead_early -> ../init.d/readahead_early
S05kudzu -> ../init.d/kudzu
S06cpuspeed -> ../init.d/cpuspeed
S08ip6tables -> ../init.d/ip6tables
S08iptables -> ../init.d/iptables
S09isdn -> ../init.d/isdn

As illustrated in this listing, none of the scripts that actually start and stop the services are located in the /etc/rc.d/rc5.d/ directory. Rather, all of the files in /etc/rc.d/rc5.d/ are symbolic links pointing to scripts located in the /etc/rc.d/init.d/ directory. Symbolic links are used in each of the rc directories so that the runlevels can be reconfigured by creating, modifying, and deleting the symbolic links without affecting the actual scripts they reference.

The name of each symbolic link begins with either a K or an S. The K links are processes that are killed on that runlevel, while those beginning with an S are started.

The init command first stops all of the K symbolic links in the directory by issuing the /etc/rc.d/init.d/<command> stop command, where <command> is the process to be killed. It then starts all of the S symbolic links by issuing /etc/rc.d/init.d/<command> start.

Running Additional Programs at Boot Time:

The /etc/rc.d/rc.local script is executed by the init command at boot time or when changing run levels. Adding commands to the bottom of this script is an easy way to perform necessary tasks like starting special services or initialize devices without writing complex initialization scripts in the /etc/rc.d/init.d/ directory and creating symbolic links.
The /etc/rc.serial script is used if serial ports must be setup at boot time. This script runs setserial commands to configure the system’s serial ports. Refer to the setserial man page for more information.

Shutting Down

To shut down Red Hat Enterprise Linux, the root user may issue the /sbin/shutdown command. The shutdown man page has a complete list of options, but the two most common uses are:

/sbin/shutdown -h now

and

/sbin/shutdown -r now

After shutting everything down, the -h option halts the machine and the -r option reboots.
PAM console users can use the reboot and halt commands to shut down the system while in run levels 1 through 5. For more information about PAM console users, refer to the Red Hat Enterprise Linux Deployment Guide.
If the computer does not power itself down, be careful not to turn off the computer until a message appears indicating that the system is halted.
Failure to wait for this message can mean that not all the hard drive partitions are unmounted, which can lead to file system corruption.

Further to the discussion on init, we have init stages like init s, init 0, init 1, init 2, init 3, init 4, init 5, init 6 which corresponds to run levels. init s and init 1 are single user mode but with a significant difference.

“init 1” is an administrative mode that runs the /etc/inittab script.
“init s” is a repair mode that doesn’t mount the filesystem. So, /etc/inittab doesn’t need to be there

Booting to run level 1 causes RHEL to process the /etc/rc.d/rc.sysinit script followed a single script from rc1.d which after some basic checks and cleanup execs init S.

Booting to runlevel s (S or single) causes RHEL to just process the /etc/rc.d/rc.sysinit script. If /etc/inittab is missing or corrupt, you go directly to a root shell with no scripts processed.

 

 

 

Our Server/OS is 32 Bit or 64 Bit?

To determine whether the running kernel is 32 bit or 64 bit :

3 ways to check,

To determine whether the CPU is 32 or 64 bit:

For checking the arch of the CPU we can grep the flags from the /proc/cpuinfo file


Among the flags identify the words “tm(transparent mode)” or “rm(real mode)” or “lm(long mode)”
rm ==> 16-bit processor
tm ==> 32-bit processor
lm ==> 64-bit processor

Linux memory usage demystified.. !!!!

Have you ever worried about Memory usage in the Linux server?

Ok, let’s take a look into the Linux memory.

Memory Architecture

To execute a process, the Linux kernel allocates a portion of the memory area to the requesting process. The process uses the memory area as workspace and performs the
required work. Assume a desktop or laptop assigned to you in your office cubicle. Take a look at your desktop (read as your Windows/Linux Dekstop) most of us will use this place in our system to scatter text files, documents, pdf’s, ppt’s and other things to perform your work, look how muddled up it looks. We are very poor in managing our desktop area, we are speaking about a maximum area of 1366×768 or more. Think of our Linux kernel managing GB’s of memory how cluttered it will be, But the Linux kernel is perfectly efficient in managing its memory area. The difference is that the kernel has to allocate space in a more dynamic manner. The number of running processes sometimes comes to tens of thousands and amount of memory is usually limited. Therefore, Linux kernel must handle the memory efficiently or you will end up with an unresponsive system.

32 Bit Vs 64 bit

On 32-bit architectures such as the i386, the Linux kernel can directly address only the first gigabyte of physical memory (896 MB when considering the reserved range). Memory above the so-called ZONE_NORMAL must be mapped into the lower 1 GB. This mapping is completely transparent to applications, but allocating a memory page in ZONE_HIGHMEM causes a small performance degradation.

On the other hand, with 64-bit architectures such as x86_64, ZONE_NORMAL extends all the way to 64 GB or to 128 GB in the case of IA-64 systems. As you can see, the overhead of mapping memory pages from ZONE_HIGHMEM into ZONE_NORMAL can be eliminated by using a 64-bit architecture

On 32-bit architectures, the maximum address space that single process can access is 4GB.This is a restriction derived from 32-bit virtual addressing. In a standard implementation, the virtual address space is divided into a 3 GB user space and a 1 GB kernel space. There are some variants like 4 G/4 G addressing layout implementing.

On the other hand, on 64-bit architecture such as x86_64 and IA64, no such restriction exists.Each single process can benefit from the vast and huge address space.

Memory Usage

On a Linux system, many programs run at the same time. These programs support multiple users, and some processes are more used than others. Some of these programs use a portion of memory while the rest are “sleeping.” When an application accesses cache, the performance increases because an in-memory access retrieves data, thereby eliminating the need to access slower disks. The OS uses an algorithm to control which programs will use physical memory and which are paged out. This is transparent to user programs.

Paging and Swapping

In Linux, genrally *NIXes, there are differences between paging and swapping. Paging moves individual pages to swap space on the disk and swapping is a bigger operation that moves the entire address space of a process to swap space in one operation.

If your server is always paging to disk (a high page-out rate), buddy..! its time to increase your physical RAM. However, for systems with a low page-out rate, it might not affect performance. If a process behaves poorly. Paging can be a serious performance problem when the amount of free memory pages falls below the minimum amount specified, because the paging mechanism is not able to handle the requests for physical memory pages and the swap mechanism is called to free more pages. This significantly increases I/O to disk and will quickly degrade a server’s performance.

Memory Available

This indicates how much physical memory is available for use. If, after you start your application, this value has decreased significantly, you might have a memory leak. Check the application that is causing it and make the necessary adjustments.

System Cache

This is the common memory space used by the file system cache.

Page Faults

There are two types of page faults: soft page faults, when the page is found in memory, and hard page faults, when the page is not found in memory and must be fetched from disk. Accessing the disk will slow your application considerably.

Private Memory

This represents the memory used by each process running on the server.

Oh.. ok.. at least some of you seems its too boring by now, So let’s think it real time.

To explain the memory usage I’ll try to illustrate with examples from my virtual test server running  PostgreSQL DB Instance on Centos 6.3.

The above output is a typical output for the free command used for memory monitoring. Just looking at this numbers, we can conclude that the system is having approx 2GB of ram and nearly 95% of it is used. If we look at the Swap: line in the output, we see that the swap space appears to be unused.

In between the Mem: and Swap: lines, we see a line labeled -/+ buffers/cache. This is probably the trickiest part in interpreting free command output. This line shows how much of the physical memory is used by the buffer/cache. In other words, this shows how much memory is being used (a better word would be borrowed) for disk caching. Disk caching makes the system run much faster.

So, while at first glance, this system appears to be running short of memory, it’s actually just making good use of memory that’s currently not needed for anything else. The key number to look at in the output above is, therefore, 1703. This is the amount of memory that would be made available to your applications if they need it.

Now we will see the Swap and Paging activity

NOTE: If your system is busy and you want to watch how memory is changing, you can run free with a -s (seconds) argument that causes the command to give you totals every X seconds. You need to start worrying about memory only and only if you see the swap usage is high.

Why Linux?

Why do I use Linux? Why do I spend much of my time suggesting others use it? Is it just because it’s available for free? These are interesting questions that are not discussed very often.

I post my personal motive for using Linux below. Some are practical, while others are more judicious.

If you’re one of those quivers on the brink of switching to Linux, reading this list might be a good place to start, and you may find some inspiration to make the leap.

  • Control over my system

There’s no “right way” or “wrong way” of doing things (although there are sensible and efficient ways of doing things, of course). I have the freedom to do what I want with Linux. In the Linux community, you’ll never hear somebody say, “Hey! You’re not supposed to do that!” or, “Serves you right for doing it the wrong way!” Instead, what you’re more likely to hear is, “Hey! I didn’t know you could do that! That’s cool!” Innovative solutions are encouraged. There is more than one way to do that a.k.a TIMTOWTDT

This freedom extends to my choice of software applications I use on my system. If I don’t like a particular piece of software, I can use an alternative. This is true even of desktop or system components.

  • Community

All operating systems tend to generate communities around them, but the community around Linux is proactive, rather than passive. What does it mean? Well, a typical posting in a Windows community forum might be something like this: “Hey! Feature X doesn’t work as it should! This sucks! I sure hope Microsoft fixes it soon!” The same posting in a Linux forum is more likely to be like this: “Hey! Feature X doesn’t work as it should! Here’s a solution…” Not only that but, in the reply, there will be several other solutions from other people or the original solution will be improved by others. Not only that but a developer might read the posting and offer a tweak for the original program, or even start his/her alternative project.

People in the Linux community share what they know. This is the whole damn point. Linux is based on the fundamental concept that knowledge wants to be free. Personally, I think this is awe-inspiring, not least because Linux brings out the very best in people.

What this also means that, if you solve a problem and take just a few moments to share it, you’re making Linux stronger. You are both a user and a contributor.

  • Virus-Free

This is a very practical reason for liking Linux, but no less valid: When using Linux, I don’t have to worry about viruses. Viruses are an ever-present threat to Windows–a kind of terrorism for computers.

I won’t brag that there are no viruses for Linux, but I’m fairly certain that there are fewer viruses in active circulation. Any that arise tend to die out quickly, simply because it’s much harder to infect a Linux system due to the way it’s built.

In addition, there’s a secret I read in some of the hacker blogs and sites that are rarely discussed: The kind of people who create viruses respect and like Linux. They don’t want to damage it, either in practical terms or by damaging its reputation. There’s also the fact that Linux is still a minority operating system, and virus writers tend to target the big fishes in the pond.

The lack of viruses means I don’t have to have an annoying antivirus program installed on my computer. There are no irritating pop-ups from the virus checker telling me it’s doing a good job, and no daily/weekly scans rendering my computer almost unusable. (IMHO nearly all antivirus programs on Windows are almost as bad as the viruses they claim to protect the user from.)

  • It’s Free Of Charge

This is another very practical reason, but it really can’t be underestimated. Linux doesn’t cost anything, and everybody in the world, therefore, has access to it. Who can argue with that?

And finally, if I’m to list out my personal liking for Linux I’ll put it in this way,

  • It’s free.
  • I can run on pretty much any hardware.
  • It is highly scalable… I can install it on a 486 or a dual core.
  • Help is readily available and free of charge.
  • Well documented.
  • An inbuilt standard help system that is actually useful (man pages).
  • Powerful CLI.
  • Linux can be configured to run without a GUI for max performance. This is especially useful for servers. Other operating systems don’t have this luxury.
  • Linux will actually give you a reason why something had an error.
  • You can choose which type of desktop environment you want.
  • With Linux, we can compile our own kernel so we don’t have to a wear a one size fits all hat.
  • When you “end a task” in Linux it actually works. (SIGTERM, SIGHUP,SIGKILL).
  • The command line auto complete feature works the way you expect it to.
  • Linux tends not to hide details.
  • You can choose a filesystem that better fits your needs. 
  • When there is a security exploit I can expect a patch immediately on the next day.