Init & Initrd
In this post, we’re discussing on two major components of Linux OS, initrd and init process.
Initrd a.k.a initial RAM disk:
The initial RAM disk (initrd) is an initial root file system that is mounted prior to when the real root file system is available. The initrd is bound to the kernel and loaded as part of the kernel boot procedure. The kernel then mounts this initrd as part of the two-stage boot process to load the modules to make the real file systems available and get at the real root file system.
The initrd contains a minimal set of directories and executables to achieve this, such as the insmod tool to install kernel modules into the kernel.
In the case of desktop or server Linux systems, the initrd is a transient file system. Its lifetime is short, only serving as a bridge to the real root file system. In embedded systems with no mutable storage, the initrd is the permanent root file system. We’re going to traverse both of these contexts.
Anatomy of the initrd
The initrd image contains the necessary executables and system files to support the second-stage boot of a Linux system. Depending on which version of Linux you’re running, the method for creating the initial RAM disk can vary. Prior to Fedora Core 3, the initrd is constructed using the loop device. The loop device is a device driver that allows you to mount a file as a block device and then interpret the file system it represents. The loop device may not be present in your kernel, but you can enable it through the kernel’s configuration tool (make menuconfig) by selecting Device Drivers > Block Devices > Loopback Device Support. You can inspect the loop device as follows (your initrd file name will vary):
Inspecting the initrd (prior to FC3)
# mkdir temp ; cd temp
# cp /boot/initrd.img.gz .
# gunzip initrd.img.gz
# mount -t ext -o loop initrd.img /mnt/initrd
# ls -la /mnt/initrd
#
You can now inspect the /mnt/initrd subdirectory for the contents of the initrd. Note that even if your initrd image file does not end with the .gz suffix, it’s a compressed file, and you can add the .gz suffix to gunzip it.
Beginning with Fedora Core 3, the default initrd image is a compressed cpio archive file. Instead of mounting the file as a compressed image using the loop device, you can use a cpio archive. To inspect the contents of a cpio archive, use the following commands:
Inspecting the initrd (FC3 and later)
# mkdir temp ; cd temp
# cp /boot/initrd-2.6.14.2.img initrd-2.6.14.2.img.gz
# gunzip initrd-2.6.14.2.img.gz
# cpio -i –make-directories < initrd-2.6.14.2.img
#
The result is a small root file system, as shown below. The small, but necessary, set of applications are present in the ./bin directory, including nash (not a shell, a script interpreter), insmod for loading kernel modules, and lvm (logical volume manager tools).
Default Linux initrd directory structure
# ls -la
#
drwxr-xr-x 10 root root 4096 May 7 02:48 .
drwxr-x— 15 root root 4096 May 7 00:54 ..
drwxr-xr-x 2 root root 4096 May 7 02:48 bin
drwxr-xr-x 2 root root 4096 May 7 02:48 dev
drwxr-xr-x 4 root root 4096 May 7 02:48 etc
-rwxr-xr-x 1 root root 812 May 7 02:48 init
-rw-r–r– 1 root root 1723392 May 7 02:45 initrd-2.6.14.2.img
drwxr-xr-x 2 root root 4096 May 7 02:48 lib
drwxr-xr-x 2 root root 4096 May 7 02:48 loopfs
drwxr-xr-x 2 root root 4096 May 7 02:48 proc
lrwxrwxrwx 1 root root 3 May 7 02:48 sbin -> bin
drwxr-xr-x 2 root root 4096 May 7 02:48 sys
drwxr-xr-x 2 root root 4096 May 7 02:48 sysroot
#
Of interest in the above command output is the init file at the root. This file, like the traditional Linux boot process, is invoked when the initrd image is decompressed into the RAM disk. We’ll explore about init in detail later in this post.
Tools for creating an initrd
The initial RAM disk was originally created to support bridging the kernel to the ultimate root file system through a transient root file system. The initrd is also useful as a non-persistent root file system mounted in a RAM disk for embedded Linux systems.
The mkinitrd utility is ideal for creating initrd images. In addition to creating an initrd image, it also identifies the modules to load for your particular system and populates them into the image.
Find out your kernel version:
# uname -r
2.6.15.4
Make a backup of existing ram disk:
# cp /boot/initrd.$(uname -r).img /root
To create initial ramdisk image type the following command as the root user:
# mkinitrd -o /boot/initrd.$(uname -r).img $(uname -r)
# ls -l /boot/initrd.$(uname -r).img
# mkinitrd -f -v /boot/initrd-$(uname -r).img $(uname -r)
The -v verbose flag causes mkinitrd to display the names of all the modules it is including in the initial ramdisk. The -f option will force an overwrite of any existing initial ramdisk image at the path you have specified. If you are in a kernel version different to the initrd you are building (including if you are in Rescue Mode) you must specify the full kernel version, without architecture
# mkinitrd -f -v /boot/initrd-2.6.18-164.el5.img 2.6.18-164.el5
You may need to modify grub.conf to point out to correct ramdisk image, make sure following line existing in grub.conf file:
# initrd /boot/initrd.img-2.6.15.4.img
initrd during Linux booting process:
initrd provides the capability to load a RAM disk by the bootloader. This RAM disk can then be mounted as the root filesystem and programs can be run from it. Afterward, a new root file system can be mounted on a different device. The previous root (from initrd) is then moved to a directory and can be subsequently unmounted. initrd is mainly designed to allow system startup to occur in two phases, where the kernel comes up with a minimum set of compiled-in drivers, and where additional modules are loaded from initrd.
- The boot loader loads the kernel and the initial RAM disk
- The kernel converts initrd into a “normal” RAM disk and frees the memory used by initrd
- initrd is mounted read-write as root
- /linuxrc is executed (this can be any valid executable, including shell scripts; it is run with uid 0 and can do basically everything init can do)
- linuxrc mounts the “real” root file system
- linuxrc places the root file system at the root directory using the pivot_root system call
- The usual boot sequence (e.g. invocation of /sbin/init) is performed on the root file system
- The initrd file system is removed
What is init?
Init is a user-space process that always has PID=1 and PPID=0. It’s the first user-space program spawned by the kernel once everything is ready (i.e. essential device drivers are initialized and the root filesystem is mounted). As the first process launched, it doesn’t have a meaningful parent.
When the kernel is loaded, it immediately initializes and configures the computer’s memory and configures the various hardware attached to the system, including all processors, I/O subsystems, and storage devices. It then looks for the compressed initrd image(s) in a predetermined location in memory, decompresses it directly to /sysroot/, and loads all necessary drivers. Next, it initializes virtual devices related to the file system, such as LVM or software RAID, before completing the initrd processes and freeing up all the memory the disk image once occupied.
The kernel then creates a root device, mounts the root partition read-only, and frees any unused memory.
At this point, the kernel is loaded into memory and operational. However, since there are no user applications that allow meaningful input to the system, not much can be done with the system.
To set up the user environment, the kernel executes the /sbin/init program.
The /sbin/init program (also called init) coordinates the rest of the boot process and configures the environment for the user. When the init command starts, it becomes the parent or grandparent of all of the processes that start up automatically on the system. First, it runs the /etc/rc.d/rc.sysinit script, which sets the environment path, starts swap, checks the file systems, and executes all other steps required for system initialization. For example, most systems use a clock, so rc.sysinit reads the /etc/sysconfig/clock configuration file to initialize the hardware clock. Another example is if there are special serial port processes which must be initialized, rc.sysinit executes the /etc/rc.serial file.
The init command then processes the jobs in the /etc/event.d directory, which describes how the system should be set up in each SysV init runlevel. Runlevels are a state, or mode, defined by the services listed in the SysV /etc/rc.d/rc<x>.d/ directory, where <x> is the number of the runlevel.
Next, the init command sets the source function library, /etc/rc.d/init.d/functions, for the system, which configures how to start, kill, and determine the PID of a program.
The init program starts all of the background processes by looking in the appropriate rc directory for the runlevel specified as the default in /etc/inittab. The rc directories are numbered to correspond to the runlevel they represent. For instance, /etc/rc.d/rc5.d/ is the directory for runlevel 5.
When booting to runlevel 5, the init program looks in the /etc/rc.d/rc5.d/ directory to determine which processes to start and stop.
Below is an example listing of the /etc/rc.d/rc5.d/ directory:
K74ypserv -> ../init.d/ypserv
K74ypxfrd -> ../init.d/ypxfrd
K85mdmpd -> ../init.d/mdmpd
K89netplugd -> ../init.d/netplugd
K99microcode_ctl -> ../init.d/microcode_ctl
S04readahead_early -> ../init.d/readahead_early
S05kudzu -> ../init.d/kudzu
S06cpuspeed -> ../init.d/cpuspeed
S08ip6tables -> ../init.d/ip6tables
S08iptables -> ../init.d/iptables
S09isdn -> ../init.d/isdn
As illustrated in this listing, none of the scripts that actually start and stop the services are located in the /etc/rc.d/rc5.d/ directory. Rather, all of the files in /etc/rc.d/rc5.d/ are symbolic links pointing to scripts located in the /etc/rc.d/init.d/ directory. Symbolic links are used in each of the rc directories so that the runlevels can be reconfigured by creating, modifying, and deleting the symbolic links without affecting the actual scripts they reference.
The name of each symbolic link begins with either a K or an S. The K links are processes that are killed on that runlevel, while those beginning with an S are started.
The init command first stops all of the K symbolic links in the directory by issuing the /etc/rc.d/init.d/<command> stop command, where <command> is the process to be killed. It then starts all of the S symbolic links by issuing /etc/rc.d/init.d/<command> start.
Running Additional Programs at Boot Time:
The /etc/rc.d/rc.local script is executed by the init command at boot time or when changing run levels. Adding commands to the bottom of this script is an easy way to perform necessary tasks like starting special services or initialize devices without writing complex initialization scripts in the /etc/rc.d/init.d/ directory and creating symbolic links.
The /etc/rc.serial script is used if serial ports must be setup at boot time. This script runs setserial commands to configure the system’s serial ports. Refer to the setserial man page for more information.
Shutting Down
To shut down Red Hat Enterprise Linux, the root user may issue the /sbin/shutdown command. The shutdown man page has a complete list of options, but the two most common uses are:
/sbin/shutdown -h now
and
/sbin/shutdown -r now
After shutting everything down, the -h option halts the machine and the -r option reboots.
PAM console users can use the reboot and halt commands to shut down the system while in run levels 1 through 5. For more information about PAM console users, refer to the Red Hat Enterprise Linux Deployment Guide.
If the computer does not power itself down, be careful not to turn off the computer until a message appears indicating that the system is halted.
Failure to wait for this message can mean that not all the hard drive partitions are unmounted, which can lead to file system corruption.
Further to the discussion on init, we have init stages like init s, init 0, init 1, init 2, init 3, init 4, init 5, init 6 which corresponds to run levels. init s and init 1 are single user mode but with a significant difference.
“init 1” is an administrative mode that runs the /etc/inittab script.
“init s” is a repair mode that doesn’t mount the filesystem. So, /etc/inittab doesn’t need to be there
Booting to run level 1 causes RHEL to process the /etc/rc.d/rc.sysinit script followed a single script from rc1.d which after some basic checks and cleanup execs init S.
Booting to runlevel s (S or single) causes RHEL to just process the /etc/rc.d/rc.sysinit script. If /etc/inittab is missing or corrupt, you go directly to a root shell with no scripts processed.