Becoming a SRE

This is a continuation of an earlier post about SRE. In that post we’ve seen what a SRE is and key skills require to become a SRE. Further to that, in this post we’ll see on becoming a SRE.

Cloud

  • AWS (recommended)
  • Azure
  • Google Cloud

Operating Systems

  • Linux (recommended)
  • Windows

Programming

  • Python (recommended)
  • Golang (recommended)
  • NodeJS

IaC – Infrastructure as a Code

  • Terraform (recommended)
  • Container Orchestration (recommended)
  • Configuration Management

CI & CD Tools

  • Jenkins (recommended)
  • Git & GItHub (recommended)
  • GitLab
  • Circle CI
  • Go continuous delivery
  • Bamboo

Continuous Monitoring

  • Prometheus (recommended)
  • AppDynamics (recommended)
  • Nagios
  • Zabbix
  • NewRelic

Networking/Connectivity

  • Protocols
  • Subnet/CIDR
  • Network Components (TGW, VPC, SG etc)
  • API (Rest, SOAP, XMLRPC)

Fun with Linux CLI

I have been using Linux or the past 20 years and I’m always in love with this Operating System. My daily driver for official use is a Windows Laptop which has WSL in it but my personal laptops a 15 year old Lenovo and the new HP runs MX Linux & Arch respectively. I believe most computer geeks are so enthusiastic about Linux distros and open-source software. Everyone has their own reasons for loving Linux, my reasons are as follows:

  • Linux Is Free – distributions are available for download free of charge.
  • Linux Is Open – Linux kernel or the heart of the OS & other operating system components, and many user programs are free and open-source, meaning that anyone can look at the source code and make changes. As Richard Stallman says, this software is “free as in speech.”
  • Linux Command Line – command line offers the most control over the computer. Many Linux programs only use the command line, including developer tools. This may repel normal users, but technical users appreciate it.
  • Community Support – choice of support, ranging from IRC, web forums, Wikis, Discord servers, even in-person user groups. For any issues someone has often posted a solution somewhere on the web. Due to the community spirit Linux seems to inspire its users on multiple forums
  • Programming Tools in abundance it comes with many of the tools they need to do their jobs. Editors, compilers, interpreters, debuggers, you name it, it’s often included in the default system. If not, it’s only a package manager command away.
  • Rapid Prototyping due to its affinity to scripting languages
  • Linux Is Customizable to the core including, desktop environments, window managers, apps etc and one can even run Linux without a GUI
  • Linux Runs Everywhere, from x86 to ARM to your N/W devices and your mobile.
  • Linux strengths is its ability to interoperate with other file format

Today we’re going to discuss about the fun side of Linux command line. If you are bored you should definitely try these fun commands.

Neofetch: A system utility written in bash to get customizable system info.

Installing neofetch:

sudo apt install neofetch - Debian/Ubuntu & its derivatives
sudo dnf install neofetch - Fedora/RHEL and its derivatives
sudo pacman -S neofetch - Arch/Manjaro and its derivatives

FIGlet: This command utility is used to create beautiful ASCII art logo. I some remote servers you might have already seen this. It doesn’t have a character limit. Create your own ASCII art of unlimited lengths with this CLI tool.

Installing figlet

sudo apt install figlet - Debian/Ubuntu & its derivatives
sudo dnf install figlet - Fedora/RHEL and its derivatives
sudo pacman -S figlet - Arch/Manjaro and its derivatives

Cowsay: It is an ASCII art command line tool that displays the input with an ASCII cow art

Installing cowsay

sudo apt install cowsay - Debian/Ubuntu & its derivatives
sudo dnf install cowsay - Fedora/RHEL and its derivatives
sudo pacman -S cowsay - Arch/Manjaro and its derivatives

sl: This linux command line utility brings the good old steam locomotive to our desktop. funny right, do try it out.

Installing sl

sudo apt install sl - Debian/Ubuntu & its derivatives
sudo dnf install sl - Fedora/RHEL and its derivatives
sudo pacman -S sl - Arch/Manjaro and its derivatives

xeyes: This is kind of a stress buster which will bring a pair of eyes to your desktop. The eye balls move depending on your mouse pointers position.

Installing xeyes

sudo apt install x11-apps - Debian/Ubuntu & its derivatives
sudo dnf install xeyes - Fedora/RHEL and its derivatives
sudo pacman -S xorg-xeyes - Arch/Manjaro and its derivatives

aafire: This utility will make your terminal light up. aafire command starts an ASCII fire inside your terminal.

Installing aafire

sudo apt install libaa-bin - Debian/Ubuntu & its derivatives
sudo dnf install aalib - Fedora/RHEL and its derivatives
sudo pacman -S aalib - Arch/Manjaro and its derivatives

rig: This command line tool help you to rig some user info. It wil quickly generate fake identity which is readable by apps and users.

Installing rig

sudo apt install rig - Debian/Ubuntu & its derivatives
sudo dnf install rig - Fedora/RHEL and its derivatives
sudo pacman -S rig - Arch/Manjaro and its derivatives

Finally, want to see movies or play music/mp3 files on the Linux command line, give it a try not many CLI will allow you to do this.

mpg123: for playing mp3 files/playlists
cmus: ncurses based utility for playing mp3 files/playlists
mpv: ncurses based utility for playing videos

sudo apt install cmus/mpg123 - Debian/Ubuntu & its derivatives
sudo dnf install cmus/mpg123 - Fedora/RHEL and its derivatives
sudo pacman -S cmus/mpg123 - Arch/Manjaro and its derivatives

sudo apt install mpv - Debian/Ubuntu & its derivatives
sudo dnf install mpv - Fedora/RHEL and its derivatives
sudo pacman -S mpv - Arch/Manjaro and its derivatives

RHEL 8 What’s new?

Red Hat Enterprise Linux 8 was released in Beta on November 14, 2018. There are so many features and improvements that distinguishes it from its antecedent – RHEL 7. In this blog, I’m attempting to provide a quick glance of those improvements, deprecations and the upgrade path.

Improvements:

  • YUM command is not available and DNF command replaces it. If you’ve worked on Fedora, DNF was a the default package manager in it.
  • chronyd is the default network time protocol wrapper instead of ntpd
  • Repo channels names changed, but content of them is mostly the same. CodeReady Linux Builder repository was added. It is similar to EPEL and supplies additional packages, which are not supported for production use.
  • One of the biggest improvement in RHEL 8 system performance is the new upper limit on physical memory capacity. Now has 4 PB of physical memory capacity compared to 64TB of system memory in RHEL 7
  • RPM command is also upgraded. The rpmbuild command can now do all build steps from a source package directly. the new –reinstall option allows to reinstall a previously installed package. there is a new rpm2archive utility for converting rpm payload to tar archives.
  • TCP networking stack is Improved. RedHat claims that the version 4.18 provides higher performances, better scalability, and more stability
  • RHEL 8 supports Open SSL 1.1.1 and TLS 1.3 cryptographic standard by default
  • BIND version is upgraded to 9.11 by default and introduces new features and feature changes compared to version 9.10.
  • Apache HTTP Server, has been updated from version 2.4.6 to version 2.4.37 between RHEL 7 and RHEL 8. This updated version includes several new features, but maintains backwards compatibility with the RHEL 7 version
  • RHEL 8 introduces nginx 1.14, a web and proxy server supporting HTTP and other protocols
  • OpenSSH was upgraded to version 7.8p1.
  • Vim runs default.vim script, if no ~/.vimrc file is available.
  • The ‘nobody’ & ‘nfsnobody’ user and groups are merged into ‘nobody’ ID (65534).
  • In RHEL 8, for some daemons like cups, the logs are no longer stored in specific files within the /var/log directory, which was used in RHEL 7. Instead, thet are stored only in systemd-journald.
  • Now you are forced to switch to Chronyd. The old NTP implementation is not supported in RHEL8.
  • NFS over UDP (NFS3) is no longer supported. The NFS configuration file moved to “/etc/nfs.conf”. when upgrading from RHEL7 the file is moved automatically.
  • For desktop users, Wayland is the default display server as a replacement for the X.org server. Yet X.Org is still available. Legacy X11 applications that cannot be ported to Wayland automatically use Xwayland as a proxy between the X11 legacy clients and the Wayland compositor.
  • Iptables were replaced by the nftables as a default network filtering framework. This update adds the iptables-translate and ip6tables-translate tools to convert the existing iptables or ip6tables rules into the equivalent ones for nftables.
  • GCC toolchain is based on the GCC 8.2
  • Python version installed by default is 3.6, which introduced incompatibilities with scripts written for Python 2.x but, Python 2.7 is available in the python2 package.
  • Perl 5.26, distributed with RHEL 8. The current directory . has been removed from the @INC module search path for security reasons. PHP 7.2 is also added
  • For working with containers, Red hat expects you to use the podman, buildah, skopeo, and runc tools. The podman tool manages pods, container images, and containers on a single node. It is built on the libpod library, which enables management of containers and groups of containers, called pods.
  • The basic installation provides a new version of the ifup and ifdown scripts which call NetworkManager through the nmcli tool. The NetworkManager-config-server package is only installed by default if you select either the Server or Server with GUI base environment during the setup. If you selected a different environment, use the yum install NetworkManager-config-server command to install the package.
  • Node.js, a software development platform in the JavaScript programming language, is provided for the first time in RHEL. It was previously available only as a Software Collection. RHEL 8 provides Node.js 10.
  • DNF modules improve package management.
  • New tool called Image Builder enables users to create customized RHEL images. Image Builder is available in AppStream in the lorax-composer package. Among other things, it allows created live ISO disk image and images for Azure, VMWare and AWS, See Composing a customized RHEL system image.
  • Some new storage management capabilities were introduced. Stratis is a new local storage manager. It provides managed file systems on top of pools of storage with additional features to the user. Also supports file system snapshots, and LUKSv2 disk encryption with Network-BoundDisk Encryption (NBDE).
  • VMs by default are managed via Cockpit. If required virt-manager could also be installed. Cockpit web console is available by default. It provides basic stats of the server much like Nagios and access to logs. Packages for the RHEL 8 web console, also known as Cockpit, are now part of Red Hat Enterprise Linux default repositories, and can therefore be immediately installed on a registered RHEL 8 system. (You should be using this extensively if you’re using KVM implementations of RHEL 8 virtual machines)

Deprecations:

  • Yum package is deprecated and Yum command is just a symbolic link to dnf.
  • NTP implementation is not supported in RHEL8
  • Network scripts are deprecated; ifup and ifdown map to nm-cli
  • Digital Signature Algorithm (DSA) is considered deprecated.. Authentication mechanisms that depend on DSA keys do not work in the default configuration.
  • rdist is removed as well as rsh and all r-utilities.
  • X.Org display server was replaced by Wayland’ from Gnome
  • tcp_wrappers were removed. Not clear what happened with programs previously compiled with tcp-wrapper support such as Postfix.
  • Iptables are deprecated.
  • Limited support for python 2.6.
  • KDE support has been deprecated.
  • The Up-gradation from KDE on RHEL 7 to GNOME on RHEL 8 is unsupported.
  • Removal of Btrfs support.
  • Docker is not included in RHEL 8.0.

Upgrade:

Release of RHEL 8 gives opportunity for those who still are using RHEL 6 to skip RHEL 7 completely for new server installations. RHEL 7 has five years before EOL (June 30, 2024) while many severs last more then five years now. Theoretically upgrade from RHEL 6 to RHEL 8 is possible via upgrade to RHEL 7 first, but is too risky. RHEL 8 is distributed through two main repositories: Please follow RHEL8 Upgrade path.

Base OS

Content in the BaseOS repository is intended to provide the core set of the underlying OS functionality that provides the foundation for all installations. This content is available in the RPM format and is subject to support terms similar to those in previous releases of RHEL. For a list of packages distributed through BaseOS.

AppStream

Content in the Application Stream repository includes additional user space applications, runtime languages, and databases in support of the varied workloads and use cases. Application Streams are available in the familiar RPM format, as an extension to the RPM format called modules, or as Software Collections. For a list of packages available in AppStream,

In addition, the CodeReady Linux Builder repository is available with all RHEL subscriptions. It provides additional packages for use by developers. Packages included in the CodeReady Linux Builder repository are unsupported. Please check RHEL 8 Package manifest.

With the idea of the Application stream, RHEL 8 is following the Fedora Modularity lead. Fedora 28, released earlier this year, by Fedora Linux distribution (considered as bleeding edge community edition of RHEL) introduced the concept of modularity. Without waiting for the next version of the operating system, Userspace components will update in less time than core operating system packages. Installations of many versions of the same packages (such as an interpreted language or a database) are also available by the use of an application stream.

Theoretically RHEL 8 will be able to withstand more heavy loads due to optimized TCP/IP stack and improvements in memory handling.

Installation has not been changed much from RHEL 7. RHEL 8 still pushes LVM for root filesystem in default installation. Without subscription you can still install packages from ISO, either directly or making it a repo. The default filesystem remains XFS, RedHat EnterpriseLinux 8 supports installing from a repository on a local hard drive. you only need to specify the directory instead of the ISO image.

For example:

inst.repo=hd::.

Kickstart also has changed but not much ( auth or authconfig are depreciated & you need to use authselect instead)

Source: Red Hat RHEL8 release notes, Red Hat Blogs, Linux Journal etc

Linux Inside Win 10

I am a zealous fan of Linux and FOSS. I have been using Linux and it’s TUI with bash shell for more than seventeen years. When I moved to my new role I find it bit difficult when I had a Windows 10 laptop and was literally fumbling with the powershell and cmd line when I tried working with tools like terraform, git etc. But luckily I figured out a solution for the old school *NIX users like me who are forced to use a Windows laptop, and that solution is WSL.

Windows Subsystem for Linux a.k.a WSL is an environment in Windows 10 for running unmodified Linux binaries in a way similar to Linux Containers. Please go through my earlier post on LinuxContainers for more details on it. WSL runs Linux binaries by implementing a Linux API compatibility layer partly in the Windows kernel when it was introduced first. The second iteration of it, “WSL 2” uses the Linux kernel itself in a lightweight VM to provide better compatibility with native Linux installations.

To use WSL in Win 10, you have to enable wsl feature from the Windows optional features. Being a aficionado of command line than the GUI, I’ll now list out step by step commands in order to enable the WSL and install your favourite Linux distribution and how to use it.

  1. Open Powershell as administrator and execute the command below Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux
    Note: Restart is required when prompted
  2. Invoke-WebRequest -Uri https://aka.ms/wsl-ubuntu-1804 -OutFile Ubuntu.appx -UseBasicParsing

In the above command you can use all the Linux distros available for WSL. For example if you want to use Kali Linux instead of Ubuntu, you can edit the distro url in the step 2 command as “https://aka.ms/ wsl-kali-linux”. Please refer to this guide for all available distros.

Once the download is completed, you need to add that package to the Windows 10 application list which can be done using the command below.

  1. Add-AppxPackage .\Ubuntu.appx

Once all these steps are completed, in the Win 10 search (magnifying glass in the bottom left corner near to the windows logo) type Ubuntu and you can see your Ubuntu (you need to search which ever wsl distro you have added). To complete the initialization of your newly installed distro, launch a new instance which is Ubuntu in our case by selecting and running Ubuntu from search as seen in the screen shot below.

This will start the installation of your chosen Linux distros binaries and libraries along with the kernel. It will take some time to complete the installation and configuration (approximately 5 to 10 minutes depending on your laptop/desktop configuration).

Once installation is complete, you will be prompted to create a new user account (and its password).

Note: You can choose any username and password you wish – they have no bearing on your Windows username.

If everything goes well, we’ll have Ubuntu installed as a sub system. When you open a new distro instance, you won’t be prompted for your password,
but if you elevate your privileges using sudo, you will need to enter your password.

Next step is to updating your package catalog, and upgrading your installed packages with the Ubuntu package manager apt. To do so, execute the below command in the prompt.

$ sudo apt update && sudo apt upgrade

Now we will proceed to install Git.

$ sudo apt install git

$ git --version

And finally test out git installation with the above command. You can also install other tools, packages available in the repository. I have installed git, terraform, aws-cli, azure-cli and ansible. You can install python, ruby, go programming environment as well. Python pip and ruby gem installations are also supported. You can use this sub system as an alternative for your day to day Linux operations and as an alternative terminal for your Powershell if you are using Sublime Text or Atom or Visual studio code

Demystifying Arch

There is always a lot of buzz around the Arch Linux distribution in the Linux community. Those who are switching to Linux as a desktop operating system are very reluctant to go the Arch way and those who have experienced with Linux as well. IMHO it’s only because of the mystery prevailing around this distro. Here in this blog we’ll try to address some of those and help to embrace everyone to Arch world with out any concerns.

Arch principles are: Simplicity, Modernity, Pragmatism, User Centrality and Versatility.

Arch Linux is an independently developed, x86-64 general-purpose GNU/Linux distribution that strives to provide the latest stable versions of most software by following a rolling-release model. The default installation is a minimal base system, configured by the user to only add what is purposely required. In other words Arch is a DIY Linux instalaltion where the user is having full control on each and every component getting installed and configured.

The original creator of Arch Linux, Judd Vinet was in total love with the simplicity and elegance of Slackware, BSD, etc. He wanted a distribution that could be DIY, you make it yourself from ground up and it only has what you want it to have.

The solution was to use a shell based manual installation instead of the automatic ncurses or graphical based installations that lead you to learn almost nothing and you know nothing about what exactly was installed on the system. The ZSH based installer of Arch Linux means that you do everything using commands. All this means that you know exactly what is happening, since you are the one doing it. You can customize everything right from the pre-installation steps. You don’t install the OS in Arch. You just download the base system and install it in the root partition/directory. Then you install other required packages needed by you by either chrooting into the system or booting into the new system and it will be a terminal, as it is only base system install the you’ve done. Install and configure each package, you must know what are your required packages and how they are installed and configured. That is how Arch Linux haelps you install your on custom Linux (DIY) and help you to learn by doing, breaking & repeating.

Arch Linux doesn’t care about being easy for Ubuntu/Debian/Redhat/Fedora style to set up. But, there do exist easy-to-install variants if anyone wants to have a touch and feel os Arch Linux; Antergos & Manjaro is essentially Arch with a graphical installer.

Typically in a more simpler steps Arch installation will be like,

  • Download ISO Image
  • Burn the image to a DVD/USB
  • Boot Arch from the media
  • create disk partitions
  • Setup network with/without DHCP, including wired or wireless network
  • Optimize gcc for the specific CPU
  • Config/compile Linux kernel & modules
  • Base packages
  • Environmental configuration
  • Necessary softwares/tools/applications
  • setup X server and GUI

In one word, the installation bundle in those distributions like Debian and CentOS does all things above in a more user-friendly way or even automatically, which you will have to do manually step by step in Arch Linux. But it is not at all a hard to do thing as it sounds. We’ll walk through the installation in the next post.

Magic SysRq key of Linux

The magic SysRq key is a key combination in the Linux kernel which allows the user to perform various low-level commands regardless of the system’s state.

It is often used to recover from freezes or to reboot a computer without corrupting the filesystem. The key combination consists of Alt+SysRq+commandkey. In many systems, the SysRq key is the PrintScreen key.

First, you need to enable the SysRq key, as shown below.

# echo “1” > /proc/sys/kernel/sysrq

List of SysRq Command Keys

Following are the command keys available for Alt+SysRq+commandkey.

‘k’ – Kills all the process running on the current virtual console.
’s’ – This will attempt to sync all the mounted file system.
‘b’ – Immediately reboot the system, without unmounting partitions or syncing.
‘e’ – Sends SIGTERM to all process except init.
‘m’ – Output current memory information to the console.
‘i’ – Send the SIGKILL signal to all processes except init
‘r’ – Switch the keyboard from raw mode (the mode used by programs such as X11), to XLATE mode.
’s’ – sync all mounted file systems.
‘t’ – Output a list of current tasks and their information to the console.
‘u’ – Remount all mounted filesystems in read-only mode.
‘o’ – Shutdown the system immediately.
‘p’ – Print the current registers and flags to the console.
‘0-9′ – Sets the console log level, controlling which kernel messages will be printed to your console.
‘f’ – Will call oom_kill to kill a process which takes more memory.
‘h’ – Used to display the help. But any other keys than the above listed will print help.

We can also do this by echoing the keys to the /proc/sysrq-trigger file. For example, to reboot a system you can perform the following.

# echo “b” > /proc/sysrq-trigger

Perform a Safe reboot of Linux using Magic SysRq Key

To perform a safe reboot of a Linux computer which hangs up, do the following. This will avoid the fsck during the next re-booting. i.e Press Alt+SysRq+letter highlighted below.

unRaw (take control of keyboard back from X11,
tErminate (send SIGTERM to all processes, allowing them to terminate gracefully),
kIll (send SIGILL to all processes, forcing them to terminate immediately),
Sync (flush data to disk),
Unmount (remount all filesystems read-only),
reBoot.

Linux Process & Process Management

Introduction

A Linux server, like any other computer you may be familiar with, runs applications. To the computer, these are considered “processes”.

While Linux will handle the low-level, behind-the-scenes management in a process’s lifecycle, you will need a way of interacting with the operating system to manage it from a higher-level.

In this post, we will discuss some simple aspects of process management. Linux provides an abundant collection of tools for this purpose.

How To View Running Processes in Linux

The easiest way to find out what processes are running on your server is to run the top command:

# top

top – 07:45:38 up 9:38, 1 user, load average: 0.00, 0.01, 0.05
Tasks: 89 total, 2 running, 87 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 1014976 total, 517484 free, 102656 used, 394836 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 728192 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 128092 6680 3932 S 0.0 0.7 0:02.84 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.03 ksoftirqd/0
6 root 20 0 0 0 0 S 0.0 0.0 0:00.32 kworker/u30:0
7 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
————- output truncated ——————-

The top chunk of information gives system statistics, such as system load and the total number of tasks.

You can easily see that there is 2 running process, and 87 processes are sleeping (aka idle/not using CPU resources).

The bottom portion has the running processes and their usage statistics.

Though top gives you an interface to view running processes based on ncurses. This tool is not always flexible enough to adequately cover all scenarios. A powerful command called ps is often the answer to these problems.

List processes with ps command

When called without arguments, the output can be a bit lack-luster:

# ps

PID TTY TIME CMD
3125 pts/0 00:00:00 sudo
3126 pts/0 00:00:00 su
3127 pts/0 00:00:00 bash
3150 pts/0 00:00:00 ps

This output shows all of the processes associated with the current user and terminal session. This makes sense because we are only running bash, sudo and ps with this terminal currently.

We can run ps command with different options to get a complete picture of the processes on this system.

BSD style – The options in bsd style syntax are not preceded by a dash.

# ps aux

UNIX/LINUX style – The options in Linux style syntax are preceded by a dash as usual.

# ps -ef

It is okay to mix both the syntax styles on Linux systems. For example “ps au -x”. In this post, we’re using both style syntaxes.

# ps aux

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2 0.0 0.0 0 0 ? S Jan11 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S Jan11 0:00 [ksoftirqd/0]
root 6 0.0 0.0 0 0 ? S Jan11 0:00 [kworker/u30:0]
root 7 0.0 0.0 0 0 ? S Jan11 0:00 [migration/0]
root 8 0.0 0.0 0 0 ? S Jan11 0:00 [rcu_bh]
root 9 0.0 0.0 0 0 ? R Jan11 0:00 [rcu_sched]
root 10 0.0 0.0 0 0 ? S Jan11 0:00

————- output truncated ——————-

These options tell ps to show processes owned by all users (regardless of their terminal association) in a user-friendly format.

To see a tree view, where hierarchal relationships are illustrated, we can run the command with these options:

# ps axjf

PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
0 2 0 0 ? -1 S 0 0:00 [kthreadd]
2 3 0 0 ? -1 S 0 0:00 \_ [ksoftirqd/0]
2 6 0 0 ? -1 S 0 0:00 \_ [kworker/u30:0]
2 7 0 0 ? -1 S 0 0:00 \_ [migration/0]
2 8 0 0 ? -1 S 0 0:00 \_ [rcu_bh]
2 9 0 0 ? -1 R 0 0:00 \_ 1 2024 2024 2024 ? -1 Ss 0 0:00 /usr/sbin/sshd
2024 3100 3100 3100 ? -1 Ss 0 0:00 \_ sshd: ajoy[priv]
3100 3103 3100 3100 ? -1 S 1000 0:00 \_ sshd: ajoy@pts/0
3103 3104 3104 3104 pts/0 3153 Ss 1000 0:00 \_ -bash
3104 3125 3125 3104 pts/0 3153 S 0 0:00 \_ sudo su –
3125 3126 3125 3104 pts/0 3153 S 0 0:00 \_ su –
3126 3127 3127 3104 pts/0 3153 S 0 0:00 \_ -bash
3127 3153 3153 3104 pts/0 3153 R+ 0 0:00 \_ ps axjf
————- output truncated ——————-

As you can see, the process sshd is shown to be a parent of the processes like bash, su, sudo, and ps ajx itself.

List the Process based on the UID and Commands (ps -u, ps -C)

Use -u option to displays the process that belongs to a specific username. When you have multiple usernames, separate them using a comma. The example below displays all the process that are owned by user wwwrun, or postfix.

# ps -f -u wwwrun,postfix

UID PID PPID C STIME TTY TIME CMD
postfix 7457 7435 0 Mar09 ? 00:00:00 qmgr -l -t fifo -u
wwwrun 7495 7491 0 Mar09 ? 00:00:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun 7496 7491 0 Mar09 ? 00:00:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun 7497 7491 0 Mar09 ? 00:00:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun 7498 7491 0 Mar09 ? 00:00:00 /usr/sbin/httpd2-prefork -f /etc/apac

The following example shows that all the processes which have tatad.pl in its command execution.

# ps -f -C tatad.pl

UID PID PPID C STIME TTY TIME CMD
root 9576 1 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9577 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9579 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9580 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9581 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bi

The following method is used to get a list of processes with a particular PPID.

#ps -f –ppid 9576

UID PID PPID C STIME TTY TIME CMD
root 9577 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9579 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9580 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root 9581 9576 0 Mar09 ? 00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin

List Processes in a Hierarchy (ps –forest)

The example below displays the process Id and commands in a hierarchy. –forest is an argument to ps command which displays ASCII art of process tree. From this tree, we can identify which is the parent process and the child processes it forked in a recursive manner.

#ps -e -o pid,args –forest
468 \_ sshd: root@pts/7
514 | \_ -bash
17484 \_ sshd: root@pts/11
17513 | \_ -bash
24004 | \_ vi ./790310__11117/journal
15513 \_ sshd: root@pts/1
15522 | \_ -bash
4280 \_ sshd: root@pts/5
4302 | \_ -bash

List elapsed wall time for processes (ps -o pid,etime=)

If you want the get the elapsed time for the processes which are currently running ps command provides etime which provides the elapsed time since the process was started, in the form [[dd-]hh:]mm: , ss.

The below command displays the elapsed time for the process IDs 1 (init) and process id 29675.

For example “10-22:13:29? in the output represents the process init is running for 10days, 22hours,13 minutes and 29seconds. Since init process starts during the system startup, this time will be same as the output of the ‘uptime’ command.

# ps -p 1,29675 -o pid,etime=
PID
1 10-22:13:29
29675 1-02:58:46

List all threads for a particular process (ps -L)

You can get a list of threads for the processes. When a process hangs, we might need to identify the list of threads running for a particular process as shown below.

# ps -C java -L -o pid,tid,pcpu,state,nlwp,args

PID TID %CPU S NLWP COMMAND
16992 16992 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16993 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16994 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16995 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16996 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16997 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16998 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16999 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 17000 0.0 S 15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_l

Finding memory Leak (ps –sort pmem)

A memory leak, technically, is an ever-increasing usage of memory by an application.

With common desktop applications, this may go unnoticed, because a process typically frees any memory it has used when you close the application.

However, In the client/server model, memory leakage is a serious issue, because applications are expected to be available 24×7. Applications must not continue to increase their memory usage indefinite because this can cause serious issues. To monitor such memory leaks, we can use the following commands.

# ps aux –sort pmem

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 1520 508 ? S 2005 1:27 init
inst 1309 0.0 0.4 344308 33048 ? S 2005 1:55 agnt (idle)
inst 2919 0.0 0.4 345580 37368 ? S 2005 20:02 agnt (idle)
inst 24594 0.0 0.4 345068 36960 ? S 2005 15:45 agnt (idle)

In the above ps command, –sort option outputs the highest %MEM at the bottom. Just note down the PID for the highest %MEM usage. Then use ps command to view all the details about this process id, and monitor the change over time. You had to manually repeat ir or put it as a cron to a file.

The VSZ number is useless if what you are interested in is memory consumption. VSZ measures how much of the process’s virtual memory space has been marked by the process of memory that should be mapped by the operating system if the process happens to touch it. But it has nothing to do with whether that memory has actually been touched and used. VSZ is an internal detail about how a process does memory allocation — how big a chunk of unused memory it grabs at once. Look at RSS for the count of memory pages it has actually started using

RSS:

Resident set size = the non-swapped physical memory that a task has used; Resident Set currently in physical memory including Code, Data, Stack

VSZ:

Virtual memory usage of entire process = VmLib + VmExe + VmData + VmStk

In other words,
a) VSZ *includes* RSS
b) “ps -aux” alone isn’t enough to tell you if a process is thrashing (although, if your system *is* thrashing, “ps -aux” will help you identify the processes experiencing the biggest hits).

# ps ev –pid=27645

PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
27645 ? S 3:01 0 25 1231262 1183976 14.4 /TaskServer/bin/./wrapper-linux-x86-32

# ps ev –pid=27645

PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
27645 ? S 3:01 0 25 1231262 1183976 14.4 /TaskServer/bin/./wrapper-linux-x86-32

Note: In the above output, if RSS (resident set size, in KB) increases over time (so would %MEM), it may indicate a memory leak in the application.

The following command displays all the process owned by Linux username: oracle.

# ps U oracle

PID TTY STAT TIME COMMAND
5014 ? Ss 0:01 /oracle/bin/tnslsnr
7124 ? Ss 0:00 ora_q002_med
8206 ? Ss 0:00 ora_cjq0_med
8852 ? Ss 0:01 ora_pmon_med

Following command displays all the process owned by the current user.

# ps U $USER

PID TTY STAT TIME COMMAND
10329 ? S 0:00 sshd: ajoy@pts/1,pts/2
10330 pts/1 Ss 0:00 -bash
10354 pts/2 Ss+ 0:00 -bash

The ps command can be configured to show a selected list of columns only. There are a large number of columns to to show and the full list is available in the man pages.

The following command shows only the pid, username, cpu, memory and command columns.

# ps -e -o pid,uname,pcpu,pmem,comm

PID USER %CPU %MEM COMMAND
1 root 0.0 0.6 systemd
2 root 0.0 0.0 kthreadd
3 root 0.0 0.0 ksoftirqd/0
6 root 0.0 0.0 kworker/u30:0
7 root 0.0 0.0 migration/0

The ps command is quite flexible and it is possible to rename the column labels as shown below:

# ps -e -o pid,uname=USERNAME,pcpu=CPU_USAGE,pmem,comm

PID USERNAME CPU_USAGE %MEM COMMAND
1 root 0.0 0.6 systemd
2 root 0.0 0.0 kthreadd
3 root 0.0 0.0 ksoftirqd/0
6 root 0.0 0.0 kworker/u30:0
7 root 0.0 0.0 migration/0

Combined with the watch command we can turn ps into a realtime process reporter. Simple example is like this

# watch -n 1 ‘ps -e -o pid,uname,cmd,pmem,pcpu –sort=-pmem,-pcpu | head -15’

Every 1.0s: ps -e -o pid,uname,cmd,pmem,pcpu –… Sun Dec 1 18:16:08 2009

PID USER CMD %MEM %CPU
3800 1000 /opt/google/chrome/chrome – 4.6 1.4
7492 1000 /opt/google/chrome/chrome – 2.7 1.4
3150 1000 /opt/google/chrome/chrome 2.7 2.5
3824 1000 /opt/google/chrome/chrome – 2.6 0.6
3936 1000 /opt/google/chrome/chrome – 2.4 1.6
2936 1000 /usr/bin/plasma-desktop 2.3 0.2
9666 1000 /opt/google/chrome/chrome – 2.1 0.8

Process IDs:

In Linux and Unix-like systems, each process is assigned a process ID, or PID. This is how the operating system identifies and keeps track of processes.

# pgrep bash

3104
3127

The first process spawned at boot, called init, is given the PID of “1”.

# pgrep init

1

This process is then responsible for spawning every other process on the system. The later processes are given larger PID numbers.

A process’s parent is the process that was responsible for spawning it. If a process’s parent is killed, then the child processes also die. The parent process’s PID is referred to as the PPID.

Process States:

Here are the different values that the s, stat and state output specifiers (header “STAT” or “S”) will display to describe the state of a process:

Running

The process is either running (it is the current process in the system) or it is ready to run (it is waiting to be assigned to one of the system’s CPUs).

Waiting

The process is waiting for an event or for a resource. Linux differentiates between two types of waiting process; interruptible and uninterruptible. Interruptible waiting processes can be interrupted by signals whereas uninterruptible waiting processes are waiting directly on hardware conditions and cannot be interrupted under any circumstances.

Stopped

The process has been stopped, usually by receiving a signal. A process that is being debugged can be in a stopped state.
Zombie
This is a halted process which, for some reason, still has a task_struct data structure in the task vector. It is what it sounds like, a dead process.

D uninterruptible sleep (usually IO)
R running or runnable (on run queue)
S interruptible sleep (waiting for an event to complete)
T stopped, either by a job control signal or because it is being traced.
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z defunct (“zombie”) process, terminated but not reaped by its parent.

For BSD formats and when the stat keyword is used, additional characters may be displayed:
< high-priority (not nice to other users)
N low-priority (nice to other users)
L has pages locked into memory (for real-time and custom IO)
s is a session leader
l is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)
+ is in the foreground process group

Processes Signals:

All processes in Linux respond to signals. Signals are an os-level way of telling programs to terminate or modify their behavior. The most common way of passing signals to a program is with the kill command. The default functionality of this utility is to attempt to kill a process:

# kill <PID of process>

This sends the TERM signal to the process. The TERM signal tells the process to please terminate. This allows the program to perform clean-up operations and exit smoothly.

If the program is misbehaving and does not exit when given the TERM signal, we can escalate the signal by passing the KILL signal:

# kill -KILL <PID of process>

This is a special signal that is not sent to the program.

Instead, it is given to the operating system kernel, which shuts down the process. This is used to bypass programs that ignore the signals sent to them.

Each signal has an associated number that can be passed instead of the name. For instance, You can pass “-15” instead of “-TERM”, and “-9” instead of “-KILL”.

Signals are not only used to shut down programs. They can also be used to perform other actions.

For instance, many daemons will restart when they are given the HUP or hang-up signal. Apache is one program that operates like this.

# kill -HUP <PID of httpd>

The above command will cause Apache to reload its configuration file and resume serving content.

You can list all of the signals that are possible to send with the kill by typing:

# kill -l

1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP
6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL 10) SIGUSR1
11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM
16) SIGSTKFLT 17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP
21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ
26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO 30) SIGPWR
31) SIGSYS 34) SIGRTMIN 35) SIGRTMIN+1 36) SIGRTMIN+2 37) SIGRTMIN+3
38) SIGRTMIN+4 39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 42) SIGRTMIN+8
43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7
58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2
63) SIGRTMAX-1 64) SIGRTMAX

The conventional way of sending signals is through the use of PIDs, there are also methods of doing this with regular process names.

The pkill command works in almost exactly the same way as kill, but it operates on a process name instead:

# pkill -9 ping

is equivalent to

# kill -9 `pgrep ping`

If you would like to send a signal to every instance of a certain process, you can use the killall command:

# killall firefox

Process Priorities

Some processes might be considered mission critical for your situation, while others may be executed whenever there might be leftover resources.You will want to adjust which processes are given priority in a server environment. Linux controls priority through a value called niceness.

High priority tasks are considered less nice, because they don’t share resources as well. Low priority processes, on the other hand, are nice because they insist on only taking minimal resources.

When we ran top at the beginning of this blog, there was a column marked “NI”. This is the nice value of the process:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 128092 6680 3932 S 0.0 0.7 0:02.84 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0

Nice values can range between “-19/-20” (highest priority) and “19/20” (lowest priority) depending on the system.

To run a program with a certain nice value, we can use the nice command:

# nice -n 15 <command>

This only works while executing a new program.

To alter the nice value of a program that is already executing, we use a tool called renice:

# renice 0 <PID of process>

Real, Effective & Saved UID explained

Each Linux/Unix process has 3 UIDs associated with it. Superuser privilege is UID=0.

Real UID

This is the UID of the user/process that created THIS process. It can be changed only if the running process has EUID=0.

Effective UID

This UID is used to evaluate privileges of the process to perform a particular action. EUID can be changed either to RUID, or SUID if EUID!=0. If EUID=0, it can be changed to anything.

Saved UID

If the binary image file, that was launched has a Set-UID bit on, SUID will be the UID of the owner of the file. Otherwise, SUID will be the RUID.

  • What is the idea behind this?

Normal programs, like “ls”, “cat”, “echo” will be run by a normal user, under that users UID. Special programs that allow the user to have controlled access to protected data, can have Set-UID bit to allow the program to be run under privileged UID.

An example of such program is “passwd”. If you list it in full, you will see that it has a Set-UID bit and the owner is “root”. When a normal user, say “ajoy”, runs “passwd”, passwd starts with:

Real-UID = ajoy
Effective-UID = ajoy
Saved-UID = root

The program calls a system call “seteuid( 0 )” and since SUID=0, the call will succeed and the UIDs will be:

Real-UID = ajoy
Effective-UID = root
Saved-UID = root

After that, “passwd” process will be able to access /etc/passwd and change password for user “ajoy”. Note that user “ajoy” cannot write to /etc/passwd on it’s own. Note one other thing, setting a Set-UID on an executable file is not enough to make it run as a privileged process. The program itself must make a system call.

That is the idea.

List open files lsof command explained

The command lsof stands for list open files, which will list all the open files in the system. The open files include network connections, devices, and directories. The output of the lsof command will have the following columns:

COMMAND process name.
PID process ID
USER Username
FD file descriptor
TYPE node type of the file
DEVICE device number
SIZE file size
NODE node number
NAME full path of the file name.

Simply typing lsof will provide a list of all open files belonging to all active processes.

# lsof

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
init 1 root cwd DIR 8,1 4096 2 /
init 1 root txt REG 8,1 124704 917562 /sbin/init
init 1 root 0u CHR 1,3 0t0 4369 /dev/null
init 1 root 1u CHR 1,3 0t0 4369 /dev/null
init 1 root 2u CHR 1,3 0t0 4369 /dev/null
init 1 root 3r FIFO 0,8 0t0 6323 pipe
—————————————-truncated——————

By default, one file per line is displayed. Most of the columns are self-explanatory. We will explain the details about a couple of cryptic columns (FD and TYPE).

FD – Represents the file descriptor. Some of the values of FDs are,

cwd – Current Working Directory
txt – Text file
mem – Memory mapped file
mmap – Memory mapped device
NUMBER – Represent the actual file descriptor. The character after the number i.e ’1u’, represents the mode in which the file is opened. r for read, w for write, u for read and write.

TYPE – Specifies the type of the file. Some of the values of TYPEs are,

REG – Regular File
DIR – Directory
FIFO – First In First Out
CHR – Character special file

The lsof command by itself without may return lot of records as output, which may not be very meaningful except to give you a rough idea about how many files are open in the system at any given point of view as shown below.

# lsof | wc -l

3093

Use lsof –u option to display all the files opened by a specific user.

# lsof –u ajoy

vi 7190 ajoy txt REG 8,1 47

 

List opened files under a directory

You can list the processes which opened files under a specified directory using ‘+D’ option. +D will recurse the sub directories also. If you don’t want lsof to recurse, then use ‘+d’ option.

# lsof +D /var/log/

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
rsyslogd 488 syslog 1w REG 8,1 1151 268940 /var/log/syslog
rsyslogd 488 syslog 2w REG 8,1 2405 269616 /var/log/auth.log
console-k 144 root 9w REG 8,1 10871 269369 /var/log/ConsoleKit/history

List opened files based on process names starting with

You can list the files opened by process names starting with a string, using ‘-c’ option. -c followed by the process name will list the files opened by the process starting with that processes name. You can give multiple -c switch on a single command line.

# lsof -c ssh -c init

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
init 1 root txt REG 8,1 124704 917562 /sbin/init
init 1 root mem REG 8,1 1434180 1442625 /lib/i386-linux-gnu/libc-2.13.so
init 1 root mem REG 8,1 30684 1442694 /lib/i386-linux-gnu/librt-2.13.so

ssh-agent 1528 user1 1u CHR 1,3 0t0 4369 /dev/null
ssh-agent 1528 user1 2u CHR 1,3 0t0 4369 /dev/null

List processes using a mount point

Sometime when we try to umount a directory, the system will say “Device or Resource Busy” error. So we need to find out what are all the processes using the mount point and kill those processes to umount the directory. By using lsof we can find those processes.

# lsof /home

The following will also work.

# lsof +D /home/

# lsof -u ^ajoy

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
rtkit-dae 1380 rtkit 7u 0000 0,9 0 4360 anon_inode
udisks-da 1584 root cwd DIR 8,1 4096 2 /

The above command listed all the files opened by all users, expect user ‘ajoy’.

List all open files by a specific process

You can list all the files opened by a specific process using ‘-p’ option. It will be helpful sometimes to get more information about a specific process.

# lsof -p 1753

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
bash 1753 user1 cwd DIR 8,1 4096 393571 /home/ajoy/test.txt
bash 1753 user1 rtd DIR 8,1 4096 2 /
bash 1753 user1 255u CHR 136,0 0t0 3 /dev/pts/0

Kill all process that belongs to a particular user

When you want to kill all the processes which has files opened by a specific user, you can use ‘-t’ option to list output only the process id of the process, and pass it to kill as follows

# kill -9 `lsof -t -u ajoy`

The above command will kill all process belonging to user ‘ajoy’, which has files opened.

Similarly you can also use ‘-t’ in many ways. For example, to list process id of a process which opened /var/log/syslog can be done by

# lsof -t /var/log/syslog

489

Combine more list options using OR/AND

By default when you use more than one list option in lsof, they will be treated as OR. For example,

# lsof -u user1-c init

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
init 1 root cwd DIR 8,1 4096 2 /
init 1 root txt REG 8,1 124704 917562 /sbin/init
bash 1995 user1 2u CHR 136,2 0t0 5 /dev/pts/2
bash 1995 user1 255u CHR 136,2 0t0 5 /dev/pts/2

The above command uses two list options, ‘-u’ and ‘-c’. So the command will list process belongs to user ‘lakshmanan’ as well as process name starts with ‘init’.

But when you want to list a process belongs to user ‘lakshmanan’ and the process name starts with ‘init’, you can use ‘-a’ option.

# lsof -u user1 -c init -a

List all network connections

You can list all the network connections opened by using ‘-i’ option.

# lsof -i

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
avahi-dae 515 avahi 13u IPv4 6848 0t0 UDP *:mdns
avahi-dae 515 avahi 16u IPv6 6851 0t0 UDP *:52060
cupsd 1075 root 5u IPv6 22512 0t0 TCP ip6-localhost:ipp (LISTEN)

List all network files in use by a specific process

You can list all the network files which is being used by a process as follows

# lsof -i -a -p 234

You can also use the following

# lsof -i -a -c ssh

List processes which are listening on a particular port

You can list the processes which are listening on a particular port by using ‘-i’ with ‘:’ as follows

# lsofi :25

COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
exim4 2541 Debian-exim 3u IPv4 8677 TCP localhost:smtp (LISTEN)

List all TCP or UDP connections

You can list all the TCP or UDP connections by specifying the protocol using ‘-i’.

# lsof -i tcp; lsof -i udp;

List all Network File System ( NFS ) files

You can list all the NFS files by using ‘-N’ option. The following lsof command will list all NFS files used by user ‘lakshmanan’.

# lsof -N -u user1 -a

4608 475196 /bin/vi

sshd 7163 ajoy 3u IPv6 15088263 TCP dev-db:ssh->abc-12-12-12-12.socal.res.rr.com:2631 (ESTABLISHED)

A system administrator can use this command to get some idea on what users are executing on the system.

List Users of a particular file

If you like to view all the users who are using a particular file, use lsof as shown below. In this example, it displays all users who are currently using vi.

# lsof /bin/vi

COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
vi 7258 root txt REG 8,1 474608 475196 /bin/vi
vi 7300 ajoy txt REG 8,1 474608 475196 /bin/vi

Semaphores in a DB perspective

Semaphores can be described as counters which are used to provide synchronization between processes or between threads within a process for shared resources like shared memories. System V semaphores support semaphore sets where each one is a counting semaphore. So when an application requests semaphores, the kernel releases them in sets. The number of semaphores per set can be defined through the kernel parameter SEMMSL.

To see all semaphore settings, run:

# ipcs -ls

  • The SEMMSL Parameter

This parameter defines the maximum number of semaphores per semaphore set.

Oracle recommends SEMMSL to be at least 250 for 9i R2 and 10g R1/R2 databases except for 9i R2 on x86 platforms where the minimum value is lower. Since these recommendations are minimum settings, it’s best to set it always to at least 250 for 9i and 10g databases on x86 and x86-64 platforms.

NOTE:
If a database gets thousands of concurrent connections where the ora.init parameter PROCESSES is very large, then SEMMSL should be larger as well. Note what Metalink Note:187405.1 and Note:184821.1 have to say regarding SEMMSL: “The SEMMSL setting should be 10 plus the largest PROCESSES parameter of any Oracle database on the system”. Even though these notes talk about 9i databases this SEMMSL rule also applies to 10g databases. I’ve seen low SEMMSL settings to be an issue for 10g RAC databases where Oracle recommended to increase SEMMSL and to calculate it according to the rule mentioned in these notes. An example for setting semaphores for higher PROCESSES settings can be found at Example for Semaphore Settings.

  • The SEMMNI Parameter

This parameter defines the maximum number of semaphore sets for the entire Linux system.

Oracle recommends SEMMNI to be at least 128 for 9i R2 and 10g R1/R2 databases except for 9i R2 on x86 platforms where the minimum value is lower. Since these recommendations are minimum settings, it’s best to set it always to at least 128 for 9i and 10g databases on x86 and x86-64 platforms.

  • The SEMMNS Parameter

This parameter defines the total number of semaphores (not semaphore sets) for the entire Linux system. A semaphore set can have more than one semaphore, and as the semget(2) man page explains, values greater than SEMMSL * SEMMNI makes it irrelevant. The maximum number of semaphores that can be allocated on a Linux system will be the lesser of: SEMMNS or (SEMMSL * SEMMNI).

Oracle recommends SEMMNS to be at least 32000 for 9i R2 and 10g R1/R2 databases except for 9i R2 on x86 platforms where the minimum value is lower. Setting SEMMNS to 32000 ensures that SEMMSL * SEMMNI (250*128=32000) semaphores can be be used. Therefore it’s recommended to set SEMMNS to at least 32000 for 9i and 10g databases on x86 and x86-64 platforms.

  • The SEMOPM Parameter

This parameter defines the maximum number of semaphore operations that can be performed per semop(2) system call (semaphore call). The semop(2) function provides the ability to do operations for multiple semaphores with one semop(2) system call. Since a semaphore set can have the maximum number of SEMMSL semaphores per semaphore set, it is often recommended to set SEMOPM equal to SEMMSL.

Oracle recommends to set SEMOPM to a minimum value of 100 for 9i R2 and 10g R1/R2 databases on x86 and x86-64 platforms.

  • Setting Semaphore Parameters

To determine the values of the four described semaphore parameters, run:

# cat /proc/sys/kernel/sem
250 32000 32 128

These values represent SEMMSL, SEMMNS, SEMOPM, and SEMMNI.

Alternatively, you can run:

# ipcs -ls

All four described semaphore parameters can be changed in the proc file system without reboot:

# echo 250 32000 100 128 > /proc/sys/kernel/sem

Alternatively, you can use sysctl(8) to change it:

# sysctl -w kernel.sem=”250 32000 100 128″

To make the change permanent, add or change the following line in the file /etc/sysctl.conf. This file is used during the boot process.

# echo “kernel.sem=250 32000 100 128” >> /etc/sysctl.conf

  • Example for Semaphore Settings

On systems where the ora.init parameter PROCESSES is very large, the semaphore settings need to be adjusted accordingly.

As shown at The SEMMSL Parameter the SEMMSL setting should be 10 plus the largest PROCESSES parameter of any Oracle database on the system. So if you have one database instance running on a system where PROCESSES is set to 5000, then SEMMSL should be set to 5010.

As shown at The SEMMNS Parameter the maximum number of semaphores that can be allocated on a Linux system will be the lesser of: SEMMNS or (SEMMSL * SEMMNI). Since SEMMNI can stay at 128, we need to increase SEMMNS to 641280 (5010*128).

As shown at The SEMOPM Parameter a semaphore set can have the maximum number of SEMMSL semaphores per semaphore set and it is recommended to set SEMOPM equal to SEMMSL. Since SEMMSL is set to 5010 the SEMOPM parameter should be set to 5010 as well.

Hence, if the ora.init parameter PROCESSES is set to 5000, then the semaphore settings should be as follows:

# sysctl -w kernel.sem=”5010 641280 5010 128″