Monitoring Operating Systems Image 3

During performance testing, monitoring your operating system is important to potentially uncover any bottlenecks. Refer to our article  A High-Level Insight Into Performance Monitoring with regards to server monitoring,

Below we will explore the most common methods to monitor the three types of operating systems that are normally hosted on servers i.e.

  1. Windows
  2. Unix
  3. Others: OpenVMS, Mainframes

 

Windows

When it comes to monitoring windows servers, Perfmon (Performance Monitor) is by far the easiest way to collect metrics.

You can run Perfmon on any Windows machine by clicking Start -> Run -> “perfmon”.

Some performance testing tools will provide an interface to remotely monitor perfmon from the interface. This removes the need for separate retrieval and collation of Perfmon log files when analysing the test results

The typical counters that you can collect from perfmon are;

 

Table Pt 1

 

UNIX-likes: Solaris, Linux, HP-UX, AIX, etc

The majority of Unix and Linux have the same performance monitoring interfaces, with some minor variations.

In Unix, a daemon is a computer program that runs as a background process, rather than being under the direct control of an interactive user. Typically daemon names end with the letter ‘d’ for example, syslogd is the daemon that implements the system logging facility and sshd is a daemon that services incoming SSH connections.

Here are some of the commands that can be useful in performing the monitoring on these servers.

 

  • RSTATD Daemon

Purpose
Returns performance statistics obtained from the kernel.

Syntax
/usr/sbin/rpc.rstatd

Description
The rstatd daemon is a server that returns performance statistics obtained from the kernel. Some of these statistics may be read using the rup(1) The rstatd daemon is normally started by the inetd daemon.

Files
/etc/inetd.conf  TCP/IP (Transmission Control Protocol) configuration file that starts RPC (Remote Procedure Call) daemons and other TCP/IP daemons.
/etc/services  Contains an entry for each server available through the internet.

 

In LoadRunner (MicroFocus’s Performance Testing tool) this is referred to as the “Unix System Monitor”. It works along the same lines as perfmon: a daemon runs on the remote machine that you need to monitor. You can then use a client to connect to it and retrieve the data. The LoadRunner Controller does exactly this.

For more information click on the following link. https://www.ibm.com/support/knowledgecenter/ssw_aix_72/r_commands/rstatd.html

 

  • IOSTAT

iostat (Input/Output statistics) are used to monitor system input/output devices (physical and logical) that are loaded, by observing the time the devices are active in relation to their average transfer rates. 

iostat produces two reports, one on CPU utilisation and one on Disk use and the combined count of characters read/written to/from all terminals on the system. 

iostat’s options default to tdc (terminal, disk and CPU). If any other option/s are specified, this default is overridden i.e iostat – d will only report statistics about disks.

An iostat command of simply “iostat” will produce a single line of values printed out on the terminal window. 

For Example;

2020 06 22 08 54 32 Monitoring Operating Systems Read Only Word

The above report is broken down into two parts.

 

  1. CPU utilisation section which contains the following:

 

%user – shows the percentage of CPU utilisation that occurred while executing at the user (application) level

nice – shows the percentage of CPU utilisation that occurred while executing at the user level with nice priority

% system – shows the percentage of CPU utilisation that occurred while executing at the system (kernel) level

%iowait – shows the percentage of time that the CPU or CPU’s were idle during which the system had an outstanding disk I/O request

%steal – shows the percentage of time spent in involuntary wait by the virtual CPU or CPU’s while the hypervisor was servicing another virtual processor.

%idle – shows the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request

 

  1. Disk utilisation section which contains the following:

 

Device – This column gives the device (or partition) name as listed in the /dev directory.

Tps – Indicates the number of transfers per second that were issued to the device. A transfer is an I/O request to the device. Multiple logical requests can be combined into a single I/O request to the device. A transfer is of indeterminate size

Blk_read/s (kB_read/s, MB_read/s) – Indicates the amount of data read from the device expressed in a number of blocks (kilobytes, megabytes) per second. Blocks are equivalent to sectors and therefore have a size of 512 bytes.

Blk_wrtn/s (kB_wrtn/s, MB_wrtn/s) – Indicate the amount of data written to the device expressed in a number of blocks (kilobytes, megabytes) per second.

Blk_read (kB_read, MB_read) – The total number of blocks (kilobytes, megabytes) read.

Blk_wrtn (kB_wrtn, MB_wrtn) – The total number of blocks (kilobytes, megabytes) written.

To produce a report over a specified duration of time the arguments, interval and count will need to be added.

 

IOSTAT example

$ iostat 5 550 > /tmp/io_0620_1200.txt

Where:  

iostat is the type of monitoring 

5 (seconds) is the frequency (interval) the stats are captured

550 is the number (count) of cycles iostat will run for (5 550 will run for 45 minutes)

              > is the command to direct the output

              tmp/io_0620_1200.txt is the directory and file where the output will be captured.

This command will produce output for a 45 minute period, with the first line containing statistics for average values since the system reboot; subsequent lines will be relevant to the system’s behaviour for the snapshot in time that the data is collected.

iostat report statistics are useful in determining how heavy and evenly the load on the system is distributed across all physical drives, as well as CPU utilisation. These stats can be very useful when determining whether a physical volume is becoming a performance bottleneck and if there is potential to improve the situation.

For more information click on the following link https://man7.org/linux/man-pages/man1/iostat.1.html

 

  • VMSTAT

Vmstat (Virtual Memory Statistics) is a computer system monitoring tool that collects and displays summary information about processes, memory, paging, block IO, interrupts, traps, and CPU activity. You can specify a sampling interval which permits observing system activity in near real-time.

When you type vmstat into your command line, you will see a screen similar to the one shown below:

2020 06 22 08 52 31 Monitoring Operating Systems Read Only Word

 

The above output from vmstat is organised into six sections: “procs“, “memory“, “swap“, “io“, “system” and “cpu“.

The first section covers “processes”. The “r” refers to the number of processes waiting for run time. The “b” refers to the number of process in uninterruptable sleep.
The next section covers “memory”. “swpd” refers to the amount of virtual memory used.
Free” is the amount of idle memory.
buff” is the amount of memory that is used as buffers.
cache” is the amount of memory that is used as cache.
The next section covers “swap”. “si” refers to the amount of memory swapped in from disk. “so” is the amount of memory that is swapped out to disk.
The next section covers “IO”. “bi” is the blocks received from a block device (usually disk). “bo” is the number of blocks sent to a block device.
The next section covers “system”. “in” is the number of interrupts including the clock.
cs” is the number of context switching per second.
The final section covers “CPU”, the numbers refer to the percentage of total c time. “us” is the time spent running non-kernel code (user time including nice time). “sy” is the time spent running kernel code (system time). “id” the time spent idle. “wa” time spent waiting for “IO”.

 

VMSTAT example

$ vmstat [options] [delay [count]]

Where:  

vmstat is the type of monitoring 

Option – lets you specify the type of information needed such as paging -p, cache -c, interrupt -i etc. If no option is specified information about process, memory, paging, disk, interrupts & CPU is displayed.

Delay – is time period in seconds between two samples. vmstat 2 will give data at each 2-second interval.

Count – is the number of times the data is needed. vmstat 4 5 will give data at 4-second intervals 5 times.

For more information click on the following link https://man7.org/linux/man-pages/man8/vmstat.8.html

 

  • PRSTAT

prstat (process statistics) is the most common utility found on Sun Solaris for performance monitoring.

The prstat command examines all active processes on the system and reports statistics based on the selected output mode and sort order.

When you simply type prstat in your command line, you will see a screen similar to the one shown below, refreshing every few seconds and sorting all the information by the CPU column:

 

2020 06 22 08 49 35 Monitoring Operating Systems Read Only Word

 

The following list defines the column headings and the meanings of the above prstat report:

PID– The process ID of the process.

USERNAME– The real user (login) name or real user ID.

SIZE (SWAP) –  The total virtual memory size of the process, including all mapped files and devices, in kilobytes (K), megabytes (M), or gigabytes (G).

RSS – The resident set size of the process (RSS), in kilobytes (K), megabytes (M), or gigabytes (G). The RSS value is an estimate provided by proc(4) that might underestimate the actual resident set size. Users who want to get more accurate usage information for capacity planning should use the -x option to pmap(1) instead.

STATE  The state of the process:

  • cpuN – Process is running on CPU N.
  • sleep – Sleeping: process is waiting for an event to complete.
  • wait -Waiting: the process is waiting for CPU usage to drop to the CPU-caps enforced limits.
  • run – Runnable: process in on run queue.
  • zombie – Zombie state: process terminated and parent not waiting.
  • stop – Process is stopped.

PRI – The priority of the process. Larger numbers mean higher priority.

NICE – Nice value used in priority computation. Only processes in certain scheduling classes have a nice value.

TIME – The cumulative execution time for the process.

CPU – The percentage of recent CPU time used by the process. If executing in a non-global zone and the pools facility is active, the percentage will be that of the processors in the processor set in use by the pool to which the zone is bound.

PROCESS – The name of the process (name of the executed file).

NLWP – The number of lwps (lightweight processes) in the process.

If you run prstat with the -a option (prstat -a) you will get an output similar to the default one, but the last few lines of it will be used for providing a really useful report of the users consuming top system resources.

 

PRSTAT example

$ prstat [options] [delay [count]]

Where:  

Option – If you do not  specify  an  option,  prstat  examines  all processes and reports statistics sorted by CPU usage.”  

Delay – is time period in seconds between two samples. prstat 2 will give data at each two-second interval.

Count – is the number of times the data is needed. prstat 2 5 will give data at 2-second intervals 5 times.

For more information click on the following link https://docs.oracle.com/cd/E88353_01/html/E72487/prstat-8.html

 

Others: OpenVMS, Mainframes

Other operating systems you may run into are OpenVMS and IBM Mainframes. Mainframes come in two different flavours: 

  1. Descendants of System/360 (circa 1964!), IBM Z is now the family name used by IBM for all of its z/Architecture mainframe computers. The previous version was named OS/390.
  2. Descendants of AS/400 computers, now named the IBM I, currently at version IBM i 7.4 which was released in 2019.

For OpenVMS, HP provide the OpenView Performance Agent and a reference of all performance counters here: http://h30266.www3.hpe.com/odl/axplp/sysman/ovpa4040/openvms_metrics.html

IBM mainframes use a Collection Services function which provides for the collection of system and job level performance data. It is the primary collector of performance data and should be used when monitoring IBM mainframes.

Whether you are to monitor OpenVMS or a mainframe, you are likely to be working hand-in-hand with the system administrators, who may have other monitoring solutions in place.

 

In summary, there are numerous methods to monitor operating systems while performance testing.  

To find out how SQA Consulting can assist you with your performance monitoring needs contact us.

  • Iso 27001 2013 Badge White
  • CE+ Logo Affiliated Hi Res
  • Iso 9001 2015 Badge White