Showing posts with label stress-ng. Show all posts
Showing posts with label stress-ng. Show all posts

Monday, 4 January 2021

Improving kernel test coverage with stress-ng

Over the past year there has been focused work on improving the test coverage of the Linux Kernel with stress-ng.  Increased test coverage exercises more kernel code and hence improves the breadth of testing, allowing us to be more confident that more corner cases are being handled correctly.

The test coverage has been improved in several ways:

  1. testing more system calls; most system calls are being now exercised
  2. adding more ioctl() command tests
  3. exercising system call error handling paths
  4. exercise more system call options and flags
  5. keeping track of new features added to recent kernels and adding stress test cases for these
  6. adding support for new architectures (RISC-V for example)

Each stress-ng release is run with various stressor options against the latest kernel (built with gcov enabled).  The gcov data is processed with lcov to produce human readable kernel source code containing coverage annotations to help inform where to add more test coverage for the next release cycle of stress-ng. 

Linux Foundation sponsored Piyush Goyal for 3 months to add test cases that exercise system call test failure paths and I appreciate this help in improving stress-ng. I finally completed this tedious task at the end of 2020 with the release of stress-ng 0.12.00.

Below is a chart showing how the kernel coverage generated by stress-ng has been increasing since 2015. The amber line shows lines of code exercised and the green line shows kernel functions exercised.

 


..one can see that there was a large increase of kernel test coverage in the latter half of 2020 with stress-ng.  In all, 2020 saw ~20% increase on kernel coverage, most of this was driven using the gcov analysis, however, there is more to do.

What next?  Apart from continuing to add support for new kernel system calls and features I hope to improve the kernel coverage test script to exercise more file systems; it will be interesting to see what kind of bugs get found. I'll also be keeping the stress-ng project page refreshed as this tracks bugs that stress-ng has found in the Linux kernel.

As it stands, release 0.12.00 was a major milestone for stress-ng as it marks the completion of the major work items to improve kernel test coverage.

Saturday, 8 June 2019

Working towards stress-ng 0.10.00

Over the past 9+ months I've been cleaning up stress-ng in preparation for a V0.10.00 release.   Stress-ng is a portable Linux/UNIX Swiss army knife of micro-benchmarking kernel stress tests.

The Ubuntu kernel team uses stress-ng for kernel regression testing in several ways:
  • Checking that the kernel does not crash when being stressed tested
  • Performance (bogo-op throughput) regression checks
  • Power consumption regression checks
  • Core CPU Thermal regression checks
The wide range of micro benchmarks in stress-ng allow us to keep track of a range of metrics so that we can catch regressions.

I've tried to focus on several aspects of stress-ng over the last last development cycle:
  • Improve per-stressor modularization. A lot of code has been moved from the core of stress-ng back into each stress test.
  • Clean up a lot of corner case bugs found when we've been testing stress-ng in production.  We exercise stress-ng on a lot of hardware and in various cloud instances, so we find occasional bugs in stress-ng.
  • Improve usability, for example, adding bash command completion.
  • Improve portability (various kernels, compilers and C libraries). It really builds on runs on a *lot* of Linux/UNIX/POSIX systems.
  • Improve kernel test coverage.  Try to exercise more kernel core functionality and reach parts other tests don't yet reach.
Over the past several days I've been running various versions of stress-ng on a gcov enabled 5.0 kernel to measure kernel test coverage with stress-ng.  As shown below, the tool has been slowly gaining more core kernel coverage over time:

With the use of gcov + lcov, I can observe where stress-ng is not currently exercising the kernel and this allows me to devise stress tests to touch these un-exercised parts.  The tool has a history of tripping kernel bugs, so I'm quite pleased it has helped us to find corners of the kernel that needed improving.

This week I released V0.09.59 of stress-ng.  Apart from the usual sets of clean up changes and bug fixes, this new release now incorporates bash command line completion to make it easier to use.  Once the 5.2 Linux kernel has been released and I'm satisfied that stress-ng covers new 5.2 features I will  probably be releasing V0.10.00. This  will be a major release milestone now that stress-ng has realized most of my original design goals.

Tuesday, 18 July 2017

New features landing stress-ng V0.08.09

The latest release of stress-ng V0.08.09 incorporates new stressors and a handful of bug fixes. So what is new in this release?
  • memrate stressor to exercise and measure memory read/write throughput
  • matrix yx option to swap order of matrix operations
  • matrix stressor size can now be 8192 x 8192 in size
  • radixsort stressor (using the BSD library radixsort) to exercise CPU and memory
  • improved job script parsing and error reporting
  • faster termination of rmap stressor (this was slow inside VMs)
  • icache stressor now calls cacheflush()
  • anonymous memory mappings are now private allowing hugepage madvise
  • fcntl stressor exercises the 4.13 kernel F_GET_FILE_RW_HINT and F_SET_FILE_RW_HINT
  • stream and vm stressors have new mdavise options
The new memrate stressor performs 64/32/16/8 bit reads and writes to a large memory region.  It will attempt to get some statistics on the memory bandwidth for these simple reads and writes.  One can also specify the read/write rates in terms of MB/sec using the --memrate-rd-mbs and --memrate-wr-mbs options, for example:

 stress-ng --memrate 1 --memrate-bytes 1G \  
 --memrate-rd-mbs 1000 --memrate-wr-mbs 2000 -t 60  
 stress-ng: info: [22880] dispatching hogs: 1 memrate  
 stress-ng: info: [22881] stress-ng-memrate: write64: 1998.96 MB/sec  
 stress-ng: info: [22881] stress-ng-memrate: read64:   998.61 MB/sec  
 stress-ng: info: [22881] stress-ng-memrate: write32: 1999.68 MB/sec  
 stress-ng: info: [22881] stress-ng-memrate: read32:   998.80 MB/sec  
 stress-ng: info: [22881] stress-ng-memrate: write16: 1999.39 MB/sec  
 stress-ng: info: [22881] stress-ng-memrate: read16:   999.66 MB/sec  
 stress-ng: info: [22881] stress-ng-memrate: write8:  1841.04 MB/sec  
 stress-ng: info: [22881] stress-ng-memrate:  read8:   999.94 MB/sec  
 stress-ng: info: [22880] successful run completed in 60.00s (1 min, 0.00 secs)  

...the memrate stressor will attempt to limit the memory rates but due to scheduling jitter and other memory activity it may not be 100% accurate.  By careful setting of the size of the memory being exercised with the --memrate-bytes option one can exercise the L1/L2/L3 caches and/or the entire memory.

By default, matrix stressor will perform matrix operations with optimal memory access to memory.  The new --matrix-yx option will instead perform matrix operations in a y, x rather than an x, y matrix order, causing more cache stalls on larger matrices.  This can be useful for exercising cache misses.

To complement the heapsort, mergesort and qsort memory/CPU exercising sort stressors I've added the BSD library radixsort stressor to exercise sorting of hundreds of thousands of small text strings.

Finally, while exercising various hugepage kernel configuration options I was inspired to make stress-ng mmap's to work better with hugepage madvise hints, so where possible all anonymous memory mappings are now private to allow hugepage madvise to work.  The stream and vm stressors also have new madvise options to allow one to chose hugepage, nohugepage or normal hints.

No big changes as per normal, just small incremental improvements to this all purpose stress tool.

Thursday, 22 June 2017

Cyclic latency measurements in stress-ng V0.08.06

The stress-ng logo
The latest release of stress-ng contains a mechanism to measure latencies via a cyclic latency test.  Essentially this is just a loop that cycles around performing high precisions sleeps and measures the (extra overhead) latency taken to perform the sleep compared to expected time.  This loop runs with either one of the Round-Robin (rr) or First-In-First-Out real time scheduling polices.

The cyclic test can be configured to specify the sleep time (in nanoseconds), the scheduling type (rr or fifo),  the scheduling priority (1 to 100) and also the sleep method (explained later).

The first 10,000 latency measurements are used to compute various latency statistics:
  • mean latency (aka the 'average')
  • modal latency (the most 'popular' latency)
  • minimum latency
  • maximum latency
  • standard deviation
  • latency percentiles (25%, 50%, 75%, 90%, 95.40%, 99.0%, 99.5%, 99.9% and 99.99%
  • latency distribution (enabled with the --cyclic-dist option)
The latency percentiles indicate the latency at which a percentage of the samples fall into.  For example, the 99% percentile for the 10,000 samples is the latency at which 9,900 samples are equal to or below.

The latency distribution is shown when the --cyclic-dist option is used; one has to specify the distribution interval in nanoseconds and up to the first 100 values in the distribution are output.

For an idle machine, one can invoke just the cyclic measurements with stress-ng as follows:

 sudo stress-ng --cyclic 1 --cyclic-policy fifo \
--cyclic-prio 100 --cyclic-method --clock_ns \
--cyclic-sleep 20000 --cyclic-dist 1000 -t 5  
 stress-ng: info: [27594] dispatching hogs: 1 cyclic  
 stress-ng: info: [27595] stress-ng-cyclic: sched SCHED_FIFO: 20000 ns delay, 10000 samples  
 stress-ng: info: [27595] stress-ng-cyclic:  mean: 5242.86 ns, mode: 4880 ns  
 stress-ng: info: [27595] stress-ng-cyclic:  min: 3050 ns, max: 44818 ns, std.dev. 1142.92  
 stress-ng: info: [27595] stress-ng-cyclic: latency percentiles:  
 stress-ng: info: [27595] stress-ng-cyclic:  25.00%:    4881 us  
 stress-ng: info: [27595] stress-ng-cyclic:  50.00%:    5191 us  
 stress-ng: info: [27595] stress-ng-cyclic:  75.00%:    5261 us  
 stress-ng: info: [27595] stress-ng-cyclic:  90.00%:    5368 us  
 stress-ng: info: [27595] stress-ng-cyclic:  95.40%:    6857 us  
 stress-ng: info: [27595] stress-ng-cyclic:  99.00%:    8942 us  
 stress-ng: info: [27595] stress-ng-cyclic:  99.50%:    9821 us  
 stress-ng: info: [27595] stress-ng-cyclic:  99.90%:   22210 us  
 stress-ng: info: [27595] stress-ng-cyclic:  99.99%:   36074 us  
 stress-ng: info: [27595] stress-ng-cyclic: latency distribution (1000 us intervals):  
 stress-ng: info: [27595] stress-ng-cyclic: latency (us) frequency  
 stress-ng: info: [27595] stress-ng-cyclic:           0         0  
 stress-ng: info: [27595] stress-ng-cyclic:        1000         0  
 stress-ng: info: [27595] stress-ng-cyclic:        2000         0  
 stress-ng: info: [27595] stress-ng-cyclic:        3000        82  
 stress-ng: info: [27595] stress-ng-cyclic:        4000      3342  
 stress-ng: info: [27595] stress-ng-cyclic:        5000      5974  
 stress-ng: info: [27595] stress-ng-cyclic:        6000       197  
 stress-ng: info: [27595] stress-ng-cyclic:        7000       209  
 stress-ng: info: [27595] stress-ng-cyclic:        8000       100  
 stress-ng: info: [27595] stress-ng-cyclic:        9000        50  
 stress-ng: info: [27595] stress-ng-cyclic:       10000        10  
 stress-ng: info: [27595] stress-ng-cyclic:       11000         9  
 stress-ng: info: [27595] stress-ng-cyclic:       12000         2  
 stress-ng: info: [27595] stress-ng-cyclic:       13000         2  
 stress-ng: info: [27595] stress-ng-cyclic:       14000         1  
 stress-ng: info: [27595] stress-ng-cyclic:       15000         9  
 stress-ng: info: [27595] stress-ng-cyclic:       16000         1  
 stress-ng: info: [27595] stress-ng-cyclic:       17000         1  
 stress-ng: info: [27595] stress-ng-cyclic:       18000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       19000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       20000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       21000         1  
 stress-ng: info: [27595] stress-ng-cyclic:       22000         1  
 stress-ng: info: [27595] stress-ng-cyclic:       23000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       24000         1  
 stress-ng: info: [27595] stress-ng-cyclic:       25000         2  
 stress-ng: info: [27595] stress-ng-cyclic:       26000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       27000         1  
 stress-ng: info: [27595] stress-ng-cyclic:       28000         1  
 stress-ng: info: [27595] stress-ng-cyclic:       29000         2  
 stress-ng: info: [27595] stress-ng-cyclic:       30000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       31000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       32000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       33000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       34000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       35000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       36000         1  
 stress-ng: info: [27595] stress-ng-cyclic:       37000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       38000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       39000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       40000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       41000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       42000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       43000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       44000         1  
 stress-ng: info: [27594] successful run completed in 5.00s  
   

Note that stress-ng needs to be invoked using sudo to enable the Real Time FIFO scheduling for the cyclic measurements.

The above example uses the following options:

  • --cyclic 1
    • starts one instance of the cyclic measurements (1 is always recommended)
  • --cyclic-policy fifo 
    • use the real time First-In-First-Out scheduling for the cyclic measurements
  • --cyclic-prio 100 
    • use the maximum scheduling priority  
  • --cyclic-method clock_ns
    • use the clock_nanoseconds(2) system call to perform the high precision duration sleep
  • --cyclic-sleep 20000 
    • sleep for 20000 nanoseconds per cyclic iteration
  • --cyclic-dist 1000 
    • enable latency distribution statistics with an interval of 1000 nanoseconds between each data point.
  • -t 5
    • run for just 5 seconds
From the run above, we can see that 99.5% of latencies were less than 9821 nanoseconds and most clustered around the 4880 nanosecond model point. The distribution data shows that there is some clustering around the 5000 nanosecond point and the samples tail off with a bit of a long tail.

Now for the interesting part. Since stress-ng is packed with many different stressors we can run these while performing the cyclic measurements, for example, we can tell stress-ng to run *all* the virtual memory related stress tests and see how this affects the latency distribution using the following:

 sudo stress-ng --cyclic 1 --cyclic-policy fifo \  
 --cyclic-prio 100 --cyclic-method clock_ns \  
 --cyclic-sleep 20000 --cyclic-dist 1000 \  
 --class vm --all 1 -t 60s  
   

..the above invokes all the vm class of stressors to run all at the same time (with just one instance of each stressor) for 60 seconds.

The --cyclic-method specifies the delay used on each of the 10,000 cyclic iterations used.  The default (and recommended method) is clock_ns, using the high precision delay.  The available cyclic delay methods are:
  • clock_ns (use the clock_nanosecond() sleep)
  • posix_ns (use the POSIX nanosecond() sleep)
  • itimer (use a high precision clock timer and pause to wait for a signal to measure latency)
  • poll (busy spin-wait on clock_gettime() to eat cycles for a delay.
All the delay mechanisms use the CLOCK_REALTIME system clock for timing.

I hope this is plenty of cyclic measurement functionality to get some useful latency benchmarks against various kernel components when using some or a mix of the stress-ng stressors.  Let me know if I am missing some other cyclic measurement options and I can see if I can add them in.

Keep stressing and measuring those systems!