Monday, 4 January 2021

Improving kernel test coverage with stress-ng

Over the past year there has been focused work on improving the test coverage of the Linux Kernel with stress-ng.  Increased test coverage exercises more kernel code and hence improves the breadth of testing, allowing us to be more confident that more corner cases are being handled correctly.

The test coverage has been improved in several ways:

  1. testing more system calls; most system calls are being now exercised
  2. adding more ioctl() command tests
  3. exercising system call error handling paths
  4. exercise more system call options and flags
  5. keeping track of new features added to recent kernels and adding stress test cases for these
  6. adding support for new architectures (RISC-V for example)

Each stress-ng release is run with various stressor options against the latest kernel (built with gcov enabled).  The gcov data is processed with lcov to produce human readable kernel source code containing coverage annotations to help inform where to add more test coverage for the next release cycle of stress-ng. 

Linux Foundation sponsored Piyush Goyal for 3 months to add test cases that exercise system call test failure paths and I appreciate this help in improving stress-ng. I finally completed this tedious task at the end of 2020 with the release of stress-ng 0.12.00.

Below is a chart showing how the kernel coverage generated by stress-ng has been increasing since 2015. The amber line shows lines of code exercised and the green line shows kernel functions exercised. can see that there was a large increase of kernel test coverage in the latter half of 2020 with stress-ng.  In all, 2020 saw ~20% increase on kernel coverage, most of this was driven using the gcov analysis, however, there is more to do.

What next?  Apart from continuing to add support for new kernel system calls and features I hope to improve the kernel coverage test script to exercise more file systems; it will be interesting to see what kind of bugs get found. I'll also be keeping the stress-ng project page refreshed as this tracks bugs that stress-ng has found in the Linux kernel.

As it stands, release 0.12.00 was a major milestone for stress-ng as it marks the completion of the major work items to improve kernel test coverage.

Friday, 25 September 2020

Kernel janitor work: fixing spelling mistakes in kernel messages

The Linux 5.9-rc6 kernel source contains over 300,000 literal strings used in kernel messages of various sorts (errors, warnings, etc) and it is no surprise that typos and spelling mistakes slip into these messages from time to time.

To catch spelling mistakes I run a daily automated job that fetches the tip from linux-next and runs a fast spelling checker tool that finds all spelling mistakes and then diff's these against the results from the previous day.  The diff is emailed to me and I put my kernel janitor hat on, fix these up and send these to the upstream developers and maintainers.

The spelling checker tool is a fast-and-dirty C parser that finds literal strings and also variable names and checks these against a US English dictionary containing over 100,000 words. As fun weekend side project I hand optimized the checker to be able to parse and spell check several millions lines of kernel C code per second.

Every 3 or so months I collate all the fixes I've made and where appropriate I add new spelling mistake patterns to the kernel checkpatch spelling dictionary.   Kernel developers should in practice run on their patches before submitting them upstream and hopefully the dictionary will catch a lot of the regular spelling mistakes.

Over the past couple of years I've seen less spelling mistakes creep into the kernel, either because folk are running checkpatch more nowadays and/or that the dictionary is now able to catch more spelling mistakes.  As it stands, this is good as it means less work to fix these up.

Spelling mistakes may be trivial fixes, but cleaning these up helps make the kernel errors appear more professional and can also help clear up some ambiguous messages.

Thursday, 30 April 2020

easy capturing of kernel stack traces with virsh

Today I needed to capture a rather large kernel stack dump, this is rather trivial using virsh.  Using virt-manager I created a VM named vm-focal and in the guest ran:

sudo systemctl enable serial-getty@ttyS0.service 

Then on the host running the VM I ran:

virsh console vm-focal

Then all I needed to do was produce the stack dump and the console output was successfully dumped by virsh. Easy.