Tuesday, 10 September 2019

Boot speed improvements for Ubuntu 19.10 Eoan Ermine

The early boot requires loading and decompressing the kernel and initramfs from the boot storage device.   This speed is dependent on several factors, speed of loading an image from the boot device, the CPU and memory/cache speed for decompression and the compression type.

Generally speaking, the smallest (best) compression takes longer to decompress due to the extra complexity in the compression algorithm.  Thus we have a trade-off between load time vs decompression time.

For slow rotational media (such as a 5400 RPM HDD) with a slow CPU the loading time can be the dominant factor.  For faster devices (such as a SSD) with a slow CPU, decompression time may be the dominate factor.  For devices with fast 7200-10000 RPM HDDs with fast CPUs, the time to seek to the data starts to dominate the load time, so load times for different compressed kernel sizes is only slightly different in load time.

The Ubuntu kernel team ran several experiments benchmarking several x86 configurations using the x86 TSC (Time Stamp Counter) to measure kernel load and decompression time for 6 different compression types: BZIP2, GZIP, LZ4, LZMA, LZMO and XZ.  BZIP2, LZMA and XZ are slow to decompress so they got ruled out very quickly from further tests.

In compression size, GZIP produces the smallest compressed kernel size, followed by LZO (~16% larger) and LZ4 (~25% larger).  With decompression time, LZ4 is over 7 times faster than GZIP, and LZO being ~1.25 times faster then GZIP on x86.

In absolute wall-clock times, the following kernel load and decompress results were observed:

Lenovo x220 laptop, 5400 RPM HDD:
  LZ4 best, 0.24s faster than the GZIP total time of 1.57s

Lenovo x220 laptop, SSD:
  LZ4 best, 0.29s faster than the GZIP total time of 0.87s

Xeon 8 thread desktop with 7200 RPM HDD:
  LZ4 best, 0.05s faster than the GZIP total time of 0.32s

VM on a Xeon 8 thread desktop host with SSD RAID ZFD backing store:
  LZ4 best, 0.05s faster than the GZIP total time of 0.24s

Even with slow spinning media and a slow CPU, the longer load time of the LZ4 kernel is overcome by the far faster decompression time. As media gets faster, the load time difference between GZIP, LZ4 and LZO diminishes and the decompression time becomes the dominant speed factor with LZ4 the clear winner.

For Ubuntu 19.10 Eoan Ermine, LZ4 will be the default decompression for x86, ppc64el and s390 kernels and for the initramfs too.

References:
Analysis: https://kernel.ubuntu.com/~cking/boot-speed-eoan-5.3/kernel-compression-method.txt
Data: https://kernel.ubuntu.com/~cking/boot-speed-eoan-5.3/boot-speed-compression-5.3-rc4.ods

Tuesday, 13 August 2019

Monitoring page faults with faultstat

Whenever a process accesses a virtual address where there isn't currently a physical page mapped into its process space then a page fault occurs.  This causes an interrupt so that the kernel can handle the page fault.  

A minor page fault occurs when the kernel can successfully map a physically resident page for the faulted user-space virtual address (for example, accessing a memory resident page that is already shared by other processes).   Major page faults occur when accessing a page that has been swapped out or accessing a file backed memory mapped page that is not resident in memory.

Page faults incur latency in the running of a program, major faults especially so because of the delay of loading pages in from a storage device.

The faultstat tool allows one to easily monitor page fault activity allowing one to find the most active page faulting processes.  Running faultstat with no options will dump the page fault statistics of all processes sorted in major+minor page fault order.

Faultstat also has a "top" like mode, inoking it with the -T option will display the top page faulting processes again in major+minor page fault order.


The Major and Minor  columns show the respective major and minor page faults. The +Major and +Minor columns show the recent increase of page faults. The Swap column shows the swap size of the process in pages.

Pressing the 's' key will switch through the sort order. Pressing the 'a' key will add an arrow annotation showing page fault growth change. The 't' key will toggle between cumulative major/minor page total to current change in major/minor faults.

The faultstat tool has just landed in Ubuntu Eoan and can also be installed as a snap.  The source can is available on github.  

Saturday, 8 June 2019

Working towards stress-ng 0.10.00

Over the past 9+ months I've been cleaning up stress-ng in preparation for a V0.10.00 release.   Stress-ng is a portable Linux/UNIX Swiss army knife of micro-benchmarking kernel stress tests.

The Ubuntu kernel team uses stress-ng for kernel regression testing in several ways:
  • Checking that the kernel does not crash when being stressed tested
  • Performance (bogo-op throughput) regression checks
  • Power consumption regression checks
  • Core CPU Thermal regression checks
The wide range of micro benchmarks in stress-ng allow us to keep track of a range of metrics so that we can catch regressions.

I've tried to focus on several aspects of stress-ng over the last last development cycle:
  • Improve per-stressor modularization. A lot of code has been moved from the core of stress-ng back into each stress test.
  • Clean up a lot of corner case bugs found when we've been testing stress-ng in production.  We exercise stress-ng on a lot of hardware and in various cloud instances, so we find occasional bugs in stress-ng.
  • Improve usability, for example, adding bash command completion.
  • Improve portability (various kernels, compilers and C libraries). It really builds on runs on a *lot* of Linux/UNIX/POSIX systems.
  • Improve kernel test coverage.  Try to exercise more kernel core functionality and reach parts other tests don't yet reach.
Over the past several days I've been running various versions of stress-ng on a gcov enabled 5.0 kernel to measure kernel test coverage with stress-ng.  As shown below, the tool has been slowly gaining more core kernel coverage over time:

With the use of gcov + lcov, I can observe where stress-ng is not currently exercising the kernel and this allows me to devise stress tests to touch these un-exercised parts.  The tool has a history of tripping kernel bugs, so I'm quite pleased it has helped us to find corners of the kernel that needed improving.

This week I released V0.09.59 of stress-ng.  Apart from the usual sets of clean up changes and bug fixes, this new release now incorporates bash command line completion to make it easier to use.  Once the 5.2 Linux kernel has been released and I'm satisfied that stress-ng covers new 5.2 features I will  probably be releasing V0.10.00. This  will be a major release milestone now that stress-ng has realized most of my original design goals.

Saturday, 5 January 2019

Kernel commits with "Fixes" Tag (revisited)

Last year I wrote about kernel commits that are tagged with the "Fixes" tag. Kernel developers use the "Fixes" tag on a bug fix commit to reference an older commit that originally introduced the bug.   The adoption of the tag has been steadily increasing since v3.12 of the kernel:

The red line shows the number of commits per release of the kernel, and the blue line shows the number of commits that contain a "Fixes" tag.

In terms of % of commits that contain the "Fixes" tag, one can see it has been steadily increasing since v3.12 and almost 12.5% of kernel commits in v4.20 are tagged this way.

The fixes tag contains the commit SHA of the commit that was fixed, so one can look up the date of the fix and of the commit being fixed and determine the time taken to fix a bug.

As one can see, a lot of issues get fixed on the first few hundred days, and some bugs take years to get fixed.  Zooming into the first hundred days of fixes the distribution looks like:


..the modal point is at day 4, I suspect these are issues that get found quickly when commits land in linux-next and are found in early testing, integration builds and static analysis.

Out of the thousands of "Fixes" tagged commits and the time to fix an issue one can determine how long it takes to fix a specific percentage of the bugs:


In the graph above, 50% of fixes are made within 151 days of the original commit, ~69% of fixes are made within a year of the original commit and ~82% of fixes are made within 2 years.  The long tail indicates that there are some bugs that take a while to be found and fixed,  the final 10% of bugs take more than 3.5 years to be found and fixed.

Comparing the time to fix issues for kernel versions v4.0, v4.10 and v4.20 for bugs that are fixed in less than 50 days we have:


... the trends are similar, however it is worth noting that more bugs are getting found and fixed a little faster in v4.10 and v4.20 than v4.0.  It will be interesting to see how these trends develop over the next few years.