A Smackerel of Opinion: June 2015

Saturday, 27 June 2015

Static code analysis on kernel source

Since 2014 I have been running static code analysis using tools such as cppcheck and smatch against the Linux kernel source on a regular basis to catch bugs that creep into the kernel. After each cppcheck run I then diff the logs and get a list of deltas on the error and warning messages, and I periodically review these to filter out false positives and I end up with a list of bugs that need some attention.

Bugs such as allocations returning NULL pointers without checks, memory leaks, duplicate memory frees and uninitialized variables are easy to find with static analyzers and generally just require generally one or two line fixes.

So what are the overall trends like?

Warnings and error messages from cppcheck have been dropping over time and "portable warnings" have been steadily increasing. "Portable warnings" are mainly from arithmetic on void * pointers (which GCC handles has byte sized but is not legal C), and these are slowly increasing over time. Note that there is some variation in the results as I use the latest versions of cppcheck, and occasionally it finds a lot of false positives and then this gets fixed in later versions of cppcheck.

Comparing it to the growth in kernel size the drop overall warning and error message trends from cppcheck aren't so bad considering the kernel has grown by nearly 11% over the time I have been running the static analysis.

Kernel source growth over time

Since each warning or error reported has to be carefully scrutinized to determine if they are false positives (and this takes a lot of effort and time), I've not yet been able determine the exact false positive rates on these stats. Compared to the actual lines of code, cppcheck is finding ~1 error per 15K lines of source.

It would be interesting to run this analysis on commercial static analyzers such as Coverity and see how the stats compare. As it stands, cppcheck is doing it's bit in detecting errors and helping engineers to improve code quality.

Friday, 19 June 2015

Powerstat and thermal zones

Last night I was mulling over an overheating laptop issue that was reported by a user that turned out to be fluff and dust clogging up the fan rather than the intel_pstate driver being broken.

While it is a relief that the kernel driver is not at fault, it still bothered me that this kind of issue should be very simple to diagnose but I overlooked the obvious. When solving these issues it is very easy to doubt that the complex part of a system is working correctly (e.g. a kernel driver) rather than the simpler part (e.g. the fan not working efficiently). Normally, I try to apply Occam's Razor which in the simplest form can be phrased as:

"when you have two competing theories that make exactly the same predictions, the simpler one is the better."

..e.g. in this case, the fan is clogged up.

Fortunately, laptops invariably provide Thermal Zone information that can be monitored and hence one can correlate CPU activity with the temperature of various components of a laptop. So last night I added Thermal Zone sampling to powerstat 0.02.00 which is enabled with the new -t option.

 
powerstat -tfR 0.5
Running for 60.0 seconds (120 samples at 0.5 second intervals).
Power measurements will start in 0 seconds time.

  Time    User  Nice   Sys  Idle    IO  Run Ctxt/s  IRQ/s  Watts x86_pk acpitz  CPU Freq
11:13:15   5.1   0.0   2.1  92.8   0.0    1   7902   1152   7.97  62.00  63.00  1.93 GHz  
11:13:16   3.9   0.0   2.5  93.1   0.5    1   7168    960   7.64  63.00  63.00  2.73 GHz  
11:13:16   1.0   0.0   2.0  96.9   0.0    1   7014    950   7.20  63.00  63.00  2.61 GHz  
11:13:17   2.0   0.0   3.0  94.5   0.5    1   6950    960   6.76  64.00  63.00  2.60 GHz  
11:13:17   3.0   0.0   3.0  93.9   0.0    1   6738    994   6.21  63.00  63.00  1.68 GHz  
11:13:18   3.5   0.0   2.5  93.6   0.5    1   6976    948   7.08  64.00  63.00  2.29 GHz  
....

..the -t option now shows x86_pk (x86 CPU package temperature) and acpitz (APCI thermal zone) temperature readings in degrees Celsius.

Now this is where the fun begins. I ran powerstat for 60 seconds at 2 samples per second and then imported the data into LibreOffice. To easily show corrleations between CPU load, power consumption, temperature and CPU frequency I normalized the data so that the lowest values were 0.0 and the highest were 1.0 and produced the following graph:

One can see that the CPU frequency (green) scales with the the CPU load (blue) and so does the CPU power (orange). CPU temperature (yellow) jumps up quickly when the CPU is loaded and then steadily increases. Meanwhile, the ACPI thermal zone (purple) trails the CPU load because it takes time for the machine to warm up and then cool down (it takes time for a fan to pump out the heat from the machine).

So, next time a laptop runs hot, running powerstat will capture the activity and correlating temperature with CPU activity should allow one to see if the overheating is related to a real CPU frequency scaling issue or a clogged up fan (or broken heat pipe!).

Thursday, 18 June 2015

Snooping on I/O using iosnoop

A while ago I blogged about Brendan Gregg's excellent book for tracking down performance issues titled "Systems Performance, Enterprise and the Cloud". Brendan has also produced a useful I/O diagnostic bash script iosnoop that uses ftrace to gather block device I/O events in real time.

The following example snoops on I/O for 1 second:

$ sudo iosnoop 1
Tracing block I/O for 1 seconds (buffered)...
COMM             PID    TYPE DEV      BLOCK        BYTES     LATms
kworker/u16:2    650    W    8,0      441077032    28672      1.46
kworker/u16:2    650    W    8,0      441077024    4096       1.45
kworker/u16:2    650    W    8,0      364810624    462848     1.35
kworker/u16:2    650    W    8,0      364810240    69632      1.34

And the next example snoops and shows start and end time stamps:

$ sudo iosnoop -ts  
Tracing block I/O. Ctrl-C to end.  
STARTs        ENDs          COMM           PID  TYPE DEV   BLOCK    BYTES   LATms  
35253.062020  35253.063148  jbd2/sda1-211  211  WS   8,0   29737200   53248   1.13  
35253.063210  35253.063261  jbd2/sda1-211  211  FWS  8,0   18446744073709551615 0     0.05  
35253.063282  35253.063616  <idle>         0    WS   8,0   29737304   4096    0.33  
35253.063650  35253.063688  gawk           551  FWS  8,0   18446744073709551615 0     0.04  
35253.766711  35253.767158  kworker/u16:0  305  W    8,0   433580264  4096    0.45  
35253.766778  35253.767258  kworker/0:1H   321  FWS  8,0   18446744073709551615 0     0.48  
35253.767289  35253.767635  <idle>         0    WS   8,0   273358464  4096    0.35  
35253.767309  35253.767654  <idle>         0    W    8,0   118371312  4096    0.35  
35253.767648  35253.767741  <idle>         0    FWS  8,0   18446744073709551615 0     0.09  
^C  
Ending tracing...

One needs to run the tool as root as it uses ftrace. There are a selection of filtering options, such as showing I/O from a specific device, I/O issues of a specific I/O type, selecting I/O on a specific PID or a specific name. iosnoop also can display the I/O completion times, start times and Queue insertion I/O start time. On Ubuntu, iosnoop can be installed using:

sudo apt-get install perf-tools-unstable

A useful I/O analysis tool indeed. For more details, install the tool and read the iosnoop man page.

Thursday, 11 June 2015

Adding CPU states and CPU frequency stats to powerstat

During some spare moments I've added a couple of minor CPU related enhancements to powerstat. The new -c option gathers CPU C-state activity over the run and shows a summary at the end, for example:

 C-State  Resident   Count Latency
 C7-IVB   75.239%   102315      87  
 C6-IVB    0.004%       60      80  
 C3-IVB    0.138%     2892      59  
 C1E-IVB   1.150%     7599      10  
 C1-IVB    0.948%     4611       1  
 POLL      0.000%        3       0  
 C0       22.521%

The above example shows that my Ivybridge i5-3210M spent ~75% of the time in the deepest C7 sleep state and ~22.5% of the time in the fully operating C0 state.

A new -f option gathers CPU frequency statistics across all the on-line CPUs and displays the running average. This provides an "instantaneous" view of the current CPU frequencies rather than a running average between the last sample, so beware that just gathering statistics using powerstat can cause CPU activity which of course can change CPU frequency.

For a simple test, I ran powerstat for a short 250 second run and normalised the CPU Core Power, CPU Load and CPU Frequency stats so that the data ranges are 0..1 so I can plot all three stats and easily compare them:

One can easily see the correlation between CPU Frequency, CPU Load and CPU core power consumed just from the powerstat data.

Powerstat tries to be as lightweight and as small as possible to minimize the impact on system behaviour. My hope is that adding these extra CPU instrumentation features adds more useful functionality without adding a larger system impact. I've instrumented powerstat with perf and I believe that the overhead is sufficiently small to justify these changes.

These two new features will be landing in powerstat 0.01.40 in Ubuntu Wily.