Sunday, 31 May 2015

Adding perf stats to stress-ng

Over the last few weeks I have been toying with the idea of adding more performance monitoring to stress-ng so one can see how much a stress test impacts on the CPU. The obvious choice to get such low level data is via Linux perf events using perf_event_open(2).

The man page for perf_event_open() provides plenty of information to get perf working from userspace, however, I was a bit stumped when I used several hardware perf events and then ran out of hardware Perf Monitoring Units (PMUs) resulting in some strange event counter readings. I discovered that when one runs out of PMUs, perf will multiplex event counting and so the perf counters need to be scaled by multiplying by PERF_FORMAT_TOTAL_TIME_ENABLED and divided by PERF_FORMAT_TOTAL_TIME_RUNNING.

Once I had figured this out, it was relatively plain sailing to get perf working in stress-ng.  So stress-ng V0.04.04 now supports the --perf option that just enables perf monitoring on each stress test being run, it is as simple as that. For multiple instances of a stress test, stress-ng will sum all the perf counters of each processes running the stress-test to provide an overall total.

The following example will run the stress-ng cache stress test.  The first run enables cache flushing and so fetches of data will cause cache misses.  The second run has no cache flushing and hence has far lower cache miss rate.


Note how the cache-flushing not only causes a far higher cache miss rate, but also reduces the effective number of instructions per cycle being executed and hence reduces the throughput (as one would expect).  With cache-flushing enabled I was seeing only 17.53 bogo ops per second compared to the 35.97 bogo ops per second with cache-flushing disabled.

The perf stats are enlightening. I still find it incredible that my laptop has so much computing power.  Some of the more compute bound stressors (such as the stress-ng bitops cpu stressor) are hitting over 20 billion instructions per second on my machine, which is rather impressive.  It seems that gcc optimization and the x86 superscaler micro-ops are working efficiently with some of these stress tests.

My hope is that the integrated perf monitoring in stress-ng will be instructive when comparing results on different processor architectures across the range of stress-ng stress tests.

No comments:

Post a Comment