Stress-ng provides a "bogo-ops" mechanism to measure a "unit of operation", normally this is just a count of the number of operations performed in a unit of time, hence allowing us to compare the relative performance of each stress method when compiled with different versions of GCC. Running each stress method for a relatively long time (a few minutes) on an idle machine allows us to get a fairly stable and accurate measurement of bogo-ops per second. Tests were run on a Lenovo x230 with an i5-3210M CPU.

The first chart below shows the relative improvement in bogo-ops per second between the two versions of GCC. A value of

*n*indicates GCC 5.1.1 is

*n*times faster in terms of bogo-ops per second than GCC 4.9.1, hence values less than 1.0 show that GCC 5.1.1 has regressed in performance.

It appears that int64, int32, int16, int8 and rand show some remarkable improvements with GCC 5.1.1; these all perform various integer operations (add, subtract, multiply, divide, xor, and, or, shift).

In contrast, hamming, hanoi, parity and sieve show degraded performance with GCC 5.1.1. Hanoi just exercises recursion of a function with a few arguments and some memory load/stores. Hamming, parity and sieve exercise bit twiddling operations and memory load/stores.

Further to just measuring computation, I used the Intel RAPL CPU package power measurements (using powerstat) to next measure the power consumed and then compute bogo ops per Watt for stress-ng built with GCC 4.9.1 and 5.1.1. I then compared the relative improvement of 5.1.1 compared to 4.9.1:

The chart above shows the same kind of characteristics as the first chart, but in terms of computational improvement per Watt. Note that there are even better improvements in relative terms for the integer and rand CPU stress methods. For example, the rand stress method shows a 1.6 x improvement in terms of computation per second and a 2.1 x improvement in terms of computation per Watt comparing GCC 4.9.1 with 5.1.1.

It seems that benchmarking performance in terms of just compute improvements really should take into consideration the power consumption too to get a better idea of how compiler optimization improvements. Compute-per-watt rather than compute-per-second should perhaps be the preferred benchmark in the modern high-density compute farms.

Of course, these comparisons are just with one specific x86 micro-architecture, so one would expect different results for different x86 CPUs.. I guess that is for another weekend to test if I get time.

## No comments:

## Post a Comment