As simple experiment, I thought it would be interesting to investigate stress-ng compiled with GCC 4.9.1 and GCC 5.1.1 in terms of computational improvement and power consumption on various CPU stress methods. The stress-ng CPU stress test contains various different mixes of integer, floating point, bit operations and logic operations that can be used for processor loading, so it makes a useful test to see how well the code gets optimized with GCC.
Stress-ng provides a "bogo-ops" mechanism to measure a "unit of operation", normally this is just a count of the number of operations performed in a unit of time, hence allowing us to compare the relative performance of each stress method when compiled with different versions of GCC. Running each stress method for a relatively long time (a few minutes) on an idle machine allows us to get a fairly stable and accurate measurement of bogo-ops per second. Tests were run on a Lenovo x230 with an i5-3210M CPU.
The first chart below shows the relative improvement in bogo-ops per second between the two versions of GCC. A value of n indicates GCC 5.1.1 is n times faster in terms of bogo-ops per second than GCC 4.9.1, hence values less than 1.0 show that GCC 5.1.1 has regressed in performance.
In contrast, hamming, hanoi, parity and sieve show degraded performance with GCC 5.1.1. Hanoi just exercises recursion of a function with a few arguments and some memory load/stores. Hamming, parity and sieve exercise bit twiddling operations and memory load/stores.
Further to just measuring computation, I used the Intel RAPL CPU package power measurements (using powerstat) to next measure the power consumed and then compute bogo ops per Watt for stress-ng built with GCC 4.9.1 and 5.1.1. I then compared the relative improvement of 5.1.1 compared to 4.9.1:
It seems that benchmarking performance in terms of just compute improvements really should take into consideration the power consumption too to get a better idea of how compiler optimization improvements. Compute-per-watt rather than compute-per-second should perhaps be the preferred benchmark in the modern high-density compute farms.
Of course, these comparisons are just with one specific x86 micro-architecture, so one would expect different results for different x86 CPUs.. I guess that is for another weekend to test if I get time.