power-calibrate is a simple tool that hacked up to perform some synthetic loading of the processor, gather the RAPL and CPU stats and using simple linear regression to compute some power related metrics.

In the example below, I run power-calibrate on an Intel i5-3210M (2 Cores, 4 threads) with each test run taking 10 seconds (-r 10), using the RAPL interface to measure power and gathering 11 samples on CPU threads 1..4:

power-calibrate -r 10 -R -s 11 CPU load User Sys Idle Run Ctxt/s IRQ/s Ops/s Cycl/s Inst/s Watts 0% x 1 0.1 0.1 99.8 1.0 181.6 61.1 0.0 2.5K 380.2 2.485 0% x 2 0.0 1.0 98.9 1.2 161.8 63.8 0.0 5.7K 0.8K 2.366 0% x 3 0.1 1.3 98.5 1.1 204.2 75.2 0.0 7.6K 1.9K 2.518 0% x 4 0.1 0.1 99.9 1.0 124.7 44.9 0.0 11.4K 2.7K 2.167 10% x 1 2.4 0.2 97.4 1.5 203.8 104.9 21.3M 123.1M 297.8M 2.636 10% x 2 5.1 0.0 94.9 1.3 185.0 137.1 42.0M 243.0M 0.6B 2.754 10% x 3 7.5 0.2 92.3 1.2 275.3 190.3 58.1M 386.9M 0.8B 3.058 10% x 4 10.0 0.1 89.9 1.9 213.5 206.1 64.5M 486.1M 0.9B 2.826 20% x 1 5.0 0.1 94.9 1.0 288.8 170.0 69.6M 403.0M 1.0B 3.283 20% x 2 10.0 0.1 89.9 1.6 310.2 248.7 96.4M 0.8B 1.3B 3.248 20% x 3 14.6 0.4 85.0 1.7 640.8 450.4 238.9M 1.7B 3.3B 5.234 20% x 4 20.0 0.2 79.8 2.1 633.4 514.6 270.5M 2.1B 3.8B 4.736 30% x 1 7.5 0.2 92.3 1.4 444.3 278.7 149.9M 0.9B 2.1B 4.631 30% x 2 14.8 1.2 84.0 1.2 541.5 418.1 200.4M 1.7B 2.8B 4.617 30% x 3 22.6 1.5 75.9 2.2 960.9 694.3 365.8M 2.6B 5.1B 7.080 30% x 4 30.0 0.2 69.8 2.4 959.2 774.8 421.1M 3.4B 5.9B 5.940 40% x 1 9.7 0.3 90.0 1.7 551.6 356.8 201.6M 1.2B 2.8B 5.498 40% x 2 19.9 0.3 79.8 1.4 668.0 539.4 288.0M 2.4B 4.0B 5.604 40% x 3 29.8 0.5 69.7 1.8 1124.5 851.8 481.4M 3.5B 6.7B 7.918 40% x 4 40.3 0.5 59.2 2.3 1186.4 1006.7 0.6B 4.6B 7.7B 6.982 50% x 1 12.1 0.4 87.4 1.7 536.4 378.6 193.1M 1.1B 2.7B 4.793 50% x 2 24.4 0.4 75.2 2.2 816.2 668.2 362.6M 3.0B 5.1B 6.493 50% x 3 35.8 0.5 63.7 3.1 1300.2 1004.6 0.6B 4.2B 8.2B 8.800 50% x 4 49.4 0.7 49.9 3.8 1455.2 1240.0 0.7B 5.7B 9.6B 8.130 60% x 1 14.5 0.4 85.1 1.8 735.0 502.7 295.7M 1.7B 4.1B 6.927 60% x 2 29.4 1.3 69.4 2.0 917.5 759.4 397.2M 3.3B 5.6B 6.791 60% x 3 44.1 1.7 54.2 3.1 1615.4 1243.6 0.7B 5.1B 9.9B 10.056 60% x 4 58.5 0.7 40.8 4.0 1728.1 1456.6 0.8B 6.8B 11.5B 9.226 70% x 1 16.8 0.3 82.9 1.9 841.8 579.5 349.3M 2.0B 4.9B 7.856 70% x 2 34.1 0.8 65.0 2.8 966.0 845.2 439.4M 3.7B 6.2B 6.800 70% x 3 49.7 0.5 49.8 3.5 1834.5 1401.2 0.8B 5.9B 11.8B 11.113 70% x 4 68.1 0.6 31.4 4.7 1771.3 1572.3 0.8B 7.0B 11.8B 8.809 80% x 1 18.9 0.4 80.7 1.9 871.9 613.0 357.1M 2.1B 5.0B 7.276 80% x 2 38.6 0.3 61.0 2.8 1268.6 1029.0 0.6B 4.8B 8.2B 9.253 80% x 3 58.8 0.3 40.8 3.5 2061.7 1623.3 1.0B 6.8B 13.6B 11.967 80% x 4 78.6 0.5 20.9 4.0 2356.3 1983.7 1.1B 9.0B 16.0B 12.047 90% x 1 21.8 0.3 78.0 2.0 1054.5 737.9 459.3M 2.6B 6.4B 9.613 90% x 2 44.2 1.2 54.7 2.7 1439.5 1174.7 0.7B 5.4B 9.2B 10.001 90% x 3 66.2 1.4 32.4 3.9 2326.2 1822.3 1.1B 7.6B 15.0B 12.579 90% x 4 88.5 0.2 11.4 4.8 2627.8 2219.1 1.3B 10.2B 17.8B 12.832 100% x 1 25.1 0.0 74.8 2.0 135.8 314.0 0.5B 3.1B 7.5B 10.278 100% x 2 50.0 0.0 50.0 3.0 91.9 560.4 0.7B 6.2B 10.4B 10.470 100% x 3 75.1 0.1 24.8 4.0 120.2 824.1 1.2B 8.7B 16.8B 13.028 100% x 4 100.0 0.0 0.0 5.0 76.8 1054.8 1.4B 11.6B 19.5B 13.156 For 4 CPUs (of a 4 CPU system): Power (Watts) = (% CPU load * 1.176217e-01) + 3.461561 1% CPU load is about 117.62 mW Coefficient of determination R^2 = 0.809961 (good) Energy (Watt-seconds) = (bogo op * 8.465141e-09) + 3.201355 1 bogo op is about 8.47 nWs Coefficient of determination R^2 = 0.911274 (strong) Energy (Watt-seconds) = (CPU cycle * 1.026249e-09) + 3.542463 1 CPU cycle is about 1.03 nWs Coefficient of determination R^2 = 0.841894 (good) Energy (Watt-seconds) = (CPU instruction * 6.044204e-10) + 3.201433 1 CPU instruction is about 0.60 nWs Coefficient of determination R^2 = 0.911272 (strong)

The results at the end are estimates based on the gathered samples. The samples are compared to the computed linear regression coefficients using the coefficient of determination (R^2); a value of 1 is a perfect linear fit, less than 1 a poorer fit.

For more accurate results, increase the run time (-r option) and also increase the number of samples (-s option).

Power-calibrate is available in Ubuntu Wily 15.10. It is just an academic toy for getting some power estimates and may be useful to compare compute vs power metrics across different x86 CPUs. I've not been able to verify how accurate it really is, so I am interested to see how this works across a range of systems.

Hi Colin,

ReplyDeleteMany thanks for the tool, looks very nice!

I'm studying your code in the hope of learning a bit more about how CPU utilization is tracked by the Linux kernel and how it is calculated by your tool. Here are a couple of questions, I'm for sure missing something about the concepts behind the calculations.

It seems that CPU utilization (user, sys, etc.) is computed as: current - past value of CPU time, where the value is got from the /proc/stat fields. I can see from the man that /proc/stat reports

"the amount of time, measured in units of USER_HZ (1/100ths of a second on most architectures, use sysconf(_SC_CLK_TCK) to obtain the right value), that the system spent in various states".

Questions:

1) In your code, I'm not able to find where you normalize the CPU times from /proc/stat with the sysconf(_SC_CLK_TCK) call. It looks like you simply subtract current and past value, then you print it.

What am I missing here?

2) This is a more general question about the correctness of this approach of calculating CPU utilization, which is AFAIK the standard approach also adopted by sar, vmstat, etc.

What about variable frequency CPUs? Is this CPU utilization metric still correct?

I can't see how it could be right if we don't take into account the actual frequency of the CPU: the amount of work would be very different at say 1.3 GHZ wrt 2 GHZ. Is the kernel doing special things to keep track of accurate CPU time in this case?

That's it! Thanks again for your time and your interesting tools!

Good questions. The clock tick rate is inferred from the total number of ticks counted during the sample interval from the CPU user, nice, sys, idle and iowait ticks. Since this theoretically *should* add up to the total ticks during the sample period we don't have to rely on determining the ticks from sysconf() and the total ticks allow us to compute the % share over that (possibly variable) time period too. If I was to divide by the sysconf() tick rate I need to convert that into the exact number of ticks over the sample duration, which is more work and can have jitter of +/- 1 or so ticks of we quantize the sample at the wrong time.

Delete