power-calibrate is a simple tool that hacked up to perform some synthetic loading of the processor, gather the RAPL and CPU stats and using simple linear regression to compute some power related metrics.
In the example below, I run power-calibrate on an Intel i5-3210M (2 Cores, 4 threads) with each test run taking 10 seconds (-r 10), using the RAPL interface to measure power and gathering 11 samples on CPU threads 1..4:
power-calibrate -r 10 -R -s 11
CPU load User Sys Idle Run Ctxt/s IRQ/s Ops/s Cycl/s Inst/s Watts
0% x 1 0.1 0.1 99.8 1.0 181.6 61.1 0.0 2.5K 380.2 2.485
0% x 2 0.0 1.0 98.9 1.2 161.8 63.8 0.0 5.7K 0.8K 2.366
0% x 3 0.1 1.3 98.5 1.1 204.2 75.2 0.0 7.6K 1.9K 2.518
0% x 4 0.1 0.1 99.9 1.0 124.7 44.9 0.0 11.4K 2.7K 2.167
10% x 1 2.4 0.2 97.4 1.5 203.8 104.9 21.3M 123.1M 297.8M 2.636
10% x 2 5.1 0.0 94.9 1.3 185.0 137.1 42.0M 243.0M 0.6B 2.754
10% x 3 7.5 0.2 92.3 1.2 275.3 190.3 58.1M 386.9M 0.8B 3.058
10% x 4 10.0 0.1 89.9 1.9 213.5 206.1 64.5M 486.1M 0.9B 2.826
20% x 1 5.0 0.1 94.9 1.0 288.8 170.0 69.6M 403.0M 1.0B 3.283
20% x 2 10.0 0.1 89.9 1.6 310.2 248.7 96.4M 0.8B 1.3B 3.248
20% x 3 14.6 0.4 85.0 1.7 640.8 450.4 238.9M 1.7B 3.3B 5.234
20% x 4 20.0 0.2 79.8 2.1 633.4 514.6 270.5M 2.1B 3.8B 4.736
30% x 1 7.5 0.2 92.3 1.4 444.3 278.7 149.9M 0.9B 2.1B 4.631
30% x 2 14.8 1.2 84.0 1.2 541.5 418.1 200.4M 1.7B 2.8B 4.617
30% x 3 22.6 1.5 75.9 2.2 960.9 694.3 365.8M 2.6B 5.1B 7.080
30% x 4 30.0 0.2 69.8 2.4 959.2 774.8 421.1M 3.4B 5.9B 5.940
40% x 1 9.7 0.3 90.0 1.7 551.6 356.8 201.6M 1.2B 2.8B 5.498
40% x 2 19.9 0.3 79.8 1.4 668.0 539.4 288.0M 2.4B 4.0B 5.604
40% x 3 29.8 0.5 69.7 1.8 1124.5 851.8 481.4M 3.5B 6.7B 7.918
40% x 4 40.3 0.5 59.2 2.3 1186.4 1006.7 0.6B 4.6B 7.7B 6.982
50% x 1 12.1 0.4 87.4 1.7 536.4 378.6 193.1M 1.1B 2.7B 4.793
50% x 2 24.4 0.4 75.2 2.2 816.2 668.2 362.6M 3.0B 5.1B 6.493
50% x 3 35.8 0.5 63.7 3.1 1300.2 1004.6 0.6B 4.2B 8.2B 8.800
50% x 4 49.4 0.7 49.9 3.8 1455.2 1240.0 0.7B 5.7B 9.6B 8.130
60% x 1 14.5 0.4 85.1 1.8 735.0 502.7 295.7M 1.7B 4.1B 6.927
60% x 2 29.4 1.3 69.4 2.0 917.5 759.4 397.2M 3.3B 5.6B 6.791
60% x 3 44.1 1.7 54.2 3.1 1615.4 1243.6 0.7B 5.1B 9.9B 10.056
60% x 4 58.5 0.7 40.8 4.0 1728.1 1456.6 0.8B 6.8B 11.5B 9.226
70% x 1 16.8 0.3 82.9 1.9 841.8 579.5 349.3M 2.0B 4.9B 7.856
70% x 2 34.1 0.8 65.0 2.8 966.0 845.2 439.4M 3.7B 6.2B 6.800
70% x 3 49.7 0.5 49.8 3.5 1834.5 1401.2 0.8B 5.9B 11.8B 11.113
70% x 4 68.1 0.6 31.4 4.7 1771.3 1572.3 0.8B 7.0B 11.8B 8.809
80% x 1 18.9 0.4 80.7 1.9 871.9 613.0 357.1M 2.1B 5.0B 7.276
80% x 2 38.6 0.3 61.0 2.8 1268.6 1029.0 0.6B 4.8B 8.2B 9.253
80% x 3 58.8 0.3 40.8 3.5 2061.7 1623.3 1.0B 6.8B 13.6B 11.967
80% x 4 78.6 0.5 20.9 4.0 2356.3 1983.7 1.1B 9.0B 16.0B 12.047
90% x 1 21.8 0.3 78.0 2.0 1054.5 737.9 459.3M 2.6B 6.4B 9.613
90% x 2 44.2 1.2 54.7 2.7 1439.5 1174.7 0.7B 5.4B 9.2B 10.001
90% x 3 66.2 1.4 32.4 3.9 2326.2 1822.3 1.1B 7.6B 15.0B 12.579
90% x 4 88.5 0.2 11.4 4.8 2627.8 2219.1 1.3B 10.2B 17.8B 12.832
100% x 1 25.1 0.0 74.8 2.0 135.8 314.0 0.5B 3.1B 7.5B 10.278
100% x 2 50.0 0.0 50.0 3.0 91.9 560.4 0.7B 6.2B 10.4B 10.470
100% x 3 75.1 0.1 24.8 4.0 120.2 824.1 1.2B 8.7B 16.8B 13.028
100% x 4 100.0 0.0 0.0 5.0 76.8 1054.8 1.4B 11.6B 19.5B 13.156
For 4 CPUs (of a 4 CPU system):
Power (Watts) = (% CPU load * 1.176217e-01) + 3.461561
1% CPU load is about 117.62 mW
Coefficient of determination R^2 = 0.809961 (good)
Energy (Watt-seconds) = (bogo op * 8.465141e-09) + 3.201355
1 bogo op is about 8.47 nWs
Coefficient of determination R^2 = 0.911274 (strong)
Energy (Watt-seconds) = (CPU cycle * 1.026249e-09) + 3.542463
1 CPU cycle is about 1.03 nWs
Coefficient of determination R^2 = 0.841894 (good)
Energy (Watt-seconds) = (CPU instruction * 6.044204e-10) + 3.201433
1 CPU instruction is about 0.60 nWs
Coefficient of determination R^2 = 0.911272 (strong)
The results at the end are estimates based on the gathered samples. The samples are compared to the computed linear regression coefficients using the coefficient of determination (R^2); a value of 1 is a perfect linear fit, less than 1 a poorer fit.
For more accurate results, increase the run time (-r option) and also increase the number of samples (-s option).
Power-calibrate is available in Ubuntu Wily 15.10. It is just an academic toy for getting some power estimates and may be useful to compare compute vs power metrics across different x86 CPUs. I've not been able to verify how accurate it really is, so I am interested to see how this works across a range of systems.
Hi Colin,
ReplyDeleteMany thanks for the tool, looks very nice!
I'm studying your code in the hope of learning a bit more about how CPU utilization is tracked by the Linux kernel and how it is calculated by your tool. Here are a couple of questions, I'm for sure missing something about the concepts behind the calculations.
It seems that CPU utilization (user, sys, etc.) is computed as: current - past value of CPU time, where the value is got from the /proc/stat fields. I can see from the man that /proc/stat reports
"the amount of time, measured in units of USER_HZ (1/100ths of a second on most architectures, use sysconf(_SC_CLK_TCK) to obtain the right value), that the system spent in various states".
Questions:
1) In your code, I'm not able to find where you normalize the CPU times from /proc/stat with the sysconf(_SC_CLK_TCK) call. It looks like you simply subtract current and past value, then you print it.
What am I missing here?
2) This is a more general question about the correctness of this approach of calculating CPU utilization, which is AFAIK the standard approach also adopted by sar, vmstat, etc.
What about variable frequency CPUs? Is this CPU utilization metric still correct?
I can't see how it could be right if we don't take into account the actual frequency of the CPU: the amount of work would be very different at say 1.3 GHZ wrt 2 GHZ. Is the kernel doing special things to keep track of accurate CPU time in this case?
That's it! Thanks again for your time and your interesting tools!
Good questions. The clock tick rate is inferred from the total number of ticks counted during the sample interval from the CPU user, nice, sys, idle and iowait ticks. Since this theoretically *should* add up to the total ticks during the sample period we don't have to rely on determining the ticks from sysconf() and the total ticks allow us to compute the % share over that (possibly variable) time period too. If I was to divide by the sysconf() tick rate I need to convert that into the exact number of ticks over the sample duration, which is more work and can have jitter of +/- 1 or so ticks of we quantize the sample at the wrong time.
Delete