Saturday, 17 October 2015

combining RAPL and perf to do power calibration

A useful feature on modern x86 CPUs is the Running Average Power Limit (RAPL) that allows one to monitor System on Chip (SoC) power consumption.  Combine this data with the ability to accurately measure CPU cycles and instructions via perf and we can get some way to get a rough estimate energy consumed to perform a single operation on the CPU.

power-calibrate is a simple tool that  hacked up to perform some synthetic loading of the processor, gather the RAPL and CPU stats and using simple linear regression to compute some power related metrics.

In the example below, I run power-calibrate on an Intel  i5-3210M (2 Cores, 4 threads) with each test run taking 10 seconds (-r 10),  using the RAPL interface to measure power and gathering 11 samples on CPU threads 1..4:

power-calibrate -r 10 -R  -s 11
  CPU load  User   Sys  Idle  Run  Ctxt/s  IRQ/s  Ops/s Cycl/s Inst/s  Watts
    0% x 1   0.1   0.1  99.8  1.0   181.6   61.1   0.0    2.5K 380.2   2.485
    0% x 2   0.0   1.0  98.9  1.2   161.8   63.8   0.0    5.7K   0.8K  2.366
    0% x 3   0.1   1.3  98.5  1.1   204.2   75.2   0.0    7.6K   1.9K  2.518
    0% x 4   0.1   0.1  99.9  1.0   124.7   44.9   0.0   11.4K   2.7K  2.167
   10% x 1   2.4   0.2  97.4  1.5   203.8  104.9  21.3M 123.1M 297.8M  2.636
   10% x 2   5.1   0.0  94.9  1.3   185.0  137.1  42.0M 243.0M   0.6B  2.754
   10% x 3   7.5   0.2  92.3  1.2   275.3  190.3  58.1M 386.9M   0.8B  3.058
   10% x 4  10.0   0.1  89.9  1.9   213.5  206.1  64.5M 486.1M   0.9B  2.826
   20% x 1   5.0   0.1  94.9  1.0   288.8  170.0  69.6M 403.0M   1.0B  3.283
   20% x 2  10.0   0.1  89.9  1.6   310.2  248.7  96.4M   0.8B   1.3B  3.248
   20% x 3  14.6   0.4  85.0  1.7   640.8  450.4 238.9M   1.7B   3.3B  5.234
   20% x 4  20.0   0.2  79.8  2.1   633.4  514.6 270.5M   2.1B   3.8B  4.736
   30% x 1   7.5   0.2  92.3  1.4   444.3  278.7 149.9M   0.9B   2.1B  4.631
   30% x 2  14.8   1.2  84.0  1.2   541.5  418.1 200.4M   1.7B   2.8B  4.617
   30% x 3  22.6   1.5  75.9  2.2   960.9  694.3 365.8M   2.6B   5.1B  7.080
   30% x 4  30.0   0.2  69.8  2.4   959.2  774.8 421.1M   3.4B   5.9B  5.940
   40% x 1   9.7   0.3  90.0  1.7   551.6  356.8 201.6M   1.2B   2.8B  5.498
   40% x 2  19.9   0.3  79.8  1.4   668.0  539.4 288.0M   2.4B   4.0B  5.604
   40% x 3  29.8   0.5  69.7  1.8  1124.5  851.8 481.4M   3.5B   6.7B  7.918
   40% x 4  40.3   0.5  59.2  2.3  1186.4 1006.7   0.6B   4.6B   7.7B  6.982
   50% x 1  12.1   0.4  87.4  1.7   536.4  378.6 193.1M   1.1B   2.7B  4.793
   50% x 2  24.4   0.4  75.2  2.2   816.2  668.2 362.6M   3.0B   5.1B  6.493
   50% x 3  35.8   0.5  63.7  3.1  1300.2 1004.6   0.6B   4.2B   8.2B  8.800
   50% x 4  49.4   0.7  49.9  3.8  1455.2 1240.0   0.7B   5.7B   9.6B  8.130
   60% x 1  14.5   0.4  85.1  1.8   735.0  502.7 295.7M   1.7B   4.1B  6.927
   60% x 2  29.4   1.3  69.4  2.0   917.5  759.4 397.2M   3.3B   5.6B  6.791
   60% x 3  44.1   1.7  54.2  3.1  1615.4 1243.6   0.7B   5.1B   9.9B 10.056
   60% x 4  58.5   0.7  40.8  4.0  1728.1 1456.6   0.8B   6.8B  11.5B  9.226
   70% x 1  16.8   0.3  82.9  1.9   841.8  579.5 349.3M   2.0B   4.9B  7.856
   70% x 2  34.1   0.8  65.0  2.8   966.0  845.2 439.4M   3.7B   6.2B  6.800
   70% x 3  49.7   0.5  49.8  3.5  1834.5 1401.2   0.8B   5.9B  11.8B 11.113
   70% x 4  68.1   0.6  31.4  4.7  1771.3 1572.3   0.8B   7.0B  11.8B  8.809
   80% x 1  18.9   0.4  80.7  1.9   871.9  613.0 357.1M   2.1B   5.0B  7.276
   80% x 2  38.6   0.3  61.0  2.8  1268.6 1029.0   0.6B   4.8B   8.2B  9.253
   80% x 3  58.8   0.3  40.8  3.5  2061.7 1623.3   1.0B   6.8B  13.6B 11.967
   80% x 4  78.6   0.5  20.9  4.0  2356.3 1983.7   1.1B   9.0B  16.0B 12.047
   90% x 1  21.8   0.3  78.0  2.0  1054.5  737.9 459.3M   2.6B   6.4B  9.613
   90% x 2  44.2   1.2  54.7  2.7  1439.5 1174.7   0.7B   5.4B   9.2B 10.001
   90% x 3  66.2   1.4  32.4  3.9  2326.2 1822.3   1.1B   7.6B  15.0B 12.579
   90% x 4  88.5   0.2  11.4  4.8  2627.8 2219.1   1.3B  10.2B  17.8B 12.832
  100% x 1  25.1   0.0  74.8  2.0   135.8  314.0   0.5B   3.1B   7.5B 10.278
  100% x 2  50.0   0.0  50.0  3.0    91.9  560.4   0.7B   6.2B  10.4B 10.470
  100% x 3  75.1   0.1  24.8  4.0   120.2  824.1   1.2B   8.7B  16.8B 13.028
  100% x 4 100.0   0.0   0.0  5.0    76.8 1054.8   1.4B  11.6B  19.5B 13.156

For 4 CPUs (of a 4 CPU system):
  Power (Watts) = (% CPU load * 1.176217e-01) + 3.461561
  1% CPU load is about 117.62 mW
  Coefficient of determination R^2 = 0.809961 (good)

  Energy (Watt-seconds) = (bogo op * 8.465141e-09) + 3.201355
  1 bogo op is about 8.47 nWs
  Coefficient of determination R^2 = 0.911274 (strong)

  Energy (Watt-seconds) = (CPU cycle * 1.026249e-09) + 3.542463
  1 CPU cycle is about 1.03 nWs
  Coefficient of determination R^2 = 0.841894 (good)

  Energy (Watt-seconds) = (CPU instruction * 6.044204e-10) + 3.201433
  1 CPU instruction is about 0.60 nWs
  Coefficient of determination R^2 = 0.911272 (strong)

The results at the end are estimates based on the gathered samples. The samples are compared to the computed linear regression coefficients using the coefficient of determination (R^2);  a value of 1 is a perfect linear fit, less than 1 a poorer fit.

For more accurate results, increase the run time (-r option) and also increase the number of samples (-s option).

Power-calibrate is available in Ubuntu Wily 15.10.  It is just an academic toy for getting some power estimates and may be useful to compare compute vs power metrics across different x86 CPUs.  I've not been able to verify how accurate it really is, so I am interested to see how this works across a range of systems.