Sunday 21 March 2010

Perf tool now in Ubuntu Lucid.

A few months ago I blogged about the perf tool and how useful it is to drill down into kernel and user space applications to analyse performance bottle necks. Well, thanks to Andy Whitcroft, the perf tool is now available in Ubuntu Lucid. Perf needs to be in lock-step with the kernel, which complicated the packaging of this tool.

To install, use:

sudo apt-get install linux-tools

and this installs the perf, perf-stat, perf-top, perf-record, perf-report and perf-list tools.

The author of perf, Ingo Molnar has written some basic instructions on driving perf in the tools/perf/Documentation/example.txt file. They should give one a feel of how to drive these tools.

For example, to examine google-chrome:

perf record google-chrome

... run a test and then exit, and then generate a summary of activity:

perf report

One can examine specific events in the system too. To get a list of available events use:

perf list

For example, to measure CPU cycles, number of instructions, context switches and kmallocs on google-chrome one uses:

perf stat -e cpu-cycles -e instructions -e context-switches -e kmem:kmalloc google-chrome

Performance counter stats for 'google-chrome':

15576472832 cycles # 0.000 M/sec
13466330277 instructions # 0.865 IPC
40791 context-switches # 0.000 M/sec
38602 kmem:kmalloc # 0.000 M/sec

18.062301718 seconds time elapsed

One more quick example, this time we record a call-graph (stack chain/backtrace) of the dd command:

perf record -g dd if=/dev/zero of=/dev/null bs=1M count=4096
and get the call graph using:

# Samples: 12584 # # Overhead Command Shared Object Symbol # ........ ............... ............................... ...... # 95.96% dd [kernel] [k] __clear_user | |--99.69%-- read_zero | vfs_read | sys_read | system_call_fastpath | __read --0.31%-- [...]
1.37% dd [kernel] [k] read_zero 0.87% dd [kernel] [k] _cond_resched | |--50.91%-- read_zero | vfs_read | sys_read | system_call_fastpath | __read

I recommend reading Ingo's example.txt and playing around with this tool. It is very powerful and allows one to drill down and examine system performance right down to the instruction level.


  1. Is perf supposed to be a replacement for oprofile?

  2. perf doesn't have to be in lockstep with the kernel. It is just packaged there.

    Its possible to use one of these targets:

    [acme@felicio linux]$ make help | grep perf
    perf-tar-src-pkg - Build perf-3.4.0-rc4.tar source tarball
    perf-targz-src-pkg - Build perf-3.4.0-rc4.tar.gz source tarball
    perf-tarbz2-src-pkg - Build perf-3.4.0-rc4.tar.bz2 source tarball
    perf-tarxz-src-pkg - Build perf-3.4.0-rc4.tar.xz source tarball
    [acme@felicio linux]$

    then get the resulting tarball, build it anywhere, use the resulting perf tools on any kernel version.

    If something doesn't work, report to lkml as a bug.