Saturday, 24 October 2009

Block I/O Layer Tracing using blktrace

blktrace is a really useful tool to see what I/O operations are going on inside the Linux block I/O layer. So what does blktrace provide:
  • plenty of block layer information on I/O operations
  • very low level (2%) overhead when tracing
  • highly configurable - trace I/O on one or several devices, selectable filter events
  • live and playback tracing
There are many different types of events that can be captured, for example, I/O merges, request re-queues, requests to underlying block device, I/O split/bounces, request completions and more beside.

One the user-space side, two tools are used: blktrace which is the event extraction utility and blkparse which takes the event data and turns it into human readable output.

Typically, one uses the tools as follows:

sudo blktrace -d /dev/sda -o - | blkparse -i -

And this will dump out data as follows:

8,0 0 1 0.000000000 1245 A W 25453095 + 8 <- (8,1) 25453032
8,0 0 2 0.000002374 1245 Q W 25453095 + 8 [postgres]
8,0 0 3 0.000010616 1245 G W 25453095 + 8 [postgres]
8,0 0 4 0.000018228 1245 P N [postgres]
8,0 0 5 0.000023397 1245 I W 25453095 + 8 [postgres]
8,0 0 6 0.000034222 1245 A W 25453103 + 8 <- (8,1) 25453040
8,0 0 7 0.000035968 1245 Q W 25453103 + 8 [postgres]
8,0 0 8 0.000040368 1245 M W 25453103 + 8 [postgres]


The 1st column shows the device major,minor tuple, e.g. (8,0). The 2nd column shows the CPU number. The 3rd column shows the sequence number. 4th column is the time stamp, which as you can see has a fairly high resolution time stamp. The 5th column is the PID of the process issuing the I/O request (in this example, 1245, the PID of postgres). The 6th column shows the event type, e.g. 'A' means a remapping from device (8,1) /dev/sda1 to device (8,0) /dev/sda, refer to the "ACTION IDENTIFIERS" section in the blkparse man page for more details on this field. The 7th column is R for Read, W for Write, D for block, B for Barrier operation. The next field is the block number and a following + number is the number of blocks requested. The final field between the [ ] brackets is the process name of the process issuing the request.

When one wants to stop tracing, hit control-C and a summary of the I/O operations is provided, e.g.:

CPU0 (8,0):
Reads Queued: 0, 0KiB Writes Queued: 128, 504KiB
Read Dispatches: 0, 0KiB Write Dispatches: 22, 508KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 32, 576KiB
Read Merges: 0, 0KiB Write Merges: 105, 420KiB
Read depth: 12 Write depth: 2
PC Reads Queued: 0, 0KiB PC Writes Queued: 0, 0KiB
PC Read Disp.: 9, 0KiB PC Write Disp.: 0, 0KiB
PC Reads Req.: 0 PC Writes Req.: 0
PC Reads Compl.: 0 PC Writes Compl.: 32
IO unplugs: 18 Timer unplugs: 0
CPU1 (8,0):
Reads Queued: 0, 0KiB Writes Queued: 16, 56KiB
Read Dispatches: 0, 0KiB Write Dispatches: 2, 52KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 0, 0KiB
Read Merges: 0, 0KiB Write Merges: 11, 44KiB
Read depth: 12 Write depth: 2
PC Reads Queued: 0, 0KiB PC Writes Queued: 0, 0KiB
PC Read Disp.: 3, 0KiB PC Write Disp.: 0, 0KiB
PC Reads Req.: 0 PC Writes Req.: 0
PC Reads Compl.: 0 PC Writes Compl.: 0
IO unplugs: 6 Timer unplugs: 0

Total (8,0):
Reads Queued: 0, 0KiB Writes Queued: 144, 560KiB
Read Dispatches: 0, 0KiB Write Dispatches: 24, 560KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 32, 576KiB
Read Merges: 0, 0KiB Write Merges: 116, 464KiB
PC Reads Queued: 0, 0KiB PC Writes Queued: 0, 0KiB
PC Read Disp.: 12, 0KiB PC Write Disp.: 0, 0KiB
PC Reads Req.: 0 PC Writes Req.: 0
PC Reads Compl.: 0 PC Writes Compl.: 32
IO unplugs: 24 Timer unplugs: 0

Throughput (R/W): 0KiB/s / 537KiB/s
Events (8,0): 576 entries
Skips: 0 forward (0 - 0.0%)


As tools go, this one is excellent for in-depth understanding of block I/O operations inside the kernel. I am sure it has many different applications and it's well worth playing with this tool to get familiar with all of the features provided. The ability to filter specific events allows one to focus and drill down on specific types of I/O operations without being buried by tracing output overload.

Jens Axboe the block layer maintainer developed and maintains blktrace. Alan D. Brunelle has contributed a lot of extra functionality - I recommend reading Brunelle's user guide to get started and also blktrace paper that contains a lot more indepth instruction on how to use this tool. The blktrace and blkparse manual pages provide more details on how to use the tools, but I'd recommend eyeballing Brunelle's user guide first.

3 comments:

  1. Thanks , good starting point for blktrace.

    ReplyDelete
  2. Cool. Now where can I find a list of these abbreviations? There is no "action identifiers" section in blktrace's man page: http://linux.die.net/man/8/blktrace

    ReplyDelete
    Replies
    1. The action identifiers are explained in the blkparse (1) man page.

      Delete