Monday, 15 February 2010

Kernel Oops page fault error codes

The x86 Linux kernel Oops messages provide normally just enough information to help a kernel developer corner and fix critical bugs. The start of a typical Oops message may look like the following:

kernel BUG at kernel/signal.c:1599!
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pc = 84427f6a
*pde = 00000000
Oops: 0001 [#1]

The 4 digit value after the "Oops:" message dumps out the page fault error code in hexadecimal which in turn can help one deduce what caused the oops. The page fault error code is encoded as follows:

bit 0 - 0 = no page found, 1 = protection fault
bit 1 - 0 = read access, 1 = write access
bit 2 - 0 = kernel-mode access, 1 = user mode access
bit 3 - 0 = n/a, 1 = use of reserved bit detected
bit 4 - 0 = n/a, 1 = fault was an instruction fetch

So, in the above example, the Oops error code was 0x0001 which means it was a page protection fault, read access in kernel mode.

A lot of Oops error codes are 0x0000, which means a page was not found by a read access in kernel mode.

For more information, consult arch/x86/mm/fault.c

No comments:

Post a Comment