Sunday, 5 July 2009

Fixing Linux System Pauses

A couple of weeks ago I was looking at a bug where a netbook would just seem to randomly hang and would only come alive when a key was pressed. After poking around a bit I recalled that my colleague Stefan Bader had seen this issue before, and he told me to try booting with the Linux boot parameter acpi_skip_timer_override. Lo and behold, this workaround worked.

So what was going on? Well, it's a BIOS issue. The BIOS seemed to be claiming that IRQ0 was routed to another IRQ on the IO-APIC and in fact, this was not so. Generally speaking, documentaion for most chipsets is not disclosed, so it's impossible to know how the chipset is configured and worse still, how to fix the problem with a quirk. There are a couple of patches in the kernel (e.g. for the patch for the HP NX6325) where this problem is worked around with a quirk, but for other machines, one has to work around this problem with appropriate boot options.

We see this bug manifest itself because modern kernels use a tickless timer and we hit a state where all the CPUs have gone into a deep C state and need a timer interrupt to wake them up. However, if the routing of the timer interrupt is misconfigured then then CPU is not woken up, hence the hang until we generate an external interrupt, for example, by pressing a key.

One can debug this by booting with kernel boot parameter "debug lapic=debug". This will make Linux dump out the interrupt routing on the IO-APIC and it's worth using to understand what's going on under the bonnet.

Boot options that are worth trying to work around this option are:

- this ignores the IRQ zero / pin 2 interrupt override

- disable HPET and use PIT instead

- forces a polled idle loop (so CPU won't go into deep C state), hence uses power and makes the system run hot (not recommended)

- halt is forced to be used for CPU idle, C2/C3 states won't be used again. This may only work on systems without C1E (Enhanced Halt State).

So, if you ever see Linux hanging around and not waking from it's idle state, try one of the above and see if this solves your problem.

No comments:

Post a Comment