Thursday, 18 August 2011

Debugging S3 suspend/resume using SystemTap and minimodem.

Some problems are a little challenging to debug and require sometimes a bit of lateral thinking to solve.   One particular issue is when suspend/resume locks up and one has no idea where or why because the console has is suspended and any debug messages just don't appear.

In the past I've had to use techniques like flashing keyboard LEDs, making the PC speaker beep or even forcible rebooting the machine at known points to be able to get some idea of roughly where a hang has occurred.   This is fine, but it is tedious since we can only emit a few bits of state per iteration.   Saving state is difficult since when a machine locks up one has to reboot it and one looses debug state.   One technique is to squirrel away debug state in the real time clock (RTC) which allows one to store twenty or so bits of state, which is still quite tough going.

One project I've been working on is to use the power of system tap to instrument the entire suspend/resume code paths - every time a function is entered a hash of the name is generated and stored in the RTC.  If the machine hangs, one can then grab this hash out of the RTC can compare this to the known function names in /proc/kallsyms, and hopefully this will give some idea of where we got to before the machine hung.

However, what would be really useful is the ability to print out more debug state during suspend/resume in real time.   Normally I approach this by using a USB/serial cable and capturing console messages via this mechanism.  However, once USB is suspended, this provides no more information.

One solution I'm now using is with Kamal Mostafa's minimodem.  This wonderful tool is an implementation of a software modem and can send and receive data by emulating a Bell-type or RTTY FSK modem.  It allows me to transmit characters at 110 to 300 baud over a standard PC speaker and reliably receive them on a host machine.  If the wind is in the right direction, one can transmit at higher speeds with an audio cable plugged in the headphone jack of the transmitter and into the microphone socket on the receiver if hardware allows.

The 8254 Programmable Interval-timer on a PC can be used to generate a square wave at a predefined frequency and can be connected to the PC speaker to emit a beep.  Sending data using the speaker to minimodem is a case of sending a 500ms leader tone, then emitting characters.  Each character has a 1 baud space tone, followed by 8 bits (least significant bit first) with a zero being a 1 baud space tone and a 1 being represented by a 1 baud mark tone, and the a trailing bunch of stop bits.

So using a prototype driver written by Kamal, I tweaked the code and put it into my suspend/resume SystemTap script and now I can dump out messages over the PC speaker and decode them using minimodem.  300 baud may not be speedy, but I am able to now instrument and trace through the entire suspend/resume path.

The SystemTap scripts are "work-in-progress" (i.e. if it breaks you keep the pieces), but can be found in my pmdebug git repo git://kernel.ubuntu.com/cking/pmdebug.git.  The README file gives a quick run down of how to use this script and I have written up a full set of instructions.

The caveat to this is that one requires a PC where one can beep the PC speaker using the PIT.  Lots of modern machines seem to either have this disabled, or the volume somehow under the control of the Intel HDA audio driver.  Anyhow, kudos to Kamal for providing minimodem and giving me the prototype kernel driver to allow me to plug this into a SystemTap scrip.

3 comments:

  1. Interesting! However, a lot of BIOS implementations still don't clear the RAM on boot so I generally find it easier to store the messages in RAM and recover them after watchdog reset.

    http://iki.fi/lindi/git/extlog.git/

    has an implementation of this that I've used to debug random lockups on a machine with multiple custom ISA cards. RAM is so fast that I can store every bus access easily in a ringbuffer.

    ReplyDelete
  2. @Timo

    Thanks for the reference to your git repo. I've tried it out on a bunch of newer laptops and netbooks but I cannot extract any debug because it looks like the memory is being cleared. Anyhow, I will try your code on any machine I get to see if it gives me an easier way to debug these issues.

    Colin

    ReplyDelete
  3. After a lot of tinkering with various systems I can't seem to find a single laptop or netbook that doesn't power-cycle DRAM on reset. I suspect the ICH* PM configuration registers may control the reset so that DRAM is power-cycled. For example, on the ICH7 the GEN_PMCON_3 register seems to control this. So I wonder if this can be over-ridden and hence ensure we don't get DRAM power-cycling on a reset.

    ReplyDelete