A Smackerel of Opinion: September 2010

Thursday, 16 September 2010

Emitting faked keys from the PC AT keyboard

The PC keyboard controller can be accessed via I/O ports 0x60 and 0x64. The i8042 keyboard controller has the following registers:

* 8 bit input buffer, read-only, containing data read from the keyboard. Accessed by reading from port 0x60.
* 8 bit output buffer, write-only, for data to be written to the keyboard. Accessed by writing to port 0x60.
* 8 bit status register, read-only, accessed by reading port 0x64.
* 8 bit control register, read/write, accessed by using the "read commands byte" and "write command byte" commands.

A write to port 0x64 sends a command to the i8042 and if the command requires a parameter this parameter is must be written to port 0x60. If the command returns a result then it appears by reading port 0x60.

Keyboard controller command 0xD2 (Write Keyboard Buffer) is fairly interesting - this places values on the keyboard controller output port, making the operating system believe a keyboard key scan code has appeared. This allows us to force key scan codes into the controller to fake keys, by doing the following:

1. Wait for the Input Buffer Full bit (bit 1) of the status register (port 0x64) is clear
2. Write command 0xd2 to the control register (port 0x64)
3. Write the scan code into the data register (port 0x60)

To write a letter 'h', one writes the scan code 0x23 ('h' key down), waits a little while and then writes the scan scan code with the top bit set 0xa3 ('h' key up).

Some example code take fakes the "hello world" keys by this method can be found here.

Adam Chapweske has written an excellent in-depth description of how the AT and PS/2 keyboard controllers work without which I would not have found this useful nugget.

Wednesday, 15 September 2010

What I do

Fixing things is in my DNA - it's one of those passions that keeps me motivated and busy. Maybe it is the hunt - tracking down why something does not work that I find interesting, like a detective gathering evidence and then making a deduction that solves the mystery. Or maybe it is the fun in discovering or learning new features about a technology as I try to solve a bug. Or it could be the warm feeling in knowing that a problem is fixed which helps users to have a less buggy machine. Whatever the underlying reason, I get a kick out of fixing obscure problems and making things work better.

Fortunately for me, my work in Canonical enabling Ubuntu to work on PCs fulfils this passion of mine. My daily work normally involves me looking at obscure firmware issues in PC BIOSs and figuring out what is wrong and how to address and fix the problem.

Without fixing these issues, quite a few machines would just not function correctly for various reasons. My work involves fixing issues such as Suspend/Resume and Hibernate/Resume hangs, or looking at why hardware is not quite configured correctly after boot or resume. Unfortunately buggy firmware does happen quite frequently and leads to all sorts of weird issues. Hotkeys, LCD backlights, wake alarms, fan controls, thermal trip points and even CPU configuration can be affected by BIOS or ACPI bugs. All these need fixing, so it keeps me busy!

Fun in the deep end

Firmware is unlike Open Source code found in Ubuntu - firmware is closed, proprietary and hence figuring out why it's broken can be tricky and hence time consuming.

I spend quite a lot time looking at kernel logs, ACPI tables and disassembling ACPI byte code looking at why kernel/BIOS interactions are misbehaving. Some of the work involves looking at the way the BIOS interacts with the embedded controller and figuring out if the ACPI byte code is written correctly or not.

Because Canonical has a good relationship with many desktop hardware vendors, I have access to the BIOS engineers who are actually writing this code. I can directly report my findings and suggested fixes to them, and then they re-work the BIOS. The end result is a PC that is sold with a BIOS that works correctly with Linux, and not just Ubuntu - all distributions benefit from this work.

You may not see my fixes appear as commits in the Linux kernel because much of my work is silent and behind the scenes - fixing firmware that make Linux work better.

Quality Counts

My other passion is quality. While it's fine to fix bugs, it's even better to detect them in the first place! Hence I was motivated to write the Firmware Test Suite - this is a tool that aims to find and diagnose BIOS and ACPI bugs automatically and even suggest possible workarounds and fixes. This tool is now being actively used by BIOS vendors to catch errors early in the development cycle and is part of the on going work to make sure that Linux compatibility really matters.

I've been developing this tool for the Maverick 10.10 release and hope to add more intelligent firmware tests for Natty 11.04 based on further analysis of current bug reports and lessons learnt while enabling Ubuntu on various PC hardware platforms.

I hope this gives a flavour of the kind of work I do for Ubuntu. It's enjoyable, challenging and satisfying to see one's work make Linux better.

Tuesday, 14 September 2010

Sensors reporting hot CPU in Maverick

My clunky old Lenovo has and Intel(R) Core(TM) Duo CPU (model 15) which coretemp in Lucid 10.04 assumed TjMax was 85 degrees C but now in Maverick 10.10 believes TjMax is 100 degrees C.

Kernel commit a321cedb12904114e2ba5041a3673ca24deb09c9 attempts to get TjMax from msr 0x1a2. If it fails to read this msr it defaults TjMax to 100 degrees C for CPU models 14, 15, 22 and 26, and one will see the following warning message:

[ 9.650025] coretemp coretemp.0: TjMax is assumed as 100 C!
[ 9.650322] coretemp coretemp.1: TjMax is assumed as 100 C!

For CPU models 23 and 28 (Atoms) TjMax will be 90 or 100 depending if it's a nettop or a netbook. Otherwise the patch will default TjMax to 100 degrees C.

One can check the value of TjMax using:

cat devices/platform/coretemp*/temp1_crit

Coretemp calculates the core temperature of the CPU by subtracting the thermal status from TjMax. Since the default has been increased from 85 to 100 degrees between Lucid and Maverick, the apparent core temperature now reads 15 degrees higher.

Now, if my machine really was running 15 degrees hotter between Lucid and Maverick I would see more power consumption. I checked the power consumption for Lucid and Maverick kernels on my Lenovo in idle and fully loaded CPU states with a power meter and observed that Maverick uses less power, so that's encouraging.

As for the correct value, why did the default change? Well, from what I can understand from several forums that discuss the setting of TjMax is that this is not well documented and not disclosed by Intel, hence the values are rule-of-thumb guesswork.

So, the bottom line is that if your CPU appears to run hot from the core temp readings between Lucid 10.04 and Maverick 10.10 first check to see if TjMax has changed on your hardware.

Monday, 13 September 2010

Hacking a custom DSDT into a QEMU BIOS

Today I was trying to reproduce a bunch of weird ACPI errors when executing AML code from a DSDT extracted from a remote machine. For some reason the AML in the 2.6.32 kernel was causing a bunch of warnings, but not in 2.6.35 and I wanted to see when the fix landed. This was a dirty hack so that I could quickly do kernel bisects running the kernel inside QEMU.

Firstly, I got the DSDT from the remote machine and converted into it form that could be included into the seabios BIOS images. I used the following C source:


#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
int data;
int i = 0;

printf("unsigned char AmlCode[] = {");
for (i=0; (data = getchar()) != EOF; i++)
        printf("%s%s0x%2.2x", i>0 ? "," : "",
                (i & 7) ? "" : "\n\t", data);
printf("\n};\n");

exit(EXIT_SUCCESS);
}

And built it with:

gcc -Wall hexdump.c -o hexdump

Next I generated a C compilable hex dump of the DSDT.dat raw image:

./hexdump < DSDT.dat > acpi-dsdt.hex

Then I got the seabios sources:

git clone git://git.linuxtogo.org/home/kevin/seabios.git

and the copied the acpi-dsdt.hex to seabios/src

..and then cd into seabios and built using make. This creates bios.bin in the directory called out.

Then I copied the current QEMU bios, vgabios and seabios images to my working directory:

cp -R /usr/share/qemu /usr/share/vgabios /usr/share/seabios/ .

and then copied the compiled seabios image into the newly copied seabios directory:

cp seabios/out/bios.bin seabios

Finally, I ran QEMU using new BIOS image as follows:

qemu -L qemu ubuntu.img

Bit of a hack, but it helped when I did not have the hardware to hand.

Wednesday, 8 September 2010

Hot Laptop

My Lenovo 3000N200 laptop has been playing me up. When I've been fully loading the processor or driving video hard it's been shutting down because of overheating. I suspect periodic SMIs are detecting an overheated CPU and the BIOS just stops the machine to avoid it turning into toast.

Suspecting that the latest 2.6.35 Maverick kernel was the cause I booted with a 2.6.32 Lucid kernel and that didn't help, so it didn't look like an obvious kernel regression.

Well, perhaps it's getting old and cranky - it's nearly 3 years old. Perhaps the thermal paste between the CPU and the heatsink is not working like it should. Since it was most probably a hardware issue I downloaded the service manual and got out the tr

usty screwdriver and opened it up. Lo and behold 5mm of dust had accumulated over the fan grill which wasn't going to help the poor machine offload all that heat out of the laptop case. I removed the fan, gave it a good clean and removed all the dust from the fan outlet grill.

After reassembly the laptop was good as new. Instead of rebooting at 95+ degrees Celsius the Lenovo now runs happily.

The moral of the story is that I should regularly service the fans on my machines. Cooking the CPU is something I would like to avoid in the future.

Monday, 6 September 2010

Digging into the BIOS CMOS Memory

The BIOS settings of a PC are stored in non-volatile memory (sometimes known as NVRAM) to ensure they are saved when the machine is off. On a PC, this NVRAM is CMOS (Complimentary Metal Oxide Semiconductor) memory and is trickle charged by a small battery to ensure the data is preserved. CMOS memory requires very little power hence it can be kept non-volatile by a battery for several years.

One can access the CMOS memory via ports 0x70 and 0x71. One writes the address of the CMOS memory location you want to read to port 0x70 and then read the contents via a read of port 0x70. I implemented this as follows:


unsigned char cmos_read(int offset)
{
     unsigned char value;

     ioperm(0x70, 0x2, 1);
     ioperm(0x80, 0x1, 1);

     outb(offset, 0x70);
     outb(0, 0x80);  /* Small delay */
     value = inb(0x71);

     ioperm(0x80, 0x1, 0);
     ioperm(0x70, 0x2, 0);

     return value;
}

..I was not 100% sure of a small port delay was required between the write to port 0x70 and the read of data on port 0x71, but I added one in by writing to port 0x80 just in case. The ioperm() calls are required to get access to the I/O ports, and one needs to run this code with root privileges otherwise you will get a segmentation fault.

Then it is a case of reading 128 bytes or so of CMOS memory using this function. The next step is decoding this raw data. Web-pages such as http://www.bioscentral.com/misc/cmosmap.htm contain CMOS memory maps, but it does tend to vary from machine to machine. With data from several memory map descriptions that I found on the Web I have figured out some common across different BIOS implementations and written some code to annotate the contents of the CMOS memory. There are a bunch of fields that need a little more decoding, but that's work in progress...

My aim is to add this into the Firmware Test Suite for 11.04 as part of a diagnostic feature.

Friday, 3 September 2010

Embedded Linux Conference, April 2010 Videos

I had the privilege to attend the Embedded Linux Conference in San Francisco in April this year. Like all conferences with multiple tracks it's impossible to attend all the talks, fortunately Free Electrons are hosting the slides and videos of a lot of the talks, so one can catch up on all the goodness at http://free-electrons.com/blog/elc-2010-videos/. Enjoy!