Wednesday, 19 December 2012

..and now something seasonal

The Christmas holidays are almost here and to celebrate I've mildly obfuscated some C that prints the lyrics to a traditional English carol. The source code can be found here

#include <stdio.h>
#include <stdbool.h>
#include <malloc.h>
#define H malloc (1237) ;
#define c while
#define W >> /* fallen right tree */
#define w << /* fallen left tree */
#define B(wish,times) ((_ &wish) times 2)
#define L {o Q=q;p(3[Z]),P(false[Z],q),\
N:4[Z]:"a "),P(1[Z],Q),p(Q>1?", ":N),--\
char typedef o;void typedef d

                 return P;}d p(o*f
                 d P(o*s,o o){c( o
                --){c(122-*s)s++; s
              *s++);} o main(void){o*
"dbdxdj!qgqdy}&qgqgjl{p|dbxd!`ytffdy}!`ytffgjl{") ,O (
"cd!"),O("!ej`!e!")};L L L L L L L L L L L L /*Xmas*/}

Saturday, 15 December 2012

Debugging using Visualisation.

There are many different classes of bugs and hence many different debugging techniques can be used.   Sometimes there is a lot of complexity in a problem and it is hard to make a mental model of what exactly is going on between multiple interdependent components, especially when non-deterministic behaviour is occurring.

Unfortunately the human mind is limited; there is only so much debugging state that it can follow.  The problem is compounded when a bug manifests itself after several hours or days of continuous execution time - there just seems like there is too much state to track and to make sense from.

Looking at thousands of lines of debug trace and trying to spot any lurking evidence that may offer some hint to why code is failing is not trivial. However the brain is highly developed at making sense of visual input, so it makes sense to visualise copious amounts of debug data to spot anomalies.

The kernel offers many ways to gather data, be it via different trace mechanisms or just by using plain old printk().   The steps to debug are thus:

1.  Try and find a way to reliably reproduce the problem to be debugged.
2.  Where possible, try to remove complexity from the problem and still end up with a reliable way of reproducing the problem.
3.  Rig up a way of collecting internal state or data on system activity.   Obviously the type of data to be collected is dependant on the type of problem being worked on.   Be careful that any instrumentation does not affect the behaviour of the system.
4.  Start collecting data and try to reproduce the bug.  You may have to do this multiple times to collect enough data to allow one to easily spot trends over different runs.
5.  Visualise the data.

Iterating on steps 2 to 5 allow one to keep on refining a problem down to the bare minimum required to corner a bug.

Now step 5 is the fun part.  Sometimes one has to lightly parse the output to collect specific items of data. I generally use tools such as awk to extract specific fields or to re-munge data into a format that can be easily processed by graphing tools.  It can be useful to also collect time stamps with the data as some bugs are timing dependant and seeing interactions between components is key to understanding why issues occur.  If one gathers multiple sets of data from different sources then being able to correlate the data on a timestamp  can be helpful.

If I have just a few tens of hundreds items of data to visualise I generally just collate my data into tab or comma separated output and then import it into LibreOffice Calc and then generate graphs from the raw data.   However, for more demanding graphing I normally resort to using gnuplot.   Gnuplot is an incredibly powerful plotting tool - however for more complex graphs one often needs to delve deep into the manual and perhaps crib from the very useful worked examples.

Graphing data allows one to easily spot trends or correlate between seemingly unrelated parts of a system. What was originally an overwhelmingly huge mass of debug data turns into a powerful resource.   Sometimes I find it useful to run multiple tests over a range of tweaked kernel tuneables to see if bugs change behaviour - often this aids understanding when there is significant amounts of inter-component complexity occurring.

Perhaps it is just the way I like to think about issues, but I do recommend experimenting with collecting large data sets and creatively transforming the data into visualise representations to allow one to easily spot issues.  It can be surprising just how much one can glean from thousands of seemingly unrelated  traces of debug data.

Saturday, 1 December 2012

UNIX 1st Edition restoration project.

Earlier this week I stumbled upon the unix-jun72 project which is a restoration of the 1st edition of UNIX based on a scanned printout of the original source.  Unfortunately not all the source is available, but some userland binaries and the C compiler have been recovered from some tapes and there is enough functionality to be able to boot to a minimal and usable system.

The first step is to download and build the most excellent Simh simulator so that we can simulate a PDP-11.   Fortunately the disk images are already pre-built and downloadable, so one just has to download these and point the PDP-11 simulator at a config file and the system almost instantly boots.  The full instructions are provided in great detail here.

I am rather fond of the PDP-11. It was the first mini computer I worked on back in 1987 when I was a very junior programmer at Prosig.  The machine was running RSX-11M Plus rather than UNIX but even so it gave me the appreciation of what one can do on a very small multi-user machine. At the time I was maintaining, building and testing DATS signal processing tools written in Fortran 77, and some of the work involved tinkering with task builder scripts to try and cram code into the very limited memory.

So in those days size really mattered.  Small was considered beautiful and this mindset was instilled into me in my formative years.  So I was very pleased to find the unix-jun72 project which allows me to relive those PDP-11 days and also experience UNIX-1.

Since some of the original source code still exists, one can browse the source of the kernel and some of the core utilities.  It is amazing to see the functionality that is available in very small binaries.

Let's  fire up the simulator and see the size of the 'cat' command:

 PDP-11 simulator V3.9-0  
 Disabling CR  
 Disabling XQ  
 RF: buffering file in memory  
 TC0: 16b format, buffering file in memory  
 :login: root  
 # ls -al /bin/cat  
 total  1  
  50 sxrwr- 1 bin   134 Jan 1 00:00:00 cat  

 And how does that compare to a GNU cat on a 64 bit Ubuntu laptop?

 ls -al /bin/cat  
 -rwxr-xr-x 1 root root 47912 Nov 28 12:48 /bin/cat  

134 bytes versus 47912 bytes. That is quite a difference!  Admittedly we are comparing apples vs pears here, so obviously this is an unfair comparison, but it does illustrate how we've "progressed" since the early UNIX-1 days.   Honestly I'm glad we can now write userland tools and the kernel in C rather than assembler and not have to worry about size constraints quite so much.

I'm very glad that this project exists to preserve the UNIX heritage.  History has a lot to teach us.  Browsing the early code is an education; it allows us to appreciate the memory constraints that shaped the design and implementation of UNIX.  To be able to run this code in a simulator and tinker with a running system adds to the experience.

We've moved on a long way in 40 or so years, machines are incredibly powerful, and memory and disk is cheap.  But let us not forget the "small is beautiful" ideal.  There is something very appealing about tools that do just enough to be very useful and yet are not bloated with unnecessary memory hogging feature creep.