Friday 1 March 2013

Pragmatic Graphing

Over the past few days I have been analysing various issues and also doing some background research, so I have been collecting some rather large sets of data to process.   Normally I filter, re-format and process the data using a bunch of simple tools such as awk, tr, cut, sort, uniq and grep to get the data into some form where it can be plotted using gnuplot. 

The UNIX philosophy of piping together a bunch of tools to produce the final output normally works fine, however, graphing the data with gnuplot always ends up with me digging around in the online gnuplot documentation or reading old gnuplot files to remind myself exactly how to plot the data just the way I want.   This is fine for occasions where I gather lots of identical logs and want to compare results from multiple tests, the investment in time to automate this with gnuplot is well worth the hassle.   However, some times I just have a handful of samples and want to plot a graph and then quickly re-jig the data and perhaps calculate some statistical information such a trend lines.  In this case, I fall back to shoving the samples into LibreOffice Calc and slamming out some quick graphs.

This makes me choke a bit.  Using LibreOffice Calc starts to make me feel like I'm an accountant rather than a software engineer.  However, once I have swallowed my pride I have come to the conclusion that one has to be pragmatic and use the right tool for the job.  To turn around small amounts of data quickly, LibreOffice Calc does seem to be quite useful.   For processing huge datasets and automated graph plotting, gnuplot does the trick (as long as I can remember how to use it).   I am a command line junkie and really don't like using GUI based power tools, but there does seem to be a place where I can mix the two quite happily.

4 comments:

  1. Give R a try, http://r-project.org.

    ReplyDelete
    Replies
    1. Ah, you beat me to it :) Yes, absolutely, try R. Very powerful, has many functions that are analogous to *nix commands, e.g. sort(), unique(), grep(). If your data are in columns (e.g. in a csv file) you can work with it as a data.frame. Less structured data can be read in and processed line by line, e.g. using readLines(). R has very good plotting facilities. Using it from the basic terminal/shell is like having a bash shell that is designed around processing data.

      Delete
    2. +1 for R. Is really useful for this type of stuff.

      Delete
    3. Thanks for the info about R, I shall give it a spin.

      Delete