Saturday 3 November 2012

Counting code size with SLOCCount

David A. Wheeler's SLOCCount is a useful tool for counting lines of code in a software project.  It is simple to use, just provide it with the path to the source code and let it grind through all the source files.  The resulting output is a break down of code line count for each type of source based on the programming language.

SLOCCount also estimates development time in person-years as well as the number of developers and the cost to develop.  One can override the defaults and specify parameters such as costs per person, overhead and effort to make it match to your development model.

Of course, like all tools that produce metrics it can be abused, for example using it as a meaningless metric of programmer productivity.  Counting lines of code does not really measure project complexity, a vexing bug that took 2 days to figure out and resulted in a 1 line fix is obviously more expensive than a poorly written 500 line function that introduces a no new noticeable functionality.   As a rule of thumb, SLOCCount is a useful tool to get an idea of the size of a project and some idea of the cost to develop it.   There are of course more complex ways to examine project source code, such as cyclomatic complexity metrics, and there are specific tools such as Panopticode that do this.

As a small exercise, I gave SLOCCount the task of counting the lines of code in the Linux kernel from version 2.6.12 to 3.6 and used the default settings to produce an estimated cost to develop each version.

It is interesting to see that the rate of code being added seemed to increase around the 2.6.28 release.   So what about the estimated cost to develop?..

This is of course pure conjecture.  The total lines of code does not consider the code of some patches that remove code and assumes that the cost is directly related to lines of code.  Also, code complexity makes some lines of code far more expensive to develop than others.   It is interesting to see that each release is adding an average of 184,000 lines of code per release which SLOCCount estimates to cost about $8.14 million dollars or ~44.24 dollars per line of code; not sure how realistic that really is.

Anyhow, SLOCCount is easy to use and provides some very useful rule-of-thumb analysis on project size and costs.

1 comment:

  1. Hi Colin,

    I recently ran few very simple file copy benchmarks in both the newly released Windows 8 and Ubuntu 12.10 (it's on my website).

    According to them, interestingly, when copying single files, Windows 8 was very impressive, as it outperformed Ubuntu by a big margin. And the disk throughput was also very impressive.

    However, while copying a folder filled with thousands of files (specially small ones) the performance of both OS were very similar.

    You can find that article on my blog, the so called 'benchmark' looks a bit lame but I just thought those tests would resemble real-world examples.

    Anyhow, because my knowledge is limited in this field, I wanted to ask you and get your professional opinion about it. So here it goes.

    According to those tests, while copying the disk throughput under NTFS was twice faster than the Ext4, though it is very similar when copying a folder filled with small files.

    So my question is, is there any way to find out, whether if it was due to Ext4 file system's design (performance) or the disk I/O Scheduler ?

    BTW: I love your tests and the contributions that you have added in Ubuntu as of making it a more power efficient OS, though most don't know that. Thank you for that!!.