I developed cpustat to be compact and efficient, as well as provide enough stats to allow me to easily identify CPU sucking processes. To optimise the code, I used tools such as perf to identify code hotspots as well as valgrind's cachegrind to identify poorly designed cache inefficient data structures.
The majority of the savings were in the parsing of data from /proc - originally I used simple fscanf() style parsing; over several optimisation rounds I ended up with hand-crafted numeric and string scanning parsing that saved several hundred thousand cycles per iteration.
I also made some optimisations by tweaking the hash table sizes to match the input data more appropriately. Also, by careful re-use of heap allocations, I was able to reduce malloc()/free() calls and save some heap management overhead.
Some very frequent string look-ups were replaced with hash lookups and frequently accessed data was duplicated rather than referenced indirectly to keep data local to reduce cache stalls and hence speed up data comparison lookup time.
The source has been statically checked by CoverityScan, cppcheck and also clang's scan-build to check for bugs introduced in the optimisation steps.
|Example of cpustat|