One provides health-check with a list of one or more processes and it will monitor all the associated threads (and child processes) and then report back on the resources used. Heath-check will report on:
- CPU utilisation
- Wakeup events
- Context Switches
- File I/O operations (Open/Read/Write/Close using fnotify)
- System calls (using ptrace)
- Analysis of polling system calls
- Memory utilisation (including memory growth)
- Network connections
- Wakelock activity
Some applications may be sub-optimally writing out data frequently, causing dirty pages and meta data that needs to be written back to the file system. Health-check will capture file I/O activity and report on the names of the files being opened, read, written and closed.
To help identify excessive or heavy system call usage, health-check uses ptrace to trap and monitor all the system calls that the program makes. For example, it has been observed that some applications excessively call poll() and nanosleep() with poorly chosen timeouts causing excessive CPU utilisation. For system calls such as these where they can wait until an event or a timeout occur, health-check has some deeper monitoring. It inspects the given timeout delay and checks to see if the call timed out, for example, health-check can identify CPU sucking repeated polling where zero timeouts are being used or excessive nanosleeps with zero or negative delays.
The ptrace ability of heath-check also allow it to monitor per-process wake lock writes. Abuse of wakelocks can keep the a kernel from suspending into deep sleep so it is useful to keep track of wakelock activity on some processes. This is not enabled by default as it is an expensive operation to monitor this via ptrace and also some kernels may not have wakelocks, so one has to use the -W option to enable this.
Health-check also inspects /proc/$pid/smaps and will determine if memory utilisation has grown or shrunk. Unusually high heap growth over time may indicate that an application has a memory leak.
Finally, health-check will inspect /proc/$pid/fd and from this determine any open sockets and then try and resolve the host names of the IP addresses. For example, it is entirely possible for an application to be making spurious or unwanted connections to various machines, so it is helpful to check up on this kind of activity.
Health-check is still very early alpha quality, so beware of possible bugs. However, it has been helpful in identifying some misbehaving applications, so it is already proving to be rather useful.
Source code can be found at: git://kernel.ubuntu.com/cking/health-check
Packages found in my White PPA in ppa:colin-king/white so to install on Ubuntu systems use:
..and go and track down some resource sucking apps..
sudo add-apt-repository ppa:colin-king/white
sudo apt-get update
sudo apt-get install health-check