Saturday, 21 June 2014

stress-ng: an updated system stress test tool

Recently added to Ubuntu 14.10 is stress-ng, a simple tool designed to stress various components of a Linux system.   stress-ng is a re-implementation of the original stress tool written by Amos  Waterland and adds various new ways to exercise a computer as well as a very simple "bogo-operation" set of metrics for each stress method.

stress-ng current contains the following methods to exercise the machine:
  • CPU compute - just lots of sqrt() operations on pseudo-random values. One can also specify the % loading of the CPUs
  • Cache thrashing, a naive cache read/write exerciser
  • Drive stress by writing and removing many temporary files
  • Process creation and termination, just lots of fork() + exit() calls
  • I/O syncs, just forcing lots of sync() calls
  • VM stress via mmap(), memory write and munmap()
  • Pipe I/O, large pipe writes and reads that exercise pipe, copying and context switching
  • Socket stressing, much like the pipe I/O test but using sockets
  • Context switching between a pair of producer and consumer processes
Many of the above stress methods have additional configuration options.  Each stress method can be run by one or more child processes.

The --metrics option dumps the number of operations performed by each stress method, aka "bogo ops", bogos because they are a rough and unscientific metric.  One can specify how long to run a test either by test duration in sections or by bogo ops.

I've tried to make stress-ng compatible with the older stress tool, but note that it is not guaranteed to produce identical results as the common test methods between the two tools have been implemented differently.

Stress-ng has been a useful for helping me measure different power consuming loads.  It is also useful with various thermald optimisation tweaks on one of my older machines.

For more information, consult the stress-ng manual page.  Be warned, this tool can make your system get seriously busy and warm!

9 comments:

  1. Note that stress-ng is almost feature complete with version 0.03.x - it now contains a large range of stressors and some specific stressors such as CPU and VM contain there own subset of stressors too. stress-ng will build on debian kFreeBSD and debian HURD based kernel as well for OpenBSD and FreeBSD too.

    ReplyDelete
  2. I'm running an instance of this tool inside and outside of a Linux container to verify the cgroup CPU Bandwidth or CPU time that each cgroup is getting. Is this an appropriate use-case for this tool?

    ReplyDelete
    Replies
    1. To be quite honest, I've not tested that use case in a container with cgroups, so I'm not sure.

      Delete
  3. Colin,
    I'm trying to generate maximum heat load in a server farm, for the purpose of benchmarking heat generation/power consumption under a theoretical maximum load, on xeon processors. I'm trying to get my CPU's to their maximum wattage. I tried "stress -c 32" for my 32 core machines, and it is generating the expected high load, but only a 10% power consumption peak over an idle machine with no bootable filesystem. I can't tell if that is typical, or if stress's sqrt function doesn't do much in the way of driving power consumption.

    What would you suggest as a stress-ng command line/script to achieve my goal? I have dual 16 core xeons and 128gig of memory. Disk is solid state, and I'n not really wanting to generate write wear on the ssd's, so the most I would want to do on the drives is reads.

    I have a lab for a big-data software development project, and I'm trying to develop metrics for making decisions on the best way to spend budget. A basic question is "Is more cooling needed for a unit of work, or is more modern hardware going to do more work with less cooling?"
    Obviously the answer is complex, depending on how old the old hardware is, and how much efficiency gain there is with the new hardware. But developing a testing method that can really crank up the heat is a desirable thing. I can't tell if my new stack is being stressed right now, I can't generate enough heat to convince me that "stress -c32" is doing it. My power consumption is high enough, but the delta from the idle state is too low to be believable.

    The best test would be to get the software team to give me their worst case system load, but they don't really know what that is, as every test they give me doesn't seem to take the system out of an idle power level, they spend most of the time with no cpu load and waiting for i/o. Of course that suggests I shouldn't even try to load the machines... But as soon as I make that assumption, someone is going to come up with a test case that breaks my lab. So I want to break it myself, now, so that the worst they can do later is reach my benchmark without surpassing it.

    ReplyDelete
    Replies
    1. The latest version of stress-ng has far more complex cpu load stress methods other than sqrt. The default action for the cpu stressor is to iterate over all the cpu stress methods in turn so that one gets a good mix of different types of CPU stressing. However, one can specify a specific cpu stressor method using the --cpu-method option. Currently the following cpu stress methods are: all ackermann bitops callfunc cdouble cfloat clongdouble correlate crc16 decimal32 decimal64 decimal128 dither djb2a double euler explog fft factorial fibonacci float fnv1a gamma gcd gray hamming hanoi hyperbolic idct int128 int64 int32 int16 int8 int128float int128double int128longdouble int128decimal32 int128decimal64 int128decimal128 int64float int64double int64longdouble int32float int32double int32longdouble jenkin jmp ln2 longdouble loop matrixprod nsqrt omega parity phi pi pjw prime psi queens rand rand48 rgb sdbm sieve sqrt trig union zeta. These are fully documented in the manual: http://kernel.ubuntu.com/~cking/stress-ng/stress-ng.pdf

      Use it as follows: stress-ng --cpu 0 --cpu-method int64double

      The above example will run the cpu stressor on all available CPUs (0 = all CPUs), with exercising 64bit integer and double floating point math operations.

      Probably the best way to get maximum load is actually to exercise CPU and cache/memory at the same time. I found that the matrix stressor does this quite well, so you may find that this works well for your use case. Use it as:

      stress-ng --matrix 0

      stress-ng can nowadays also read the thermal zone data on modern x86 CPUs, so the --tz (thermal zone) option may be also worth using to see how hot the system gets.

      You may want to examine how much power gets consumed while running stress-ng, so perhaps running my powerstat tool at the same time will provide some useful information to also complement the thermal zone data. See http://kernel.ubuntu.com/~cking/powerstat/

      To generate more heat, exercising the storage devices with the hdd stressor with just read stressing may also be useful. One has to ensure I/O is directly from the device rather than from the cache, so use the following to perform random reads:

      stress-ng --hdd 1 --hdd-opts direct,rd−rnd

      Alternatively, use the I/O mix stressor that hammers the drive using a wide mix of I/O patterns:

      stress-ng --iomix 1

      I suggest reading through the extensive manual to stress-ng as I suspect there are features in the latest stress-ng that may also be useful for your test scenarios.

      Delete
  4. It doesn't work with YAML output. We can't capture all the output for hdd

    ReplyDelete
  5. Hello Mr. King.

    In your post you say that it is possible to "specify the % loading of the CPUs". But according to the user manual, the option for this is missing. I see how to put a limit to the amount of Memory used, the number of CPUs, etc. but not for the % of CPU.

    ReplyDelete
  6. Hello Mr. King.

    In your post you say that it is possible to "specify the % loading of the CPUs". But according to the user manual, the option for this is missing. I see how to put a limit to the amount of Memory used, the number of CPUs, etc. but not for the % of CPU.

    ReplyDelete
  7. What would be the default run time for each sub-stressor under CPU if I select --cpu option? It says "defaulting to a 86400 second run per stressor" does that mean each sub-stressor will run for 24hrs? Or whole CPU class will complete within 24 hrs?

    ReplyDelete