It has been a while since my last post about stress-ng so I thought it would be useful to provide an update on the changes since V0.08.09.

I have been focusing on making stress-ng more portable so it can build with various versions of clang and gcc as well as run against a wide range of kernels. The portability shims and config detection added to stress-ng allow it to build and run on a wide range of Linux systems, as well as GNU/HURD, Minix, Debian kFreeBSD, various BSD systems, OpenIndiana and OS X.

Enabling stress-ng to work on a wide range of architectures and kernels with a range of compiler versions has helped me to find and fix various corner case bugs. Also, static analysis with a various set of tools has helped to drive up the code quality. As ever, I thoroughly recommend using static analysis tools on any project to find bugs.

Since V0.08.09 I've added the following stressors:

inode-flags - (using the FS_IOC_GETFLAGS/FS_IOC_SETFLAGS ioctl, see ioctl_iflags(2) for more details.

Stress-ng has nearly 200 stressors and many of these have various stress methods than can be selected to perform specific stress testing. These are all documented in the manual. I've also updated the stress-ng project page with various links to academic papers and presentations that have used stress-ng in various ways to stress computer systems. It is useful to find out how stress-ng is being used so that I can shape this tool in the future.

As ever, patches for fixes and improvements are always appreciated. Keep on stressing!

There are a wealth of powerful static analysis tools available nowadays for analyzing C source code. These tools help to find bugs in code by just analyzing the source code without actually having to execute the code. Over that past year or so I have been running the following static analysis tools on linux-next every weekday to find kernel bugs:

Typically each tool can take 10-25+ hours of compute time to analyze the kernel source; fortunately I have a large server at hand to do this. The automated analysis creates an Ubuntu server VM, installs the required static analysis tools, clones linux-next and then runs the analysis. The VMs are configured to minimize write activity to the host and run with 48 threads and plenty of memory to try to speed up the analysis process.

At the end of each run, the output from the previous run is diff'd against the new output and generates a list of new and fixed issues. I then manually wade through these and try to fix some of the low hanging fruit when I can find free time to do so.

I've been gathering statistics from the CoverityScan builds for the past 12 months tracking the number of defects found, outstanding issues and number of defects eliminated:

As one can see, there are a lot of defects getting fixed by the Linux developers and the overall trend of outstanding issues is downwards, which is good to see. The defect rate in linux-next is currently 0.46 issues per 1000 lines (out of over 13 million lines that are being scanned). A typical defect rate for a project this size is 0.5 issues per 1000 lines. Some of these issues are false positives or very minor / insignficant issues that will not cause any run time issues at all, so don't be too alarmed by the statistics.

Using a range of static analysis tools is useful because each one has it's own strengths and weaknesses. For example smatch and sparse are designed for sanity checking the kernel source, so they have some smarts that detect kernel specific semantic issues. CoverityScan is a commercial product however they allow open source projects the size of the linux-kernel to be built daily and the web based bug tracking tool is very easy to use and CoverityScan does manage to reliably find bugs that other tools can't reach. Cppcheck is useful as scans all the code paths by forcibly trying all the #ifdef'd variations of code - which is useful on the more obscure CONFIG mixes.

Finally, I use clang's scan-build and the latest verion of gcc to try and find the more typical warnings found by the static analysis built into modern open source compilers.

The more typical issues being found by static analysis are ones that don't generally appear at run time, such as in corner cases like error handling code paths, resource leaks or resource failure conditions, uninitialized variables or dead code paths.

My intention is to continue this process of daily checking and I hope to report back next September to review the CoverityScan trends for another year.

The new memrate stressor performs 64/32/16/8 bit reads and writes to a large memory region. It will attempt to get some statistics on the memory bandwidth for these simple reads and writes. One can also specify the read/write rates in terms of MB/sec using the --memrate-rd-mbs and --memrate-wr-mbs options, for example:

...the memrate stressor will attempt to limit the memory rates but due to scheduling jitter and other memory activity it may not be 100% accurate. By careful setting of the size of the memory being exercised with the --memrate-bytes option one can exercise the L1/L2/L3 caches and/or the entire memory.

By default, matrix stressor will perform matrix operations with optimal memory access to memory. The new --matrix-yx option will instead perform matrix operations in a y, x rather than an x, y matrix order, causing more cache stalls on larger matrices. This can be useful for exercising cache misses.

To complement the heapsort, mergesort and qsort memory/CPU exercising sort stressors I've added the BSD library radixsort stressor to exercise sorting of hundreds of thousands of small text strings.

Finally, while exercising various hugepage kernel configuration options I was inspired to make stress-ng mmap's to work better with hugepage madvise hints, so where possible all anonymous memory mappings are now private to allow hugepage madvise to work. The stream and vm stressors also have new madvise options to allow one to chose hugepage, nohugepage or normal hints.

No big changes as per normal, just small incremental improvements to this all purpose stress tool.

Forkstat is a tiny utility I wrote a while ago to monitor process activity via the process events connector. Recently I was sent a patch from Philipp Gesang to add a new -l option to switch to line buffered output to reduce the delay on output when redirecting stdout, which is a useful addition to the tool. During some spare time I looked at the original code and noticed that I had overlooked some of lesser used process event types:

STAT_PTRC - ptrace attach/detach events

STAT_UID - UID (and GID) change events

STAT_SID - SID change events

..so I've now added support for these events too.

I've also added some extra per-process information on each event. The new -x "extra info" option will now also display the UID of the process and where possible the TTY it is associated with. This allows one to easily detect who is responsible for generating the process events.

The following example shows fortstat being used to detect when a process is being traced using ptrace:

Process 17376 runs strace on process 17350 (top). We can see the ptrace attach event on the process and also then a few seconds later the detach event. We can see that the strace was being run from pts/15 by root. Using forkstat we can now snoop on users who are snooping on other user's processes.

I use forkstat mainly to capture busy process fork/exec/exit activity that tools such as ps and top cannot see because of the very sort duration of some processes or threads. Sometimes processes are created rapidly that one needs to run forkstat with a high priority to capture all the events, and so the new -r option will run forkstat with a high real time scheduling priority to try and capture all the events.

These new features landed in forkstat V0.02.00 for Ubuntu 17.10 Aardvark.

The latest release of stress-ng contains a mechanism to measure latencies via a cyclic latency test. Essentially this is just a loop that cycles around performing high precisions sleeps and measures the (extra overhead) latency taken to perform the sleep compared to expected time. This loop runs with either one of the Round-Robin (rr) or First-In-First-Out real time scheduling polices.

The cyclic test can be configured to specify the sleep time (in nanoseconds), the scheduling type (rr or fifo), the scheduling priority (1 to 100) and also the sleep method (explained later).

The first 10,000 latency measurements are used to compute various latency statistics:

The latency percentiles indicate the latency at which a percentage of the samples fall into. For example, the 99% percentile for the 10,000 samples is the latency at which 9,900 samples are equal to or below.

The latency distribution is shown when the --cyclic-dist option is used; one has to specify the distribution interval in nanoseconds and up to the first 100 values in the distribution are output.

For an idle machine, one can invoke just the cyclic measurements with stress-ng as follows:

Note that stress-ng needs to be invoked using sudo to enable the Real Time FIFO scheduling for the cyclic measurements.

The above example uses the following options:

--cyclic 1

starts one instance of the cyclic measurements (1 is always recommended)

--cyclic-policy fifo

use the real time First-In-First-Out scheduling for the cyclic measurements

--cyclic-prio 100

use the maximum scheduling priority

--cyclic-method clock_ns

use the clock_nanoseconds(2) system call to perform the high precision duration sleep

--cyclic-sleep 20000

sleep for 20000 nanoseconds per cyclic iteration

--cyclic-dist 1000

enable latency distribution statistics with an interval of 1000 nanoseconds between each data point.

-t 5

run for just 5 seconds

From the run above, we can see that 99.5% of latencies were less than 9821 nanoseconds and most clustered around the 4880 nanosecond model point. The distribution data shows that there is some clustering around the 5000 nanosecond point and the samples tail off with a bit of a long tail.

Now for the interesting part. Since stress-ng is packed with many different stressors we can run these while performing the cyclic measurements, for example, we can tell stress-ng to run *all* the virtual memory related stress tests and see how this affects the latency distribution using the following:

..the above invokes all the vm class of stressors to run all at the same time (with just one instance of each stressor) for 60 seconds.

The --cyclic-method specifies the delay used on each of the 10,000 cyclic iterations used. The default (and recommended method) is clock_ns, using the high precision delay. The available cyclic delay methods are:

clock_ns (use the clock_nanosecond() sleep)

posix_ns (use the POSIX nanosecond() sleep)

itimer (use a high precision clock timer and pause to wait for a signal to measure latency)

poll (busy spin-wait on clock_gettime() to eat cycles for a delay.

All the delay mechanisms use the CLOCK_REALTIME system clock for timing.

I hope this is plenty of cyclic measurement functionality to get some useful latency benchmarks against various kernel components when using some or a mix of the stress-ng stressors. Let me know if I am missing some other cyclic measurement options and I can see if I can add them in.

In this demonstration, the "All Batch Tests" option has been selected:

Tests will be run one by one and a progress bar shows the progress of each test. Some tests run very quickly, others can take several minutes depending on the hardware configuration (such as number of processors).

Once the tests are all complete, the following dialogue box is displayed:

The test has saved several files into the directory /fwts/15052017/1748/ and selecting Yes one can view the results log in a scroll-box:

Exiting this, the FWTS frontend dialog is displayed:

Press enter to exit (note that the Poweroff option is just for the fwts Live-CD image version of fwts-frontend).

acpidump.log is a dump of the ACPI tables in format compatible with the ACPICAacpidump tool. The results.log file is a copy of the results generated by FWTS and results.html is a HTML formatted version of the log.

The latest release of stress-ng 0.08.00 now contains a new job scripting feature. Jobs allow one to bundle up a set of stress options into a script rather than cram them all onto the command line. One can now also run multiple invocations of a stressor with the latest version of stress-ng and conbined with job scripts we now have a powerful way of running more complex stress tests.

The job script commands are essentially the stress-ng long options without the need for the '--' option characters. One option per line is allowed.

One can also add comments using the # character prefix. By default the stressors will be run in parallel, but one can use the "run sequential" command in the job script to run the stressors sequentially.

The following script runs the mmap stressor multiple times using more memory on each run:

Some of the stress-ng stressors have various "methods" that allow one to modify the way the stressor behaves. The following example shows how job scripts can be uses to exercise a system using different stressor methods:

Various example job scripts can be found in /usr/share/stress-ng/example-job, one can use these as a base for writing more complex stressors. The example jobs have all the options commented (using the text from the stress-ng manual) to make it easier to see how each stressor can be run.

Version 0.08.00 landed in Ubuntu 17.10 Artful Aardvark and is available as a snap and I've got backports in ppa:colin-king/white for older releases of Ubuntu.

Over the past 6 months I've been running static analysis on linux-next with CoverityScan on a regular basis (to find new issues and fix some of them) as well as keeping a record of the defect count.

Since the beginning of September over 2000 defects have been eliminated by a host of upstream developers and the steady downward trend of outstanding issues is good to see. A proportion of the outstanding defects are false positives or issues where the code is being overly zealous, for example, bounds checking where some conditions can never happen. Considering there are millions of lines of code, the defect rate is about average for such a large project.

I plan to keep the static analysis running long term and I'll try and post stats every 6 months or so to see how things are progressing.

BCC allows one to write BPF programs with front-ends in Python or Lua with kernel instrumentation written in C. The instrumentation code is built into sandboxed eBPF byte code and is executed in the kernel.

The BCC github project README file provides an excellent overview and description of BCC and the various available BCC tools. Building BCC from scratch can be a bit time consuming, however, the good news is that the BCC tools are now available as a snap and so BCC can be quickly and easily installed just using:

sudo snap install --devmode bcc

There are currently over 50 BCC tools in the snap, so let's have a quick look at a few:

cachetop allows one to view the top page cache hit/miss statistics. To run this use:

sudo bcc.cachetop

The funccount tool allows one to count the number of times specific functions get called. For example, to see how many kernel functions with the name starting with "do_" get called per second one can use:

sudo bcc.funccount "do_*" -i 1

To see how to use all the options in this tool, use the -h option:

sudo bcc.funccount -h

I've found the funccount tool to be especially useful to check on kernel activity by checking on hits on specific function names.

The slabratetop tool is useful to see the active kernel SLAB/SLUB memory allocation rates:

sudo bcc.slabratetop

If you want to see which process is opening specific files, one can snoop on open system calls use the opensnoop tool:

sudo bcc.opensnoop -T

Hopefully this will give you a taste of the useful tools that are available in BCC (I have barely scratched the surface in this article). I recommend installing the snap and giving it a try.

As it stands,BCC provides a useful mechanism to develop BPF tracing tools and I look forward to regularly updating the BCC snap as more tools are added to BCC. Kudos to Brendan Gregg for BCC!

The kernel contains tens of thousands of statements that may print various errors, warnings and debug/information messages to the kernel log. Unsurprisingly, as the kernel grows in size, so does the quantity of these messages. I've been scraping the kernel source for various kernel printk style statements and macros and scanning these for various typos and spelling mistakes and to make this easier I hacked up kernelscan (a quick and dirty parser) that helps me find literal strings from the kernel for spell checking.

Using kernelscan, I've gathered some statistics for the number of kernel print statements for various kernel releases:

As one can see, we have over 200,000 messages in the 4.9 kernel(!). Given the kernel growth, we can see this seems to roughly correlate with the kernel source size:

So how many lines of code in the kernel do we have per kernel printk messages over time?

..showing that the trend is to have more lines of code per frequent printk statements over time. I didn't differentiate between different types of printk message, so it is hard to see any deeper trends on what kinds of messages are being logged more or less frequently over each release, for example, perhaps there are less debug messages landing in the kernel nowadays.

I find it quite amazing that the kernel contains quite so many printk messages; it would be useful to see just how many of these are actually in a production kernel. I suspect quite large number are for driver debugging and may be conditionally omitted at build time.

Another year passes and once more I have another seasonal obfuscated C program. I was caught short on free time this year to heavily obfuscate the code which is a shame. However, this year I worked a bit harder at animating the output, so hopefully that will make up for lack of obfuscation.

The source is available on github to eyeball. I've had criticism on previous years that it is hard to figure out the structure of my obfuscated code, so this year I made sure that the if statements were easier to see and hence understand the flow of the code.

This year I've snapped up all my seasonal obfuscated C programs and put them into the snap store as the christmas-obfuscated-c snap.

Below is a video of the program running; it is all ASCII art and one can re-size the window while it is running.

stress-ng is a tool that I have been developing on-and-off for a few years. It is designed to stress kernels to force out bugs, stress CPU and memory and also contains some performance benchmarking metrics too.

stress-ng is now entering the maturity part of the development phase, however, there is always scope to add new stressors and generally improve the tool. I've just released version 0.07.07 for the Ubuntu Zesty 17.04 release and it contains a few additional features:

SIGUSR2 sent to stress-ng will dump out the current system load and memory statistics

The major change to stress-ng over the past month was an internal re-working of system call and GNU features to abstract these into a shim layer to reduce the number build conditional #ifdef paths around code. This simplifies portability, so the code now builds more easily across a range of systems and with various versions of gcc and clang and fixes some issues on older kernels too. This makes the code also faster to statically analyze with cppcheck.

Over the past month I've been hitting excessive thermal heating on my laptop and kidle_inject has been kicking in to try and stop the CPU from overheating (melting!). A quick double-check with older kernels showed me that this issue was not thermal/performance regression caused by software - instead it was time to clean my laptop and renew the thermal paste.

After some quick research, I found that Artic MX-4 Thermal Compound provided an excellent thermal conductivity rating of 8.5W/mK so I ordered a 4g sample as well as a can of pressurized gas cleaner to clean out dust.

The X230 has an excellent hardware maintenance manual, and following the instructions I stripped the laptop right down so I could pop the heat pipe contacts off the CPU and GPU. I carefully cleaned off the old dry and cracked thermal paste and applied about 0.2g of MX-4 thermal compound to the CPU and GPU and re-seated the heat pipe. With the pressurized gas I cleaned out the fan and airways to maximize airflow over the heatpipe. The entire procedure took about an hour to complete and for once I didn't have any screws left over after re-assembly!

I normally take photos of the position of components during the strip down of a laptop for reference in case I cannot figure out exactly how parts are meant to fix on the re-assembly phase. In this case, the X230 maintenance manual is sufficiently detailed so I didn't take any photos this time.

I'm glad to report that my X230 is now no-longer overheating. Heat is being effectively pumped away from the CPU and GPU and one can feel the additional heat being pushed out of the laptop. Once again I can fully max out the CPU and GPU without passive thermal cooling mechanisms being kicked into action, so I've now got 100% of my CPU performance back again; as good as new!

Now and again I see laptop overheating bugs being filed in LaunchPad. While some are legitimate issues with broken software, I do wonder if the majority of issues with the older laptops is simply due to accumulation of dust and/or old and damaged thermal paste.

The Linux kernel contains lots of error/warning/information messages; over 130,000 in the current 4.7 kernel. One of the tests in the Firmware Test Suite (FWTS) is to find BIOS/ACPI/UEFI related kernel error messages in the kernel log and try to provide some helpful advice on each error message since some can be very cryptic to the untrained eye.

The FWTS kernel error log database is currently approaching 800 entries and I have been slowly working through another 800 or so more relevant and recently added messages. Needless to say, this is taking a while to complete. The hardest part was finding relevant error messages in the kernel as they appear in different forms (e.g. printk(), dev_err(), ACPI_ERROR() etc).

In order to scrape the Linux kernel source for relevant error messages I hacked up the kernelscan parser to find error messages and dump these to stdout. kernelscan can scan 43,000 source files (17,900,000 lines of source) in under 10 seconds on my Lenovo X230 laptop, so it is relatively fast.

I also have been using kernelscan to find spelling mistakes in kernel messages and I've been punting trivial fixes upstream to fix these. These mistakes are small and petty, but I find it a little irksome when I see the kernel emit a message that contains a typo or spelling mistake - it just looks a bit unprofessional.

The code is designed to only parse kernel source, and it is a very rough and ready parser designed for speed; fundamentally, it is a big quick hack. When I get a few spare minutes I will try and see if there is any correlation between the number of error messages with the size of the kernel over the various releases.

Since my last blog post about stress-ng, I've pushed out several more small releases that incorporate new features and (as ever) a bunch more bug fixes. I've been eyeballing gcov kernel coverage stats to find more regions in the kernel where stress-ng needs to exercise. Also, testing on a range of hardware (arm64, s390x, etc) and a range of kernels has eeked out some bugs and helped me to improve stress-ng. So what's new?

stackmmap - allocates a 2MB stack that is memory mapped onto a temporary file. A recursive function works down the stack and flushes dirty stack pages back to the memory mapped file using msync(2) until the end of the stack is reached (stack overflow). This exercises dirty page and stack exception handling.

Recently I've been adding a few more features into stress-ng to get improve kernel code coverage. I'm currently using a kernel built with gcov enabled and using the most excellent lcov tool to collate the coverage data and produce some rather useful coverage charts.

With a gcov enabled kernel, gathering coverage stats is a trivial process with lcov:

I got bitten this week with the clone() system call returning -EINVAL on aarch64 on code that worked fine on x86. After re-reading the manual several times and looking at my code, I resorted to shoving in debug into the kernel to track down where the -EINVAL was occurring.

The answer to my issue is in arch/arm64/kernel/process.c, copy_thread():

While running static analysis on a lot of C source code, I keep on finding a common incorrect programming idiom used with realloc() allocation failures where a NULL is returned and the code returns with some kind of exit failure status, something like the following:

However, when realloc() fails it returns NULL and the original object remains unchanged and thus it is not freed. So the above code leaks the memory pointed to by ptr if realloc() returns NULL.

This may be a moot point, since the error handling paths normally abort the program because we are out of memory if can't proceed any further. However, there are occasions in code where ENOMEM may not be fatal, for example the program may reallocate smaller buffers and retry or free up space on the heap and retry.

Dustin Kirkland recently announced ZFS being officially supported by Canonical for Ubuntu Xenial 16.04. We've written a short reference guide to help with getting started, and understanding the ZFS terminology. This touches the basics on setting up ZFS pools as well as creating and using ZFS file systems.

If you are new to ZFS, I recommend having a look at the reference guide to get you started.