Archive

One of the bits of feedback that I got from the useR conference was that my assertive package, for run-time testing of code, was too big. Since then I’ve been working to modularise it, and the results are now on CRAN.

Now, rather than one big package, there are fifteen assertive.* packages for specific pieces of functionality. For example, assertive.numbers contains functionality for checking numeric vectors, assertive.datetimes contains functionality for checking dates and times, and assertive.properties contains functionality for checking names, attributes, and lengths.

This finer grained approach means that if you want to develop a package with assertions, you can choose only the parts that you need, allowing for a much smaller footprint.

The pathological package, which depends upon assertive, gives you an example of how to do this.

The assertive package itself now contains no functions of its own – it merely imports and re-exports the functions from the other 15 packages. This means that if you are working with assertive interactively, you can still simply type library(assertive) and have access to all the functionality.

I had an interesting email today saying that developers at the writer’s company wanted to use one of my packages, but weren’t allowed because it was under an unlimited license.

I release quite a few of my packages under an unlimited licence, at least for toy projects and immature packages. In those cases, letting users do what they want is more important to me than the fairness of, for example, the GPL.

(assertive is a notable exception, because it’s taken a lot of work, and also because it contains some RStudio code.)

Anyway, the lady who wrote to me requested that I release my package under a dual license to enable her staff to use it.

My alternate solution is more elegant: since the license is unlimited, you can simply download the package source, edit the DESCRIPTION file to change the license to whatever you want, and use it as you see fit.

pathological (for working with file paths), runittotestthat (for converting RUnit tests to testthat tests), and rebus (formerly regex, for building regular expressions in a human readable way) all make their CRAN debuts.

assertive, for run-time testing your code has more checks for the state of your R setup (is_r_devel, is_rstudio, r_has_png_capability, and many more), checks for the state of your variables (are_same_length, etc.), and utilities (dont_stop).

sig (for checking that your function signatures are sensible) now works with primitive functions too.

learningr (to accompany the book) has a reference URL fix but is otherwise the same.

I encourage you to take a look at some or all of them, and give me feedback.

There’s a new version of my assertive package, for sanity-checking code, on its way to CRAN. The release has been delayed a while, since my previous attempt at an upload met with an error that was only generated on the CRAN machine, but not on my own. The problem lay with some code designed to autorun the RUnit tests for the package. After fiddling for a while and getting nowhere, I decided it was time to make the switch to testthat.

I’ve been a long-time RUnit user, since the syntax is near-identical to every other xUnit variant in every programming language. So switching between RUnit and MATLAB xUnit or NUnit requires no thinking. testthat has a couple of important advantages over RUnit though.

test_package makes it easy to, ahem, test your package. A page of code for finding and nicely displaying bad tests has been reduced to test_package("assertive").

Secondly, testing for warnings is much cleaner. In RUnit, you have to use convoluted mechanisms like:

Thirdly, testthat caches tests, so you spend less time waiting for tests that you know are fine to rerun.

These benefits mean that I’ve been meaning to switch packages for a while. The big problem was that the assertive package contains over 300 unit tests. At about a minute or so to update each tests, that was five hours of tedious work that I couldn’t be bothered to do. Instead, I spent two days making a package that automatically converts RUnit tests to testthat tests. Not exactly a time saving, but it was more fun.

It isn’t on CRAN yet, but you can get it from github.

library(devtools)
install_github("richierocks/runittotestthat")

The package contains functions to convert tests on an individual/file/package basis.

convert_test takes an RUnit test function and returns a call to test_that. Here’s an example for the sqrt function.

Of course, the main use for this is converting whole files or packages at at time, so runittotestthat contains convert_test_file and convert_package_tests for this purpose. By default (so you don’t overwrite your RUnit tests by mistake), they write their output to the console, but you can also write the resulting testthat tests to a file. converting all 300 of assertive’s tests was as easy as

In that line of code, the runit_files variable is a special name that refers to the names of the files that contain you RUnit tests. It means that the output testthat file names can be be based upon original input names.

Although runittotestthat works fine on all my tests, automatic code editing is a tricky task, so there may be some weird edge cases that I’ve missed. Please download the package and play with it, and let me know if you find any bugs.

I was recently hunting for a function that will strip the extension from a file – changing foo.png to foo, and so forth. I was knitting a report, and wanted to replace the file extension of the input with the extension of the the output file. (knitr handles this automatically in most cases but I had some custom logic in there that meant I had to work things manually.)

Finding file extensions is such a common task that I figured that someone must have written a function to solve the problem already. A quick search using findFn("file extension") from the sos package revealed a few thousand hits. There’s a lot of noise in there, but I found a few promising candidates.

There’s removeExt in the limma package (you can find it on Bioconductor), strip_extension in Kmisc, remove_file_extension which has identical copies in both spatial.tools and gdalUtils, and extension in the raster.

To save you the time and effort, I’ve tried them all, and unfortunately they all suck.

At a bare minimum, a file extension stripper needs to be vectorized, deal with different file extensions within that vector, deal with multiple levels of extension (for things like “tar.gz” files), and with filenames with dots in the name other than the extension, and with missing values, and with directories. OK, that’s quite a few things but I’m picky.

Since all the existing options failed, I’ve made my own function. In fact, I went overboard and created a package of path manipulation utilities, the pathological package. It isn’t on CRAN yet, but you can install it via:

library(devtools)
install_github("richierocks/pathological")

It’s been a while since I’ve used MATLAB, but I have fond recollections of its fileparts function that splits a path up into the directory, filename and extension.

The pathological equivalent is to decompose a path, which returns a character matrixdata.frame with three columns.

The package also contains a few other path utilities. The standardisation I mentioned comes from standardise_path (standardize_path also available for Americans), and there’s a dir_copy function for copying directories.

It’s brand new, so after I’ve complained about other people’s code, I’m sure karma will ensure that you’ll find a bug or two, but I hope you find it useful.

assertive, my new package for writing robust code, is now on CRAN. It consists of lots of is functions for checking variables, and corresponding assert functions that throw an error if the condition doesn’t hold. For example, is_a_number checks that the input is numeric and scalar.

is_a_number(1) #TRUE
is_a_number("a") #FALSE
is_a_number(1:10) #FALSE

In the last two cases, the return value of FALSE has an attribute “cause” that indicates the cause of failure. When “a” is the input, the cause is “"a" is not of type 'numeric'.“, whereas for 1:10, the cause is “1:10 does not have length one.“. You can get or set the cause attribute with the cause function.

m <- lm(uptake ~ 1, CO2)
ok <- is_empty_model(m)
if(!ok) cause(ok)

The assert functions call an is function, and if the result is FALSE, they throw an error; otherwise they do nothing.

assert_is_a_number(1) #OK
assert_is_a_number("a") #Throws an error

There are also some has functions, primarily for checking the presence of attributes.

“Why would you want to use these functions?”, you may be asking. The dynamic typing and extreme flexibility of R means that it is very easy to have variables that are the wrong format. This is particularly true when you are dealing with user input. So while you know that the sales totals passed to your function should be a vector of non-negative numbers, or that the regular expression should be a single string rather than a character vector, your user may not. You need to check for these invalid conditions, and return an error message that the user can understand. assertive makes it easy to do all this.

Since this is the first public release of assertive, it hasn’t been widely tested. I’ve written a moderately comprehensive unit-test suite, but there are likely to be a few minor bugs here and there. In particular, I suspect there may be one or two typos in the documentation. Please give the package a try, and let me know if you find any errors, or if you want any other functions adding.