frequently applied recoding and variable transformation tasks, also with support for labelled data

In this post, I want to introduce the topic of labelled data and give some examples of what the sjmisc-package can do, with a special focus on tagged NA values.

Introduction into Labelled Data

Labelled data (or labelled vectors) is a common data structure in other statistical environments to store meta-information about variables, like variable names, value labels or multiple defined missing values. Labelled data not only extends R‘s capabilities to deal with proper value and variable labels, but also facilitates the representation of different types of missing values, like in other statistical software packages. Typically, in R, multiple declared missings cannot be represented in a similar way, like in ’SPSS’ or ‘SAS’, with the regular missing values. However, Hadley Wickham’s haven package introduced tagged_na values, which can do this. Tagged NA’s work exactly like regular R missing values except that they store one additional byte of information: a tag, which is usually a letter (“a” to “z”) or also may be a character number (“0” to “9”). This allows to indicate different missings.

get_labels() also returns “labels” of factors, even if the factor has no label attributes. This is useful, if you need a generic method for your functions to get value labels, either for labelled data or for factors.

x

Tagged missing values can also be included in the output, using the drop.na argument.

# get labels, including tagged NA values
x

Getting labelled values

The get_values() method returns the values for labelled values (i.e. values that have an associated label). We still use the vector x from the above examples.

With the drop.na argument you can omit those values from the return values that are defined as missing.

get_values(x, drop.na = TRUE)
# [1] 1 4

Setting value labels

With set_labels() you can add label attributes to any vector. You can either return a new labelled vector, or label an existing vector.

x

To add explicit labels for values, use a named vector of labels as argument.

x

Missing Values

Defining missing values

set_na() converts values of a vector or of multiple vectors in a data frame into tagged NAs, which means that these missing values get an information tag and a value label (which is, by default, the former value that was converted to NA). You can either return a new vector/data frame, or set NAs into an existing vector/data frame.

Conclusions

Labelled data vastly extends R‘s capabilities to deal with value and variable labels. The sjmisc-package offers a collection of convenient functions to work with labelled data, which might be of interest especially for users coming from other statistical packages like SPSS, who want to switch to R. Packages like sjPlot facilitate the features of labelled data, making it easy to produce well annotated plots (see these vignettes for various examples). A slightly more comprehensive introduction into the sjmisc-package can be found here.