Sebastian Sauer

January 06, 2017

Reading time ~12 minutes

Thanks to my student Marie Halbich who took the pains to collect the data!

At times, your data set will be in “wide” format, i.e, many columns in comparison to rows. For some analyses however, it is more suitable to have the data in “long” format. That is, many rows in comparison to columns.

This is the data from a study tapping into the effect of computerized “beautification” of some faces on subjective “like”.

In short, two strawpersons were present either in “natural form” or “beautified” with some computer help. Note that the design is a between-group design with the factor “strawpersons” (A and B) and the factor “presentation” (natural vs. beautified).

On closer inspection, we recognize a great number of missing values. In fact, the data frame is structured like this:

library(knitr)include_graphics("facial_beauty.png")

Where each blue rectangle represents the “core” data set for one of the four conditions (mentioned above). All the grey area represents vast deserts of NAs.

That said, the data set would be much nicer if the four “core data sets” would be aligned beneath each other like this:

library(knitr)include_graphics("beauty_aligned.png")

Of course, this would demand that the columns be the same in each blue square. And we will need a column indicating the value of the factor “strawperson” and a columnd for “presentation”. We don’t need column headers for times, only once, as shown on the diagram.

Cutting out the sub data frames

So we will “cut out” each sub data frame (blue rectangle) and stick them together one beneath another.

Note that the | operator means OR (logical OR). So we way that we want any raw where there is no NA at girl_C_unbearbeitet_Likes or no NA at girl_C_unbearbeitet_Dislikes or no NA at girl_C_unbearbeitet_Superlikes.

Now we have our first “data cubicle”. Let’s repeat that for the other 3 data cubicles.

Adjust names of sub data frames

Before we can bind the sub data frames together, we have to make sure the names of the columns are identical. So let’s do that now. In addition, we need to save the information about the study factors (natural vs. processed; girl A vs. girl C).