Monday, November 29, 2010

This post documents an example of using Sweave
to generate individualised personality reports based on
responses to a personality test.
Each report provides information on both the responses of the general
sample and responses of the specific respondent.
All source code is provided, and selected aspects are discussed,
including makefiles use of \Sexpr, figures, and LaTeX tables using Sweave.

When importing the data, I have adopted the useful convention
(which I observed from John Myles White's ProjectTemplate Package)
of naming objects and data file names the same.
The file extension also clearly indicates the file format (i.e., tab-separated-values).

I often have separate data and meta folders.
Importing metadata often makes for more manageable code than when
incorporating metadata by hard coding it into the R script.

The psych package has a number of useful functions for psychological research.
score.items is particularly good.
It enables the creation of means and totals for multiple scales.
It handles item reversal.
It also returns information related to the reliability of the scales.

The above code provides the function to run Sweave on each
individualised report

the code is a little bit messy, contains a few hacks,
and is not especially robust.

the exportReport function takes an id value as an argument x.
Note the use of the alternative assignment operator.
(See ?assignOps)

The code is designed to keep derived files away from source files
by copying files into the .output folder and
even changing the working directory to that directory.

The code creates an individualised copy of the Rnw file;
Runs Sweave on the report to produce a tex file,
and then runs texi2dvi with pdf=TRUE to produce the final pdf.

Report_Template.Rnw

The Rnw file contains interspersed chunks of LaTeX and R code.

Because the Rnw file is called from within R,
all the R objects and data processing code does not need to be called
at the start of the Rnw file.
This approach is one way of reducing the time it takes to run
a set of Sweave reports all based on a common data source.

The \Sexpr{} command is used to incorporate in-line text.
(... sample of \Sexpr{nrow(ipip)} students ...).
In the example above, it prints the actual number of cases
into the ipip data.frame (i.e., the sample size).

The code above takes a while to run (perhaps around 10 seconds on my machine).
But the resulting plot is more attractive than what I could easily get with
base graphics.

<<plot_scale_distributions, fig=true>>= indicates the start of
an R code chunk.
fig=true lets Sweave know that it has to produce code to include a figure.

The R code chunk is substituted with
\includegraphics{Report_Template_ID10-plot_scale_distributions}
in the tex file and the pdf and eps figures are created.
Thus, if you want a float with captions and labels,
you have to add them around the R code chunk.

the plotScale function is used to generate a ggplot2 figure
of the distribution of scores on each personality scale along with
a marking of the respondent's score on each scale.

The arrange function is used to layout multiple ggplot2 figures
on a single plot.
The source code is in the lib/vp.layout.R and was taken from
a [post by Stephen Turner(
http://gettinggeneticsdone.blogspot.com/2010/03/arrange-multiple-ggplot2-plots-in-same.html)

I often find it useful to split R code chunks for
table preparation and table presentation.
In general this allows any text that appears before the table to include
\Sexpr{} commands incorporating figures from the analyses which
generate the table.
In the present case, it was useful because the table was split over two pages.

The code shows some of the general logic I use for customised table creation.
In hindsight I could probably refactor it into a function so that I don't have to always type ipiptable which would make things a little more concise

The general process of table creation involves:
(a) extracting information on cells with cells often grouped into types which will receive common formatting treatment
(b) formatting cells (e.g., rounding, decimals, and so on)
(c) assembling the cells typically using a combination of the functions
rbind and cbind
(d) Inserting tex column and end of row separators with something like:
paste(apply(x, 1, function(X) paste(X, collapse = " & ")), "\\\\")
where x is the matrix of table cells.

This text was used in both tables.
Thus, this text can then be called using \Sexpr{ipiptable[["caption"]]}.
This follows the DRY principle (Don't Repeat Yourself).
Thus, if the caption needs to be modified, it only needs to be modified in one place.

1 comment:

Thank you for this example Jeromy. I am using parts of this code for generating some dynamic reports of my own and will attribute it as you see fit.

I had a question I will be glad if you can answer: I want to use some of the variables of my database besides the ID in the final reports. I am not sure about declaring the other variables as global variables since the <<- notation is not working as expected with them - the first element of my vector gets printed in each of my .pdf files instead of the one corresponding to the ID. I am a Sweave and R newbie, so please excuse me if I am overlooking something trivial.

I am a lecturer at Deakin University bridging I/O psychology and statistics. My blog contains 100+ posts focused on data analysis in the social sciences.
If you're new, check out the
Site Map.
If you love R, check out the
40+ posts on R. If you want to follow the blog, see the RSS and email subscription options.

Disclaimer

This page, its contents and style, are the responsibility of the author and do not necessarily represent the views, policies or opinions of any current, present, or future employer.The information on this internet site is provided without any express or implied warranty as to its accuracy or currency. Any use of this information is at your own risk.