Main menu

loops, matrices, and apply functions

Loops are great. They save us lots of work and they solve all sorts of problems. Sometimes, however, there are better ways of going about things. In the first place, we are often using loops to implement matrix operations. This is important to keep in mind when working in a language such as R which allows you to handle matrices directly. Loops can also be memory-intensive, hence why R gurus tend to encourage the use of apply-style functions whenever possible. These points can be illustrated by working through the following:

This example is drawn from Ward and Gledtisch (2008: 47) who use the code above to calculate the equilibrium effects associated with a spatial lag model depicting the relationship between democracy and GDP per capita. While there are pedagogical reasons to prefer this code, there are also a number of places in which it can be cleaned up.

First, the code for creating the identity matrix eye can be reduced to a single line:

eye <- diag(dim(sldv)[1])

The trick here is that the diag command can (a) be used to create diagonal matrices and (b) takes an identity matrix as its default, such that you only need to enter a value for n which, in this case is equal to the number of rows in sldv, as given by dim(sldv)[1].

Second, as far as the loop at the bottom goes, there is no need to create a vector of change scores for each observation and, hence, no need for a loop. You can just use the same identity matrix from before:

ee.est <- solve(eye - 0.56315 * wmat) %*% eye * 0.99877

Using this approach, it is no longer necessary to define ee.est ahead of time, though you may still want to label the cases.

Note that the original example refers to the equilibrium effects of a single covariate, namely GDP per capita. What if we wanted to estimate the equilibrium effects for all of the covariates in the model? For example, what if we wanted to estimate the equilibrium effects associated with a model depicting the number of residential burglaries and vehicle thefts in neighborhoods in Columbus, Ohio as a function of housing value and household income? My initial inclination was to loop over the set of coefficients (minus the intercept) and populate a NULL object by using cbind to join each additional set of estimates to the object in question. This, however, is precisely what the sapply function is designed to do. The code below generalizes the Ward and Gleditsch code to the case of multiple coefficients using sapply instead of a loop:

There are a couple of other differences between this code and the Ward and Gledtisch example. First, I’m generating the identity matrix on the fly using diag(nrow(w)) where nrow(w) is used to calculate the value of n (i.e. I’m pulling the sample size from the weights matrix as opposed to the data frame). Second, I’m only dealing with the diagonal of the spatial multiplier. As I’ll discuss at some point in the future, this throws out a lot of really important information. That said, this information isn’t used in the Ward and Gleditsch code shown above and I like the idea that estimating equilibrium effects is equivalent to simply rescaling the vector defined by the diagonal of the spatial weights matrix.

Nice post. R really likes apply-like functions, and for people who want
to start using them more, I’ve had good luck with the plyr package
(http://plyr.had.co.nz/). It was written by Hadley Wickham, and it makes
the apply-like functions much easier to work with. Plyr also has nice
built-in support for parallelization.