Data for Life :)

Main menu

Post navigation

Finding Patterns Amongst Binary Variables with the homals Package

It’s survey analysis season for me at work! When analyzing survey data, the one kind of analysis I have realized that I’m not used to doing is finding patterns in binary data. In other words, if I have a question to which multiple, non-mutually exclusive (checkbox) answers apply, how do I find the patterns in peoples’ responses to this question?

I tried apply PCA and Factor Analysis alternately, but they really don’t seem well suited to the analysis of data consisting of only binary columns (1s and 0s). In searching for something that works, I came across the homals package. While the main function is described as a “homogeneity analysis”, its one ability that interests me is called “non-linear PCA”. This is supposed to be able to reduce the dimensionality of your dataset even when the variables are all binary.

Well, here’s an example using some real survey data (with masked variable names). First we start off with the purpose of the data and some simple summary stats:

It’s a group of 6 variables (answer choices) showing peoples check-box responses to a question asking them why they donated to a particular charity. Following are the numbers of responses to each answer choice:

As you can see, it extracts 2 dimensions by default (it can be changed using the “ndim” argument in the function), and it gives you what looks very much like a regular PCA loadings table.

Reading it naively, the pattern I see in the first dimension goes something like this: People tended to answer affirmatively to answer choices 1,4,5, and 6 as a group (obviously not all the time and altogether though!), but those answers didn’t tend to be used alongside choices 2 and 3.

In the second dimension I see: People tended to answer affirmatively to answer choices 3 and 4 as a group. Okay, now as a simple check, let’s look at the correlation matrix for these binary variables:

The first dimension is easy to spot in the “V1″ column above. Also, we can see the second dimension in the “V3″ column above – both check out! I find that neat and easy. Does anyone use anything else to find patterns in binary data like this? Feel free to tell me in the comments!