Analyzing Multiple Response Variables

I am seeking help on different approaches to analyzing multiple response variables (I have a dataset from a survey with many questions with responses that are checkboxes ("Check all that apply"). Sample dataset attached.

As a first approach, I am using PROC TABULATE and trying to follow these instructions. I am running into a problem, however. All my counts are the same and the percent is 100%.

Also, I am not very familiar w/ PROC TABULATE, are there other PROCS that would be better suited for this task? Ultimately, I would like to be able to investigate statistically significant differences in these multiple response variables across groups, say gender for example.

Re: Analyzing Multiple Response Variables

I see a DESCENDING option can be used in the CLASS statement, but do I need to use PROC SORT if I want the VAR variables listed in descending order (Frequency) in the absence of a CLASS variable?

Also, when I put a dichotomous variable in the CLASS statement, the output is unchanged and there are no errors in the log. Any clues?

I would expect your dichotomous variables as class to create two rows of values, one for the 0 values and another for the 1. Then the N and PCTN may work but Sum and mean shouldn't. The CLASS option Order=Freq will sort responses by descending order of N withineach class variable. That just means that the order of the 0 and 1 value rows may change for each of the variables. Show the code with the class variable that doesn't change the appearance.

There are different approaches depending on exactly what you want the table to look like.

They pretty much all involve summarizing the existing data in some way and then sorting that result on the N or Sum and then using the ORDER=DATA option.

If you are doing that for a large number of these variables then that might be worth it. If there is only the one set look at the result and then put the variables in the table statement in the order you would like and rerun the proc tabulate.

Re: Analyzing Multiple Response Variables

While the counts in the table are correct, the percentages are not. For example, in row 1 (See embedded image) 31 is 12.8% of the total, 242, not 24.2%. Is this because I am using the MEAN function?

Also, I think I figured out the situation w/ the CLASS variable, but I still would love for the table to be sorted in descending order and have a Total as the final row in each column (See uploaded image). Yes, I have a lot of these variables!

As far as the sorting goes, I see where you say I could sort "on the N or Sum" but those are not variables in the dataset, correct? Would I have to add them first before I can sort on them?

Re: Analyzing Multiple Response Variables

While the counts in the table are correct, the percentages are not. For example, in row 1 (See embedded image) 31 is 12.8% of the total, 242, not 24.2%. Is this because I am using the MEAN function?

Also, I think I figured out the situation w/ the CLASS variable, but I still would love for the table to be sorted in descending order and have a Total as the final row in each column (See uploaded image). Yes, I have a lot of these variables!

As far as the sorting goes, I see where you say I could sort "on the N or Sum" but those are not variables in the dataset, correct? Would I have to add them first before I can sort on them?

Thank you!!

The approach I showed calculates a row percentage. You did not specify which percentage you might be interested in: Row, Column, Page or table. It was not possible from your original code which you were attempting. Given the example data I picked what made the most sense to me.

Also since your initial post indicated 170 records (the N value) I am not sure where your are getting 242 as a denominator. The example CSV you posted only had 172 also.

Is gender missing for any of your records? Tabulate will remove any record with missing values of class variables unless the "/ missing" option is used on the class statement. And if the actual response is missing (does not have a 0 or 1 value) the calculation will exclude them as well just as proc means does for calculating a mean.

Perhaps you need to provide some example data in the form of a data step so we have a common data set to run code against. It does not need to be very large maybe about 15 rows or so, with 2 or 3 of the analysis variables. Something like:

The sort order means you need to summarize your data and have a single variable that contains the row header. And if you are creating columns based on gender and total which order do you want? The All, female or male?

<The approach I showed calculates a row percentage. You did not specify which percentage you might be interested>

I see. Apologies for not being more clear. Is there a way to specify which percentages to display? I may want to be flexible down the road.

<The example CSV you posted only had 172 also.>

Yes, 172 observations, but each variable is a response/answer to a question. The responses to the questions are checkboxes ("Check all that apply" See image below). The percentages I am seeking are percentages of responses, as opposed to the percentage of respondents. The number of responses will be greater than the number of respondents.

I would like to be able to make statements about the frequency of responses, like: The most preferred method of administration was "Oral - Sublingual or intra-oral (mucosal) absorption)", 60 responses out of a total 249 of responses (242 in my last post. Sorry!), 24.1%.

I would then like to be able to compare these frequencies across different CLASS variables, like gender, and ultimately perform hypotheses tests.

<Is gender missing for any of your records?>

Yes.

<Perhaps you need to provide some example data>

I have uploaded a new dataset with gender.

<which order do you want?>

I want the order to apply to the rows. In other words, I would like pref_which___1 - pref_which___7 to be sorted by count/sum (i.e. pref_which___4, then pref_which___3, etc.). And, ideally, add a Total as a row as shown in the image of the table above.