I saw Joe N.'s tweet asking me about a study of how professors spend their time, reported by Lisa Wade at Sociological Images. This is an anthropological study, something that I am not at all familiar with although the people in the field seem to believe that they can make statistically valid observations.

I'm glad the author of the study, John Ziker, wrote a (really) long article describing what he was trying to accomplish. The key point is that the study is a preliminary exploration, with important limitations; a follow-up study is planned which may give generalizable conclusions.

Here are some issues with the first study that makes a statistician nervous:

- the sample was between 14 and 30 professors (tiny): Wade reported it to be 16. Ziker definitely started with 30.

- the selection was non-random, based on the first 30 people who responded to a school-wide announcement

- about half the initial respondents did not complete the study, and provided only partial data (one to six days)

- despite the tiny sample, some analysis required slicing the data further into four segments by grade level! I wonder how many department chairs were in that sample. (See chart on right)

- each professor is followed for a two-week period but only every other day, thus each professor at most contributed one observation per day of week

- the interviews were every other day "so the time taken for the interview did not appear on the previous day’s report." This is a horrible problem to deal with! Because time allocation is the subject of the study, the measurement method (in-depth interviewing) interferes with the measured outcome. It seems to me impossible to believe that the time spent answering questions every other day did not affect time allocation on the non-interview days.

- Ziker reasoned: "While we cannot make a claim that all faculty have the same work patterns as our initial subject pool — they do not comprise a random sample — the results are highly suggestive because of the consistency across our subjects who did represent.". In order not to fall prey to the law of small numbers, a better way to say this is: we make the assumption that the small sample is representative on both mean value and dispersion, which then leads to the assumption that all faculty have consistent work patterns similar to the observed.

- "With our initial 30 Homo academicus subjects, we ended up with a 166-day sample with each day of the week well represented." I am assuming that Ziker did not drop the 16 professors with partial data and made charts like the one on the right by ignoring the identity of the professor and aggregating over days of the week. Let's review what lies behind this chart. Each respondent contributed at most one observation per day of week; about half of the respondents did not even contribute data for all seven days. So the time allocation on any particular day is averaged over anywhere from 14 to 30 professors. These professors span a variety of ranks, departments, tenure, backgrounds, etc. and were not randomly selected. It's hard for me to trust this chart at all.

***

In general, I am a big fan of shoe leather research in which the researcher goes out there and gather the relevant data they need to address their specific research question, rather than picking up what data they could find, and then tailoring the research question to avoid the imperfection in the data. So I don't want to sound too negative. It's a difficult research problem they are dealing with. What they learned from this first study is useful to inform future explorations but drawing conclusions at this stage is premature.

At the end of his article, Ziker described the "experience sampling" method that will form the next phase of this study. I am very excited about this methodology.

Roughly speaking, they will ask participants to install a mobile app, which pops questions from time to time asking them what they are doing at that moment. Instead of exhaustively tracking a small number of participants over the course of time, they will get little bits of data, incomplete schedules, for a large number of professors. If the sample is big enough and randomized appropriately, they can analyze the data ignoring the professor identity, and report results for the "average professor". This method also retains the other benefit of the original design, which is that the respondents report their activities close to the time in which they occurred.

Data scientists pay attention! You don't have to collect complete data at the user level to do proper research. Designs like this "experience sampling" approach produce statistically valid findings without the need for complete data. In fact, trying to collect complete data is counterproductive, leading to shaky conclusions as shown above.

Comments

You can follow this conversation by subscribing to the comment feed for this post.

Agreed, i think that almost everything that has relied on diary entries has been shown to be unreliable, because people just don't fill them in as they go, and then when they fill them in they adjust the figures to reflect what they see as being correct. It probably surprises people that a good reasonably sized sample beats a larger poor sample.

A lot of academic time is difficult to categorise. Is chasing up the TA to see when the assignments will be marked teaching or admin? Is reading your blog work or recreation? While it interests me it is quite useful to see your thoughts on graphics for when I teach. At the moment it is keeping me away from the R package I'm developing.