Abstract:Carl (2017) recently published a report claiming that individuals with left-wing and liberal views are overrepresented in British academia. One weakness of this report was that it relied almost exclusively on party support data. Using data from the 2015 wave of the British Election Study Panel, the present study confirms that the political attitudes of British academics are somewhat more economically left-wing (0.38sd), and are substantially more socially liberal (0.84sd), than those of the general population. It also documents that British academics are substantially more likely to read TheGuardian newspaper (the UK’s most left-liberal newspaper) than members of the general population (31 ppts). Adjusting for demographic characteristics, education and openness to experience reduces the difference on social liberalism by 0.20sd, and reduces the difference on Guardian readership by 5 ppts, but increases the difference on economic leftism by 0.07sd.

I read your study with interest. In general I think it is promising but needs some work. Find my comments below.

Data
The data are not available. I take it that this is due them being secret/apply-only. Please note that if so. If not, please upload them to OSF (just the relevant variables if necessary due to size limitations). The paper notes that the data are available. This is not true.

---

Quote:This report was subjected to several criticisms

Subject?

--

Quote:Using data from the 2015 wave of the British Election Study Panel, the present study confirms that the political attitudes of British academics are indeed both more left-wing and more liberal than those of the general population

I note that this uses the economist tradition of stating the findings in the introduction. Seems redundant as they were already mentioned in the abstract above, meaning they get repeated 3 times: abstract, intro, conclusion.

--

Quote:(n = ~30,000)

Use ≈

--

Quote:the reference category for this variable can be considered to be the general population

It would be more proper to call this "non-academics". For some analyses, it will matter whether one excludes 107 persons or not. This happens in when academics are highly discrepant from the non-academics in some variable, especially if it concerns a discrete variable.

The general population is, well, generally taken to refer to everybody.

--

Quote:(respondents who said that they did not read a daily newspaper were coded as missing)

Careful. Now you are contrasting Guardian readers vs. newspaper readers who are not Guardian readers. This is quite different from Guardian readers vs. non-Guardian readers.

Was O a single item, or based on multiple OCEAN/Big five items? Presumably single-item ratings have strong measurement error, and quite likely substantial systematic error related to self-concept.

--

Quote:An important caveat is that two of the results in Table 1 (specifically, those in the first and second columns) were not robust to applying sampling weights; indeed, they were rendered non-significant by doing so. (Full weighted results are given in Appendix B.)

Judging from the table data, it seems likely that the p values are just calculated incorrectly. Perhaps it uses the sampling weights as df.

E.g. a predictor in original model with beta = .38 had a p value of .05 when beta = .28.

Under the most pessimistic situation, the p value of the first is .00099, which gives a z of -3.093. This implies the SE is 0.123. To find the cutoff for p =.05, thus, we find z = 1.96, which is at about .24. Thus, baring changes in predictor relations, the beta of .28 should have p somewhat below .05, even under the worst possible assumption.

--

Quote:This is somewhat surprising, since one would have expected that, if the difference observed in the first column were attributable to non-random sampling, then it would have
disappeared after controlling for demographic characteristics such as age, gender, ethnicity and region (as in the second column).

Yes. It has to result from non-random sampling with some variable not among these or strongly related to them. Candidates?

--

Quote:Appendix C shows that the distribution of party support among academics in the BES is more similar to the distribution of party support among academics in Understanding Society

Please quantify this, e.g. Pearson correlation.

--

Quote:Table 1. Estimates from OLS models of economic leftism

Please report correlation matrix of all primary variables. Remember that some readers may be more interested in other predictors, e.g. for meta-analysis.

Did you do the coding correctly this time? Dummy variables cannot be entered as numeric variables. You must set them to dummy status, otherwise the model treats them as numeric variables and underestimates the effect sizes.

It would be useful to show all the betas (in appendix if you want). Readers cannot compare the relative importance of variables when they are grouped and betas not reported, aside indirectly via delta R2.

--

Quote:Age, age squared,

Age^2 is not a good way to control for non-linear effects. Please use a spline or similar flexible method. You have sufficient sample size for this.

--

Quote:Education

Note that including education as a co-predictor is problematic. It is likely that there is political preference-based self-selection into higher education. Thus, including it as a predictor controls for a mediator to some degree.

The same is true for gender and region.

If you used a path/sem model you could model this appropriately.

--

Quote:Openness to experience

What about the overlap of measurement criticism? Some O items concern political stuff very similar or identical to your social items.

Quote:I note that this uses the economist tradition of stating the findings in the introduction. Seems redundant as they were already mentioned in the abstract above, meaning they get repeated 3 times: abstract, intro, conclusion.

I have altered the relevant sentence so that it now says:

"the present study explores whether the political attitudes of British academics are indeed both more left-wing and more liberal than those of the general population."

Quote:Use ≈

This has been changed.

Quote:It would be more proper to call this "non-academics".

I have altered the relevant sentence so that it now says:

"Insofar as academics comprise such a small share of the sample (0.3%), the reference category for this variable can be considered to be the general population, although strictly speaking it represents all nonacademics (99.7%)."

Quote:Careful. Now you are contrasting Guardian readers vs. newspaper readers who are not Guardian readers.

I have added the following sentence:

"The reference category for this variable is therefore the population of individuals who read some other daily newspaper."

Quote:Was O a single item, or based on multiple OCEAN/Big five items?

I have added the following sentence:

"The latter measure is based on the Ten Item Personality Test (TIPI; Gosling et al., 2003), and is included in the dataset as a single variable scaled from 0–10."

Quote:Judging from the table data, it seems likely that the p values are just calculated incorrectly.

The p-values for weighted estimates were computed by Stata, and I believe they are correct. Weighting affects the standard errors, as well as the point estimates.

Quote:Yes. It has to result from non-random sampling with some variable not among these or strongly related to them. Candidates?

Not sure, unfortunately.

Quote:Please quantify this, e.g. Pearson correlation.

I have added a footnote on p. 5, which states the following:

"The correlation between the unweighted distribution from the BES and the average of the two distributions from Understanding society is r = .93 for both the broad and narrow definitions of party identity. By contrast, the correlation between the weighted distribution from the BES and the average of the two distributions from Understanding society is r = .65 for the broad definition of party identity and r = .64 for the narrow definition."

Quote:Please report correlation matrix of all primary variables. Remember that some readers may be more interested in other predictors, e.g. for meta-analysis.

I would prefer not to report this. If readers want to find out the bivariate correlations, they can download the data, and run my Stata code.

Quote:Age^2 is not a good way to control for non-linear effects.

I have now included dummies for age quintiles in the models instead, but it made essentially no difference.

Quote:Note that including education as a co-predictor is problematic... What about the overlap of measurement criticism? Some O items concern political stuff very similar or identical to your social items.

I have added the following statements on p. 4:

"Note that the reason for utilizing education and openness to experience is that each has been posited to at least partially account for the left-liberal skew of academia (see Gross, 2013; Duarte et al., 2014; Carl, 2017). I.e., it has been asserted that academics tend to be have more left-liberal attitudes due to their higher education and greater openness to experience. Including these variables as covariates in a multiple regression analysis allows one to estimate how much of the skew they do in fact account for."

Quote:Are these R2 adjusted or not?

Non-adjusted, but it makes very little difference.

Quote:You cannot use OLS for a binary outcome! Please use a logistic model.

I got this criticism from another reviewer recently, and I disagree. So I will repeat what I said to that reviewer:

The linear probability model (LPM; i.e., OLS with a binary dependent variable) is widely used in the economics literature, and is now preferred to logit and probit by many econometricians. The two main reasons are: greater interpretability, and lack of small sample bias that afflicts maximum likelihood estimation when specifying fixed effects.

The conventional criticism of the LPM, namely that predicted probabilities may fall outside the interval 0–1, is not relevant if one’s purpose is simply to estimate the marginal effect of an independent variable. As Wooldridge (2002) notes in his seminal textbook on econometrics (Econometric Analysis of Cross-Section and Panel Data):

“If the main purpose is to estimate the partial effect of [the independent variable] on the response probability, averaged across the distribution of [the independent variable], then the fact that some predicted values are outside the unit interval may not be very important.”

Similarly, in the blog for their own econometrics textbook (Mostly Harmless Econometrics), Angrist and Pischke (2012) write:

“If the conditional expectation function (CEF) is linear, as it is for a saturated model, regression gives the CEF – even for LPM. If the CEF is non-linear, regression approximates the CEF. Usually it does it pretty well. Obviously, the LPM won’t give the true marginal effects from the right nonlinear model. But then, the same is true for the “wrong” nonlinear model! The fact that we have a probit, a logit, and the LPM is just a statement to the fact that we don’t know what the “right” model is. Hence, there is a lot to be said for sticking to a linear regression function as compared to a fairly arbitrary choice of a non-linear one! Nonlinearity per se is a red herring.”

Moreover, as the economist Marc Bellmere (2013) notes on his blog (please see also Allison, 2012; Smart, 2013): “The probit and logit are not well-suited to the use of fixed effects because of the incidental parameters problem.”

1. Consider including the text of the two O questions from TIPI, so that readers can judge the issue of construct overlap.
2. Consider examining the fitted values when you for your OLS model for binary outcome. Are they outside the possible range? If not, I guess you don't have problems. If they are, then it may be a problem.