The story has attracted some media attention, so I am using this blog article to correct a few errors and provide more information.

Corrections

1) Some stories are reporting that my paper was published by MIT Technology Review. That is not correct. It was published in arXiv, a "a repository of electronic preprints of scientific papers in the fields of mathematics, physics, astronomy, computer science, quantitative biology, statistics, and quantitative finance, which can be accessed online." It is not a peer-reviewed venue. My paper has not been peer reviewed.

2) I am a Professor of Computer Science at Olin College of Engineering in Needham, Massachusetts.

3) My results suggest that Internet use might account for about 20% of the decrease in religious affiliation between 1990 and 2010, or about 5 million out of 25 million people.

Note that this result doesn't mean that 5 million people who used to be affiliated are now disaffiliated because of the Internet. Rather, my study estimates that if the Internet had no effect on affiliation, there would be an additional 5 million affiliated people.

Q&A

The statistical analysis in this paper is not just a correlation between two time series. That would tell us very little about causation.

The analysis I report is based on more than 9000 individual respondents to the General Social Survey (GSS). Each respondent answered questions about their Internet use and religious affiliation, and provided demographic information like religious upbringing, region, income, socioeconomic status, etc.

These variables allowed me to control for most of the usual confounding variables and isolate the association between Internet use and religious affiliation.

Now, there are still two alternative explanations. Religious disaffiliation could cause Internet use. That seems less plausible to me, but it is still possible. The other possibility is that a third factor could cause both Internet use and disaffiliation. But that factor would have to be new or increasing since 1990, and it would have to be uncorrelated or only weakly correlated with the control variables I included.

I can't think of any good candidates for a third factor like that, and I have not heard any candidates that meet the criteria. So I think it is reasonable to conclude that Internet use causes disaffiliation.

Tech Review was one of the first places to pick up the story. Some sites are reporting, incorrectly, that my research was published in MIT Technology Review.

Correlation and causation

By far the most common response to my paper is something like what Matt McFarland wrote:

This is an interesting read, but the problem with the theory is that correlation isn’t causation. With so much changing since 1990, it’s difficult to conclude what variables were factors, and to what degree.

Let me address that in two parts.

1) My study is not based on simple correlations. I used statistical methods (specifically logistic regression) that are designed to answer the question Matt asks: which variables were factors and to what degree? I controlled for all the usual confounding factors, including income, education, religious upbringing, and a few others.

My results show that there is an association between Internet use and disaffiliation, after controlling for these other factors.

2) The question that remains is which direction the causation goes. There are three possibilities:

a) Internet use causes disaffiliation
b) disaffiliation causes Internet use
c) some other third factor causes both

In the paper I argue that (a) is more likely than (b) because it is easy to imagine several ways Internet use might cause disaffiliation, and harder to imagine how disaffiliation causes Internet use.

Similarly, it is hard to think of a third factor that causes both Internet use and disaffiliation. In order to explain the data, this new factor would have to be new, or changing substantially between 1990 and 2010, and it would have to have a strong effect on both Internet use and disaffiliation.

There are certainly many factors that contribute to disaffiliation. According to my analysis, Internet use might account for 20% of the observed change. Another 25% is due to changes in religious upbringing, and 5% is due to increases in college education. That leaves 50% of the observed changes unexplained by the factors I was able to include in the study.

If you think that Internet use does not cause disaffiliation, it is not enough to list other causes of disaffiliation, or other things that have changed since 1990. To make your argument, you have to find a third factor that

a) Is not already controlled for in my analysis,
b) Is new in 1990, or began to change substantially around that time, and
c) Actually causes (not just associated with) both Internet use and disaffiliation.

So far I have not heard a candidate that meets these criteria. For example, some people have suggested personality traits that might cause both Internet use and disaffiliation. That's certainly possible. But unless those traits are new, or started becoming more prevalent, in 1990, they don't explain the recent changes.

Questions from the media

1. Why did you initiate this study?

As a college teacher, I have been following the CIRP Freshman Survey for several years. It is a survey of incoming college students that asks, among other things, about their religious preference. Since 1985 the fraction of students reporting no religious preference has more than tripled, from 8% to 25%. I think this is an underreported story.

About two years ago I started working with data from the General Social Survey (GSS), and realized there was an opportunity to investigate factors associated with disaffiliation, and to predict future trends.

2. How did you gather your data?

I am not involved in running either the Freshman Survey or the GSS. I use data made available by the Higher Education Research Institute (HERI) and National Opinion Research Center (NORC). Obviously, their work is a great benefit to researchers like me. On the other hand, there are always challenges working with “found data.” Even with surveys like these that are well designed and executed, you never have exactly the data you want; it takes some creativity to find the data that answer your questions and the questions your data can answer.

3. Could you elaborate on your overall findings?

I think there are two important results from this study. One is that I identified several factors associated with religious disaffiliation and measured the strength of each association. By controlling for things like income, education, and religious upbringing, I was able to isolate the effect of Internet use. I found something like a dose-response curve. People who reported some Internet use (a few hours a week) were less likely to report a religious preference, by about 2 percentage points. People who use the Internet more than 7 hours per week were even less likely to be religious, by an additional 3 percentage points. That effect turns out to be stronger than a four-year college education, which reduces religious affiliation by about 2 percentage points.

With this kind of data, we can’t know for sure that Internet use causes religious disaffiliation. It is always possible that disaffiliation causes Internet use, or that a third factor causes both. In the paper I explain why I think these alternatives are less plausible, and provide some additional analysis. Based on these results, I conclude, tentatively, that Internet use causes religious disaffiliation, but a reasonable person could disagree.

In the second part of the paper I use parameters from the regression models to run simulations of counterfactual worlds, which allows me to estimate the number of people in the U.S. whose disaffiliation is due to education, Internet use, and other factors. Between 1980 and 2010, the total decrease in religious affiliation is about 25 million people. About 25% of that decrease is because fewer people are being raised with a religious affiliation. Another 20% might be due to increases in Internet use. And another 5% might be due to increases in college education.

That leaves 50% of the decrease unexplained by the factors I was able to include in the study, which raises interesting questions for future research.

4. Are you using this research for anything specific?

I do work in this area because a lot of people find it interesting; I also use it as an example in my classes. I teach Data Science at Olin College of Engineering. I want my students to be prepared to work with real data and use it to answer real questions. I use examples like this to demonstrate the tools and to show what’s possible as data like this becomes more readily available.

5. Based on your research, are you able to hypothesize what might happen to religion in America in 50, 100, or more years?

The most likely changes between now and 2040 are: the fraction of people with no religious preference will increase to about 25%, overtaking the fraction of Catholics, which will decline slowly. The fraction of Protestants will drop more quickly, to about 45%.

These predictions are based on generational replacement: the people in the surveyed population will get older; some will die and be replaced by the next generation. Most adults don’t change religious affiliation, so these predictions should be reliable.

But they are based on generational replacement only, not on any other factors that might speed up or slow down the trends. Going farther into the future, those other factors become more important.

6. Is there anything in particular you think people should know/understand about your research and findings?

Again, it’s important to remember that my results are based on observational studies. With that kind of data, we don’t know for sure whether the statistical relationships we see are due to causation. In this case I think we can make a strong argument that Internet use causes religious disaffiliation, but a reasonable person could disagree.

My paper includes some analysis that is pretty standard stuff, like logistic regression. But I also used methods that are less common; for example, using parameters from the regression models, I ran simulations of counterfactual worlds, which allowed me to estimate the number of people in the U.S. whose disaffiliation might be due to education, Internet use, and other factors.

7. Although your research cannot determine for sure that the Internet causes less religious affiliation, what about it might you speculate could be decreasing religion?

In the paper I wrote “it is easy to imagine at least two ways Internet use could contribute to disaffiliation. For people living in homogeneous communities, the Internet provides opportunities to find information about people of other religions (and none), and to interact with them personally. Also, for people with religious doubt, the Internet provides access to people in similar circumstances all over the world.”

These are speculations based on anecdotal evidence, not the kind of data I used in the statistical analysis. One place I see people from different religious background engaging on the Internet is in online forums like Reddit. Here's an example from just a few hours ago.

8. Do you think that Internet will advance in this secularization process?

There are two parts of secularization, disaffiliation from organized religion and decrease in religious faith. The data I reported in my paper provide some evidence that the Internet is contributing to disaffiliation in the U.S. I haven’t had a chance look into the related issue of religious faith, but I am interested in that question, too.

9. The access to a wider range of information would be one of the explanations for the decrease in religious affiliation?

Again, I don’t have data to support that, but it seems likely to be at least part of the explanation.

More questions

There is a difference between those who are religiously affiliated (belong to or active with a church, for example) and those who consider themselves spiritual or religious. Can you clarify what you’re talking about?

Yes, good point! My paper is only about religious affiliation, or religious preference. The GSS also asks about religious faith and spirituality, but I have not had a chance to do the same analysis with those variables.

I have seen other studies that suggest that belief in God, other forms of religious faith, and spirituality are not changing as quickly as religious affiliation. But I don't have a good reference handy.

7 comments:

I think you've found an internet effect, but I find the dose response unconvincing. I'm not familiar with self information of selection, but it appears to be based on information-theory which the more widely used information criteria are based on (AIC, BIC, etc.). The general idea appears to be the same--comparing the improvement in the value of the likelihood and penalizing for parameters.

If you were to use AIC here, you would probably find your dose response model had a lower AIC and so had more support but I bet it would be within 2-3 AIC units of the non-dose response model. <2 AIC units amounts to no support when just adding 1 additional parameter and 2-4 AIC units amounts to weak support, in which case, it is still difficult to read much into the effect. Some form of multimodel inference or at least using criterion for model comparison would be helpful in determining what is going on (relative fit, not just choosing the top model).

From a frequentist perspective, your CIs for the two levels of internet use overlap widely. I don't see much evidence of a dose response. Further, the odds ratio of interest is probably www7 vs. www2. I wouldn't be surprised if this CI overlaps 0.

That's a tough one. Is this for the models of those raised religiously only or the whole population? Rules of thumb from Burnham and Anderson (2002)- models within 2 AIC units have about equal support. Models that are 4-7 AIC units above the next best model have moderate support and models with >10 AIC units from the top model have have essentially no support.

So you're right on the boundary of moderate support, admittedly more than I was expecting. Given this further information, I think what you've done is reasonable. My intuition is that you are finding a signal of a real effect, it's just a sample size problem that limits the confidence that you've found one in this data set.

Internet usage is an ordinal variable. What happens if you treat it as continuous? Code the categories 0, 1, and 2. You assume the difference between 0 and 1 are the same as the difference between 1 and 2. This is restrictive, but appears reasonable based on your point estimates and can often lead to a more parsimonious model. Of course, this may just give you more model uncertainty.

Thanks for this interpretation. I also ran models with Internet usage as a continuous variable. I don't remember the detail, but the results were consistent with the models I reported. I chose two discrete levels for parsimony, as you suggested, and also because there were some differences in how the question was asked from year to year; the two levels allowed me to recode the responses consistently.

Interesting question. It would be hard to demonstrate an effect using the data I have. Looking at the time series data, I don't see an obvious effect after 2001, and even if there were, it would be hard to demonstrate causation. And looking at individual responses, I can't distinguish people who did and did not hear about the 9/11 attacks (if there is anyone in the second group).

It would be nice to believe that people educated after 9/11 are learning more about world religions, but Stephen Prothero might have some cold water to throw on that belief.