Of shoes -- and ships -- and sealing wax --
Of cabbages -- and kings --
And why the sea is boiling hot --
And whether pigs have wings.

Tuesday, October 31, 2006

Don't buy a computer, but if you do

Over at India Uncut, Amit Varma points to study by Todd Kendall which claims to demonstrate that the spread of the internet causes a decline in the incidence of rape. Kendall's argument is that the Internet makes pornography more widely available, especially to men in the 15-19 age group, and serves as a substitute for sexual violence.

It's an interesting paper, but reading it I had several concerns.

My biggest problem with Kendall's empirical results is that he runs a regression with both % of households accessing the Internet and % of households with computers as independent variables, and while the sign for Internet access is significant and negative, the sign for households with computers is significant and positive (it's the most significant variable in his regression) - a fact that Kendall conveniently neglects to mention in his paper, let alone provide an explanation for. Kendall's justification for including % of households is to seperate the effect of the Internet from other technological influences, but that would imply a non-significant effect of owning computers but a significant effect of Internet access. As it is, we have two variables in the right hand side of the equation that we would expect to be highly correlated (Kendall does not bother to provide us with a correlation table, but computer ownership and internet access pretty much have to be positively correlated) and they enter the regression with signs that are opposite and significant. Personally, I'd love to see what happens to the coefficient on Internet access if Kendall runs his regression without % of households with computers in there. I'm unconvinced that it would continue to be significant.

Think about it this way. Kendall tells us that, on an average, a 10% increase in Internet penetration causes a 7.3% decline in incidence of rape [1]. But if you believe his results in Table 4, a 10% increase in the % of households owning a computer causes a 6.4% increase in the incidence of rape. So the net effect of buying a computer and using it to access the Internet on the incidence of rape is a mere 0.9% (yes, yes, I know it doesn't work that way - which is exactly my point). But who are all these households who are buying a computer but not using it to connect to the Net? And what, according to Kendall, is the reason that they're more likely to commit rape? Frustration about poor connectivity? Isn't it more likely that what we're seeing is just multicollinearity unrealistically inflating the regression estimates?

My second concern with Kendall's study is that he assumes implicitly that the spread of the Internet has no effect on the reporting of rape, so that changes in the number of rapes reported is a valid measure for number of rapes actually taking place. Kendall acknowledges that there is a measurement problem here, but sees no reason to believe that the spread of the Internet may be causing a systematic bias in his measurement. He even makes some arguments for why his measures may be underestimating the effect.

Yet the study itself suggests one potential reason why the results might be biased. Kendall tells us that rapes that don't get reported tend to be those committed by people known to the victim - date rapes, for instance. Kendall also cites previous literature that tells us that the Internet facilitates more dating and other face-to-face interactions and that this may increase the opportunities for rape. Put those two together and it suggests that the spread of the Internet increases the opportunities for the kind of rapes that tend to go unreported. Is it possible, therefore, that the effect Kendall is capturing is really a reflection of the fact that the Internet is shifting the incidence of rape from assaults on strangers (which have high reporting rates) to date rape (where reporting rates are low), causing reporting rates to go down? I'm not saying this is necessarily happening - I'm simply saying that it's an interpretation of Kendall's results (such as they are) that would be consistent with the literature that Kendall himself cites, and that he doesn't consider.

Finally, it's interesting that though Kendall has a panel data set, he doesn't actually account for lagged effects in his model. So what we're seeing is the absolute level of rape in the state (and the absolute level of internet usage) not the change in rape incidence. Personally, I would love to see a regression where Kendall includes the previous year's rape incidence for the state on the right hand side or, even better, takes first differences for his variables of interest. That would tell us whether changes in the spread of the Internet were really driving changes in the incidence of rape.

None of this is to say that what Kendall is saying is necessarily wrong, though personally I'm sceptical about the argument that access to porn is a substitute for rape (in Kendall's terms, I'm firmly in the camp of those who believe that rape is about power rather than about lust). It's simply to suggest that Kendall's results and the interpretation he puts on them are extremely questionable, and we should be careful before drawing any real conclusions from them.

[1] I'm assuming that Kendall's reported coefficients are adjusted for the fact that his dependent variable is in logs.

27 comments:

> I'm firmly in the camp of those who believe that rape is about power rather than about lust.

Exactly! It might be the usual correlation-causation issue here too. It might be important to examine social and cultural factors more than anything else that might explain the incidence of rape than just internet access and use.

i haven't read the paper, but it's amazing that he gets away with these two independent variables. neela's right, once you get this kind of multicollinearity there's no point in interpreting the estimates.

The correlation coefficient assumes that computer access roughly means internet access -- which is true in the sample space considered. The internet penetration broadly translates to internet access in the U.S. -- though not strictly true.

Further, the additional factor of pornography that is not through the internet but is still accessed through a computer will add to this.

He has also stated his assumptions clearly. Your objections can at best be qualitative since the math does justice to his data.

Patrix: To be fair, Kendall does a reasonable job of trying to include a number of other variables that you would think would be relevant. If his regressions work without the confounding effect of the computer access variable, and with lagged variables, I'd be inclined to believe him, my discomfort with his theory notwithstanding.

neela: :-). What? It's true.

tr: yes, exactly. I'm not sure he does get away with it, though - to the best of my knowledge it's only a working paper - hopefully at some points reviewers will step in and sanity will return.

I certainly think multicollinearity is the major issue here, but assuming that Kendall can fix that (it's just a question of running one more regression) and still get significant results (which I have my doubts about) I'd still like to see the effect of including a lagged variable and perhaps some measure of self-reported rape rather than crime statistics, to ensure that we're not just seeing a decline in reported rape.

bangalore bytes: Yes, I saw that.

ravi: thanks. See, I liked QM3

Nilu: No, actually, my objection is entirely empirical. Anytime you put two variables that are highly correlated into a regression equation together you need to worry about multicollinearity, and when they end up being highly significant and having opposite signs there's a strong probability that the results you're getting are spurious. That's basic statistics - nothing to do with Kendall's assumptions.

The fact that computer access means internet access in his sample space only means that the correlation is extremely high, which makes the multicollinearity problem worse. And if the interpretation we're supposed to take from Kendall's regression is that the effect of computer access controlling for internet access is the effect of pornography accessed on the computer but not on the net (though I seriously doubt Kendall meant anything so silly) then what's the theoretical reason for believing that pornography accessed on the computer but not on the Internet causes a sharp INCREASE in the incidence of rape, while Internet porn causes rape to go down?

Frankly, as TR says above, with the multicollinearity in there, interpreting his results is fairly pointless.

Finally, stating your assumptions doesn't make them true. Kendall certainly acknowledges that there's a distinction between reported rape and actual incidence, and assumes that there is no systematic change in reporting rates created by internet access - my (second) point in the post is that I don't know why that's an assumption we should believe.

i would also be *very* surprised if he gets the same results after dropping the households with computers predictor. (can't believe that he would.) however, if he does, then i'd say hats off for an extremely interesting result. it's about as good as you can get without resorting to self-reports -- economists hate those for some reason.

also, if it's just a working paper, then DANG he's getting a lot of publicity. imagine retracting ones findings after a wp gets this much press.

Nilu: What regression are you talking about? In the version of the paper Amit linked to the only regression model is in Table 4 where he includes computer ownership and internet access together. The regressions in Table 5 use the same model. I haven't seen any regressions that include internet access without household computer ownership - that's what I'm asking for. Perhaps you could point out the Table / Page you're talking about.

As for significance, well, errr...if the coefficient for internet access in Table 4 turns out to be insignificant once he takes computer ownership out (which is what I suspect may happen), then he has no result.Period. I think that's fairly significant, don't you? And even if he still gets a significant positive effect it's almost certain to be of a different magnitude (i.e. much smaller). From a policy perspective, it isn't just the statistical significance of these coefficients, it's their economic significance.

TR: What do you think I did?

Yes, it is phenomenal publicity isn't it? Retracting these results will be hard - specially if the issue is something as basic as multicollinearity.

Very interesting point about the Internet and date rapes. That alone is an interesting point, actually, since the Internet also dramatically lowered the search costs relating to matching. Presumably, people eventually become much more savvy and screen potential dates, just like they do in "real life," but maybe early on you did see an increase in date rapes. I like that hypothesis. But, for your explanation to explain away Todd's result, it would have to be that date rape and anonymous rapes are substitutes for one another. That is, I can buy that Internet access causes date rapes to rise via those matching services and increased dating (and therefore increased activities associated with dating, including date rapes). But I don't normally think that the date rapist and the stranger rapist are the same people. Date raping deals with communication issues that stranger raping never does. All the "No means No" campaigning was mainly addressed at date rape, and the imperfect communicating that was taking place between men and women on dates. The "stab your rapist in the ear with a pencil" campaiging was directed more at public, anonymous rapes. So, I can't imagine that even an increase in date rapes would decrease the other kinds of rapes, since I normally think of those people as different people altogether.

If I remember correctly from my psychology classes ages ago, some rapes are about lust, but most aren't. There was something about three main reasons for a rapist to rape someone, but I can't remember what they were. Just that they weren't about sex.

Dr. Kendall is too young to remember this, but the same reassuring theory that pornography-cuts-rape was popular during the late 1960s and early 1970s when a much more dramatic increase in the availability of pornography happened. Unfortunately, the rape rate shot up as well.

This is not to say that pornography causes rape, either, just that aspiring freakonomists would benefit from a better knowledge of recent American social and crime history, which would allow them to subject their theories to simple reality checks like this.

By the way, a 1970s article by America’s greatest social observer, Tom Wolfe, called “The Boiler Room and the Computer,” explained the old Freudian fallacy in Dr. Kendall’s underlying assumption that male libido is like steam that must be periodically released to prevent damaging explosions.

scott: Fair point. I guess I just find it easier to believe that stranger rapists might be substituting date rape for stranger rape, rather than Internet porn for stranger rape. Date rape today may be primarily about communication issues as you say (though I wonder) but there's no reason why it couldn't shift to being driven by more mala fide intent. if the Internet makes it easier for stranger rapists to find victims online, then tomorrow's 'date rapist' could look very different from today's 'date rapist'.

All of this is unabashed speculation, obviously, but its merely to suggest that we need to think more carefully about what Kendall is actually measuring.

anon: Always assuming the 'Urge' exists. Personally, I find the notion that all men have some sort of hard-wired drive to sexual violence questionable and insulting. And a convenient way for men to dodge responsibility for their actions.

choochoo: Yes, well, clearly Kendall doesn't agree.

ash: Yes, I know. Thanks.

steve: Thanks for your comment. To be honest, I'm too young to remember the events you talk about as well, but entirely agree that the 'boiler room libido' theory is fallacious.

Falstaff - The question is what is the net effect of pornography on rapes. If it reduces anonymous rapes - assume for the sake of argument the measurement error is small, which it is not - but increases date rapes - where the measurement error is beleived to be significant - then the problem still remains we can't say what the net effect is. It depends on the size of the date rapes. Guys viewing pornography may want to push the relationship further than their date is ready due to porn's influence on their preferences and desires. So I think your point is still a critical one, and something the paper cannot address unless Todd can somehow address the problem of date rapes. I suppose that part of his strategy is to do just this by moving from rapes to other sexual behaviors, like prostitution arrest rates and HIV transmission. He may want to try gonorrhea incidence instead of HIV incidence, though. HIV is transmitted through intravenous drug use as well as sexual behavior, whereas gonorrhea is not. Plus gonorrhea is highly correlated with contemporaneous sexual behavior (incubation duration is 14 days at the median, and 30 days at the absolute max), and it has the advantage of being usually symptomatic, unlike HIV. Thus, he could use gonorrhea as a better proxy. But STDs are complicated to deal with since they partly depend on incidence within the sexual network, and not merely the sexual behavior itself. Still, that might be more informative. I'm not sure how you can actually get at date rapes, though. Date rapists may not necessarily be people who are likely carriers of an STD, and so the proxy may be a poor one.

You say :Finally, it's interesting that though Kendall has a panel data set, he doesn't actually account for lagged effects in his model. So what we're seeing is the absolute level of rape in the state (and the absolute level of internet usage) not the change in rape incidence. Personally, I would love to see a regression where Kendall includes the previous year's rape incidence for the state on the right hand side or, even better, takes first differences for his variables of interest. That would tell us whether changes in the spread of the Internet were really driving changes in the incidence of rape

But he uses a fixed effects model with state effects, so that is equivalent to subtracting the mean value within each state. Therefore the analysis is looking at differences from the state-specific mean value for all variables in the model-- not the absolute value.

However, I fully agree with your concerns about collinearity -- that's one key issue I'd like to see addressed.

> I'm firmly in the camp of those who believe that rape is about power rather than about lust.

I question the whole idea that rape is 'about' something. It's a complex human behaviour.

Even if the underlying psychological motivation is to express power, that expression wouldn't take the form of sexual assault without lust also being present in some form.

The fact is, if someone is alone at home whacking off to internet porn they aren't out looking for women to rape. If the correlation is significant, it could mean no more than that - porn keeps them off the streets and away from potential victims.