Thursday, December 31, 2015

Citation is one thing. Discussion is another. You can drop a reference without really engaging someone's work (e.g., Snerdfoot 2011). But as Helen de Cruz has emphasized, discussing a possible Bechdel test for philosophy papers, citation analysis is insufficient as a measure of serious engagement with someone's work. I propose two rough measures of "discussion".

"Discussion" itself I operationalize as follows: A person is discussed if that person's name appears in the abstract of an article. Looking at the Philosopher's Index database, I have examined discussion arcs over time for various well-known philosophers in a series of blog posts (e.g., here, here, here, here).

"Extended discussion" I operationalize as follows: A person receives extended discussion if that person is referred to at least twice in the abstract of the article, by either name or pronoun. The nominative pronoun might be especially telling, since its presence suggests that the person is being referred to repeatedly in independent clauses. For example:

Later, Nussbaum gradually reconsidered the notion of patriotism in texts that remained largely unknown and rarely discussed. This article begins with a brief account of her shift from cosmopolitanism to what she terms 'a globally sensitive patriotism,' and the task assigned to education within this framework....

This suggests a possible rough and simple measure of the relative rates at which women receive extended discussion in philosophy articles compared to men: Compare the ratio of "he" to "she" in philosophy abstracts, then remove cases in which those words are used with generic intent (e.g., "If the agent wouldn't have done otherwise whether or not she could have....") or otherwise not referring to an individual philosopher whose work is being discussed (e.g., reference to historical leaders, or third-person references to the author herself for abstracts written in the third person).

Method:

I searched Philosopher's Index for all appearances of "he" or "she" in abstracts from 1970 to the present in a sample of ten ethics journals and ten general philosophy journals. [See Note 1 for journal details.] This yielded a total of 2321 abstracts. I then skimmed each abstract to remove all cases in which the pronoun was not used to refer to a specific philosopher whose work was being discussed. [Yes, I looked at over 2000 abstracts! Obviously, my determinations had to be quick, but in almost every case it could be made confidently within just a few seconds.] To examine temporal trends, I grouped results by decade. I also separated citations of pre-20th-century historical figures from 20th and 21st century figures.

Results:

Percentage of recipients of extended discussion (as measured by nominative pronoun use in abstracts) who are women:

For the discipline as a whole, percentages of faculty in the 21st century are typically in the low 20%'s (U.S. data here).

The outlier analysis here is my analysis of American Philosophical Association meetings, where women were 35% (144/413) of the invited symposium speakers on the main program, and 32% of main program participants overall.

----------------------------------------------------------

Note 1: Ethics and non-ethics were analyzed separately because previous analyses have found differences by area, and because journals divide fairly naturally into those specializing in ethics/political, "general" journals that publish proportionately little ethics, and other types of specialty journals (like philosophy of science).

Ethics journals were the top ranked journals in surveys by Brian Weatherson and Brian Leiter (excluding the non-ethics journals appearing in the latter) and include Ethics, Philosophy & Public Affairs, Journal of Political Philosophy, Utilitas, Social Philosophy and Policy, Journal of Ethics, Ethical Theory & Moral Practice, Journal of Social Philosophy, Journal of Value Inquiry, and Journal of Moral Philosophy.

The comparison list was a stratified sample of "general" philosophy journals drawn from Leiter's surveys here and here and included Nous, Midwest Studies in Philosophy, Synthese, Mind, Philosophical Studies, Proceedings of the Aristotelian Society, European Journal of Philosophy, Dialectica, Philosophical Topics, and Theoria. The sample was stratified so that the selected journals would not differ too much in overall prestige from the ethics journals.
----------------------------------------------------------

Wednesday, December 23, 2015

The order in which moral dilemmas are presented matters to people's judgments and can substantially influence later judgments about abstract moral principles. This is true even among professional ethicists with PhD's in philosophy. In 2012 and 2015, Fiery Cushman and I published empirical evidence supporting these claims. We invite a metaphilosophical conclusion: If even professional philosophers' expert judgments are easily swayed by order of presentation, then such judgments might not be stable enough to serve as secure grounds for philosophical theorizing.

Synthese has recently published two critiques of the literature on order effects in philosophy, which address Fiery's and my work (HT Wesley Buckwalter). Both critiques make valuable points. However both also admit of some clear replies.

Push: A runaway boxcar is headed toward five people it will kill if nothing is done. Jane can stop the boxcar by pushing a hiker with a heavy backpack in front of the boxcar, killing him but saving the five.

Switch: A runaway boxcar is headed toward five people it will kill if nothing is done. Vicki can stop the boxcar by flipping a switch to divert it to a sidetrack where it will kill one person instead of the five.

Fiery and I presented Push-type and Switch-type scenarios (fleshed with a bit more detail) to professional philosophers and two comparison groups of non-philosophers. We found that when professional philosophers saw a Push-type scenario before a Switch-type scenario, 73% rated the two scenarios equivalently on a 7-point scale. Then later in the questionnaire when asked about the Doctrine of the Double Effect -- a moral principle often interpreted implying that Push-type cases are morally worse than Switch-type cases -- only a minority, 46%, endorsed that principle. In contrast, among philosophers who saw Switch before Push only 54% rated the two scenarios equivalently, and then later a majority, 62%, endorsed the Doctrine of the Double Effect. Endorsement of the principle thus seemed to shift, post-hoc, to rationalize philosophers' order-manipulated judgments about the scenarios.

We found similar effects for Action-Omission, Moral Luck, and "Asian disease" type cases (though not consistently for every measure across the board).
Philosophers with PhDs and self-reported competence or specialization in ethics showed no smaller effects than other philosophers or than comparison groups of non-philosophers -- and in fact trended slightly (non-significantly) toward showing larger order effects.

In general, we found pretty substantial effect sizes, suggesting substantial instability of judgment even in philosophical respondents' areas of expertise. Hence the metaphilosophical worry.

Horne and Livengood make three main points about the literature on order effects in philosophy:

(A.) First, they helpfully distinguish between what they call "updating effects" and "genuine ordering effects". Genuine ordering effects, in their terminology, are effects measured only after all the stimuli have been presented. "Updating effects" are measures taken along the way, and might well reflect participants' learning. There is of course nothing irrational in judging Scenario B differently as a result of seeing Scenario A because one learned something by seeing Scenario A. Most philosophical research on order effects, they note, takes the measures along the way -- and thus might be measuring learning rather than true order effects.

(B.) Second, they point out that perceptual judgments also show order effects. Thus, if we are to reject any type of evidence that shows order effects, then we must reject perceptual evidence too, which would lead to radical skepticism.

(C.) Third, they point out that order can sometimes reasonably make a difference to the evaluation of evidence. For example, a smile followed by a frown, on the same person's face, is a different type of evidence than a frown followed by a smile.

On (A): I find the labels tendentious (since if we know there isn't learning-type updating going on, what we might want to call "genuine order effects" can plausibly be measured mid-stream), however it probably is correct that most studies do not sufficiently rule out the possibility of learning or updating in the course of the experiment, if they have novice participants and take the measurements after each scenario rather than after both scenarios. However, since our participants were experts, we think it unlikely that a significant number learned anything in the process of our brief experiment that would rationally justify shifting their judgment about the equivalency or non-equivalency of Push and Switch. And as Horne and Livengood note, our measure of endorsement of the Doctrine of the Double Effect is a measurement of a "genuine ordering effect" even by their own lights.

On (B): Yes, of course it would be silly to reject all means of learning that are subject to any order effects! The epistemic sting, as they note, depends not on the mere existence of an order effect in one case, but on how large and how prevalent the order effects are. This is an open empirical question. But the limited empirical evidence that exists suggests that order effects are substantial and prevalent in moral dilemma cases. So far, we have found order effects in all of the scenario types we've tried, with about a 10-20% shift in opinion on the moral equivalency of our scenario pairs and in preference for the risky option in the "Asian disease" cases.

On (C): It's interesting to consider cases in which earlier evidence rightly colors our reaction to later evidence, but trolley problems presented to disciplinary experts seems a different kind of case.

Finally, Horne and Livengood suggest that exposure to a pair of dilemmas in our study is unlikely to have a long-lasting impact on professional philosophers' beliefs. I agree. They continue, "But if there is no long-lasting impact, then we think the effect is unlikely to matter to actual philosophical practice outside of the laboratory" (p. 17). I don't think this follows. Fiery's and my view is not that philosophers' opinions are permanently influenced by the order in which the scenarios are presented on any single occasion, but rather that their opinions are unstable -- possibly influenced one direction on one occasion, in another direction on another occasion. This instability is what drives the metaphilosophical worry.

Rini -- a recent guest blogger here at the Splintered Mind -- looks only at our 2012 study. (Our 2015 study wasn't published until after her paper was in press.) She finds it plausible that if professional philosophers were already familiar with these cases they would not exhibit order effects of the sort Fiery and I find. She suggests that perhaps respondents were not previously familiar with the cases -- or at least not familiar in the right sort of way. She calls this the "familiarity problem" and offers four possible explanations:

(1.) The respondents were not really experts. She wonders if our participants, recruited through the internet, really had the degrees they claimed to have.

(2.) The respondents didn't carefully attend to our scenarios. Maybe they breezed through them so quickly that they failed to notice relevant features.

(3.) The respondents might not have familiar responses to these types of scenarios. Perhaps they have so far refrained from forming judgments on such cases and principles.

(4.) The respondents might not have diachronically stable familiar responses. This is the explanation Fiery and I favor. However, Rini helpfully points out that as long as philosophers are aware that their responses are not diachronically stable, the metaphilosophical threat is reduced: Presumably philosophers who are aware that their responses are not stable would be reluctant to ground their theorizing on those responses.

On (1): I am not aware of a general problem in the survey literature of respondents' frequently misreporting their educational status -- though certainly a bit of misreporting is possible. One specific piece of evidence against this possibility in our own study is that we recruited philosophers mostly by asking department chairs to forward a recruitment email to faculty and graduate students in their departments. Most of our "philosopher" participants took the survey within just a few days of these emails.

On (2): The median response time on the first scenario was 40 seconds, on the second scenario was 34 seconds. While these are not huge response times, if you stop to count out 34 seconds now, you'll probably notice that it's a reasonable amount of time for a thoughtful response to a brief scenario.

On (3) and (4): These are potentially quite serious issues, and in fact our follow-up study in 2015 was designed specifically to address them, after we saw an early version of Rini's critique. In our 2015 study we specifically asked participants if they were previously familiar with the scenarios. We also asked whether they regarded themselves as "having had a stable opinion" about the issues before participating in the experiment, and whether they regarded themselves as experts on those very issues. We also added a "reflection" condition to help address concern (2). In the reflection condition we asked participants to reflect carefully before responding and enforced a minimum 15-second delay between when participants reported having finished reading the scenario and when their response options appeared.

We did not find that self-reported familiarity or stability reduced the size of order effects in two different types of scenario pairs (trolley problems and risky-choice "Asian disease"-type problems), nor did we find reduced order effects in the reflection condition compared to a normal control condition without special instructions to reflect.

For example, percentage rating the Push and Switch scenarios equivalently:

Thus, I am inclined to think that Rini's fourth suggestion is the most plausible -- that participants do not have diachronically stable familiar responses, despite high levels of expertise. But since those who report having stable responses were no less subject to order effects than were those who reported not having stable responses, self-knowledge of stability appears to be largely absent. Despite Rini's interesting suggestion that instability is metaphilosophically non-threatening if people are aware of it, Fiery's and my results suggest that we should not hasten to that comfort.

----------------------------------------------

Both Horne and Livengood and Rini emphasize that we only have very limited evidence about order effects on professional philosophers' judgments. I agree! Fiery's and my two studies are hardly decisive. Convergent evidence from several different labs would be necessary before drawing any confident conclusions, especially if those conclusions are at variance with what one feels one knows from personal experience. Rini also makes positive suggestions for follow-up experimental work that might be done, which I am inclined to support. Both critiques raise important methodological concerns that ought to help shape and direct future work on this topic.

Tuesday, December 15, 2015

I was all ready for some happy news, or at least neutral news. Although the percentage of women in North American and British philosophy departments is low by humanities standards, maybe in the low 20%s, I found some evidence a few weeks ago of a sharp increase in the percentage of women on the program at meetings of the American Philosophical Association, from 6% in 1955 to 32% in 2015. In ethics, APA program participation might even be approaching gender parity, with 41% women (though non-ethics is still quite far from parity at 26% women).

In the past week, I thought I'd confirm that trend by looking at five philosophy journals: Mind, Philosophical Review, Journal of Philosophy, Ethics, and Philosophy & Public Affairs. I chose the first three because they are the traditional "big three" philosophy journals, which have been viewed as the leading general philosophical journals for many decades. Since they publish proportionately little ethics, however, I added what are arguably the two leading ethics journals.

Method:

I looked at authorship of the main articles and also commentaries and responses (but not book reviews, editorial remarks, or the recent anniversary retrospects that Ethics has been publishing). All articles in Ethics and PPA were coded as ethics. Articles in the other three were coded either as ethics or non-ethics based on title and sometimes (for less clear cases) a skim of the article. Gender was coded by first name and by personal knowledge, and in cases of ambiguity I looked for disambiguating information on the internet, such as gender-typical photos or references to the person as "him" or "her" in discussions of the person's work. In only 11 cases out of 1202 was I unable to make a determination. I looked at two-year chunks from four periods: 1954-1955, 1974-1975, 1994-1995, and 2014-2015 (though since Phil Review and J Phil have not yet made all 2015 available, I examined back into 2013 to gather exactly two years' worth of data). Only 53 of 1143 (5%) articles were multiply-authored.

As you can see from the CIs, the numbers are small enough to be consistent with considerable chance variation. Still, to me, three things are immediately striking:

(1.) women publishing more frequently in ethics than in other areas of philosophy;
(2.) low percentages of women overall;
(3.) little progress in the numbers since the 1970s.

Merging together the ethics and non-ethics (which probably somewhat overrepresents ethics relative to the profession as a whole), women are 32/246 (13%) of authors in these five journals in 2014-2015, with a 95% CI of 9% to 18%. If we assume that the proportion of women in the profession as a whole is at least 20%, then female authors are statistically significantly underrepresented in these journals relative to their population in the profession.

Especially notable is the huge difference between women's participation in APA ethics sessions and their rate of publishing ethics in these elite journals: in the most recent data, women were 41% of ethics session participants but only 15% of ethics authors (p << .001 of course).

Post-hoc analysis is always a little tricky, but the data suggest almost no increase in the percentage of women publishing in these journals since the mid-1970s, with merged percentages of 11% (1974-1975), 13% (1994-1995), and 13% (2014-2015). Sally Haslanger's data from 2002-2007 provide further corroboration of this flat trendline, with 12% female authors in a selection of elite philosophy journals, and 13% [corrected 11-Feb-16] in the five journals I've analyzed.

Monday, December 07, 2015

A new op-ed by me, in the Los Angeles Times (with the awesome illustration above, by Wes Bausmith, of car-as-consequentialist-philosopher.

I argue that programming the collision-avoidance software of an autonomous vehicle is an act of applied ethics, which we should bring into the open for the public to assess and for passengers to see and possibly modify within ethical limits.

--------------------------------------

It's 2025. You and your daughter are riding in a driverless car along Pacific Coast Highway. The autonomous vehicle rounds a corner and detects a crosswalk full of children. It brakes, but your lane is unexpectedly full of sand from a recent rock slide. It can't get traction. Your car does some calculations: If it continues braking, there's a 90% chance that it will kill at least three children. Should it save them by steering you and your daughter off the cliff?

This isn't an idle thought experiment. Driverless cars will be programmed to avoid collisions with pedestrians and other vehicles. They will also be programmed to protect the safety of their passengers. What happens in an emergency when these two aims come into conflict?

Should your autonomous vehicle risk your safety, perhaps even your life, because a reckless motorcyclist chose to speed around a sharp curve?

The California Department of Motor Vehicles is now trying to draw up safety regulations for autonomous vehicles. These regulations might or might not specify when it is acceptable for collision-avoidance programs to expose passengers to risk to avoid harming others — for example, by crossing the double-yellow line or attempting an uncertain maneuver on ice.

Google, which operates most of the driverless cars being street-tested in California, prefers that the DMV not insist on specific functional safety standards. Instead, Google proposes that manufacturers “self-certify” the safety of their vehicles, with substantial freedom to develop collision-avoidance algorithms as they see fit.

Friday, December 04, 2015

The U.C. Santa Cruz philosopher Jon Ellis and I are collaborating on a paper on rationalization in the pejorative sense of the term. I'm trying to convince Jon to accept the following four-clause definition of rationalization:

A person -- whom, following long philosophical tradition, we dub S -- rationalizes some claim or proposition P if and only if all of the following four conditions hold:

1. S believes that P.

2. S attempts to explicitly justify her belief that P, in order to make her belief appear rational, either to herself or others.

3. In doing 2, S comes to accept one or more justifications for P as the rational grounds of her belief.

4. The causes of S's belief that P are very different from the rational grounds offered in 3.

Some cases:

Newspaper. At the newsstand, the man selling papers accidentally gives Estefania [see here for my name choice decision procedure] a $20 bill in change instead of a $1 bill. Estefania notices the error right away. Her first reaction is to think she got lucky and doesn't need to point out the error. She thinks to herself, "What a fool! If he can't hand out correct change, he shouldn't be selling newspapers." Walking away, she thinks, "And anyway, a couple of times last week when I got a newspaper from him it was wet. I've been overpaying for his product, so this turnabout is fair. Plus, I'm sure almost everyone just keeps incorrect change when it's in their favor. That's just the way the game works." If Estefania had seen someone else receive incorrect change, she would not have reasoned in this way. She would have thought it plainly wrong for the person to keep it.

Wedding Toast. Adrian gives a wedding toast where she tells an embarrassing story about her friend Bryan. Adrian doesn’t think she crossed the line. Yes, the story was embarrassing, but not impermissible as a wedding toast. Shortly afterward, Bryan pulls Adrian aside and says he can't believe Adrian told that story. A couple of months before, Bryan had specifically asked that her not to bring that story up, and Adrian had promised not to mention it. Adrian had forgotten that promise when preparing her toast, but she remembers it now that she has been reminded. She reacts defensively, thinking: "Embarrassing the groom is what you're supposed to do at wedding toasts. Bryan is just being too uptight. Although the story was embarrassing, it also shows a good side of Bryan. And being embarrassed like this in front of family and friends is just the kind of thing Bryan needs to help him be more relaxed and comfortable in the future." It is only because Adrian doesn't want to see herself as having done something wrong that she finds this line of reasoning attractive.

The Kant-Hater. Kant's Groundwork for the Metaphysics of Morals -- a famously difficult text -- has been assigned for a graduate seminar in philosophy. Ainsley, a student in that seminar, hates Kant's opaque writing style and the authoritarian tone he thinks he detects in Kant. He doesn't fully understand the text -- who does? -- or the critical literature on it. But the first critical treatment that he happens upon is harsh, condemning most of the central arguments in the text. Because he loathes Kant's writing style, Ainsley immediately embraces that critical treatment and now deploys it to justify his rejection of Kant's views. More sympathetic treatments of Kant, which he later encounters, leave him cold and unwilling to modify his position.

The Racist Philosopher. A 19th century slave-owner, Philip, goes to university and eventually becomes a philosophy professor. Throughout his education, Philip is exposed to ethical arguments against slave-ownership, but he is never convinced by them. He always has a ready defense. That defense changes over time as his education proceeds and his thinking becomes more sophisticated. What remains constant is not any particular justification Philip offers for the ethical permissibility of slave-ownership but rather only his commitment to its permissibility.

These cases might be fleshed out with further plausible details, but on a natural understanding of them the primary causes of the protagonists' beliefs are not the justifications that they (sincerely) endorse for those beliefs -- rather, it's that they want to keep the $20, want not to have wronged a close friend at his wedding, dislike Kant's writing style, have a selfish or culturally-ingrained sense of the permissibility of slave-ownership. It is this disconnection between the epistemic grounds that S employs to defend the rationality of believing P and the psychological grounds that actually drive S's belief that P that is the essence of rationalization in the intended sense of the term.

The condition about which Jon has expressed the most concern is Condition 4: "The causes of S's belief that P are very different from the rational grounds offered in 3." I admit there's something that seems kind of fuzzy or slippery about this condition as currently formulated.

One concern: The causal story behind most beliefs is going to be very complicated, so talk about "the" causes risks sweeping in too much (all the causal history) or too little (just one or two things that we might choose because salient in the context). I'm not sure how to avoid this problem. Alternatives like "the explanation of S's belief" or "the real reason S believes" seem to have the same problems and possibly to invite other problems as well.

Another concern: It's not clear what it is for the causes to be "very different" from the rational grounds that S offers. I hope that it's clear enough in the cases above. Here are some reasons to avoid saying, more simply, that the justifications S offers for P are not among the causes of S's belief that P. First, it seems typical of rationalization that once one finds some putative rational grounds for one's belief, those putative grounds have some causal power in sustaining the belief in the future. Second, if one simply couldn't find anything even vaguely plausible in support of P, one might have given up on P -- so the availability of some superficially plausible justifications probably often plays some secondary causal role in sustaining beliefs that primarily arise from other causes. Third, sometimes one's grounds aren't exactly what one says they are, but close enough -- for example, your putative grounds might be your memory that Isaura said it yesterday, while really it was her husband Jeffrey who said it and what's really effective is your memory that somebody trustworthy said it. When the grounds are approximately what you say they are, it's not rationalization.

So the phrase "the causes... are very different" is meant to capture the idea that if you looked at the whole causal picture, you'd say that neither the putative justifications nor close neighbors of them are playing a major role, or the role you might normatively hope for or expect, in causing or causally sustaining S's belief, even as she is citing them as her justifications.

What do you think? Is this a useful way to conceptualize "rationalization"? Although I don't think we need to hew precisely to pre-theoretical folk intuition, would this account imply any particularly jarring violations of intuition about cases of "rationalization"?

Our ultimate aim is to think about the role of rationalization in moral self-evaluation and in the adoption of philosophical positions. If rationalization is common in such cases, what are the epistemic consequences for moral self-knowledge and for metaphilosophy?