Researchers showed subjects two female faces for a few seconds and asked which face was more attractive. Researchers then placed the photos face down and handed subjects the face they had chosen, asking them to explain the motives behind their choice. But sometimes, researchers used a sleight-of-hand trick to switch the photos, showing viewers the face they had not chosen. Very few subjects noticed the face they were given was not the one they had chosen. Moreover, they happily explained why they preferred the face they had actually rejected, inventing reasons like "I like her smile" even though they had actually chosen the solemn-faced picture.1

The idea that we lack good introspective access to our own desires - that we often have no idea what we want2 - is a key lemma in naturalistic metaethics, so it seems worth a post to collect the science by which we know that.

Early warnings came from split-brain research, which identified an 'interpreter' in the left hemisphere that invents reasons for beliefs and actions. When the command 'walk' was flashed to split-brain subjects' right hemispheres, they got up from their chairs and start walking away. When asked why they suddenly started walking away, they replied (for example) that they got up because they wanted a Coke.3

The overjustification effect

Common sense suggests that we infer others' feelings from their appearance and actions, but we have a different, more direct route to our own feelings: direct perception or introspection.4 In contrast, self-perception theory5 suggests that our knowledge of ourselves is exactly like our knowledge of others.6 One famous result explained by self-perception theory is the overjustification effect.

In a famous 1973 study,

nursery school children drew pictures with a magic marker, a presumably intrinsically interesting activity, under one of three reward conditions. In the first condition the children expected to receive a reward (a fancy 'good player' award) for drawing, in the second they received the reward unexpectedly, and children in a third group received no reward. Only the expected reward produced a decrement in performance, during a later 'free play' period, as compared with the other two groups. [This] overjustification effect seemed to be due not to the reward itself but to the implication that the reward was the reason for the behavior. Only if the participants knew a reward was coming when they performed the behavior would it undermine their intrinsic interest in the task.7

It seems that subjects initially drew pictures because of intrinsic motivation in that activity, but the payment led them to unconsciously 'conclude' that their behavior did not represent their actual desires. Thus, they performed more poorly in the subsequent 'free play' period. This is known as the overjustification effect.

After dozens of similar studies, two meta-analyses confirmed that the overjustification effect occurs when (1) subjects are led to expect rewards before performing the behavior, (2) the rewards are tangible, and (3) the rewards are independent of the subjects’ level of performance.8

Implicit motivation

If we can be wrong about our own desires, then presumably many of our desires are activated unconsciously and operate unconsciously. Such implicit motivations have been amply confirmed.9

In one study, subjects were primed with achievement-related words ('strive', 'win', 'attain') during a word-finding task. During a second word-finding task, subjects were interrupted by an intercom announcement asking them to stop working. Those who had been primed with achievement-related words kept working more often than those who had not been so primed. Subjects were unable to identify the effect of this priming on their own motivation.10

This demonstrates that priming unconsciously affects the accessibility or strength of existing goals.11 Do we unconsciously form goals, too?

We do, as shown by decades of research on operant conditioning. When a neutral potential goal is associated with a stimulus of positive affect, we acquire new goals, and we can be unaware that this has happened:

Watching someone smile while eating blueberry muffins may, for instance, link that activity to positive affect, which creates a goal representation. Indeed, such observational or social learning is thought to be a basic way in which infants learn which behavioral states are desired and which ones are not.12

Notes

1 Johansson et al. (2005).

2 Several experiments have established that we infer rather than perceive the moment we decided to act: Rigoni et al (2010); Banks & Isham (2009, 2010); Moore & Haggard (2008); Sarrazin et al. (2008); Gomes (1998, 2002). But do not infer that conscious thoughts do not affect behavior. As on recent review put it: "The evidence for conscious causation of behavior is profound, extensive, adaptive, multifaceted, and empirically strong. However, conscious causation is often indirect and delayed, and it depends on interplay with unconscious processes. Consciousness seems especially useful for enabling behavior to be shaped by nonpresent factors and by social and cultural information, as well as for dealing with multiple competing options or impulses" (Baumeister et al. 2011). We can even be wrong about whether we intended to act at all: Lynn et al. (2010); Morsella et al (2010). If we don't have direct introspective access even to our decisions to act, why think we have introspective access to our desires?

3 Gazzaniga (1992), pp. 124-126.

4 But widespread findings of self-ignorancechallenge this view. See, for example, Wilson (2004).

5 Zanna & Cooper (1974) seemed to have disproved self-perception theory in favor of cognitive dissonance theory, but Fazio et al (1977) showed that the two co-exist. This remains the modern view.

As such, we'd be unlikely to get what we really want if the world was re-engineered in accordance with a description of what we want that came from verbal introspective access to our motivations.

Interesting as these experimental results are, it sounds to me like you're saying that there's a license to be human (or a license to be yourself, or a license to be your current self).

Suppose I found out that many of my actions that seemed random were actually subtly aimed at invading Moldova, perhaps because aliens with weird preferences placed some functional equivalent of mind control lasers in my brain, and suppose that this fact was not introspectively accessible to me; e.g., a future where Moldova is invaded does not feel more utopian to imagine than the alternatives. Isn't there an important sense in which, in that hypothetical, I don't care about invading Moldova? What if the mind control laser was outside my brain, perhaps in orbit? At what point do I get to say, "I won't let my so-called preferences stop me from doing what's right?"

My impression is that this mindset, where you determine what to do by looking closely at the world to see what you're already doing, and then giving that precedence over what seems right, would be seen as an alien mindset by anyone not affected by certain subtle misunderstandings of the exact sense in which value is subjective. My impression is that once these misunderstandings go away and people ask themselves what considerations they're really moved by, they'll find out that where their utility function (or preferences or whatever) disagrees with what, on reflection, seems right, they genuinely don't care (at least in any straightforward way) what their preferences are, paradoxical as that sounds.

My impression is that once these misunderstandings go away and people ask themselves what considerations they're really moved by, they'll find out that where their utility function (or preferences or whatever) disagrees with what, on reflection, seems right, they genuinely don't care (at least in any straightforward way) what their preferences are, paradoxical as that sounds.

I think you would have a strong point if the arguments that really move us forms a coherent ethical system, but what if when people find out what they're really moved by, it turns out not to be anything coherent, but just a semi-random set of "considerations" that happen to move a hodgepodge of neural circuits?

That certainly seems to be to some extent true of real humans, but the point is that even if I'm to some extent a random hodgepodge, this does not obviously create in me an impulse to consult a brain scan readout or a table of my counterfactual behaviors and then follow those at the expense of whatever my other semi-random considerations are causing me to feel is right.

this does not obviously create in me an impulse to consult a brain scan readout or a table of my counterfactual behaviors

Sure, unless one of the semi-random considerations that moves you is "Crap, my EV is not coherent. Well I don't want to lay down and wait to die, so let's just make an AI that will serve my current desires." :)

Incoherent considerations aren't all that bad. Even if someone prefers A to B, B to C, and C to A, they'll just spend a lot of time switching rather than waiting to die. I guess that people probably prefer changing their considerations in general, so your example of a semi-random consideration is sufficient but not at all unique or uncommon.

Agreed. But depending on exactly what's meant I think lukeprog is still correct in the statement that "we'd be unlikely to get what we really want if the world was re-engineered in accordance with a description of what we want that came from verbal introspective access to our motivations", simply because the descriptions that people actually produce from this are so incomplete. We'd have to compile something from asking "Would you prefer Moldova to be invaded or not? Would you prefer...", etc., since people wouldn't even think of that question themselves. (And we'd probably need specific scenarios, not just "Moldova is invaded vs. not".)

And since verbal introspection is so unreliable, a better check might be somehow actually simulating you in a world where Moldova is invaded vs. not, and seeing which you prefer. That may be getting a little too close to "license to be human" territory, since that obviously would be revealed preference, but due to human inconsistency - specifically, the fact that our preferences over actions don't seem to always follow from preferences over consequences like they should - I'm not certain it's necessarily the sort that gives us problems. It's when you go by our preferences over actions that you get the real problems...

I agree with you, but I think there are a lot of LW people who didn't really like the meta-ethics sequence or liked it but got something odd out of it and who basically think that most of what they value comes from genetic-evolutionary pressures (the aliens in your scenario). Luke's post is very important for them if not for the rest of us who are more interested in where we're getting our notion of 'right' from if not entirely from the aliens.

Suppose I found out that many of my actions that seemed random were actually subtly aimed at invading Moldova, perhaps because aliens with weird preferences placed some functional equivalent of mind control lasers in my brain

I suspect you'd prefer the aliens turn off their mind-control lasers, and if you had a choice you would have preferred they did not turn on the lasers in the first place.

Once you're corrupted, you're corrupted. At that point we have a mind-controlled Steven wandering around and there's not much point in trying to learn about human motivation from the behavior of humans who are mind-controlled by aliens.

Well, then its unlikely that your random unconscious actions have any ulterior motive as sophisticated as invading Moldova. Your true desires are probably just some combination of increasing your status, activities prone to make babies, and your conscious desires, assuming the conscious desires haven't been subverted by bad philosophy.

I don't see much harm in activities prone to make babies, so the real question here is "If I my unconscious desires lead me to have poor relationships because I'm gaming them for status, and I don't consciously value status, would I want to fix that by changing the unconscious desires?" I think I would, if I could be sure my income wouldn't be affected much, and the fix was well tested, preferably on other people.

But in any case, human volition is going to look like a clump of mud. It has a more-or-less well defined position, but not exactly, and the boundaries are unclear.

Thank you for pointing out the recurrent threat of empiric stupidity in these sorts of matters, namely the guiding empiricist assumption in this case that an empirical determination of our desires by outside instrumentation is going to result in an improvement to human affairs. We cannot overlook the way scientific empiricism can sometimes make people stupider at affairs requiring developed skill at reflective judgment.

Personally I find having an inconsistent mind so intolerable that as far as I know, I'd face a choice between

A: blocking the aliens out of my head completely

B: Assimilating with them completely.

Correspondingly I have endeavoured to establish a rapport with evolution's design deep enough that I can either

A: Consciously adapt it to the epoch of intelligent agency, for example, instilling within it a fear of solar collapse, a sense of the kinship linking all life on earth, and a cognizance of extra-solar hunting grounds for it to aspire towards. These might sound like rationalizations of noble goals we'd communally established post hoc.. well yes, they would either way, I think those goals were only able to be ennobled upon the favour of evolution's old intents of surviving and spreading.

B: Truly accept as not horrible and perfectly normal, the subjectively horrible unacceptable things that would drive most people away from forging this kind of self-rapport. I'd give examples but these are by their nature hard to index, as if they're communicated tactfully, they don't seem horrible at all.

But then, I was drawn to this thread for a reason. I wonder if all my progress under A is just a mat of rationalizations and if the reality of Her Design is too ugly for me to publicly embrace, and if that very design has been built to anticipate that, and that is why our vocal selves are blanketed with confusion as to our intents.

It strikes me that, in addition to the face-value interpretations given by the researchers, the subjects of some of these experiments could also be seen as rationally responding to incentives not to reveal their desires. The face attractiveness subjects might be afraid of embarrassing an authority figure or "messing up" the experiment. The split-brain patient might (rightly) think a truthful "I don't know" would be interpreted as evasive or hostile. The children might reason that being seen doing a rewarded activity "for free" would remove the basis for any future rewards.

I've got the impression that these findings are mostly based on studies done on college students and children. I'm not sure how much I should trust them to generalize (and apply to me specifically), especially with regards to hypocrisy and self-justification theory.

Luke, are you aware of similar studies done on not-neurotypical populations, like say schizophrenics, experienced meditators, autists or experts in analytical fields?

When you're bored, you entertain yourself by generating arbitrary complex behavior to no particular purpose. Sometimes, while you're doing this, other people notice something you did and give you resources for it.

When this happens, you re-categorize the particular activity that got the resources, from "arbitrary complex behavior I generated for no particular purpose" to "things that get resources from people".

In other words, from "mapping out the space of possible activities" to "generating value in an economy".

Or, from "explore" to "exploit".

Or, from "play" to "work".

Once an activity is successfully classed as "work" — that is, something that gets other people to give you resources — you don't need to do it unless you want more resources. If you don't want more resources right now, you can safely spend time exploring other possible activities. But if you get hungry/poor/etc., you can go back to the best "work" you've found so far, to get resources you need.

(Similarly, some things may not get you resources from others, but may get you attention, affection, status, etc. — which probably gets a distinct classification such as "social activity".)

We do, as shown by decades of research on operant conditioning. When a neutral potential goal is associated with a stimulus of positive affect, we acquire new goals, and we can be unaware that this has happened:

Nice post. I would like more on this and terminal vs instrumental distinctions.

Common sense suggests that we infer others' feelings from their appearance and actions, but we have a different, more direct route to our own feelings: direct perception or introspection. In contrast, self-perception theory suggests that our knowledge of ourselves is exactly like our knowledge of others.

It's unclear to me how this is related to the overjustification effect. Could you make the connection more explicit for me? As it is it feels like a non sequitur.

My impression is that lukeprog is interweaving material on the overjustification effect and the introspection illusion. The introspection illusion helps to explain why we're not aware of the overjustification effect in ourselves.

Thanks, that sounds right. I want to say that that was my impression as well, but if I try to be honest with myself I really don't know if that's true.

It still seems like a big leap, and from what I understand Luke may be misrepresenting self-perception theory. Luke claims that "our knowledge of ourselves is exactly like our knowledge of others" while your link says that in the introspection illusion "people wrongly think they have direct insight into the origins of their mental states, while treating others' introspections as unreliable" (my emphasis). These sound like different claims and Luke's is more extraordinary. And for that matter it doesn't seem necessary or helpful for the ensuing discussion of the overjustification effect.

It occurs to me, though, that I'm just arguing because I'm confused about the material, so I'm going to go read some more.

See the footnotes here for work on motivation as related to akrasia - in particular, Steel's 'The Nature of Procrastination' article.

I'm not sure what you mean by 'desires you have but aren't pursuing much'. Which concept of desire are you using? The motivational one? I suspect we don't understand the neuroscience of motivation well enough to say much about your question, but I'm not sure I understood your question.

I would prefer that the weather be sunny and roughly 70 degrees Farenheit with a slight breeze tomorrow, but am doing nothing to try and make that happen.

That could just be a desire that I don't be expect to be able to fulfill (expectation roughly equal to 0), but I intuitively feel that desires are separate from motivation to pursue them (this might be wrong though).

For another example:

Alice wants to make a living writing, and gets happy and misty-eyed at the thought. However, she always says "I can't do it now, I have X Y Z". Meanwhile, she occasionally comments on and reads a group blog..

Bob thinks being a writer would be cool, but doesn't intend to do anything about it. He occasionally comments on a group blog.

Most people would say that Alice wants to be a writer more than Bob does, but they do roughly the same amount of tangible work towards it. Most people would say that Alice wants to be a writer more than Bob does.

I was mostly asking with respect to akrasia vs. hypocrisy, but realized that you can distinguish between the two by making it easier for the person in question to accomplish their goal.

If they choose to fulfill the desire, then they actually want it, and if they don't choose to, then they don't care as much.

Alice wants to make a living writing, and gets happy and misty-eyed at the thought. However, she always says "I can't do it now, I have X Y Z". Meanwhile, she occasionally comments on and reads a group blog..

As such, we'd be unlikely to get what we really want if the world was re-engineered in accordance with a description of what we want that came from verbal introspective access to our motivations. Less naive proposals would involve probing the neuroscience of motivation at the algorithmic level. (Footnote: Inferring desires from behavior alone probably won't work, either.)

There is something a bit bizarre about proposing to extract preferences by scanning brains (because raw behavior and reported introspection are not authentic and primitive enough), and then to insist that these fundamental preferences be extrapolated through a process of reflective equilibrium - thereby becoming more refined.

Is there some argument justifying the claim that what I really want is not what I say I want, and not what I do, but rather what the technician running the scanner says I want. By what definition of "really" is this what I really want? By what definition of "want"?

Note: In some ways this echoes steven0461, but I think it makes some additional points.

I was thinking that the brain scan approach could be tested on a small scale with people living in an environment designed according to their brain scans, but then I realized that the damned thing doesn't ground out. If you don't trust what people say, you can't judge the success of the project by questionnaires or interviews. If you can't trust what people do, then you can't use whether or not they are willing to stay in the project.

I think that if the rates of depression and/or suicide go up, the brain scan project is a failure, but that's a pretty crude measure.

You could use brain scans, of course, but that's no way to find out whether brain scans improve whatever you're trying to improve.

It may not follow from the article, but I think that if people's actions are so much shaped by unconscious effects and miscalculations about happiness and other goals, then actions aren't a very reliable guide. See also the many discussions here about akrasia-- should akrasia be used to deduce that people generally would rather spend large amounts of their time doing things they don't like all that much and don't contribute to their goals?

OK, so what people do, and what they say are the #1 and #2 best available resources on what they actually want. Sample from multiple individuals, and I figure some pretty successful reconstructions of their goals will be possible.

Interestingly, I think there used to be a group of people who nominally were dedicated to doing the kind of desire-inferring with an aim at concrete progress and conceptual understanding that I can cheer for, though they didn't have all that fancy neuroscience knowledge back then. Fun quiz: can you guess which field I'm referring to? I gave you some hints. And... here is the answer I had in mind. (Check this wiki article for the context, though.) Was that your guess? If not, what was?

(I looked at the article, and at another more specific WP article, without finding anything that looked much like what Luke was saying. Whether you're aiming to raise the status of the field in question, to discredit Luke or what he's saying by association with something widely disapproved of, to point out an illuminating parallel, or whatever, I think you need to be much more explicit.)

Mostly it just seemed to me like an interesting connection, especially if the notion of eugenics is generalized to be more explicitly reflective on memetics and multi-level selection -- instead of the focus at the individual-biological/organismic (and to a weird extent the racial level) -- at which point it becomes reflexive, even. It has various abstract connections to FAI/CEV. Specifically what seemed cool about the vision of eugenics outlined in the diagram I linked to is that it is reflective, empirical, naturalistic meta-ethics / applied ethics, which I'm not sure went on before that and hasn't come up again except in some very primitive complex systems and dual inheritance studies as far I know. In hindsight I should not have expected these thoughts to automatically enter peoples' brains when they saw the diagram I linked to.

I was also hoping that other people could notice similar connections to other fields that might also be non-obviously related to this theme of refining and more effectively applying our models of morality and meta-ethics.

I think I am consistently up against Hofstadter's law of inferential distances, or something.

Dear Will_Newsome's brain,

Please update on the above information, or explain more clearly why you do not want to, and in any case please explain why various parts or coalitions of you do not want to change your strategy for communication or do not want to acknowledge that the lack of a changed strategy is indicative of not updating. Once such concerns are out in the open I promise to reflect carefully and explicitly on how best to reach something like a Pareto improvement, obviously with your guidance and partnership at each step of the way.

Sincerely, Will_Newsome's executive function algorithm that likes to use public commitments as self-bargaining tactics because it read a Less Wrong post that said that was a good idea.

Yeah, they were actually the second group that came to mind in roughly that memespace, but it seems to me what they didn't have was very clear reflection on how they got their goals and how that is relevant. I think that might have been hard to explain or examine without the idea of evolution.

Ah, praise be to ya. This is a damn good start. Ideally the post would give scenarios (imagined or abstracted from cases on Less Wrong or wherever) showing how people do this kind of just-so story introspection and the various places at which you can get a map/territory confusion, or flinch and avoid your beliefs' real weak points, or just not think about a tricky problem for 5 minutes. But we can always do that in another more specialized post at some later point, like Alicorn did with luminosity.

(I tentatively think that people have this intuitive process for evaluating the expected benefits of questioning or thinking up (alternative) plausible causal chains connecting subgoals to goals they think are more justified, but because values sort of feel like they're determined by the map it's easy to think that the feeling of it being hard is an unresolvable property and not something that can be fixed by spending more time looking at the territory. I wildly speculate that the intuitive calculation people use involves a lot of looking for concrete expected benefits for exploring valuespace, which is something that makes Less Wrong cool: the existence of a tribe that can support you and help you think through things makes your brain think it's an okay use of resources to ponder things like cryonics, existential risks, et cetera.)

Less naive proposals would involve probing the neuroscience of motivation at the algorithmic level.

I think that seems more naive - once you consider the timescales. Machine intelligence seems likely to come before we have the required brain-scanning/brain-understanding technology. Maybe once we have intelligent machines, then we can figure out the brain - but we likely can't use brain scans to let the intelligent machine know what we want in the first place - because the timing will be wrong.

This theory seems to debunk the classical "people need an economic incentive to do their jobs": it seems to imply that imposing and economic reward on the tax detracts from the intrinsinc enjoyment of the task by making the task performers think the task is for the sake of the remuneration rather than for its own sake. It also seems to suggest that, were this reward system be removed (but what would it be replaced with, practically speaking?) people might be happier by enjoying their own work.

This theory seems to debunk the classical "people need an economic incentive to do their jobs"

This suggests that if you pay someone to do X, they will be less likely to do X as a hobby, and enjoy X less while they're doing it. That does not imply that if you didn't pay them to do X, they would do it enough to satisfy the job requirements.

There are cases where that's true- open source programming comes to mind- but they seem to be the exception, rather than the norm.