Can You Condition Yourself?

A friend recently told me about a self-help tactic that has become popular in the circles I move in: the idea of applying behaviorism to yourself (sometimes called “training your inner pigeon”). The idea is you give yourself rewards when you do things you want to do more of, and your brain works its magic and reinforces the activity.

When I first heard about this, my thought was “No way that is ever going to work”. I have always been under the impression that conditioning is kind of like tickling. You can’t tickle yourself. You’d be expecting it.

Let’s start by distinguishing a couple of possibilities:

1) This process doesn’t work at all

2) This process works by making you want the reward. Suppose you promise yourself a candy bar each time you do homework. You are hungry and want the candy bar, but you would feel bad if you ate it without doing homework. Therefore, you grudgingly do homework to get at the candy bar.

3) This process works by changing your urges and desires. After eating a candy bar each time you do homework, your brain associates homework with a nice, delicious-feeling, and you enjoy doing homework more from now on.

Let’s start with 3, the most encouraging possibility. This gains a little support from the Little Albert experiment. Here, a baby who had no particular fear of rats was exposed several times to rats plus loud, terrifying noises. Eventually the baby came to fear rats, even without the noise, presumably because the fear of the noise had generalized onto the rat through association. It’s easy to see how this could mean something like the happiness of candy-bar-eating generalizing to homework. Nevertheless, I believe this argument proves too much.

Every evening, I sit down at the table, get a plate and some silverware, and eat dinner. It’s usually something I really like, and it usually includes dessert, which I like even more. If eating good food isn’t rewarding, I don’t know what is, and sure enough I rarely skip dinnertime.

However, if for some reason I don’t have dinner – maybe I’ve promised my friends I’ll go out with a late dinner for them and so I can’t stuff myself first – I do not feel the slightest urge to sit down at the dinner table with a plate and sort of move my silverware around in the air making little eating motions, and when I tried it (empiricism!) I did not find it at all pleasant.

Take a second to think about how weird that is (the result, not me trying the experiment). Sitting at the table and moving my silverware, in conditions exactly like these, has been quickly associated with reward every single time I’ve done it in the past, for decades, ever since I learned to feed myself. But I don’t feel even a little bit of urge to do this. None at all. You may generate additional examples at your leisure, but the point is that just being consistently associated with a positive reinforcer in a low time-delay way does not make a neutral activity (let alone an actively unpleasant activity) become desirable.

What happened with Little Albert, then? First of all, he was classical conditioning and not operant conditioning. Second of all, Albert had no understanding or control over what was going on. Each time he heard the noise, he was very surprised – he was receiving a new fact from the Universe. But it wasn’t information he understood; he had no idea what the connection between the rat and the noise was and whether it would recur. He just knew that there was some mysterious rat -> noise connection.

Compare this to me eating dinner. The connection between sitting down and eating dinner is not at all a new fact fed me by the Universe; it’s something I plan myself. And it is not mysterious whether any given sitting and silverware-waving will reward me; I know it will reward me if and only if I am planning to eat dinner. Therefore the brain does not think of silverware-waving as an activity that might, who knows, lead to reward in the future.

(one might object that my inner pigeon – or lizard brain, to mix animal metaphors – doesn’t share my complex explicit knowledge of the reward structure of dinner-eating. But the little I know of the brain’s reinforcement mechanism suggests that reinforcement learning is based on surprise – technically the difference between predicted and observed values of some complicated Bayesian equation encoded in dopaminergic neurons or something – and that this system is actually quite good at predicting expected reward from an action, within certain limits)

So (3), the hypothesis that the reward will cause me to start enjoying homework, seems wrong. What about (2) – “I don’t like homework much, but at least I get some candy out of it”?

Here there’s a ceiling on how much the candy can reinforce your homework-doing behavior, and that ceiling is how much you like candy.

Suppose you have a big box of candy in the fridge. If you haven’t eaten it all already, that suggests your desire for candy isn’t even enough to reinforce the action of going to the fridge, getting a candy bar, and eating it, let alone the much more complicated task of doing homework. Yes, maybe there are good reasons why you don’t eat the candy – for example, you’re afraid of getting fat. But these issues don’t go away when you use the candy as a reward for homework completion. However little you want the candy bar you were barely even willing to take out of the fridge, that’s how much it’s motivating your homework.

Maybe you say “I will allow myself exactly one candy bar a day, but only if I finish my homework”. Even if you can stick to this rule, here the candy bar becomes an extrinsic reward motivating the homework. We all know what happens with extrinsic rewards – overjustification effect! You gradually start interpreting the task at hand as an annoying impediment to getting the reward, lose your intrinsic motivation, and as soon as the reward is removed, you’re even less willing to do the task than before.

So both (2) and (3) are pretty unlikely. That leaves us with (1) – don’t even bother.

Luckily, my friend helpfully clarified that this wasn’t what her class taught at all (I think maybe they originally tried this, but considerations like the ones I mentioned convinced them to change?). Their new policy is that you should reinforce yourself with a “victory gesture” – for example, pumping your fist and shouting “YEAH!” and visualizing an image corresponding to your success and trying to feel really good about yourself.

So for example, as soon as you sit down to start your homework, you make the victory gesture and imagine yourself graduating summa cum laude from school, and then you feel really good and have reinforced the behavior of sitting down to do your homework. And maybe you do it again when you finish, because peak end rule.

She claims a few benefits of this method. First, it’s very fast, so you can reinforce things right as they happen instead of with time delay which gives your brain enough time to lose the connection. Second, it’s intrinsic, so it’s not going to sap your natural motivation the same way the candy bar might.

I understand the claim that rewards delivered very immediately after a stimulus can work better for conditioning – I was referred to a couple of papers proving this, though I don’t remember them. But I notice I am confused. When we have good examples of real conditioning, immediate reward isn’t especially important. For example, people often use the language of behaviorism to talk about addiction, say alcoholism. But the chemical rewards of getting drunk don’t manifest until a little while after you’ve had your first beer – certainly not within a split second – and certainly alcoholism can reinforce even longer term behaviors, like leaving home and going to the bar. Pornography is another good example of effective behaviorism, but going to a porn site gives only delayed rewards – first you have to find a video you like, then you have to wait for it to buffer, then you have to sit through the boring part where the nice lady and the plumber are discussing the best ways to fix her faulty pipes, and so on. It seems that when we have a real effect that definitely works, immediacy is not required (indeed, if it were humans would have a lot of trouble learning anything but the most basic reflexes).

But okay. Ignore that. It would really really really really bad mind design to allow your own consciously generate-able emotions to feed back into the reinforcement mechanism.

Start with one obvious point. I said the candy bar couldn’t be much of a reinforcer if you otherwise left it in the jar without eating it. The same seems broadly true of a victory gesture. I don’t feel the slightest urge to perform a victory gesture, and having tried it empirically I don’t feel the slightest urge to repeat it. This bodes poorly for its ability to be a strong reinforcer.

And over several billion years of evolution, the brain has every incentive to get rid of that behavior if indeed it was ever possible. Imagine a world in which our own thoughts and feelings can be strongly reinforcing. You’re a caveman, encountering a saber-toothed tiger. You have two choices. You can either feel fear, which is an unpleasant emotion. Or you can feel happiness, which is a pleasant emotion. First you try feeling fear, but that’s unpleasant! You don’t like fear! The feeling of fear is negatively reinforced and your brain learns to stop feeling it. Then you try happiness! You like happiness! The decision to feel happiness is positively reinforced. Yes, you decide, saber-toothed tigers are wonderful things and you are overjoyed there is one in front of you getting into a pouncing position and licking its lips and…well, this caveman isn’t going to live very long.

From the little I know about the reward system, it seems to operate on a basis of predicting pleasure level, then upregulating actions that result in world-states that seem more pleasurable than predicted and downregulating actions that result in world-states that seem less pleasurable than predicted. I don’t think you can prevent the “I’m going to do my victory gesture!” part of you and the “I’m going to predict my pleasure at time t+1” part of you from talking to each other, I don’t think internal pleasure is as reinforcing as external world-state results, and I don’t think the pleasure of making a victory gesture is strong enough to do much anyway.

…there were a lot of “I thinks” in that paragraph. Do we have any evidence here?

The literature on this is hiding under the obscure term “self-consequation”, and unfortunately it is all from Scientific Prehistory, ie the 1970s and 1980s before journal articles were uploaded to the Internet. I am able to find this full study, which does pretty much exactly the experiment listed at the beginning of this post – feed people candy in return for studying – and finds that it helps only if other people are there keeping them honest. But I am also able to find this abstract, which appears to be from a study showing the opposite – some kind of benefit – but is totally unavailable on the Internet. Both studies seem to refer to a long literature supporting their result and (sigh) neither seems aware of the other’s existence. However, I am more skeptical of the second, both because I can’t see it and because I worry that experimental protocols aren’t real self-reinforcement. That is, if an experimenter gives you their bag of candy and tells you to reinforce yourself by eating some when you do something good, that’s still different from using your own bag of candy and coming up with the idea on your own, even if the experimenter is out of the room when you’re working.

I will still try the technique, because it seems low cost and potentially high value. Really high value, actually. So high value that I would have expected the first person to get it right to take over the world. This is turning into another argument against it, isn’t it?

But yeah, as I was saying, I still intend to try the technique, even though it won’t be a very well-controlled experiment. And I’m glad I heard the idea for reminding me how little I know about behaviorism.

44 Responses to Can You Condition Yourself?

At our house we have a ‘chocolate box’ with a padlock. We split custody of a) the box of delicious candies b) the key. It works pretty well in generating extrinsic rewards, and in causing the rewarded activities to happen.

I think the idea behind the fist-pump is that you can work your way around the rewarding-enough-to-have-eaten-it-already problem if you identify a reward with higher value if you fulfill the victory conditions than if you don’t. Presumably for some people, a fist pump may genuinely accentuate the feeling of accomplishment after completing a task (while being pointless and stupid otherwise, of course,) while perhaps for you the silliness of it is too much of a hurdle.

I wrote an article in LessWrong’s discussion section about trying this sort of thing with technology. So far, it’s worked, but not amazingly well; I’m fairly sure that’s got something to do with a bad reward structure, though, and I’ll probably report back in a few weeks once I’ve got the next iteration implemented.

Okay, I edited that out because you’re right that it’s an unfairly high bar for comparison. Where I would have gone with that is to say that regarding food, whenever the conditions for the reward are met (I’m hungry) I will choose to eat unless there’s some very strong reason not to.

In fact, sometimes when I’m hungry, I do feel the urge to chew, to sit down and stand up, to walk into the kitchen even if I am waiting on food/ not planning on eating for another reason. Generalizing from One Example and all that.

And I daydream about imagined success all the time. It sounds straight up impossible to use the emotions from a daydream to fuel activity though: if I craved the feeling that much my brain would sabotage any attempts I made to control it.

I tried self-conditioning for diet and productivity and tidiness.
Didn’t work. Didn’t keep it up at all.
I have come to the conclusion that a.) losing weight happens only when I’m not looking; b.) I will always be as messy as i can get away with; c.) while I will never slack off to the point of putting myself in danger, I will always slack off more than some people, because I damn well like it. And behaviorism can’t do anything about that.

I’ve never made it a structured experiment, but something with elements of #2 and #3 has worked for me. One factor you haven’t included, is what would you otherwise be doing that the reward displaces?

In the past, when I’ve got ‘on a roll’ getting one chore done, I’ve followed it by trying to do another and another, and/or ‘beating myself up’ for not doing this many chores every day. Any sort of reward on the order of a candy, a cigarette, etc also provides a few minutes of leisure, relaxation — preventing that negative result.

I think you did a pretty good job ripping apart the behaviorism aspects of this, but I think there still might be something to it.
To me, the fist-pump thing isn’t so much a reward creating positive association as a trick to get your brain into gear. Most of what causes procrastianation is the intial flinch away from starting and if you can overcome that by creating an artificial good feeling then you ought to be able to ride that brief feeling past the initial negative flinch.
As for minds not being able to generate their own good feelings, I think there is ample literature on deliberate facial expressions being able to generate corresponding emotions in oneself. Purely anecdotally, when I smile it makes me feel slightly better even if its forced. I remember similar findings from the guys who study microexpressions. A fist pump and mimed “yes” could likewise plausibly create artificial feelings of success which you can use as part of a success spiral and to overcome the negative flinch that starting work entails.

“And over several billion years of evolution, the brain has every incentive to get rid of that behavior if indeed it was ever possible.”
And therefore people can’t possibly be foolhardy, or feel a need to do something dangerous if its expected of them (a good hunter, to preserve his standing, being less likely to run etc).

That said, I knew there was a reason that this didn’t register in my mind as worthy of investigation, and that reason was overjustification effect.

But isn’t this the same basic principle we were all reared on: “eat your vegetables first, then you can have dessert”?

I certainly don’t think going “Woo-hoo! I am going to vacuum the heck out of these carpets! Yeah! Go me!” is going to work any way at all in getting me to overcome my procrastination about housework and other necessary tasks. But I have done the “Okay, get this job finished now and then you can engage in that pleasant activity you want to do” self-bribery bit.

To take your analogy about the candy bars in the fridge, you must want to eat them because you wouldn’t have bought them otherwise. So rationing them out as the ‘carrot’ after doing some boring/unpleasant task works better to get you to do that dull job faster, can be a motivation to do it now rather than put it off (I want that reward, so the sooner I start, the sooner I’ll finish and can have it) and is probably better for your health than sitting down and eating an entire bag of fun-size Mars bars in one go (even if the sweet, caramelly, chocolaty goodness is calling like a siren “Eat me, eat meeeeee…”).

A thing I have observed in myself is that habits are hard to gain, and once gained hard to loose. Which means (to me) that if I want to gain or loose a habit I have to work very hard for a short while, and thereafter need not work at all hard.

I’ve never managed to condition myself into habits by offering myself rewards; indeed even externally controlled rewards have never helped. During the “working hard” phase however I have had success with reminding myself regularly *why* I am making this change; although I think more success with simply setting regular alarms to remind me what it is I’m supposed to be doing. Sometimes simply framing a desired-habit in a way that lets me set alarms is very helpful – “change sheets first Saturday of the month” is a LOT easier to manage (and harder to weasel out on) than “change sheets more often” (for me).

A note regarding immediacy of reward, once you get something you do get an instant let up on the neurological side of things, the craving is satisfied, and though the actual reward from the activity may still lay in the future, you are getting “something” out of it almost instantly

> And over several billion years of evolution, the brain has every incentive to get rid of that behavior if indeed it was ever possible. Imagine a world in which our own thoughts and feelings can be strongly reinforcing. You’re a caveman, encountering a saber-toothed tiger. You have two choices. You can either feel fear, which is an unpleasant emotion. Or you can feel happiness, which is a pleasant emotion. First you try feeling fear, but that’s unpleasant! You don’t like fear! The feeling of fear is negatively reinforced and your brain learns to stop feeling it. Then you try happiness! You like happiness! The decision to feel happiness is positively reinforced. Yes, you decide, saber-toothed tigers are wonderful things and you are overjoyed there is one in front of you getting into a pouncing position and licking its lips and…well, this caveman isn’t going to live very long.

I look forward to your post debunking the widespread claims that activities like ‘masturbation’ exist in the animal kingdom or _homo sapiens_.

– People do not masturbate when being attacked by a saber-tooth tiger, unless they have a really weird fetish

– Masturbation does have this failure mode of being so tempting that lots of people engage in it all the time. There are no subreddits for people trying to overcome their addiction to victory gestures.

– Most important, this seems likely to be some sort of spandrel – it’s evolutionarily necessary for people to get pleasure out of sex, which means the genitals have to be sensitive, which means once we got really dextrous hands we could stimulate them ourselves. Because this is evolutionarily novel and doesn’t seem to prevent people from having sex too much it hasn’t been selected against too strongly, but it does seem to be a bug that snuck through rather than a feature of the way reinforcement works.

Neither are pumping your arm, going to the refridgerator, feeding yourself an M&M, recoiling from a mouse…

– People do not masturbate when being attacked by a saber-tooth tiger, unless they have a really weird fetish

> – Masturbation does have this failure mode of being so tempting that lots of people engage in it all the time. There are no subreddits for people trying to overcome their addiction to victory gestures.

Indeed. Perhaps you see why I brought it up.

> – Most important, this seems likely to be some sort of spandrel – it’s evolutionarily necessary for people to get pleasure out of sex, which means the genitals have to be sensitive, which means once we got really dextrous hands we could stimulate them ourselves. Because this is evolutionarily novel and doesn’t seem to prevent people from having sex too much it hasn’t been selected against too strongly, but it does seem to be a bug that snuck through rather than a feature of the way reinforcement works.

That’s a good point to raise. However, I would at least consider the possibility—given the ubiquity of masturbation in the animal kingdom—that there are significant selective advantages as well as disadvantages to this behaviour. A quick google search turned up this paper, with a rather amusing title, whose abstract suggests that masturbation could serve a grooming function and reduce the probability of one’s getting an STI. I also vaguely recall reading that male masturbation clears out dead sperm, which is advantageous to one’s genes in the event of subsequent coitus.

Scott refers to this behaviour as a spandrel, but that reminds me of Daniel Dennett’s argument in Darwin’s Dangerous Idea that architectural and evolutionary spandrels are in fact optimised, only within the fairly tight constraints of a given structure’s design-space. To call a distinct biological feature a “spandrel” is always a matter of degree; identifiable features are always somewhat constrained by other aspects of an organism’s design, but rarely so constrained as to have been shaped purely by these contraints, without even small modifications due to evolutionary selection.

Exactly where masturbation lies along this continuum is, I suggest, not easy to say, although I’m sure it is relatively more shaped by constraints than other behaviours.

Still, I think Scott’s point stands that the existence or non-existence of human traits in areas of design-space that we do not expect to be particularly constrained, probably including our ability to condition ourselves, can be deduced by simple evolutionary arguments.

I don’t know about big complex tasks, but I tried this once for a bad habit, and it worked impressively well. I used to pick my nose as a kid… and then I didn’t stop when I grew up. When I heard about the inner pigeon idea, I thought I’d give it a shot. Every time I noticed an inclination to reach my hand up, or that I was anywhere in the process of nose-picking, I would pump my fist and go “YES!” (this is my happy gesture-button).

The idea was to reinforce my own noticing. Since this action is generally considered gross and/or shameful, my brain generally tried to avoid thinking about it, which meant I definitely would never notice I’d done it until it was too late. By rewarding myself for noticing (whenever it happened) I taught my brain that it was a good thing to think about.

I think perhaps this functions in a slightly different way than Skinner’s pigeons though. It’s almost like I’ve made a game out of noticing my brain’s urge, and I get an (uncounted) point every time I successfully do so. Regardless, this made me more aware of these urges, which meant I started noticing more and more when it was just my nose feeling itchy or my hand moving up. I would then reward myself and not bother actually doing it.

Within the first day, this almost completely eliminated the habit, although I forgot to go back for vaccines 2 and 3 so I confess that it’s not quite gone. However, during this comment I noticed once during, and YES’d, then not a minute later I noticed beforehand and YES’d again.

I believe this could work for a number of these sorts of impulses, although I haven’t yet tried (upon reflection, this would be really valuable; adding near top of queue)
• the urge to open facebook/twitter/hackernews/reddit/etc in a new tab
• the urge to go on the internet if not already on
• the urge to switch away from this tab where I’m writing a blog comment and go check my email (has happened twicethrice four times so far)
• the urge to grab a snack when I’m already full
• the urge to click on a link that is linkbaity but that I don’t anticipate actually being valuable.

In addition to helping notice these urges, it could also be valuable for noticing thoughts to the effect of “this may not be a valuable use of my time” or “I’m doing something I don’t want to be doing”. Normally, my brain shies away from those, because if that’s true, it means I’ve been wasting my time. However, like being wrong, the only way to fix that is to admit it (to yourself, at least). This ranges from:
• “Why am I still reading this site?”
• “I’m not really sure what I’m getting out of this video…”
• “This conversation/person-I’m-talking-to isn’t really very interesting…”
• “Gah, I was going to submit that form this morning and I forgot…”
… and of course any ugh field or thought about a belief that’s generally aversive.

Since there can be a fair bit of cognitive overhead to this at the start, I would recommend starting by focusing on only one type of thought or urge at a time, but it’s fun to do. I had to leave for a few hours in the middle of typing this and my attention to it while writing made it really easy to apply it to both the original unhygenic habit mentioned and also to the email-checking impulse when I came back.

Writing this post has made me realize that there’s a lot of really low-hanging fruit for me here, and so I’m going to try adding a new noticing every few days for the next while. I’ve made this a blog post for myself as well, and will post results there.

“You’re a caveman, encountering a saber-toothed tiger. You have two choices. You can either feel fear, which is an unpleasant emotion. Or you can feel happiness, which is a pleasant emotion. First you try feeling fear, but that’s unpleasant! You don’t like fear! The feeling of fear is negatively reinforced and your brain learns to stop feeling it. Then you try happiness! You like happiness! The decision to feel happiness is positively reinforced. Yes, you decide, saber-toothed tigers are wonderful things and you are overjoyed there is one in front of you getting into a pouncing position and licking its lips and…well, this caveman isn’t going to live very long.”

Gah. This is not how it works! Behaviors, not emotions are conditioned. Attacking tigers are a (likely unconditioned?) cue to feel fear. When in a fearful state, the person is likely to try running away. The behavior of running away from them is negatively reinforced because then the scary tiger is further away. The person will run next time.

To change emotional responses to things, you need classical, not operant conditioning.

It seems obvious to me that our own internal processes can and do reinforce behaviors all the time! For example, I think happy thoughts about how I’ll thank myself later to reinforce plugging in my phone. I don’t use victory gestures or candy, but I definitely use the feeling of “yay, I’m doing the thing I thought would work!”

In the case of the dinner plate, I would say that setting the table for dinner is something System 2 has under stimulus control. You know you’ll be rewarded iff you’ve been given the “set the table” cue, so you don’t do the behavior otherwise. I don’t see a similar issue arising with the self-reinforcement plan

I’m also a bit of a skeptic about the whole extrinsic rewards reduce motivation to do the thing when they’re removed business. I think that can happen, but a different thing can also happen where you’ve created a habit that’s easier to maintain even after the reward are gone. I would expect internal framings to matter quite a bit here.

Meta

Subscribe via Email

Email Address

Triplebyte is building an objective and empirically validated software engineering recruitment process. We don’t look at resumes, just at whether you can code. We’ve had great success helping SSC readers get jobs in the past. We invite you to test your skills and try our process!

AISafety.com hosts a Skype reading group Wednesdays at 19:45 UTC, reading new and old articles on different aspects of AI Safety. We start with a presentation of a summary of the article, and then discuss in a friendly atmosphere.

Jane Street is a quantitative trading firm with a focus on technology and collaborative problem solving. We're always hiring talented programmers, traders, and researchers and have internships and fulltime positions in New York, London, and Hong Kong. No background in finance required.

Beeminder's an evidence-based willpower augmention tool that collects quantifiable data about your life, then helps you organize it into commitment mechanisms so you can keep resolutions. They've also got a blog about what they're doing here

Altruisto is a browser extension so that when you shop online, a portion of the money you pay goes to effective charities (no extra cost to you). Just install an extension and when you buy something, people in poverty will get medicines, bed nets, or financial aid.

Metaculus is a platform for generating crowd-sourced predictions about the future, especially science and technology. If you're interested in testing yourself and contributing to their project, check out their questions page

Giving What We Can is a charitable movement promoting giving some of your money to the developing world or other worthy causes. If you're interested in this, consider taking their Pledge as a formal and public declaration of intent.

MealSquares is a "nutritionally complete" food that contains a balanced diet worth of nutrients in a few tasty easily measurable units. Think Soylent, except zero preparation, made with natural ingredients, and looks/tastes a lot like an ordinary scone.

80,000 Hours researches different problems and professions to help you figure out how to do as much good as possible. Their free career guide show you how to choose a career that's fulfilling and maximises your contribution to solving the world's most pressing problems.

Nectome is building the first brain preservation technique to verifiably preserve your memories for the future.