DeSoto, K. A. (2013, October). Confidence and accuracy in memory: Methods, analyses, and theory. Talk given at the Washington University Department of Psychology Behavior, Brain, and Cognition Colloquium, St. Louis, Missouri.

When an individual is very confident in recalling a particular memory, is that person also likely to be accurate? I research this topic in the Memory Lab at Washington University Psychology Department in St. Louis, MO, where I received my master's degree and am currently a Ph.D. candidate. The relationship between confidence and accuracy is complex, and my colleagues and I are attempting to untangle these issues.

I am also interested in applying principles of cognitive psychology to education. Most recently, I have worked to examine the degree to which retrieval practice (e.g., taking tests) supports long-term learning as measured by tests of recognition (e.g., multiple choice tests).

I am also currently researching how individuals choose to conduct their own learning in a series of studies investigating self-allocated study time in the learning of

natural concepts.

I have taught at the undergraduate and graduate level. My goal as an instructor is to fuse the groundbreaking research happening at Washington University with the fundamental concepts emphasized by the course to make topics of learning both relevant and timely.

I received my undergraduate degree in Psychology and Computer Science from the College of William & Mary in Williamsburg, VA, where I did my senior thesis on eye movements while zoning out during reading. At William & Mary, I was involved in the university's wind ensemble, started up the first student podcast on campus, and blogged about technology and social media.

Nowadays, when I am not in the lab, I can usually be found exploring the city of St. Louis or engaging in one of my hobbies -- technology and digital media, music (piano, tuba), or being an avid Yelper. I have enjoyed biking around town and learning how to keep green things alive in my plot in the community garden. I am also a member of the Compton Heights Community Band, which plays over a dozen free outdoor concerts during the summer and a large pops spectacular late December.

I've been using a Jawbone UP 24 to record total steps, total sleep, and total workout time since January 2014. I use it to track other stuff, too, but I'm most confident about these estimates. Here are my numbers from 2014.

My numbers are pretty decent! Just under seven hours of sleep is totally reasonable (these estimates exclude time I'm in bed but am awake -- e.g., if I can't fall asleep). I break the 10,000 steps per day recommendation with a little room to spare. And I manage to work out for 30 minutes a day, on average, mostly thanks to the first half of the year.

August is a clear outlier. What was I doing in August? Not much, apparently. Actually, I spent about half the month on a trip to Puerto Rico and moving Becky back to Virginia, so there was a lot going on -- just none of it was all too active.

The #memorylab is quieting down before the winter holidays, and personally, there's no exception. I met with Roddy today to discuss some of our projects as we transition into the Spring 2015 semester. There's dissertation progress, as usual -- four experiments collected and basically analyzed -- but the presidents paper has got us interested in a few ideas involving collective memory. One project is a followup to the presidents paper, and another one is something different exploring recent events in St. Louis. Adam and I are collaborating on this front. The project with Steve is also developing, too.

Inspired by some of the highly superior memory folks we've had come visit (e.g., Nelson Dellis), I've been keeping a paper diary for about a month and a half now to see if it seems to improve my autobiographical memory. My unscientific intuition is that the thing isn't making a bit of difference. I do have some recollection of specific entries (e.g., one in which I was extremely angry), but nothing else feels qualitatively different. Is it a failed experiment so far? I don't know. One nice aspect is that a lot of exciting things have happened for me and others over the last 90 days or so. In that respect, it is nice to have an as-it-happens look at these different events. Will I ever go back and rehearse my memories of these events? Unlikely.

Meanwhile, it's been about three weeks since I switched to my Android phone (a Galaxy Note 4). So far I really like it. I love the size of the screen, the attached stylus, and the flexibility (e.g., if you want to modify the LED colors, you just download an application that lets you do this). I do miss the overall ease of sending text messages with the iPhone, but I think when Android Lollipop gets released for the phone in a month or two, hopefully, on-screen notifications will ease this burden. The device certainly feels further in the computer/tablet direction on the phone to tablet to computer spectrum. Overall, this is OK. I have been having a strange bug involving receipt of MMS messages, though, which I'd like to figure out at some point.

It was a good week in the #memorylab. I made my first public radio appearance on Tuesday, on St. Louis on the Air. You can listen to the 30-minute segment by clicking here. I had a great time, and the folks in the St. Louis Public Radio office were very professional and friendly. Hopefully I'll do something worth inviting me back for someday.

We also said farewell to Steven Smith, who had been visiting WUSTL for the semester and is now headed back to his post at Texas A&M University. He was in St. Louis just long enough for us to put our heads together for the beginnings of a project investigating high confidence false memories for face materials. See a recent paper by Deffler, Brown, and Marsh (2014) for related research.

Deniz and I have been working hard on interpreting the data we collected for the fourth study of my dissertation. Whether it goes into the actual thesis remains to be seen, though, since it is a pretty thick dataset.

And tonight is the first Psychology Department Holiday Party in my six years here that's not being held in Room 216. If you're in the department and have RSVP'd, I hope to see you Upstairs at the Cheshire this evening. The festivities won't be too prolonged, though, considering there's a Frostbite Series 12K at 8:30 the next morning!

Been a hectic week with a paper release imminent and Psychonomics on its way. Not to mention the Thanksgiving holiday is up in a little over a week, too. Well, the good news is with the November holidays over I'll be able to focus full-time on my dissertation again, and get that ball rolling for what should be the final time. Once we hit December it should be a good kind of freefall from there.

Another project I've been thinking about is something that's in the works with Dr. Steve Smith, who's visiting from Texas A&M this year. We're interested in some forensic questions that are a bit more applied than the other things I've tacked in graduate school. Hopefully there will be some interesting things to report in this space a little later.

But, as the title of the post says, with everything that's going on, it's a bad time to have doubled (tripled?) my daily coffee intake. I wonder if it's affecting my sleep, too, although a recent night of sleep (slept in a bit, don't judge) looks pretty unconcerning:

Most of the lab is spending this week gearing up for Psychonomics, the big experimental psychology convention that's being held this year in Long Beach, CA. Perhaps we will see you there! Here's where you can see #memorylab members in the wild (members in bold):

Submitted what's probably the last version of our page proofs for our paper today. This means that it's likely the article will be published sometime before the end of the year, and quite possibly before the end of November -- who knows, it's hard to say. Continue to stay tuned; there's probably nothing more to say on the manuscript until it's published.

Which reminds me that I need to work on an IRB modification to get the task up and online. This needs to happen pretty quickly, so I should stop writing updates and get on that immediately.

Looked over Adam's paper draft that he'll be resubmitting in a few hours. It's looking pretty good. Will try and link to it once it's published.

The majority of the day was spent working on proofreading and editing the Science paper, and other rigamarole associated with getting the publication out the door. The page proofs are looking really nice and it'll be exciting to share it with readers here before the end of the year.

Also spent some time working on a few professional development issues.

Deniz is running seven (or so) subjects as we speak.

And lastly, the day was capped off by a pleasant surprise visit by Andrew Butler, Assistant Professor at the University of Texas at Austin. He's a fabulous researcher who is bound to continue a great career in the land of beef brisket.

15-E4 was only set up to take 80 different subject numbers, except this experiment's going to require 80 folks at least -- probably more like 96 - 128. So I went in to the source code and increased the range of allowable subject numbers.

I took that as an opportunity to create a new Github repository for the source code of the experiment. You can get to it by clicking here. Since I do most of my coding on the timeline, things don't work as well as they ought to, but you can still download and explore the software yourself if you're interested.

Before long, it won't be the space that's the seller, rather, who can offer the best search, organization, filtering, etc. In my opinion Google's top here for the time being, partly because of the auto-enhance features.

The more places can host your photos for free, the more likely they are to be backed up in more than one location. But that also means that's one more place your photo is floating around for someone to access it against your will. Say you take a photo you didn't mean to and it's automatically beamed to an unlimited number of backup systems. How do you ever make sure you've deleted it! Security's also going to be more important in this area.

For lab meeting this week we're reading a testing effect review by Rowland (2014). It is a heroic effort. The main ideas are summed up well on p. 21, in the Conclusions section, where it is noted that a retrieval difficulty hypothesis is supported, whereas a transfer-appropriate processing account is not. Additional support was found for semantic elaboration accounts, like Mary's mediator hypothesis, but the bottom line is that currently, no one unifying account can explain all testing data.

Another interesting finding was that benefits of testing emerged immediately as well as a delay, which dovetails with some things Karpicke has been saying lately about the "crossover" in testing being an artifact of test difficulty and item selection effects.

The somewhat unsatisfying, but honest, conclusion is on p. 22, where it is stated that "the underlying mechanisms that produce the effect remain elusive," and that "the testing effect is likely to reflect multiple memory mechanisms." In other words, there's lots more work to be done.

The IRB for my main research line was expiring soon, so this morning I submitted a continuing review form to keep it open, noting that 63 additional subjects had been consented since the last time I did the review. I also added Lena to the IRB so that she can help with data collection in the case that I am incapacitated (or, more likely, am unable to make into the lab for whatever reason).

Up to 61 subjects for the fourth dissertation experiment. This is great as it means we have 15-16 subjects in each of the four conditions. There was a no-show in one of the sessions earlier today, which means we need to go 58, 63, 64, ... when data collection resumes on Monday afternoon. Things are coming along, which is great.

And I'm wiped! This puts us at a total of 48. That's probably half of what we want at a minimum, with 24 per condition, although you could possibly go down to 20 or 16 if you really wanted to. We'll see what time and resources allow.

Yesterday I also installed air fresheners and put Lysol wipes in the labs to keep things clean. With so many subjects, the testing computers do actually start to get pretty gross. These safeguards will hopefully prevent against things, at least to a certain extent.

That's all for today. Not much to update. We had an interesting lab meeting where we read Storm, Friedman, Murayama, and Bjork (2014) and Peterson and Mulligan (2014). Interesting papers related to the testing effect. I'll save my thoughts.

I proposed an additional dissertation experiment and am amid data collection for it. My research assistant Deniz and I have consented 40 subjects. Two subject numbers were unable to complete the study in the allotted time (#13 and #14). We re-ran a new #14, but still need to re-run a #13. Somehow we also skipped #10. Thus, when data collection resumes tomorrow, we'll do 10, 13, and 41. The reason subject numbers are important in my study is that they're tied in the programming to the counterbalancing groups and experimental conditions. Thus, if one number is no good, we just throw it out and get another one.

Of course, the reason to go in and look at the data mid-data collection is just to ensure that everyone's completing the study as expected and that there are no bugs or glitches in the program. This process alerted me to one potential issue -- currently, if someone doesn't make a source decision (i.e., they selected NEW on the source test), I grade their source accuracy as INCORRECT. Obviously this is wrong -- when computing overall source accuracy, we should only look at OLD responses.

This has the potential to get pretty sticky and confusing as things progress, so I'll have to think carefully about these issues while working on the writeup.

It's a good day today -- we found out that our paper was accepted for publication in Science. I've been vague when discussing it for a while now (heck, maybe for even a year or so) as to not spoil the punchline, and it turns out that the contents are under embargo for at least another month or two anyway, so I can't share many details. What I can share, though, is that it should be a neat paper of general interest. Hopefully, too, it'll start to raise some interesting questions for collective memory researchers, as well as suggest some new ways of considering an interesting set of questions in a quantitative way. Keep an eye on this space and I'll start sharing things as soon as I have the opportunity.

It's been about three months without any updates here -- my apologies about that. The reason is that I stopped using Dropbox, and without Dropbox, there's no way to update this blog. There have been a few times when I've wanted to write a post, though, and for that reason I decided to reactivate Dropbox just for blogging purposes. So, hopefully, there'll be more of this to come.

Why did I stop using Dropbox, you might ask? Through a pretty roundabout set of developments. I've fallen in love with Google+ Photos. I love how the pictures I take on my iPhone are uploaded to the cloud, auto-enhanced, and organized in a meaningful way. I ran out of space on my Google Drive because I uploaded some photos. Because it didn't make sense to maintain some files in Dropbox and some in Google Drive, I eliminated Dropbox -- after all, their photo tools are inferior, even considering Carousel (which I haven't looked into in a while; I ought to).

So bottom line -- Google+ Photos are awesome. They're so good, in fact, I think I'm going to get a Nexus 6 when they're announced instead of an iPhone.

The last few weeks I've been spending a lot of time on a top-secret project and professional development-type things, so there hasn't been too much to discuss in research notes. The good news, though, is that (I believe) data collection is completed for the third experiment of my dissertation. We have 64 subjects, evenly balanced in terms of counterbalancing condition.

I've not done much in the way of analysis on these data but one thing I'm looking at is whether the general pattern of results replicate what I've discussed in earlier places. So far it looks as if this is the case, which is good to see.

Data collection is going well. There's no longer a need for 50-59 year olds, as you can see from the figure below:

Now, the slow wait for those 60 and up begins. We are getting about seven a day, assuming the rate between the 21st and today remains constant, so I expect this step should take a little over a week.

Meanwhile, I also ought to be considering those indicating they cheated on the test. We have 23 self-identified cheaters out of 480 subjects, which is roughly a rate of one in 20 -- not great. I'll have to decide whether to re-run an equivalent number of subjects or just leave it be (likely, I'll just leave it be).

We need 31 50-59 year olds and 72 60-69 year olds. I opened new MTurk slots for folks 50+ and hope to get those rolling in soon. Chances are good the 70-89 age range will go relatively unfilled; currently, that group has been less than half of one percent of our participants.

A few days ago I posted some calibration curves for remembering, knowing, and guessing. (It turns out they were constructed from pilot data instead of the full data set accidentally, but the same pattern bears out). In the third experiment of my dissertation -- I have 32 of 64 subjects' worth of data collected so far -- we get both recognition (old/new) accuracy and source accuracy and thus are able to plot two calibration plots, one for recognition and one for source. Take a look.

They look pretty similar, which is both good and bad news. First note that, at least by eyeball, there's a bigger difference at high confidence (that's the x-axis, which I'm noticing I've left out) for source memory than old/new accuracy for remember versus know. Some prior literature, however, predicts that that know curve for source accuracy should not really be exceeding any point falling on the remember curve. Clearly, though, 80-100 confidence knows are more accurate, in terms of source, than 0-79 confidence remember ratings.

So this is going to be something I have to think about over the next few days. I think it might come down to the instructions. In my study, subjects make a recognition + source decision, then confidence, than remember/know/guess. This order -- as well as the specific prompt for the confidence rating -- might end up being very important in this line of research. Hmm...

Quiet April here on the blog -- I spent most of the month running subjects and working on another project I can't talk too much about yet. In the meantime, though, I'm preparing for the Show Me Mini Mental Life Conference, a small biannual (that means once every two years, right, or is that "biennial") conference jointly hosted by Washington University and Mizzou a bit down I-64. I'm giving a talk on some of my dissertation data, which I've written about in other places (Project 15).

As you might know, my dissertation research is interested in confidence-accuracy differences as a function of remembering, knowing, and guessing. What we can do is plot calibration curves for the accuracy of responses given while in each qualitative state of remembering. Here's the figure:

As the figure shows, one type of response seems to pull apart from the rest. Namely, we see that remember responses given with high (e.g., 80-100) confidence appear to be significantly more accurate, on average, than remember responses given with lower confidence as well as any know or guess responses. Thus, the real-world implication is that if I'm sure I remember something, and can remember specific details about that memory (rather than a gut feeling or hunch that it happened), it's likely that the memory is an accurate one.

Thanks to data collection efforts last week I have seven subjects' worth of pilot data for Project 15 Experiment 3. The temporary Excel file is saved as "E2 Pilot Data Check.xlsx" in the Dropbox folder. The first step was to ensure that no data has been lost as a result of programming errors. The good news is that all subjects have 100 observations in the study phase and 300 in the recognition test phase, which means nothing's getting swallowed up. This is great.

Next is taking a look at recognition memory performance and comparing it with prior research. Right now "old" rates for targets, related lures, and unrelated lures are .81, .47 and .17 respectively. Data from Project 15 Experiment 2 are in the same ballpark (.71, .29, .17). So initial data collection suggests folks are more conservative in Experiment 3 than in Experiment 2, which is certainly a possibility.

So now some time needs to be spent thinking about which analyses are the crucial ones in Experiment 3. Let's recopy what I wrote as a summary of the Experiment 2 data:

Currently, there are not significant differences in calibration (between-subjects and between-events) between remember and know responses. There are, however, significant differences in resolution (within-subjects and within-items).

Therefore, Experiment 2 of my dissertation proves that qualitative state of remembering (remembering versus knowing) affects the strength of the confidence-accuracy correlation.

The main question is calibration and resolution for the source accuracy-confidence relationship, I think. Everything else is icing on the cake.

So far -- and this is a super quick analysis -- there's an interesting difference in confidence-accuracy relationship depending on whether we're looking at recognition or source accuracy.

I have been taking digital photographs since 2003. My collection is a mess, and I have no idea what to do about it. Everything from 2003-2009 is on an external hard drive, basically, and 2009 on is on the hard drive of the machine I'm updating currently. In 2003 it made sense to organize photos by albums, but in the modern era of cellphone cameras and one or two pictures per day forever, it makes more sense to organize in other ways (e.g., day/month/year, faces, locations). Much of my stuff is in Aperture right now, which I purchased thinking it would be a helpful upgrade over iPhoto, but it was a waste of money -- I feel it works even worse.

So what do I do? Here's what I'm thinking.

(1) Get all the files in one location, on my computer. File structure doesn't matter as much as having everything in one place does.

(2) Back those up on multiple drives in multiple locations, including the cloud (not sure where to go here -- Flickr offers the free terabyte, but I'd rather pay for Dropbox or Google. Meanwhile, most of my devices play the nicest with iCloud).

(3) Find some photo organizer/viewer that works the best. Whether this is iPhoto, Aperture, Picasa, or what I don't know and can't say. I guess this will take some research. (It seems as if photo organization has really taken a hit in the current era of the closed web, doesn't it?)

(4) Start to organize in a way that doesn't make me want to pull my hair out.