People are still bad at gauging their own interview performance. Here’s the data.

interviewing.io is a platform where people can practice technical interviewing anonymously, and if things go well, get jobs at top companies in the process. We started it because resumes suck and because we believe that anyone, regardless of how they look on paper, should have the opportunity to prove their mettle.

At the end of 2015, we published a post about how people are terrible at gauging their own interview performance. At the time, we just had a few hundred interviews to draw on, so as you can imagine, we were quite eager to rerun the numbers with the advent of more data. After drawing on roughly one thousand interviews, we were surprised to find that the numbers have really held up, and that people continue to be terrible at gauging their own interview performance.

The setup

When an interviewer and an interviewee match on interviewing.io, they meet in a collaborative coding environment with voice, text chat, and a whiteboard and jump right into a technical question (feel free to watch this process in action on our recordings page). After each interview, people leave one another feedback, and each party can see what the other person said about them once they both submit their reviews.

If you’re curious, you can see what the feedback forms look like below — in addition to one direct yes/no question, we also ask about a few different aspects of interview performance using a 1-4 scale. We also ask interviewees some extra questions that we don’t share with their interviewers, and one of those questions is about how well they think they did. For context, a technical score of 3 or above seems to be the rough cut-off for hirability.

Feedback form for interviewers

Feedback form for interviewees

Perceived versus actual performance… revisited

Below are two heatmaps of perceived vs. actual performance per interview (for interviews where we had both pieces of data). In each heatmap, the darker areas represent higher interview concentration. For instance, the darkest square represents interviews where both perceived and actual performance was rated as a 3. You can hover over each square to see the exact interview count (denoted by “z”).

The first heatmap is our old data:

And the second heatmap is our data as of August 2016:

As you can see, even with the advent of a lot more interviews, the heatmaps look remarkably similar. The R-squared for a linear regression on the first data set is 0.24. And for the more recent data set, it’s dropped to 0.18. In both cases, even though some small positive relationship between actual and perceived performance does exist, it is not a strong, predictable correspondence.

You can also see there’s a non-trivial amount of impostor syndrome going on in the graph above, which probably comes as no surprise to anyone who’s been an engineer. Take a look at the graph below to see what I mean.

The x-axis is the difference between actual and perceived performance, i.e. actual minus perceived. In other words, a negative value means that you overestimated your performance, and a positive one means that you underestimated it. Therefore, every bar above 0 is impostor syndrome country, and every bar below zero belongs to its foulsome, overconfident cousin, the Dunning-Kruger effect.1

On interviewing.io (though I wouldn’t be surprised if this finding extrapolated to the qualified engineering population at large), impostor syndrome plagues interviewees roughly twice as often as Dunning-Kruger. Which, I guess, is better than the alternative.

Why people underestimate their performance

With all this data, I couldn’t resist digging into interviews where interviewees gave themselves 1’s and 2’s but where interviewers gave them 4’s to try to figure out if there were any common threads. And, indeed, a few trends emerged. The interviews that tended to yield the most interviewee impostor syndrome were ones where question complexity was layered. In other words, the interviewer would start with a fairly simple question and then, when the interviewee completed it successfully, they would change things up to make it harder. Lather, rinse, repeat. In some cases, an interviewer could get through up to 4 layered tiers in about an hour. Inevitably, even a good interviewee will hit a wall eventually, even if the place where it happens is way further out than the boundary for most people who attempt the same question.

Another trend I observed had to do with interviewees beating themselves up for issues that mattered a lot to them but fundamentally didn’t matter much to their interviewer: off-by-one errors, small syntax errors that made it impossible to compile their code (even though everything was semantically correct), getting big-O wrong the first time and then correcting themselves, and so on.

Interestingly enough, how far off people were in gauging their own performance was independent of how highly rated (overall) their interviewer was or how strict their interviewer was.

With that in mind, if I learned anything from watching these interviews, it was this. Interviewing is a flawed, human process. Both sides want to do a good job, but sometimes the things that matter to each side are vastly different. And sometimes the standards that both sides hold themselves to are vastly different as well.

Why this (still) matters for hiring, and what you can do to make it better

Techniques like layered questions are important to sussing out just how good a potential candidate is and can make for a really engaging positive experience, so removing them isn’t a good solution. And there probably isn’t that much you can do directly to stop an engineer from beating themselves up over a small syntax error (especially if it’s one the interviewer didn’t care about). However, all is not lost!

As you recall, during the feedback step that happens after each interview, we ask interviewees if they’d want to work with their interviewer. As it turns out, there’s a very statistically significant relationship between whether people think they did well and whether they’d want to work with the interviewer. This means that when people think they did poorly, they may be a lot less likely to want to work with you. And by extension, it means that in every interview cycle, some portion of interviewees are losing interest in joining your company just because they didn’t think they did well, despite the fact that they actually did.

How can one mitigate these losses? Give positive, actionable feedback immediately (or as soon as possible)! This way people don’t have time to go through the self-flagellation gauntlet that happens after a perceived poor performance, followed by the inevitable rationalization that they totally didn’t want to work there anyway.

1I’m always terrified of misspelling “Dunning-Kruger” and not double-checking it because of overconfidence in my own spelling abilities.↩

Share this:

Aline Lerner

Posted on 3:37 am November 3, 2016.

Isn’t “People Can’t Gauge Their Own Interview Performance” exactly the opposite of what that graph shows? You’ve got 47% of people being bang on in their estimate of how they did, with another 45% of people being only one point off.

Posted on 11:03 am November 3, 2016.

To us, the fact that less than half of people can predict their own performance counts as not really being able to do it. And on a 4 point scale, 1 point can make a huge difference. Sure, they’re not catastrophically bad, but I’d say this would fall under the umbrella of “can’t do it.”

Posted on 5:47 am September 9, 2016.

I’m not sure why you assert that “How well do you think you did?” and “How were their technical skills?” would match up in the first place. Technical skills are hardly the only thing employers hire for. For example, suppose someone has terrific technical skills, which come across in the interview, but is a real jerk. The interviewer might recognize those technical skills, and mark them highly for them, but still recommend a no-hire. If the interviewee accurately assesses the interview, they’d correctly give a low score to “How well do you think you did?”.

In other words, even if all interviewees assessed their performance accurately, there still might not be that high a correlation between those two questions.

Posted on 6:16 pm September 8, 2016.

Posted on 2:00 pm September 8, 2016.

What’s really interesting to me is that, in addition to linking Dunning-Kruger and Impostor Syndrome around zero, the last paragraph applies well beyond the interview stage. That is, in my experience, the single most consistent management failure across all types and sizes of companies is a lack of timely and actionable feedback. In short, “didn’t want to work there” can occur after the hire as well as before, and a management culture constructed around tacit praise (or condemnation) isn’t doing anyone a favor.