Engineers can’t gauge their own interview performance. And that makes them harder to hire.

interviewing.io is an anonymous technical interviewing platform. We started it because resumes suck and because we believe that anyone, regardless of how they look on paper, should have the opportunity to prove their mettle. In the past few months, we’ve amassed over 600 technical interviews along with their associated data and metadata. Interview questions tend to fall into the category of what you’d encounter at a phone screen for a back-end software engineering role at a top company, and interviewers typically come from a mix of larger companies like Google, Facebook, and Twitter, as well as engineering-focused startups like Asana, Mattermark, KeepSafe, and more.

Over the course of the next few posts, we’ll be sharing some { unexpected, horrifying, amusing, ultimately encouraging } things we’ve learned. In this blog’s heroic maiden voyage, we’ll be tackling people’s surprising inability to gauge their own interview performance and the very real implications this finding has for hiring.

First, a bit about setup

When an interviewer and an interviewee match on our platform, they meet in a collaborative coding environment with voice, text chat, and a whiteboard and jump right into a technical question. After each interview, people leave one another feedback, and each party can see what the other person said about them once they both submit their reviews. If both people find each other competent and pleasant, they have the option to unmask. Overall, interviewees tend to do quite well on the platform, with just under half of interviews resulting in a “yes” from the interviewer.

If you’re curious, we have a few public recordings of interviews done on the platform, so you can watch and see what an interview is really like. In addition to these, our feedback forms are attached below. There is one direct yes/no question, and we also ask about a few different aspects of interview performance using a 1-4 scale. We also ask interviewees some extra questions that we don’t share with their interviewers, and one of those questions is about how well they think they did. In this post, we’ll be focusing on the technical score an interviewer gives an interviewee and the interviewee’s self-assessment (both are circled below). For context, a technical score of 3 or above seems to be the rough cut-off for hirability.

Feedback form for interviewers

Feedback form for interviewees

Perceived versus actual performance

Below, you can see the distribution of people’s actual technical performance (as rated by their interviewers) and the distribution of their perceived performance (how they rated themselves) for the same set of interviews.1

You might notice right away that there is a little bit of disparity, but things get interesting when you plot perceived vs. actual performance for each interview. Below, is a heatmap of the data where the darker areas represent higher interview concentration. For instance, the darkest square represents interviews where both perceived and actual performance was rated as a 3. You can hover over each square to see the exact interview count (denoted by “z”).

If you run a regression on this data2, you get an R-squared of only 0.24, and once you take away the worst interviews, it drops down even further to a 0.16. For context, R-squared is a measurement of how well you can fit empirical data to some mathematical model. It’s on a scale from 0 to 1 with 0 meaning that everything is noise and 1 meaning that everything fits perfectly. In other words, even though some small positive relationship between actual and perceived performance does exist, it is not a strong, predictable correspondence.

You can also see there’s a non-trivial amount of impostor syndrome going on in the graph above, which probably comes as no surprise to anyone who’s been an engineer.

Gayle Laakmann McDowell of Cracking the Coding Interview fame has written quite a bit about how bad people are at gauging their own interview performance, and it’s something that I had noticed anecdotally when I was doing recruiting, so it was nice to see some empirical data on that front. In her writing, Gayle mentions that it’s the job of a good interviewer to make you feel like you did OK even if you bombed. I was curious about whether that’s what was going on here, but when I ran the numbers, there wasn’t any relationship between how highly an interviewer was rated overall and how off their interviewees’ self-assessments were, in one direction or the other.

Ultimately, this isn’t a big data set, and we will continue to monitor the relationship between perceived and actual performance as we host more interviews, but we did find that this relationship emerged very early on and has continued to persist with more and more interviews — R-squared has never exceeded 0.26 to date.

Why this matters for hiring

Now here’s the actionable and kind of messed up part. As you recall, during the feedback step that happens after each interview, we ask interviewees if they’d want to work with their interviewer. As it turns out, there’s a very statistically significant relationship (p < 0.0008) between whether people think they did well and whether they’d want to work with the interviewer. This means that when people think they did poorly, they may be a lot less likely to want to work with you3. And by extension, it means that in every interview cycle, some portion of interviewees are losing interest in joining your company just because they didn’t think they did well, despite the fact that they actually did.

How can one mitigate these losses? Give positive, actionable feedback immediately (or as soon as possible)! This way people don’t have time to go through the self-flagellation gauntlet that happens after a perceived poor performance, followed by the inevitable rationalization that they totally didn’t want to work there anyway.

Lastly, a quick shout-out to Statwing and Plotly for making terrific data analysis and graphing tools respectively.

1There are only 254 interviews represented here because not all interviews in our data set had comprehensive, mutual feedback. Moreover, we realize that raw scores don’t tell the whole story and will be focusing on standardization of these scores and the resulting rat’s nest in our next post. That said, though interviewer strictness does vary, we gate interviewers pretty heavily based on their background and experience, so the overall bar is high and comparable to what you’d find at a good company in the wild.

2Here we are referring to linear regression, and though we tried fitting a number of different curves to the data, they all sucked.

3In our data, people were 3 times less likely to want to work with their interviewers when they thought they did poorly.

Share this:

Aline Lerner

Posted on 11:53 am December 17, 2015.

This is bunch of non sense. I disagree with this author at a fundamental level. If resumes “suck” solving algorithms also doesn’t reflect how well a candidate can perform in the Job. I am a VP of Engineering at a startup in Bay area and I have interviewed lot of candidates who are very good at Algorithms but fail to succeed in my company.

Posted on 8:43 pm December 15, 2015.

Could you break out the correlation between candidate self-assesment and wanting to work with the interviewer by whether or not the interviewer would want to work with them, the interviewer’s rating of the candidate, or both?

Posted on 11:34 pm December 15, 2015.

Interesting article! I was wondering about that one really actionable component…

At some companies (especially larger ones, I imagine), the interviewer doesn’t have the final say even for a phone screen on whether someone will continue with the interview process. I’ve suggested to the committee for someone continue on and have them turned down anyway. So what is your advice for this situation, when giving genuine positive feedback immediately could result in a *really* awkward situation?

Posted on 12:07 am December 16, 2015.

I know that some hiring processes can be complicated and involve a lot of moving parts. In those cases, probably the best thing to do is to try to give feedback as quickly as possible. Even if it’s not immediate, I expect that if it’s done within a day or two, it’ll get the candidate before they have a chance to do the rationalization song and dance.

I’d be interested to figure out exactly where the line of demarcation is for something like this, i.e. exactly when the rationalization window closes.

That aside, I’d be curious about why the interviewer doesn’t have the final say for a phone screen. It makes sense when there are multiple interviewers involved, as there might be for an onsite… but in the case where it’s just one interviewer, it seems like turnaround should be pretty quick.

Posted on 5:23 pm December 15, 2015.

As far as final say, there are two things. The first is that someone may be extremely personable and, people not being entirely rational, an interviewer might give the benefit of the doubt when the evidence of technical skills isn’t there. A more impartial person on a committee wouldn’t have that bias. The other is to maintain a standard, which can also swing the other way — someone may say no to a candidate and the committee think the interviewer was too harsh.