This paper contrasts measures of teacher effectiveness with the students' evaluations for the same teachers using administrative data from Bocconi University (Italy). The effectiveness measures are estimated by comparing the subsequent performance in follow-on coursework of students who are randomly assigned to teachers in each of their compulsory courses. We find that, even in a setting where the syllabuses are fixed and all teachers in the same course present exactly the same material, teachers still matter substantially. The average difference in subsequent performance between students who were assigned to the best and worst teacher (on the effectiveness scale) is approximately 43% of a standard deviation in the distribution of exam grades, corresponding to about 5.6% of the average grade. Additionally, we find that our measure of teacher effectiveness is negatively correlated with the students' evaluations: in other words, teachers who are associated with better subsequent performance receive worst evaluations from their students. We rationalize these results with a simple model where teachers can either engage in real teaching or in teaching-to-the-test, the former requiring higher students' effort than the latter. Teaching-to-the-test guarantees high grades in the current course but does not improve future outcomes. Hence, if students are myopic and evaluate better teachers from which they derive higher utility in a static framework, the model is capable of predicting our empirical finding that good teachers receive bad evaluations, especially when teaching-to-the-test is very effective (for example, with multiple choice tests). Consistently with the predictions of the model, we also find that classes in which high skill students are over-represented produce evaluations that are less at odds with estimated teacher effectiveness.