Student evaluations

I remember, quite a few years ago, giving the same introductory logic course two years running, as far as I could tell doing as a good a job each time. But my student evaluations plummeted between one year and the next. Why? I could only put it down to the fact that the first year I gave the course in relaxed casual dress; the next year (because a committee was scheduled the same afternoons) I wore a rather serious suit. So I supposedly came across as remote, unhelpful, and harder to understand.

I was reminded of that experience — which made me permanently a tad sceptical about the worth of student evaluations — when I read these two scepticism-reinforcing pieces*, by the philosophers Michael Huemer and Clark Glymour. I was particularly amused (in a world-weary sort of way) by this excerpt from the former:

[There was a] study, in which students were asked to rate instructors on a number of personality traits (e.g., “confident,” “dominant,” “optimistic,” etc.), on the basis of 30-second video clips, without audio, of the instructors lecturing. These ratings were found to be very good predictors of end-of-semester evaluations given by the instructors’ actual students. A composite of the personality trait ratings correlated .76 with end-of-term course evaluations; ratings of instructors’ “optimism” showed an impressive .84 correlation with end-of-term course evaluations. Thus, in order to predict with fair accuracy the ratings an instructor would get, it was not necessary to know anything of what the instructor said in class, the material the course covered, the readings, the assignments, the tests, etc.

Williams and Ceci conducted a related experiment. Professor Ceci, a veteran teacher of the Developmental Psychology course at Cornell, gave the course consecutively in both fall and spring semesters one year. In between the two semesters, he visited a media consultant for lessons on improving presentation style. Specifically, Professor Ceci was trained to modulate his tone of voice more and to use more hand gestures while speaking. He then proceeded, in the spring semester, to give almost the identical course (verified by checking recordings of his lectures from the fall), with the sole significant difference being the addition of hand gestures and variations in tone of voice (grading policy, textbook, office hours, tests, and even the basic demographic profile of the class remained the same). The result: student ratings for the spring semester were far higher, usually by more than one standard deviation, on all aspects of the course and the instructor. Even the textbook was rated higher by almost a full point on a scale from 1 to 5. Students in the spring semester believed they had learned far more (this rating increased from 2.93 to 4.05), even though, according to Ceci, they had not in fact learned any more, as measured by their test scores. Again, the conclusion seems to be that student ratings are heavily influenced by cosmetic factors that have no effect on student learning.

So now you know: bounce in optimistically, wave your hands around confidently, and you can sell the kids anything …

And I should say that these days I always wear a suit to lecture (so I’ve a cast-iron excuse for any poor evaluations, of course).

Added For a bit of judicious balance, do read Richard Zach’s second contribution (Comment 12 below), and the linked paper.

*Links from twitter, thanks to John Basl and Allen Stairs

13 thoughts on “Student evaluations”

  1. That's interesting. I thought there was work showing that student evals are supposed to be higher if you dress better.

    We have the very weird practice (no doubt due to a desire to keep class sizes down) to offer the same course with the same instructor twice in the same term. So right now I teach intro logic twice in one day. In the past when I did this, evaluations were lower for the second course, even though both courses had exactly the same material, pacing, assignments, and tests. I chalked it up to being more engaged in the first section, while in the second section it might all have been a little too rehearsed. Maybe I just was tired, or got bored with lecturing since I had just done the same stuff. Maybe it was just that the students were tired (the second section was in the mid-afternoon).

  2. Yep, I bet smart casual trumps turning up in your beach wear: but maybe looking too much like an undertaker dampens the student spirits …

    More on sartorial impact. My then colleague Jenny Saul gave a first year intro to pol. phil. called "Justice and Gender", and got some pretty tough comments from students about being preached to by an American feminist. Next year, same lectures, she wore a skirt. Evaluations rocketed up.

  3. I like how Glymour's essay reveals that even when students learn *more* they may give lower evaluations.

    But I don't think these ideas show that a teacher should not for example care about gestures, tone of voice, or dress. It may be independent of learning but it is clearly not independent of ratings, which indicates to me that something is going on there with students' attitudes. Maybe they are happier or more entertained?

    It's true that the most important point is learning, but even if students learn exactly the same, and even if they mistakenly believe they have learned more, it is better that they enjoyed the course more, find the professor friendlier, or whatever the case may be.

  4. Alex: I agree, of course. Evaluations probably quite reliably reveal how enjoyable a course is felt to be. And that's just fine, so long as we are clear about the evaluations tell us. But the rhetoric from university management that surrounds student evaluation of courses isn't usually about measuring enjoyment!

  5. "Specifically, Professor Ceci was trained to modulate his tone of voice more and to use more hand gestures while speaking. […] Students in the spring semester believed they had learned far more (this rating increased from 2.93 to 4.05), even though, according to Ceci, they had not in fact learned any more, as measured by their test scores."

    Interestingly, teacher gesturing can help learning, but the content/strategy of the gesture has to be differ from the one present in the speech. So I'm guessing the professor's gesturing then either repeated the same information as the verbal lecture or was extraneous.

    Singer, M. A., & Goldin-Meadow, S. Children learn when their teachers’ gestures and speech differ.

    available here;

  6. Not to be overlooked either: the characteristics of the room. In a big class (225 +/-) my sense is that evaluations are lower if I'm on a platform above the students than if it's amphitheater-style and they are mostly above. Other colleagues have reported this as well.

  7. Interesting point, Allen. I usually have to give first year logic lectures in a room with a raised platform. But I don't use it, but talk from the floor, which I feel works better, esp. as I make heavy use of the data-projector. That way there's more of an impression that we are following along together instead of me pontificating from on high.

  8. Only a philosopher would think that things like a modulated tone of voice (e.g., indicating importance and not boring your students to death) is irrelevant to learning…

  9. Ah, Elizbeth, but the interesting thing about the Williams/Ceci experiment is that the media-trained modulation and gesturing didn't make any difference to what had been learnt as far as output tests could determine, only differences to student's impressions of how much had been learnt.

  10. Peter: That's one interpretation. Another is that the students picked up on something in their own learning that wasn't reflected in their exams.

    My own experiences as a student make me suspect that both are at play.

  11. The best work done on the worth of student course evaluations can be found in Valen Johnson's book, Grade Inflation: A Crisis in College Education. He was a statistician at Duke when he wrote the book, and was concerned with grading patterns in the arts vs. the sciences, as it were, and that lead to course evaluations.

    For humor, see my "How To Duck Out Of Teaching" in The Chronicle of Higher Education a few years back. Or my op ed "The Best Ever" in the Wilmington News Journal. For those, on campus, who take a serious interest in this topic, they will find no end of resistance, ignorance, and scorn awaiting them. I did. I have since retired from academia, early I might add.

    Note: you need to know statistics and be able to think critically about empirical and numerical claims if you are to speak at an adult level about this topic; that is, get past simple tables of percents and anecdotes. If you doubt this, try suggesting, to humanities faculty, that they put confidence intervals around their course evaluation ratings, for one thing. You will be met with resistance galore.

  12. I just found this interesting paper:

    Kulik, Student Ratings: Validity, Utility, and Controversy

    In the last part, he discusses the Williams and Ceci study. Some of the things pointed out there are also things that I thought to myself when I read that study. a) It's a study about one prof and one course. Not very representative. b) By the looks of it what happened is that first the students hated the course, and after Ceci recieved consultation on lecturing style, they no longer hated it. That seems like a good thing. c) Ceci had given that course for 20 years. It might just be that the course was optimal in all respects other than his lecturing style, so you wouldn't expect an increase in student grades on the exam. d) Grades on the final exam are not necessarily a reliable measure of student learning. And even after the consultation, the students didn't rate the course very highly on "test quality" (just below average; before it was "poor", and the evaluation item on which the course received the poorest grade). If it was a poorly designed exam (say, one that only requires rote memorization) you wouldn't expect a change in the exam results almost no matter what else you changed about the course. One thing pointed out in the study: it's a total outlier among similar studies, both in terms of the effect of receiving consultation on lecturing style on student evals, and (see the discussion of the Cohen meta-analysis) it is overall agreed that there is a moderate positive correlation between teaching evaluation results and exam results.

    I found the Williams & Ceci study here.

  13. Of possible interest:

    Bruce Weinberg, Masanori Hashimoto And Belton Fleisher, “Evaluating Teaching In Higher Education,” Journal of Economic Education (Summer 2009) 227-261.
    Abstract: The authors develop an original measure of learning in higher education based on grades in subsequent courses. Using this measure of learning they show that student evaluations are positively related to current grades but unrelated to learning once current grades are controlled. They offer evidence that the weak relationship between learning and student evaluations arises in part because students are unaware of how much they have learned in a course. They conclude with a discussion of easily implemented optimal methods for evaluating teaching.

    From the article:

    "This article departs from the previous work in three ways. First we use actual course grades rather than expected grades …. Second unlike most of the literature we measure grades using the average grade in a section rather than individual level grades for an individual level … Third … we are also the first to study how grades and learning are jointly related to evaluations. We use our analysis of the determinants of student evaluations to suggest improved methods for evaluating instructors."

    "The data show a strong positive relationship between student evaluations and both current grades and learning when these variables are included separately. But when these variables are included in the same model the current grade is related to student evaluations but learning is not. There are many potential explanations for these results including a variety of selection arguments. We devote considerable effort to five of them concluding that students tend to be unaware of how much they have learned in a class. There are no reasons to believe that the focus on current grades and uncertainty about learning is specific to economics or the institution studied and therefore we expect that our results are generalizable at least qualitatively."

    Earlier versions of the article are available at


    See also

    P. Isely, H. Singh, "Do higher grades lead to favorable student evaluations?" Journal of Economic Education (Winter 2005) 29-42.

    Martin Davies, Joe Hirschberg Jenny Lye, Carol Johnston and Ian McDonald, "Systematic influences on teaching evaluations: The case for caution," Australian Economic Papers v46 n1 (March 2007) 18-38.

    Robert J.Youmans, Benjamin D. Jee, "Fudging the numbers: Distributing chocolate influences student evaluations of an undergraduate course." Teaching of Psychology v34 n4 (Fall 2007) 245-247.
    Abstract: Student evaluations provide important information about teaching effectiveness. Research has shown that student evaluations can be mediated by unintended aspects of a course. In this study, we examined whether an event unrelated to a course would increase student evaluations. Six discussion sections completed course evaluations administered by an independent experimenter. The experimenter offered chocolate to 3 sections before they completed the evaluations. Overall, students offered chocolate gave more positive evaluations than students not offered chocolate. This result highlights the need to standardize evaluation procedures to control for the influence of external factors on student evaluations.

    Dennis E. Clayson, “Student Evaluations of Teaching: Are They Related to What Students Learn? A Meta-Analysis and Review of the Literature,” Journal of Marketing Education, Vol. 31, No. 1, 16-30 (2009) DOI: 10.1177/0273475308324086

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top