Regular readers of this blog know that I sometimes ponder the clarity of my assignment and exam prompts (some past posts on this subject are here, here, and here). Students sometimes don’t hit what, in my mind, the question targets, so I revise in the hopes of creating a prompt that is more transparent. But I don’t want prompts to be answerable with a Jeopardy-like regurgitation of facts. I want students to exert some cognitive effort to figure out how to apply concepts that are relevant to the question at hand.
Usually this situation occurs with my undergraduates, but I’m noticing it more frequently with master’s degree students. A recent example is an assignment from my graduate-level introduction to comparative politics course:
What do grades actually mean? I began pondering this question while designing a course for the fall semester. Theoretically a grade indicates the amount of knowledge or skill that a student possesses. But really? Those of us working in the USA are quite familiar with grade inflation. A final grade of C today probably doesn’t indicate the same level of knowledge or skill proficiency as the C from fifty years ago. There is also the persistent problem of knowing whether our assessment tools are measuring the types of learning that we think they are/want them to. And it is probably safe to assume that, both in and out of the classroom, there is a lot of learning happening but we just aren’t interested in trying to measure it. The situation gets even more complex given that — again, in the USA — a “learning activity” often won’t function as intended if students believe that it has no discernible effect on their course grades.
I structure my syllabi so that the sum total of points available from all assessed work is greater than what it needed for any particular final grade. For example, a student might need to accumulate at least 950 points over the semester for an A, but there could be 1,040 points available. I do this to deliberately create wiggle room for students — with so many assignments, students don’t need to get perfect scores on, or complete, all of them. While this leads to higher grades in my courses than if I graded strictly on a bell curve, I want to give students plenty of opportunities to practice, fail, and improve. And I firmly believe that sloppy writing indicates slopping thinking, while good writing indicates the converse. So in reality what I’m doing with most of my assignments is evaluating the writing abilities of my students.
This system often produces a bimodal grade distribution that is skewed to the right. Expend a lot of effort and demonstrate a certain level of proficiency, and you will get a grade somewhere between an A and a B-. Choose not to expend the effort, or consistently demonstrate an inability to perform at a minimum level, and you will get a D or an F. I’m comfortable with this result, in part because I know from the cognitive science research on learning that repeated exposure and frequent testing builds long term memory.
This leads me to the reason for doubting that grades my courses mean the same thing as they do in courses where the only assessment is done through mid-term and final exams composed of multiple-choice questions. Yes, the proportion of A’s in the latter might be lower than in the former, but I bet on average my students are retaining more. At least I like to think that’s the case. There is no way for me to be sure.
The final exam for this course last year asked each student to write an economic rationale in support of one of two policy options, using information from course readings as evidence. Generally students students did not do well on the exam, mainly because they did not discuss applicable concepts like moral hazard and discounting the future. These concepts were found in several course readings and discussed in class. While I didn’t explicitly mention these concepts in the exam prompt, the benefits of including them in the rationale should have been obvious given course content.
Now I’m thinking of a question like this for the final exam:
What has a greater influence on economic development in Egypt: law (institutions) or geography (luck)? Why?
In your answer, reference the items below and relevant course readings listed in the syllabus:
Now that I’m done with hours upon hours of post-semester meetings and painting my house’s front door, I can comment on Simon’s recent post about open-book exams.
One’s choice of exam format reflects two questions that are often in conflict. Will the exam give students a reasonable chance of demonstrating whether they have acquired the knowledge that they were supposed to have acquired? Can the instructor accurately, impartially, and practically assess the exam results? For example . . .
Oral exam: great for testing exegetical ability on the fly, but extremely tiresome and unfeasible if the instructor is teaching more than a couple dozen students in a course.
Multiple choice questions: very easy for the instructor to grade, minimizes student complaints, but encourages binge and purge memorization.
The timed essay exam, whether open- or closed-book: also tiresome to grade, often susceptible to instructor bias, and, perhaps most importantly, reinforces the unproductive notion that writing (and thus thinking) does not need to be a careful, deliberative process.
How does all this affect me? Over the years I have moved away from formal exams and toward a final culminating assignment — such as a take-home essay question that I reveal in the last week of the semester — intended to test how well students are able to apply concepts to an unfamiliar situation. But lately I’ve become disenchanted with this format, too.
Simon’s post prompted me to think back to my own days as a student. Exams in physics, mathematics, and engineering consisted of, essentially, solving a variety of puzzles — full marks required both supplying the correct solution and documenting how one arrived at it. The primary ability being tested was concept application. One prepared for these exams by working on practice puzzles involving the same concepts. Courses in political science, history, and whatnot had timed essay exams. To prepare for these, I would guess at likely questions and create outlines of essays that answered these questions. I would repeatedly hand-write an outline to memorize it, then turn it into prose during the exam. Even if my guesses weren’t entirely accurate, they were often close enough for the outlines to be very useful.
I’m now wondering if there is a way to turn the outline creation process into the equivalent of an exam. Something that gets graded, but not as part of a scaffolded research paper assignment.
We’re just about getting to the end of semester’s block of teaching weeks, so my attention is turning to final assessment once again.
With my first-years I’ve inherited a module on the EU that used to be mine some time ago and for which I’ve stuck to the assessment regime through curiosity as much as anything else.
As I’ve discussed elsewhere here, we’re piloting our new computer-based assessment system on the module, so I was keen to see how that changed things. Much of my attention in that regard has been to do with the coursework, but we’re also doing the final exam on it too.
It turns out that this is an excellent opportunity for me to get into open-book exams.
My student memory of these are watching law students carting in a dozen or more lever-arch files (ask your parents) into an exam hall, usually with at least one person have the entire thing spill out across the corridor outside or (on one tremendous occasion) across a busy street and towards a near-by canal.
Happy days. But not very enticing.
But because so much of the work has moved online, not least the exam itself, this seems like a good moment to visit the format.
For those who’ve not encountered it before, an open-book exam is simply one where you can bring and use any materials you like during the exam period. The idea is that it’s much more like a situation you might encounter in real-life than sitting in a bare room, answering questions you’ve hopefully prepared for, but using only what you can haul from the back of your mind.
The reason it’s not been so popular has been a mix of the aforementioned mess, the fear that students will just copy out other peoples’ work and the vague air that it’s not ‘right’.
Of course, I’m a big believer in changing what you when situations change, so why not try an open-book format?
It’s helped by the system being able still to detect plagiarism (final submissions are run through the usual software), plus it can note when a student suddenly dumps several hundred words at once.
Moreover, giving an open-book exam removes any feeling of accommodation to students about factual errors: my lovely mnemonics will be left at one side should I meet anyone who tries to tell me about the Council of Europe in leading the EU.
Of course, an open-book exam – while superficially attractive to students – is a big bear-trap. The temptation to ‘go check something’ will be very high, taking time away from actually writing an answer to the question asked. As those law students used to discover (when we talked to them on our way to the bar), it’s one thing to have access to lots of information, but quite another if you don’t know how to find the right information.
So, we’ll see. My impression so far has been that a lot of my students haven’t really clocked the different issues involved. If nothing else, if they’re relying on my flipped lectures as much as I think they are, then they’ll discover rather quickly that those are in possibly the least-helpful format for an exam.
I’m still enough of a kid to be excited to see the place I work at mentioned in the news, especially if it’s in an outlet my mum might see.
Of course, it’d be better if the context of this particular mention were different, but I guess you can’t have it all.
This all comes off the back of the on-going debate in government about grade inflation.
I wrote about all this last summer, and I’m not sure I’ve gotten much further in my thinking about this, except to note the shift in framing to combating ‘artificial’ grade inflation.
While this might seem to start to take account of the other factors at play, what it singularly doesn’t do is set out a means of calculating this in practice.
Obviously, there are changes in student characteristics that have a direct bearing and these are relatively simple to capture: socio-economic status; entry grades; progressive performance in each year of study.
However, there are also obviously changes in the teaching environment: staffing changes; changes in pedagogic approach; changing curricula (we’ve made our final year dissertation optional thus year, for example); changing provision of learning resources outside the degree programme, at the library or in welfare; changes in programme regulations.
Nikita Minin of Masaryk University is motivated by a goal we can all appreciate: ensuring that his students achieve the learning outcomes of his course. In his case, the course is a graduate seminar on theories of IR and energy security and the learning outcomes include improving student skills in critical thinking and writing. He noticed that students in his class did not seem to really improve on these skills during the class, and introduced three teaching interventions in an attempt to fix this.
First, Minin provided more intense instruction on the writing assignments at the start of the course, providing a grading rubric and examples of successful student work. Second, he gave students audio rather than written feedback on their papers. Finally, using a sequential assessment system, the instructor gave formative feedback first and grades much later in the course. Minin assessed the impact of these three interventions, comparing course sections with and without them, and concluded that the first two interventions achieved the objective of improving student achievement of the learning outcomes.
The interventions described in the chapter are in line with current thinking regarding in-course assessment. While Minin does not use the language of transparent teaching, his first intervention falls exactly in line with the Transparency in Teaching and Learning Project’s (TILT)approach. Transparency calls on instructors to openly communicate about the purpose of an assignment, the tasks they are to complete, and the criteria for success, and Minin does exactly that in this first intervention. Given the data so far on the TILT project, it is not surprising that Minin saw some success by taking this approach. Likewise, now-ubiquitous learning management systems allow for giving feedback in multiple platforms, including audio and video. For years now, advocates for audio-based feedback claim that this can be a more effective tool than written feedback. Minin’s observations therefore, also fit nicely in line with existing work.
Where the chapter falls short, then, is not in the design of its interventions, but in the claims made based on the available data. The sample sizes are tiny, with just five students receiving the interventions. With final grades used as the primary dependent variable, it is difficult to tease out the independent impact of each of the three changes. Using final grades is also an issue when the experimenter is also the person who assigns grades, as it is more difficult to avoid bias than when more objective or blind items are used. Lang’s (2016) bookSmall Teaching: Everyday Lessons from the Science of Learningtells us that engaging in self-reflection is itself an intervention, and Minin’s use of minute-paper style self-reflections to assess the impact of feedback, while itself an interesting and potentially useful idea, mean that a fourth intervention was used in the course. While I do not doubt Minin’s observations that his interventions had a positive impact, as they are backed by existing research, the evidence in the chapter does not strongly advance our confidence in those findings.
However, I have never been one to dismiss good teaching ideas simply because of a lack of strong evidence from a particular instructor. Minin highlights a crucial concern—that we should never assume that our courses are teaching what we intend them to teach, and that ‘time and effort’ do not necessarily achieve the desired results, even for graduate students. Reflecting on this, seeking out innovative solutions, and then assessing the impact is a process we should all be following, and Minin sets a great example.
My first-year module this semester has been a real training ground for me. Not only am I going all-in on flipping, but I’m also trialing the new assessment software that the University is thinking of using.
By extension, that also means it’s a training ground for my students, something that I’ve been very open about with them.
The flipping seems to be working and I’ll be writing up my thoughts on that later in the semester, but having coming through the first use of the software I need to make some decisions now.
In part, my situation arises from wanting to push how we used the software past a conventional approach. Not only did students submit a literature review to it, but they then had to review someone else’s using the system, all in aid of a final piece of self-reflection (which we’re marking now).
Using the marking function is a bit more involved than just submitting work and a couple of people did get a bit lost on that. But the bigger problem was that not everyone submitted work.
In the good old days (i.e. last year and before) we did all this in-class, so it was much simpler to cover (the exceptionally few) missing pieces. However, because we’d pre-selected peer reviewers, we ended up with some students having nothing to review and others not getting their work reviewed.
That’s a failing on my part: next time, I’d leave allocation until after the first submission was in, so everyone who submitted got allocated and reviewed.
But that’s next time. What about now?
Already, I’ve indicated to everyone that not getting peer feedback won’t count against them in marking, but a couple of students have felt that absent such comments they’re not in a position to complete the self-reflection.
To that, I’ve had to underline that it’s self-reflection, so peer feedback was only ever one component of that: indeed, the whole purpose of the somewhat-convoluted exercise is to get students becoming more independent and critical about their learning.
All that said, peer review was added in here to help prompt everyone to think more about what they’ve done and what they could do.
As we sit down to mark, the question will be much we can, and should, take the circumstances into account. Until we’ve seen the full range of work, that’s going to be a tricky call to make.
However, it all highlights an important point in such situations: do we have fall-backs?
Trying new things is inherently risky – that’s why many colleagues stick with what they know – but with some risk management, that need not be a barrier to moving practice forward.
Annoying through our situation here is, it’s not fatally-compromising to the endeavour: we know who’s affected and how; they’re still able to submit work; and the assessment is relatively small in the overall scheme of things.
Yes, we’ll be using the system again for the final exam, but without the aspects that have proved problematic. Indeed, the exam has already been trialled elsewhere in the University, so that’s well-understood.
So, on balance, I feel comfortable that we can manage the situation and implement the necessary changes next time around to remove the problems identified.
Which is, of course, a big part of the reason for trying it out in the first place.
For reasons best known to others, it’s the end of our first
semester here, so that means coursework grades are going back to students.
I was even more interested than usual in this event this
time around because something unusual happened with my class: they came to talk
with me about their assessment.
I know that might seem mundane, but despite my best efforts
my office hours have often resembled one of the remoter oases in a desert:
potentially of use, but rarely visited by anyone.
I’d love to tell you what was different this semester, but I
genuinely have no idea: I did the things I usually did, so maybe it was a
cohort effect. Or not.
In any case, I reckon I sat down for discussions with most
of the students and emailed with several others. In those exchanges we
typically covered both generic guidance on what was required and specific
discussion on students’ plans.
Of course, the big question is whether that helped the
students to do better.
At this point, I’ll note that my class had about 35 students
and it’s a one-off event so far, so I’m alive to not over-reading the outcomes.
Against that, the marking has been confirmed by the second marker.
That said, the main positive outcome was that the bottom
half of the class moved up quite markedly. In previous years, I’ve always had a
cluster of students who simply didn’t ‘get’ the assessment – a reflective essay
– and thus came out with poor marks. This time, I had only a couple of students
in that situation, and they appeared (from my records) to have not attended
most of the classes, and hadn’t come to talk.
Put differently, the tail was severely trimmed and the large
bulk of students secured a decent grade.
What didn’t appear to happen was an overall shift upwards
though: the top end remaining where it had been previously.
Again, I’m not sure why this might be. Without another
cohort I’m not even sure if my guidance actually did anything for anything.
Quite aside from the specific instance, it does underline for
me how little me know about the ways in which our teaching practice does and
doesn’t impact on student learning.
In this case, I don’t really know how one could ethically
test the impact of formative feedback and support, given the multiple variables
at play. If you have an idea, I’d love to hear it.
One last post about teaching my redesigned course on development last semester:
Is the ability to follow directions what distinguishes the excellent from the average student?
Writing assignments in my courses require students to synthesize information from a variety of source material into a single, cohesive argument. Exams are no different. My instructions for the final exam included “refer to relevant course readings” and “see the rubric below for guidance on how your work will be evaluated.” The rubric contained the criterion “use of a variety of relevant course readings.”
I assumed that these statements would translate in students’ minds as “my exam grade will suffer tremendously if I don’t reference any of the course readings.” Yet nine of the fifteen students who took the exam did not use any readings, despite having written about them earlier in the semester. Four others only referred to a single reading. Only two students incorporated information from several different readings.
Maybe I’m wrong, but I don’t think I’m at fault here.