Just What Is Your Best Exam Format?

Now that I’m done with hours upon hours of post-semester meetings and painting my house’s front door, I can comment on Simon’s recent post about open-book exams.

Abandon all hope, ye who enter here.

One’s choice of exam format reflects two questions that are often in conflict. Will the exam give students a reasonable chance of demonstrating whether they have acquired the knowledge that they were supposed to have acquired? Can the instructor accurately, impartially, and practically assess the exam results? For example . . .

  • Oral exam: great for testing exegetical ability on the fly, but extremely tiresome and unfeasible if the instructor is teaching more than a couple dozen students in a course.
  • Multiple choice questions: very easy for the instructor to grade, minimizes student complaints, but encourages binge and purge memorization.
  • The timed essay exam, whether open- or closed-book: also tiresome to grade, often susceptible to instructor bias, and, perhaps most importantly, reinforces the unproductive notion that writing (and thus thinking) does not need to be a careful, deliberative process.

How does all this affect me? Over the years I have moved away from formal exams and toward a final culminating assignment — such as a take-home essay question that I reveal in the last week of the semester — intended to test how well students are able to apply concepts to an unfamiliar situation. But lately I’ve become disenchanted with this format, too.

Simon’s post prompted me to think back to my own days as a student. Exams in physics, mathematics, and engineering consisted of, essentially, solving a variety of puzzles — full marks required both supplying the correct solution and documenting how one arrived at it. The primary ability being tested was concept application. One prepared for these exams by working on practice puzzles involving the same concepts. Courses in political science, history, and whatnot had timed essay exams. To prepare for these, I would guess at likely questions and create outlines of essays that answered these questions. I would repeatedly hand-write an outline to memorize it, then turn it into prose during the exam. Even if my guesses weren’t entirely accurate, they were often close enough for the outlines to be very useful.

I’m now wondering if there is a way to turn the outline creation process into the equivalent of an exam. Something that gets graded, but not as part of a scaffolded research paper assignment.

Opening the book on exams

We’re just about getting to the end of semester’s block of teaching weeks, so my attention is turning to final assessment once again.

Let’s take it back, let’s take it back, let’s take it back to the Law School…

With my first-years I’ve inherited a module on the EU that used to be mine some time ago and for which I’ve stuck to the assessment regime through curiosity as much as anything else.

As I’ve discussed elsewhere here, we’re piloting our new computer-based assessment system on the module, so I was keen to see how that changed things. Much of my attention in that regard has been to do with the coursework, but we’re also doing the final exam on it too.

It turns out that this is an excellent opportunity for me to get into open-book exams.

My student memory of these are watching law students carting in a dozen or more lever-arch files (ask your parents) into an exam hall, usually with at least one person have the entire thing spill out across the corridor outside or (on one tremendous occasion) across a busy street and towards a near-by canal.

Happy days. But not very enticing.

But because so much of the work has moved online, not least the exam itself, this seems like a good moment to visit the format.

For those who’ve not encountered it before, an open-book exam is simply one where you can bring and use any materials you like during the exam period. The idea is that it’s much more like a situation you might encounter in real-life than sitting in a bare room, answering questions you’ve hopefully prepared for, but using only what you can haul from the back of your mind.

The reason it’s not been so popular has been a mix of the aforementioned mess, the fear that students will just copy out other peoples’ work and the vague air that it’s not ‘right’.

Of course, I’m a big believer in changing what you when situations change, so why not try an open-book format?

It’s helped by the system being able still to detect plagiarism (final submissions are run through the usual software), plus it can note when a student suddenly dumps several hundred words at once.

Moreover, giving an open-book exam removes any feeling of accommodation to students about factual errors: my lovely mnemonics will be left at one side should I meet anyone who tries to tell me about the Council of Europe in leading the EU.

Of course, an open-book exam – while superficially attractive to students – is a big bear-trap. The temptation to ‘go check something’ will be very high, taking time away from actually writing an answer to the question asked. As those law students used to discover (when we talked to them on our way to the bar), it’s one thing to have access to lots of information, but quite another if you don’t know how to find the right information.

So, we’ll see. My impression so far has been that a lot of my students haven’t really clocked the different issues involved. If nothing else, if they’re relying on my flipped lectures as much as I think they are, then they’ll discover rather quickly that those are in possibly the least-helpful format for an exam.

Let’s hope those lecture notes are in good order.

It’s not me, it’s you: framing grade-inflation

I’m still enough of a kid to be excited to see the place I work at mentioned in the news, especially if it’s in an outlet my mum might see.

Of course, it’d be better if the context of this particular mention were different, but I guess you can’t have it all.

This all comes off the back of the on-going debate in government about grade inflation.

I wrote about all this last summer, and I’m not sure I’ve gotten much further in my thinking about this, except to note the shift in framing to combating ‘artificial’ grade inflation.

While this might seem to start to take account of the other factors at play, what it singularly doesn’t do is set out a means of calculating this in practice.

Obviously, there are changes in student characteristics that have a direct bearing and these are relatively simple to capture: socio-economic status; entry grades; progressive performance in each year of study.

However, there are also obviously changes in the teaching environment: staffing changes; changes in pedagogic approach; changing curricula (we’ve made our final year dissertation optional thus year, for example); changing provision of learning resources outside the degree programme, at the library or in welfare; changes in programme regulations.

Continue reading

Audio Feedback and Transparency as Teaching Interventions

This is a review of “Enhancing formative assessment as the way of boosting students’ performance and achieving learning outcomes.” Chapter 8 of Early Career Academics’ Reflections on Learning to Teach in Central Europe, by Nikita Minin, Masaryk University.

Nikita Minin of Masaryk University is motivated by a goal we can all appreciate: ensuring that his students achieve the learning outcomes of his course.  In his case, the course is a graduate seminar on theories of IR and energy security and the learning outcomes include improving student skills in critical thinking and writing.  He noticed that students in his class did not seem to really improve on these skills during the class, and introduced three teaching interventions in an attempt to fix this. 

First, Minin provided more intense instruction on the writing assignments at the start of the course, providing a grading rubric and examples of successful student work. Second, he gave students audio rather than written feedback on their papers.  Finally, using a sequential assessment system, the instructor gave formative feedback first and grades much later in the course. Minin assessed the impact of these three interventions, comparing course sections with and without them, and concluded that the first two interventions achieved the objective of improving student achievement of the learning outcomes.

The interventions described in the chapter are in line with current thinking regarding in-course assessment. While Minin does not use the language of transparent teaching, his first intervention falls exactly in line with the Transparency in Teaching and Learning Project’s (TILT)approach. Transparency calls on instructors to openly communicate about the purpose of an assignment, the tasks they are to complete, and the criteria for success, and Minin does exactly that in this first intervention.  Given the data so far on the TILT project, it is not surprising that Minin saw some success by taking this approach. Likewise, now-ubiquitous learning management systems allow for giving feedback in multiple platforms, including audio and video. For years now, advocates for audio-based feedback claim that this can be a more effective tool than written feedback. Minin’s observations therefore, also fit nicely in line with existing work.

Where the chapter falls short, then, is not in the design of its interventions, but in the claims made based on the available data. The sample sizes are tiny, with just five students receiving the interventions. With final grades used as the primary dependent variable, it is difficult to tease out the independent impact of each of the three changes. Using final grades is also an issue when the experimenter is also the person who assigns grades, as it is more difficult to avoid bias than when more objective or blind items are used. Lang’s (2016) bookSmall Teaching: Everyday Lessons from the Science of Learningtells us that engaging in self-reflection is itself an intervention, and Minin’s use of minute-paper style self-reflections to assess the impact of feedback, while itself an interesting and potentially useful idea, mean that a fourth intervention was used in the course.  While I do not doubt Minin’s observations that his interventions had a positive impact, as they are backed by existing research, the evidence in the chapter does not strongly advance our confidence in those findings.

However, I have never been one to dismiss good teaching ideas simply because of a lack of strong evidence from a particular instructor.  Minin highlights a crucial concern—that we should never assume that our courses are teaching what we intend them to teach, and that ‘time and effort’ do not necessarily achieve the desired results, even for graduate students. Reflecting on this, seeking out innovative solutions, and then assessing the impact is a process we should all be following, and Minin sets a great example.

Do Guinea Pigs need slack?

My first-year module this semester has been a real training ground for me. Not only am I going all-in on flipping, but I’m also trialing the new assessment software that the University is thinking of using.

Something like this

By extension, that also means it’s a training ground for my students, something that I’ve been very open about with them.

The flipping seems to be working and I’ll be writing up my thoughts on that later in the semester, but having coming through the first use of the software I need to make some decisions now.

In part, my situation arises from wanting to push how we used the software past a conventional approach. Not only did students submit a literature review to it, but they then had to review someone else’s using the system, all in aid of a final piece of self-reflection (which we’re marking now).

Using the marking function is a bit more involved than just submitting work and a couple of people did get a bit lost on that. But the bigger problem was that not everyone submitted work.

In the good old days (i.e. last year and before) we did all this in-class, so it was much simpler to cover (the exceptionally few) missing pieces. However, because we’d pre-selected peer reviewers, we ended up with some students having nothing to review and others not getting their work reviewed.

That’s a failing on my part: next time, I’d leave allocation until after the first submission was in, so everyone who submitted got allocated and reviewed.

But that’s next time. What about now?

Already, I’ve indicated to everyone that not getting peer feedback won’t count against them in marking, but a couple of students have felt that absent such comments they’re not in a position to complete the self-reflection.

To that, I’ve had to underline that it’s self-reflection, so peer feedback was only ever one component of that: indeed, the whole purpose of the somewhat-convoluted exercise is to get students becoming more independent and critical about their learning.

All that said, peer review was added in here to help prompt everyone to think more about what they’ve done and what they could do.

As we sit down to mark, the question will be much we can, and should, take the circumstances into account. Until we’ve seen the full range of work, that’s going to be a tricky call to make.

However, it all highlights an important point in such situations: do we have fall-backs?

Trying new things is inherently risky – that’s why many colleagues stick with what they know – but with some risk management, that need not be a barrier to moving practice forward.

Annoying through our situation here is, it’s not fatally-compromising to the endeavour: we know who’s affected and how; they’re still able to submit work; and the assessment is relatively small in the overall scheme of things.

Yes, we’ll be using the system again for the final exam, but without the aspects that have proved problematic. Indeed, the exam has already been trialled elsewhere in the University, so that’s well-understood.

So, on balance, I feel comfortable that we can manage the situation and implement the necessary changes next time around to remove the problems identified.

Which is, of course, a big part of the reason for trying it out in the first place.

From formative feedback to assessment outcomes

For reasons best known to others, it’s the end of our first semester here, so that means coursework grades are going back to students.

I was even more interested than usual in this event this time around because something unusual happened with my class: they came to talk with me about their assessment.

I know that might seem mundane, but despite my best efforts my office hours have often resembled one of the remoter oases in a desert: potentially of use, but rarely visited by anyone.

I’d love to tell you what was different this semester, but I genuinely have no idea: I did the things I usually did, so maybe it was a cohort effect. Or not.

In any case, I reckon I sat down for discussions with most of the students and emailed with several others. In those exchanges we typically covered both generic guidance on what was required and specific discussion on students’ plans.

Of course, the big question is whether that helped the students to do better.

At this point, I’ll note that my class had about 35 students and it’s a one-off event so far, so I’m alive to not over-reading the outcomes. Against that, the marking has been confirmed by the second marker.

That said, the main positive outcome was that the bottom half of the class moved up quite markedly. In previous years, I’ve always had a cluster of students who simply didn’t ‘get’ the assessment – a reflective essay – and thus came out with poor marks. This time, I had only a couple of students in that situation, and they appeared (from my records) to have not attended most of the classes, and hadn’t come to talk.

Put differently, the tail was severely trimmed and the large bulk of students secured a decent grade.

What didn’t appear to happen was an overall shift upwards though: the top end remaining where it had been previously.

Again, I’m not sure why this might be. Without another cohort I’m not even sure if my guidance actually did anything for anything.

Quite aside from the specific instance, it does underline for me how little me know about the ways in which our teaching practice does and doesn’t impact on student learning.

In this case, I don’t really know how one could ethically test the impact of formative feedback and support, given the multiple variables at play. If you have an idea, I’d love to hear it.

The Difference Between Good and Bad?

One last post about teaching my redesigned course on development last semester:

Is the ability to follow directions what distinguishes the excellent from the average student?

Writing assignments in my courses require students to synthesize information from a variety of source material into a single, cohesive argument. Exams are no different. My instructions for the final exam included “refer to relevant course readings” and “see the rubric below for guidance on how your work will be evaluated.” The rubric contained the criterion “use of a variety of relevant course readings.”

I assumed that these statements would translate in students’ minds as “my exam grade will suffer tremendously if I don’t reference any of the course readings.” Yet nine of the fifteen students who took the exam did not use any readings, despite having written about them earlier in the semester. Four others only referred to a single reading. Only two students incorporated information from several different readings.  

Maybe I’m wrong, but I don’t think I’m at fault here.



To Quiz or Not to Quiz, Part 3

Some final thoughts on adding in-class quizzes to my course on economic development:

For six of the nine quizzes administered so far, students answered only half of the questions correctly. Given the results of my survey on students’ study habits, I am increasingly convinced that the problem of transfer is contributing to their poor performance. Perhaps I should create a series of real world-based practice exercises for next year’s iteration of this course. These exercises could be an additional connection to the reading assignments.

Even though each quiz has a maximum of four questions, the quiz-taking eats up a significant amount of classroom time. Perhaps I should impose a time limit. If I put the quizzes online for completion outside of class, students will be able to search for correct answers, which defeats my purpose of testing recall to strengthen memory.

The quizzes have helped me identify what students still don’t know. Reviewing questions in class after grading each quiz might have helped students better understand the concepts that they had been tested on. But the final exam that I created for the course (Part 8 below) will allow me to only indirectly infer whether this occurred. Maybe next year I should repeat some of the same questions across multiple quizzes, or introduce summative exams, to get a better idea of whether students are in fact learning what they are being quizzed about.

Links to the original series on redesigning this course:

(Trans)formative Assessment in Teaching

Today I’m attending ISA’s inaugural Innovative Pedagogy Conference in St. Louis.  Victor and I are doing a workshop on using games and simulations to teach political violence, showcasing activities like Survive or Die!, Prisoner’s DilemmaModel Diplomacy, identity exercise, and others.  But I’m most interested in reflecting on the session offered by Matthew Krain and Kent Kille of the College of Wooster on Assessment and Effectiveness in Active Teaching in International Studies. Their framework for formative assessment (that can, in fact, be transformative) is very helpful as an overall approach to teaching.

Continue reading

The price of failure

via GIPHY

After last week’s class discussion about participation, I decided to run an exercise that made it really easy to show the marginal benefit of preparation.

I told students to prepare for a meeting about putting together an agenda for another negotiation, and gave them all specific roles, plus some rules of procedure.

(For those who are looking for Brexit sims, this was a Council working group, putting together an agenda for the Commission to take to the UK to discuss the Political Declaration).

Because it was about formulating an agenda, I hoped that students would see they didn’t need to get too deeply into substantive positions, as long as they could frame the general areas to be covered.

Plus, but giving clear roles and rules, I incentivised everyone to push out their own draft agendas prior to the meeting. In so doing, I hoped they’d see that even a small amount of preparation could have big effects.

Um

Obviously, it didn’t turn out that way.

Continue reading