Chat GPT and Specifications Grading

Unsophisticated use of Chat GPT tends to produce generically poor essays, with repetitive structure, lack of analysis, and pretty stilted prose. Whether its identifiable as AI or not, the reality is that an essay written that way is likely to get a poor grade. When you receive a poorly written essay in which you suspect AI use, there are two typical paths:

  1. Pursue it as a case of suspected misconduct. You might run it through a detector to check for AI use, or ask the student to submit evidence of the work as it progressed through revisions.  Detectors are notorious for producing false positives, though, and students who were acting in good faith (but just have poor writing skills) will be caught up in this.
  2. Ignore the suspected use and just grade it accordingly. The essay is likely to get a C, as Devon Cantwell-Chaves pointed out in a recent tweet, so how much energy do you want to spend on trying to catch users out, when the results are poor? 
Devon Cantwell-Chavez tweets on February 13, 2024 about her approach to grading assignments where Chat GPT use is suspected.

To this I wish to add a third path: use specifications grading. 

Specifications grading, first developed by Linda Nilson in her 2015 book, is an alternative grading system where letter grades are awarded not based on percentage of correct responses but on achieving bundles of learning outcomes. Assignments are written as evaluations of achieving one or more outcomes, and graded pass/fail, with a ‘pass’ representing the equivalent of a B—not D level work, but relatively high quality. In the task description, the instructor writes clear specifications of what students must do to achieve that passing grade. This makes grading itself easier—as you are checking off whether they met each specifications—and ensures that students are judged based on whether or not they achieve the desired outcome. An A in the class, then, would indicate achievement of all learning outcomes, or achievement of a certain level of depth within each learning outcome.

It occurred to me after reading Devon’s tweet that specifications grading as a system has a built-in way of handling suspicion of Chat GPT use. It has the benefits of Option 2, but results in a failing grade, rather than a C. If you set the specifications for an essay assignment to be fairly high (B level), with the writing quality and argumentation being a factor, then most essays relying on generative AI will fail to pass. This is a pretty efficient way to deal with suspected AI use, as it avoids the misconduct process, but still ensures the poor quality of the work is recognized. Its essentially naming the problematic outcome, without worrying too much about how the student came to reach that outcome. It also teaches students that while they can get something to turn in by using generative AI, it won’t be high enough quality, and therefore not worth their time.

Of course, if there are multiple reasons to suspect misconduct, there is every ethical reason to consider pursuing an academic misconduct case. But a suspected use of AI based on poor writing quality, or a positive detector report, in my view are insufficient to pursue such a case. Adopting the specs grading approach is one way to prevent having to make that call.

For more on specifications grading, see Chad’s posts on the subject.

For more on Chat GPT responses, see Simon’s original post from January 2023, as well as the many other posts we’ve made on the topic.