Algorithms and Predicting Academic Success

I’ve been reading Thinking, Fast and Slow, by Daniel Kahneman, who won the 2002 Nobel Prize in Economic Sciences for his work on the psychology of decision making.

Based on what he writes about the accuracy of clinical versus statistical predictions, I’m wondering if my university should employ an algorithm to determine which incoming students are most likely to suffer severe academic problems, and direct resources only at the students the algorithm identifies as most at risk.

Like other universities in the USA, mine is worried about student retention, academic progress, and graduation rates, and increasing amounts of staff and faculty resources are being devoted to making sure that Jeremy doesn’t fall through the institutional cracks.  The result is a combination of a blanket (every first-year student takes a course on college-related life management skills) and individual (a professor or staff member has a hunch that a particular student might not return next semester and decides to warn others) intervention strategies.

From a statistical point of view, there are serious problems to this approach. Requiring that every student take an orientation course amounts to “we don’t know which students are most likely to drop out, so all of them have to be treated.” Faculty and staff, even though they might be highly trained in advising, must decide whether to raise the alarm about a particular student in isolation. They are unaware of information contained in the overall data set, which is an extremely unreliable method of making decisions because people put more faith in their decision-making abilities than they should. According to Kahneman,

“Those who know more forecast very slightly better than those who know less. But those with the most knowledge are often less reliable. The reason is that the person who acquires more knowledge develops an enhanced illusion of her skill and becomes unrealistically overconfident . . . To maximize predictive accuracy, final decisions should be left to formulas, especially in low-validity environments” (pages 219 and 225).

Something similar to the Apgar test for newborns might be a more accurate and efficient means of predicting which students will run into academic problems — students who score above a certain threshold on the test would be targeted for intervention. The intervention could take the form of mandatory periodic meetings with an advisor, recommending that the student take courses taught by certain professors, etc.

Kahneman recommends that

  • this type of instrument measure no more than six characteristics or dimensions,
  • the dimensions should be as independent as possible from each other,
  • questions should be factual in nature (i.e., not an affective or associative test).

Obviously such a procedure would require some coordination between different units of the university to gain access to data, but the sample size would be large. There would also need to be tracking of data over time to see how predictive the algorithm is. But longitudinal tracking would enable the instrument to be refined.

Currently this kind of data-driven method of decision making is probably too radical an idea to be considered by my university. Meanwhile I’m going to think about how I can generate some kind of algorithm to use on the students that I teach, and see if I can find some non-academic dimensions that predict grades. Any suggestions are welcome.

Just what do you think you’re doing, Dave? Dave, I really think I’m entitled to an answer to that question.

5 thoughts on “Algorithms and Predicting Academic Success

    1. I was unaware that this test was being administered to incoming students. I doubt most faculty know it either. We certainly don’t have access to the scores of the students who populate our classrooms. Unless, I suppose, a student in a professor’s classroom is that professor’s advisee. However, I’ve never been provided with LASSI scores on any of the (few) students who have been assigned to me as advisees, including the ones I’ve taught in class. So maybe this applies only to students in their first year, for faculty who are teaching New Student Seminar?

      For most faculty members — those teaching/advising students beyond New Student Seminar — this seems to be an example of being forced to rely on subjective, and often incorrect, impressions about a student who might be at serious academic risk across the board. We don’t see the data.

      LASSI may be somewhat predictive of academic performance (an extremely quick glance of the literature shows some favorable reports), but you got me curious and I answered the questions on the free demo for students. According to the results, I’m extremely unmotivated, find it difficult to process information, and don’t use the resources I need to be an efficient learner. But this is an N=1 result from a truncated version of the test, so it’s not a fair critique.

      As someone who is always willing to put words in Nobel Prize winners’ mouths, I’ll write that Kahneman would probably say the following about the LASSI instrument:

      – it is too long and complex (10 scales, 80 items)
      – questions elicit self-impressions, they are not fact-based, and everyone likes to think they are above average

      I bet that a few factually-based questions asked of incoming students, equally-weighted, could be just as effective as a diagnostic tool as LASSI. For example, “How many of your immediate family members — parents and siblings — have graduated from college?” Or “How many days per week did you go to work at a job in your last year of high school?”

      A more fundamental issue is what happens once the data is collected. Are students’ scores analyzed against student success to determine the predictive validity of the test? Do students who score low on LASSI or any other instrument fail or transfer if they do not receive certain types of interventions? Which interventions are associated with student retention and persistence? Certainly some must be more effective and cost efficient than others.

      I would not be surprised that analysis of such data shows a significant retention problem among the very good students, not just the very bad ones. I also surmise that there is an association between these students, their grades, and the courses and instructors taken in the first year.

  1. Students who are determined to be at risk academically (and that might possibly mean students who are in the bottom AND top 10% of the distribution), financially, socially, and/or emotionally by whatever algorithms we employ, can be given advisers who are most likely to be useful to them and who will be encouraged and reinforced for helping their advisees overcome manage their problems as well as possible. And particular courses and professors might be matched with students who are at risk for one reason or another. Of course, the algorithms we use will be important and we probably want to over-predict rather than under-predict student problems, but given the importance of raising the retention rate, I think it important to make the investment.

Leave a Reply