
R M Media Ltd, CC BY-SA license
An important component of both statistical and information literacy is the ability to recognize the difference between correlation and causation. Teaching this skill is made even more difficult by cognitive biases that lead to errors in probabilistic thinking.* So I decided to hit my students over the head with Chapter 4 from Charles Wheelan’s Naked Statistics and, from Tyler Vigen’s Spurious Correlations website, an image of the 99.26% correlation between the divorce rate in Maine and margarine consumption.
The assignment asked students to submit a written response to this question:
Why are these two variables so highly correlated? Does divorce cause margarine consumption or does margarine consumption cause divorce? Why?
All the students who completed the assignment answered the question correctly: neither one causes the other. In class, students identified several possible intervening variables, including:
- People eat margarine and margarine-laced products as an emotional comfort food when relationships end.
- Divorce leads to a greater number of households, with each household purchasing its own tub of margarine.
Students’ ideas led in turn to a discussion of how to appropriately measure these variables and construct new hypotheses.
*An excellent overview of this topic is Jack A. Hope and Ivan W. Kelly, “Common Difficulties with Probabilistic Reasoning,” The Mathematics Teacher 76, 8 (November 1983): 565-570.
Links to all posts in this series about information literacy: