As an instructor, I have made it my business to enact a unique kind of assessment policy for my students. Specifically, all tests are short-essay style and revisions are allowed after a grade has been received. This ensures that students always have some motivation to figure out what they got wrong and improve on it. In other words, I design my assessment to incentivize learning. From the standpoint of some abstract perspective on the value of education, this seems like a reasonable perspective to adopt (at least to me, though I haven’t heard any of my colleagues argue with the method). It’s also, for lack of a better word, a stupid thing for me to do, from a professional perspective. What I mean here is that – on the job market – my ability to get students to learn successfully is not exactly incentivized, or at least that’s the impression that others with more insight have passed on to me. Not only are people on hiring committees not particularly interested in how much time I’m willing to devote to my students learning (it’s not the first thing they look at, or even in the top 3, I think), but the time I do invest in this method of assessment is time I’m not spending doing other things they value, like seeking out grants or trying to publish as many papers as I can in the most prestigious outlets available.
“If you’re so smart, how come you aren’t rich?”
And my method of assessment does involve quite a bit of time. When each test takes about 5-10 minutes to grade and make comments on and you’re staring down a class of about 100 students, some quick math tells you that each round of grading will take up about 8 to 16 hours. By contrast, I could instead offer my students a multiple choice test which could be graded almost automatically, cutting my time investment down to mere minutes. Over the course of a semester, then, I could devote 24 to 48 hours to helping students learn (across three tests) or I could instead provide grades for them in about 15 minutes using other methods. As far as anyone on a hiring committee will be able to tell, those two options are effectively equivalent. Sure, one helps students learn better, but being good at getting students to learn isn’t exactly incentivized on a professional level. Those 24 to 48 hours could have instead been spent seeking out grant funding or writing papers and – importantly – that’s per 100 students; if you happen to be teaching three or more classes a semester, that number goes up.
These incentives don’t just extend to tests and grading, mind you. If hiring committees aren’t all that concerned with my student’s learning outcomes, that has implications as for how much time I should spend designing my lecture material as well. Let’s say I was faced with the task of having to teach my students about information I was not terribly familiar with, be that the topic of the class as a whole or a particular novel piece of information within that otherwise-familiar topic. I could take the time-consuming route and familiarize myself with the information first, tracking down relevant primary sources, reading them in depth, assessing their strengths and weaknesses, as well as search out follow-up research on the matter. I could also take the quick route and simply read the abstract/discussion section of the paper or just report on the summary of the research provided by textbook writers or publisher’s materials.
If your goal is prep about 12-weeks worth of lecture material, it’s quite clear which method saves the most time. If having well-researched courses full of information you’re an expert on isn’t properly incentivized, then why would we expect professors to take the latter path? Pride, perhaps – many professors want to be good at their job and helpful to their students – but it seems other incentives push against devoting time to quality education if one is looking to make themselves an attractive hire*. I’ve heard teaching referred to as a distraction by more than one instructor, hinting strongly as to where they perceive incentives exist.
The implications of these concerns about incentives extend beyond any personal frustrations I might have and they’re beginning to get a larger share of the spotlight. One of the more recent events highlighting this issue was dubbed the replication crisis, where many published findings did not show up again when independent research teams sought them out. This wasn’t some appreciable minority, either; in psychology it was well over 50% of them. There’s little doubt that a healthy part of this state of affairs owes its existence to researchers purposefully using questionable methods to find publishable results, but why would they do so in the first place? Why are they so motivated to find these results. Again, pride factors into the equation but, as is usually the case, another part of that answer revolves around the incentive structure of academia: if academics are judged, hired, promoted, and funded on their ability to publish results, then they are incentivized to publish as many of those results as they can, even if the results themselves aren’t particularly trustworthy (they’re also disincentivized from trying to publish negative results, in many instances, which causes other problems).
Incentives so perverse I’m sure they’re someone’s fetish
A new paper has been making the rounds discussing these incentives in academia (Edwards & Roy, 2017), which begins with a simple premise: academic researchers are humans. Like other humans, we tend respond to particular incentives. While the incentive structures within academia might have been created with good intentions in mind, there is always a looming threat from the law of unintended consequences. In this case, those unintended consequences as referred to as Goodhart’s Law, which can be expressed as such: “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes,” or, “when a measure becomes a target, it ceases to be a good measure.” In essence, this idea means that people will follow the letter of the law, rather than the spirit.
Putting that into an academic example, a university might want to hire intelligent and insightful professors. However, assessing intelligence and insight are difficult to do, so, rather than assess those traits, the university assesses proxy measures of them; something that tends to be associated with intelligence and insight, but is not itself either of those things. In this instance, it might be noticed that intelligent, insightful professors tend to publish more papers than their peers. Because the number of papers someone publishes is much easier to measure, the university simply measures that variable instead in determining who to hire and promote. While publication records are initially good predictors of performance, once they become the target of assessment, that correlation begins to decline. As publishing papers per se became the target behavior people are assessed on, they begin to maximize that variable rather than the thing it was intended to measure in the first place. Instead of publishing fewer quality papers full of insight, they publish many papers that do a worse job of helping us understand the world.
In much the same vein, student grades on a standardized test might be a good measure of a teacher’s effectiveness; more effective teachers tend to produce students that learn more and subsequently do better on the test. However, if the poor teachers are then penalized and told to improve their performance or find a new job, the teachers might try to game the system. Now, instead of teaching their students about a subject in a holistic fashion that results in real learning, they just start teaching to the test. Rather than being taught, say, chemistry, students begin to get taught how to take a chemistry test, and the two are decidedly not the same thing. So long as teachers are only assessed on the grades of their students that take those tests, this is the incentive structure that ends up getting created.
Pictured: Not actual chemistry
Beyond just impacting the number of papers that academics might publish, a number of other potential unintended consequences of incentive structures are discussed. One of which involves measures of the quality of published work. We might expect that theoretically and empirically meaningful papers will receive more citations than weaker work. However, because the meaningfulness of a paper can’t be assessed directly, we look at proxy measures, like citation count (how often a paper is cited by other papers or authors). The consequence? People citing their own work more often and peer reviewers requesting their work be cited by people seeking to publish in the field. The number of pointless citations are inflated. There are also incentives for publishing in “good” or prestigious journals; those which are thought to preferentially publish meaningful work. Again, we can’t just assess how “good” a journal is, so we use other metrics, like how often papers from that journal are cited. The net result here is much the same, where journals would prefer to publish papers that cite papers they have previously published. Going a step further, when universities are ranked on certain metrics, they are incentivized to game those metrics or simply misreport them. Apparently a number of colleges have been caught just lying on that front to get their rankings up, while others can improve their rankings without really improving their institution.
There are many such examples we might run though (and I recommend you check out the paper itself for just that reason), but the larger point I wanted to discuss was what all this means on a broader scale. To the extent that those who are more willing to cheat the system are rewarded for their behavior, those who are less willing to cheat will be crowded out, and there we have a real problem on our hands. For perspective, Fanelli (2009) reports that 2% of scientists admit to fabricating data and 10% report engaging in less overt, but still questionable practices, on average; he also reports that when asked about if they know of a case of their peers doing such things, those numbers are around 14% and 30%, respectively. While those numbers aren’t straightforward to interpret (it’s possible that some people cheat a lot, several people know of the same cases, or that one might be willing to cheat if the opportunity presented itself even if it hasn’t yet, for instance), they should be taken very seriously as a cause for concern.
(It’s also worth noting that Edwards & Roy misreport the Fanelli findings by citing his upper-bounds as if they were the average, making the problem of academic misconduct seem as bad a possible. This is likely just a mistake, but it highlights the possibility that mistakes likely follow the incentive structure as well; not just cheating. Just as researchers have incentives to overstate their own findings, they also have incentives to overstate the findings of others to help make their points convincingly)
Which is ironic for a paper complaining about incentives to overstate results
When it’s not just the case that a handful of bad apples within academia are contributing to a problem of, say, cheating with their data, but rather an appreciable minority of them are, this has the potential to have at least two major consequences. First, it can encourage more non-cheaters to become cheaters. If I were to observe my colleagues cheating the system and getting rewarded for it, I might be encouraged to cheat myself just to keep up when faced with (very) limited opportunities for jobs or funding. Parallels can be drawn to steroid use in sports, where those who do not initially want to use steroids might be encouraged to if enough of their competitors did.
The second consequence is that, as more people take part in that kind of culture, public faith in universities – and perhaps scientific research more generally – erodes. With eroding public faith comes reduced funding and increased skepticism towards research findings; both responses are justified (why would you fund researchers you can’t trust?) and worrying, as there are important problems that research can help solve, but only if people are willing to listen.
*To be fair, it’s not that my ability as a teacher is entirely irrelevant to hiring committees; it’s that not only is this ability secondary to other concerns (i.e., my teaching ability might be looked at only after they narrow the search down by grant funding and publications), but my teaching ability itself isn’t actually assessed. What is assessed are my student evaluations and that is decidedly not the same thing.
References: Edwards, M. & Roy, S. (2017). Academic research in the 21st century: Maintaining scientific integrity in a climate of perverse incentives and hypercompetition. Environmental Engineering Science, 34, 51-61.
Fanelli, D. (2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PLoS One. 4, e5738