Academic Perversion

As an instructor, I have made it my business to enact a unique kind of assessment policy for my students. Specifically, all tests are short-essay style and revisions are allowed after a grade has been received. This ensures that students always have some motivation to figure out what they got wrong and improve on it. In other words, I design my assessment to incentivize learning. From the standpoint of some abstract perspective on the value of education, this seems like a reasonable perspective to adopt (at least to me, though I haven’t heard any of my colleagues argue with the method). It’s also, for lack of a better word, a stupid thing for me to do, from a professional perspective. What I mean here is that – on the job market – my ability to get students to learn successfully is not exactly incentivized, or at least that’s the impression that others with more insight have passed on to me. Not only are people on hiring committees not particularly interested in how much time I’m willing to devote to my students learning (it’s not the first thing they look at, or even in the top 3, I think), but the time I do invest in this method of assessment is time I’m not spending doing other things they value, like seeking out grants or trying to publish as many papers as I can in the most prestigious outlets available.

“If you’re so smart, how come you aren’t rich?”

And my method of assessment does involve quite a bit of time. When each test takes about 5-10 minutes to grade and make comments on and you’re staring down a class of about 100 students, some quick math tells you that each round of grading will take up about 8 to 16 hours. By contrast, I could instead offer my students a multiple choice test which could be graded almost automatically, cutting my time investment down to mere minutes. Over the course of a semester, then, I could devote 24 to 48 hours to helping students learn (across three tests) or I could instead provide grades for them in about 15 minutes using other methods. As far as anyone on a hiring committee will be able to tell, those two options are effectively equivalent. Sure, one helps students learn better, but being good at getting students to learn isn’t exactly incentivized on a professional level. Those 24 to 48 hours could have instead been spent seeking out grant funding or writing papers and – importantly – that’s per 100 students; if you happen to be teaching three or more classes a semester, that number goes up.

These incentives don’t just extend to tests and grading, mind you. If hiring committees aren’t all that concerned with my student’s learning outcomes, that has implications as for how much time I should spend designing my lecture material as well. Let’s say I was faced with the task of having to teach my students about information I was not terribly familiar with, be that the topic of the class as a whole or a particular novel piece of information within that otherwise-familiar topic. I could take the time-consuming route and familiarize myself with the information first, tracking down relevant primary sources, reading them in depth, assessing their strengths and weaknesses, as well as search out follow-up research on the matter. I could also take the quick route and simply read the abstract/discussion section of the paper or just report on the summary of the research provided by textbook writers or publisher’s materials.

If your goal is prep about 12-weeks worth of lecture material, it’s quite clear which method saves the most time. If having well-researched courses full of information you’re an expert on isn’t properly incentivized, then why would we expect professors to take the latter path? Pride, perhaps – many professors want to be good at their job and helpful to their students – but it seems other incentives push against devoting time to quality education if one is looking to make themselves an attractive hire*. I’ve heard teaching referred to as a distraction by more than one instructor, hinting strongly as to where they perceive incentives exist.

The implications of these concerns about incentives extend beyond any personal frustrations I might have and they’re beginning to get a larger share of the spotlight. One of the more recent events highlighting this issue was dubbed the replication crisis, where many published findings did not show up again when independent research teams sought them out. This wasn’t some appreciable minority, either; in psychology it was well over 50% of them. There’s little doubt that a healthy part of this state of affairs owes its existence to researchers purposefully using questionable methods to find publishable results, but why would they do so in the first place? Why are they so motivated to find these results. Again, pride factors into the equation but, as is usually the case, another part of that answer revolves around the incentive structure of academia: if academics are judged, hired, promoted, and funded on their ability to publish results, then they are incentivized to publish as many of those results as they can, even if the results themselves aren’t particularly trustworthy (they’re also disincentivized from trying to publish negative results, in many instances, which causes other problems).

Incentives so perverse I’m sure they’re someone’s fetish

A new paper has been making the rounds discussing these incentives in academia (Edwards & Roy, 2017), which begins with a simple premise: academic researchers are humans. Like other humans, we tend respond to particular incentives. While the incentive structures within academia might have been created with good intentions in mind, there is always a looming threat from the law of unintended consequences. In this case, those unintended consequences as referred to as Goodhart’s Law, which can be expressed as such: “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes,” or, “when a measure becomes a target, it ceases to be a good measure.” In essence, this idea means that people will follow the letter of the law, rather than the spirit.

Putting that into an academic example, a university might want to hire intelligent and insightful professors. However, assessing intelligence and insight are difficult to do, so, rather than assess those traits, the university assesses proxy measures of them; something that tends to be associated with intelligence and insight, but is not itself either of those things. In this instance, it might be noticed that intelligent, insightful professors tend to publish more papers than their peers. Because the number of papers someone publishes is much easier to measure, the university simply measures that variable instead in determining who to hire and promote. While publication records are initially good predictors of performance, once they become the target of assessment, that correlation begins to decline. As publishing papers per se became the target behavior people are assessed on, they begin to maximize that variable rather than the thing it was intended to measure in the first place. Instead of publishing fewer quality papers full of insight, they publish many papers that do a worse job of helping us understand the world. 

In much the same vein, student grades on a standardized test might be a good measure of a teacher’s effectiveness; more effective teachers tend to produce students that learn more and subsequently do better on the test. However, if the poor teachers are then penalized and told to improve their performance or find a new job, the teachers might try to game the system. Now, instead of teaching their students about a subject in a holistic fashion that results in real learning, they just start teaching to the test. Rather than being taught, say, chemistry, students begin to get taught how to take a chemistry test, and the two are decidedly not the same thing. So long as teachers are only assessed on the grades of their students that take those tests, this is the incentive structure that ends up getting created.

Pictured: Not actual chemistry

Beyond just impacting the number of papers that academics might publish, a number of other potential unintended consequences of incentive structures are discussed. One of which involves measures of the quality of published work. We might expect that theoretically and empirically meaningful papers will receive more citations than weaker work. However, because the meaningfulness of a paper can’t be assessed directly, we look at proxy measures, like citation count (how often a paper is cited by other papers or authors). The consequence? People citing their own work more often and peer reviewers requesting their work be cited by people seeking to publish in the field. The number of pointless citations are inflated. There are also incentives for publishing in “good” or prestigious journals; those which are thought to preferentially publish meaningful work. Again, we can’t just assess how “good” a journal is, so we use other metrics, like how often papers from that journal are cited. The net result here is much the same, where journals would prefer to publish papers that cite papers they have previously published. Going a step further, when universities are ranked on certain metrics, they are incentivized to game those metrics or simply misreport them. Apparently a number of colleges have been caught just lying on that front to get their rankings up, while others can improve their rankings without really improving their institution. 

There are many such examples we might run though (and I recommend you check out the paper itself for just that reason), but the larger point I wanted to discuss was what all this means on a broader scale. To the extent that those who are more willing to cheat the system are rewarded for their behavior, those who are less willing to cheat will be crowded out, and there we have a real problem on our hands. For perspective, Fanelli (2009) reports that 2% of scientists admit to fabricating data and 10% report engaging in less overt, but still questionable practices, on average; he also reports that when asked about if they know of a case of their peers doing such things, those numbers are around 14% and 30%, respectively. While those numbers aren’t straightforward to interpret (it’s possible that some people cheat a lot, several people know of the same cases, or that one might be willing to cheat if the opportunity presented itself even if it hasn’t yet, for instance), they should be taken very seriously as a cause for concern.

(It’s also worth noting that Edwards & Roy misreport the Fanelli findings by citing his upper-bounds as if they were the average, making the problem of academic misconduct seem as bad a possible. This is likely just a mistake, but it highlights the possibility that mistakes likely follow the incentive structure as well; not just cheating. Just as researchers have incentives to overstate their own findings, they also have incentives to overstate the findings of others to help make their points convincingly)

Which is ironic for a paper complaining about incentives to overstate results

When it’s not just the case that a handful of bad apples within academia are contributing to a problem of, say, cheating with their data, but rather an appreciable minority of them are, this has the potential to have at least two major consequences. First, it can encourage more non-cheaters to become cheaters. If I were to observe my colleagues cheating the system and getting rewarded for it, I might be encouraged to cheat myself just to keep up when faced with (very) limited opportunities for jobs or funding. Parallels can be drawn to steroid use in sports, where those who do not initially want to use steroids might be encouraged to if enough of their competitors did.

The second consequence is that, as more people take part in that kind of culture, public faith in universities – and perhaps scientific research more generally – erodes. With eroding public faith comes reduced funding and increased skepticism towards research findings; both responses are justified (why would you fund researchers you can’t trust?) and worrying, as there are important problems that research can help solve, but only if people are willing to listen.    

*To be fair, it’s not that my ability as a teacher is entirely irrelevant to hiring committees; it’s that not only is this ability secondary to other concerns (i.e., my teaching ability might be looked at only after they narrow the search down by grant funding and publications), but my teaching ability itself isn’t actually assessed. What is assessed are my student evaluations and that is decidedly not the same thing.

References: Edwards, M. & Roy, S. (2017). Academic research in the 21st century: Maintaining scientific integrity in a climate of perverse incentives and hypercompetition. Environmental Engineering Science, 34, 51-61.

Fanelli, D. (2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PLoS One. 4, e5738

Courting Controversy

“He says true but unpopular things. If you can’t talk about problems, you can’t fix them.”

The above quote comes to us from an interview with Trump supporters. Regardless of what one thinks about Trump and the truth of what he says, that idea holds a powerful truth itself: the world we live in can be a complicated one, and if we want to figure out how to best solve the problems we face, we need to be able to talk about them openly; even if the topics are unpleasant or the ideas incorrect. That said, there are some topics that people tend to purposefully avoid talking about. Not because the topics themselves are in some way unimportant or uninteresting, but rather because the mere mention of them is not unlike the prodding of a landmine. They are taboo thoughts: things that are made difficult to even think without risking moral condemnation and social ostracism. As I’m no fan of taboos, I’m going to cross one of them today myself, but in order to talk about those topics with some degree of safety, one needs to begin by talking about other topics which are safe. I want to first talk about something that is not dangerous, and slowly ramp up the danger. As a fair warning, this does require that this post be a bit longer than usual, but I think it’s a necessary precaution. 

“You have my attention…and it’s gone”

Let’s start by talking about driving. Driving is a potentially dangerous task, as drivers are controlling heavy machinery traveling at speeds that regularly break 65 mph. The scope of that danger can be highlighted by estimates that put the odds of pedestrian death – were they to be struck by a moving vehicle – at around 85% at only 40 mph. Because driving can have adverse consequences for both the driver and those around them, we impose restrictions on who is allowed to drive what, where, when, and how. The goal we are trying to accomplish with these restrictions is to minimize harm while balancing benefits. After all, driving isn’t only risky; it’s also useful and something people want to do. So, how are we going to – ideally – determine who is allowed the ability to drive and who is not? The most common solution, I would think, is to determine what risks we are trying to minimize and then ensure that people are able to surpass some minimum threshold of demonstrated ability. Simply put, we want to know people are good drivers.

Let’s make that concrete. In order to safely operate a vehicle you need to be able: (a) see out of the windows, (b) know how to operate the car mechanically, (c) have the physical strength and size to operate the car, (d) understand the “rules of the road” and all associated traffic signals, (e) have adequate visual acuity to see the world you’ll be driving through, (f) possess adequate reaction time to be able to respond to the ever-changing road environment, and (g) possess the psychological restraint to not take excessive risks, such as traveling at unreasonably high speeds or cutting people off. This list is non-exhaustive, but it’s a reasonable place to start.

If you want to drive, then, you need to demonstrate that you can see out of the car while still being able to operate it. This would mean that those who are too small to accomplish both tasks at once – like young children or very short adults – shouldn’t be allowed to drive. Similarly, those who are physically large enough to see out of the windows but possess exceptionally poor eyesight should similarly be barred from driving, as we cannot trust they will respond appropriately. If they can see but not react in time, we don’t want them on the road either. If they can operate the car, can see, and know the rules but refuse to obey them and drive recklessly, we either don’t grant them a license or revoke it if they already have one.

In the service of assessing these skills we subject people to a number of tests: there are written tests that must be completed to determine knowledge of the rules of the road; there are visual tests; there are tests of driving ability. Once these tests are passed, they are still reviewed from time to time, and a buildup of infractions can yield to a revocation of driving privileges.

However, we do not test everyone for these abilities. All of these things that we want a driver’s license to reflect – like every human trait – need to develop over time. In other words, they tend to fall within some particular distribution – often a normal one – with respect to age. As such, younger drivers are thought to pose more risk than adult drivers along a number of these desired traits. For instance, while not every person who is 10 years old is too small to operate a vehicle, the large majority of them are. Similarly, your average 15-year-old might not appropriately understand the risks of reckless driving and avoid it as we would hope. Moreover, the benefits that these young individuals can obtain from driving are lower as well; it’s not common for 12-year-olds to need a car to commute to work.

Accordingly, we also set minimum age laws regarding when people can begin to be considered for driving privileges. These laws are not set because it is impossible that anyone below the specific age set by it might have need of a car and be able to operate it safely and responsibly, but rather a recognition that a small enough percentage of them can that it’s not really worth thinking about (in the case of two-year-olds, for instance, that percentage is 0, as none could physically operate the vehicle; in the case of 14-year-olds it’s non-zero, but judged to be sufficiently low all the same). There are even proposals floating around concerning something like a maximum driving age, as driving abilities appear to deteriorate appreciably in older populations. As such, it’s not that we’re concerned about the age per se of the drivers – we don’t just want anyone over the age of 18 on the road – but age is a still a good correlate of other abilities and allows us to save a lot of time in not having to assess every single individual for driving abilities from birth to death under every possible circumstance.

Don’t worry; he’s watched plenty of Fast & Furious movies

This brings us to first point of ramping up the controversy. Let’s talk a bit about drunk driving. We have laws against operating vehicles while drunk because of the effects that drinking has: reduced attention and reaction time, reduced inhibitions resulting in more reckless driving, and impaired ability to see or stay awake, all of which amount to a reduction in driving skill and increase potential for harmful accidents. Reasonable as these laws sound, imagine, if you would, two hypothetical drivers: the worst driver legally allowed to get behind a wheel, as well as the best driver. Sober, we should expect the former to pose a much greater risk to himself and others than the latter but, because they both pass the minimum threshold of ability, both are allowed to drive. It is possible, however, that the best driver’s abilities while he is drunk still exceed those of the worst driver’s while he is sober.

Can we recognize that exception to the spirit of the law against drunk driving without saying it is morally or legally acceptable for the best driver to drive drunk? I think we can. There are two reasons we might do so. The first is that we might say even if the spirit of the rule seems to be violated in this particular instance, the rule is still one that holds true more generally and should be enforced for everyone regardless. That is, sometimes the rule will make a mistake (in a manner of speaking), but it is right often enough that we tolerate the mistake. This seems perfectly reasonable, and is something we accept in other areas of life, like medicine. When we receive a diagnosis from a doctor, we accept that it might not be right 100% of the time, but (usually) believe it to be right often enough that we act as if it were true. Further, the law is efficient: it saves us the time and effort in testing every driver for their abilities under varying levels of intoxication. Since the consequences of making an error in this domain might outweigh the benefits of making a correct hit, we work on maximizing the extent to which we avoid errors. If such methods of testing driving ability were instantaneous and accurate, however, we might not need this law against drunk driving per se because we could just be looking at people’s ability, rather than blood alcohol content. 

The second argument you might make to uphold the drunk driving rule is to say that even if the best drunk driver is still better than the worst sober one, the best drunk driver is nevertheless a worse driver than he is while sober. As such, he would be imposing more risk on himself and others than he reasonably needs to, and should not be allowed to engage in the behavior because of that. This argument is a little weaker – as it sets up a double standard – but it could be defensible in the right context. So long as you’re explicit about it, driving laws could be set such that people need to pass a certain threshold of ability and need to be able to perform within a certain range of their maximum ability. This might do things like make driving while tired illegal, just like drunk driving. 

The larger point I hope to hit on here is the following, which I hope we all accept: there are sometimes exceptions (in spirit) to rules that generally hold true and are useful. It is usually the case that people below a certain minimum driving age shouldn’t be trusted with the privilege, but it’s not like something magical happens at that age where an ability appears fully-formed in their brain. People don’t entirely lack the ability to drive at 17.99 years old and possess it fully at 18.01 years. That’s just not how development works for any trait in any species. We can recognize that some young individuals possess exceptional driving abilities (at least for their age, if not in the absolute sense, like this 14-year-old NASCAR driver) without suggesting that we change the minimum age driving law or even grant those younger people the ability to drive yet. It’s also not the case (in principle) that every drunk driver is incapable of operating their vehicle at or above the prescribed threshold of minimum safety and competency. We can recognize those exceptional individuals as being unusual in ability while still believing that the rule against drunk driving should be enforced (even for them) and be fully supportive of it.

That said, 14-year-old drunk drivers are a recipe for disaster

Now let’s crank up the controversy meter further and talk about sex. Rather than talking about when we allow people to drive cars and under what circumstances, let’s talk about when we accept their ability to consent to have sex. Much like driving, sex can carry potential costs, including pregnancy, emotional harm, and the spread of STIs. Also like driving, sex tends to carry benefits, like physical pleasure, emotional satisfaction and, depending on your perspective, pregnancy. Further, much like driving, there are laws set for the minimum age at which someone can be said to legally consent to sex. These laws seem to be set in balancing the costs and benefits of the act; we do not trust the individuals below certain ages are capable of making responsible decisions about when to engage in the act, with whom, in what contexts, and so on. There is a real risk that younger individuals can be exploited by older ones in this realm. In other words, we want to ensure that people are at least at a reasonable point in their physical and psychological development that can allow them to make an informed choice. Much like driving (or signing contracts), we want people to possess a requisite level of skills before they are allowed to give consent for sex.

This is where the matter begins to get complicated because, as far as I have seen throughout discussions on the matter, people are less than clear about what skills or bodies of knowledge people should possess before they are allowed to engage in the act. While just about everyone appears to believe that people should possess a certain degree of psychological maturity, what that precisely means is not outlined. In this regard, consent is quite unlike driving: people do not need to obtain licenses to have sex (excepting some areas in which sex outside of marriage is not permitted) and do not need to demonstrate particular skills or knowledge. They simply need to reach a certain age. This is (sort of) like giving everyone over the age of, say, 16, a license to drive regardless of their abilities. This lack of clarity regarding what skills we want people to have is no doubt as least partially responsible for the greater variation in age of consent laws, relative to age of driving laws, across the globe.   

The matter of sex is complicated by a host of other factors, but the main issue is this: it is difficult for people to outline what psychological traits we need to have in order to be deemed capable of engaging in the behavior. For driving, this is less of a problem: pretty much everyone can agree on what skills and knowledge they want other drivers to have; for sex, concerns are much more strategic. Here’s a great for instance: one potential consequence (intended for some) to sex is pregnancy and children. Because sex can result in children and those children need to be cared for, some might suggest that people who cannot reasonably be expected to be able to provide well enough for said children should be barred from consenting to sex. This proposal is frequently invoked to justify the position that non-adults shouldn’t be able to consent to sex because they often do not have access to child-rearing resources. It’s an argument that has intuitive appeal, but it’s not applied consistently. That is, I don’t see many people suggesting that the age of consent should be lowered for rich individuals who could care for children, nor that people who fall below a certain poverty line be barred from having sex because they might not be able to care for any children it produced.

There are other arguments one might consider on that front as well: because the biological consequences of sex fall on men and women differently, might we actually hold different standards for men and women when considering whether they are allowed to engage in the behavior? That is, would it be OK for a 12-year-old boy to consent to sex with a 34-year-old woman because she can bear the costs of pregnancy, but not allow the same relationship when the sexes were reversed? Legally we have the answer: no, it’s not acceptable in either case. However, there are some who would suggest such the former relationship is actually acceptable. Even in the realm of law, it would seem, a sex-dependent standard has been upheld in the past. 

Sure hope that’s his mother…

This is clearly not an exhaustive list of questions regarding how age of consent laws might be set, but the point should be clear enough: without a clear standard about what capabilities one needs to possess to be able to engage in sex, we end up with rather unproductive discussions. Making things even trickier, sex is more of a strategic act than driving, yielding greater disagreements over the matter and inflamed passions. It is very difficult to make explicit what abilities we want people to demonstrate in order to be able to consent to sex and reach consensus on them for just this reason. Toss in the prospect of adults taking advantage of teenagers and you have all the makings of a subject people really don’t want to talk about. As such, we are sometimes left in a bit of an awkward spot when thinking about whether exceptions to the spirit of age of consent laws exist. Much like driving, we know that nothing magical happens to someone’s body and brain when they hit a certain age: development is a gradual process that, while exhibiting regularities, does not occur identically for all people. Some people will possess the abilities we’d like them to have before the age of consent; some people won’t possess those abilities even after it.

Importantly – and this is the main point I’ve been hoping to make – this does not mean we need to change or discard these laws. We can recognize that these laws do not fit every case like a glove while still behaving as if they do and intuitively judging them as being about right. Some 14-year-olds do possess the ability to drive, but they are not allowed to legally; some 14-year-olds possess whatever requisite abilities we hope those who consent to sex will have, but we still treat them as if they do not. At least in the US: in Canada, the age of consent is currently 16, up from 14 a few years ago, in some areas of Europe it is still 14, and in some areas of Mexico it can be lower than that.

“Don’t let that distract from their lovely architecture or beaches, though”

Understanding the variation in these intuitions both between countries, between individuals, and over time are interesting matters in their own right. However, there are some who worry about the consequences of even discussing the issue. That is, if we acknowledge that even a single individual is an exception to the general rule, we would be threatening the validity of the rule itself. Now I don’t think this is the case, as I have outlined above, but it is worth adding the following point to that concern: recognizing possible exceptions to the rule is an entirely different matter than the consequences of doing so. Even if there are negative consequences to discussing the matter, that doesn’t change the reality of the situation. If your argument requires that you fail to recognize parts of reality because it might upset people – or that your decree, from the get go, that certain topics cannot be discussed – then your argument should be refined.

There is a fair bit of danger in accepting these taboos: while it might seem all well and good when the taboo is directed against a topic you feel shouldn’t be discussed, a realization needs to be made that your group is not always going to be in charge of what topics fall under that umbrella, and to accept it as legitimate when it benefits you is to accept it as legitimate when it hurts you as well. For instance, not wanting to talk about sex with children out of fear it would cause younger teens to become sexually active yielded the widely-ineffective abstinence-only sex education (and, as far as I can tell, talking comprehensive sex education does not result in worse outcomes, but I’m always open to evidence that it does). There is a real hunger in people to understand the world and to be able to voice what is on their mind; denying that comes with very real perils.