Classic Research In Evolutionary Psychology: Learning

Let’s say I were to give you a problem to solve: I want you to design a tool that is good at cutting. Despite the apparent generality of the function, this is actually a pretty vague request. For instance, one might want to know more about the material to be cut: a sword might work if your job is cutting some kind human flesh, but it might also be unwieldy to keep around the kitchen for preparing dinner (I’m also not entirely sure they’re dishwasher-safe, provided you managed to fit a katana into your machine in the first place). So let’s narrow the request down to some kind of kitchen utensil. Even that request, however, is a bit vague, as evidenced by Wikipedia naming about a dozen different kinds of utensil-style knives (and about 51 different kinds of knives overall). That list doesn’t even manage to capture other kinds of cutting-related kitchen utensils, like egg-slicers, mandolines, peelers, and graters. Why do we see so much variety, even in the kitchen, and why can’t one simple knife be good enough? Simple: when different tasks have non-overlapping sets of best design solutions, functional specificity tends to yield efficiency in one realm, but not in another.

“You have my bow! And my axe! And my sword-themed skillet!”.

The same basic logic has been applied to the design features of living organisms as well, including aspects of our cognition as I argued in the last post: the part of the mind that functions to logically reason about cheaters in the social environment does not appear to be able logically reason with similar ease about other, even closely-related topics. Today, we’re going to expand on that idea, but shift our focus towards the realm of learning. Generally speaking, learning can be conceived of as some change to an organism’s preexisting cognitive structure due to some experience (typically unrelated to physical trauma). As with most things related to biological changes, however, random alterations are unlikely to result in improvement; to modify a Richard Dawkins quote ever so slightly, “However many ways there may be of [learning something useful], it is certain that there are vastly more ways of [learning something that isn't". For this reason, along with some personal experience, no sane academic has ever suggested that our learning occurs randomly. Learning needs to be a highly-structured process in order to be of any use.

Precisely how structured "highly-structured" entails is a bit of a sticky issue, though. There are undoubtedly still some who would suggest that some general type of reinforcement-style learning might be good enough for learning all sorts of neat and useful things. It's a simple rule: if [action] is followed by [reward], then increase the probability of [action]; if [action] is followed by [punishment], then decrease the probability of [action]. There are a number of problems with such a simple rule, and they return to our knife example: the learning rule itself is under-specified for the demands of the various learning problems organisms face. Let’s begin with an analysis of what is known as conditioned taste aversion. Organisms, especially omnivorous ones, often need to learn about what things in their environment are safe to eat and which are toxic and to be avoided. One problem in learning about which are potential foods are toxic is that the action (eating) is often divorced from the outcome (sickness) by a span of minutes to hours, and plenty of intervening actions take place in the interim. On top of that, this is not the type of learning you want to need repeated exposures to in order to learn, as, and this should go without saying, eating poisonous foods is bad for you. In order to learn the connection between the food and the sickness, then, a learning mechanism would seem to need to “know” that the sickness is related to the food and not other, intervening variables, as well as being related in some specific temporal fashion. Events that conform more closely to this anticipated pattern should be more readily learnable.

The first study we’ll consider, then, is by Garcia & Koelling (1966) who were examining taste conditioning in rats. The experimenters created conditions in which rats were exposed to “bright, noisy” water and “tasty” water. The former condition was created by hooking a drinking apparatus up to a circuit that connected to a lamp and a clicking mechanism, so when the rats drank, they were provided with visual and auditory stimuli. The tasty condition was created by flavoring the water. Garcia & Koelling (1966) then attempted to pair the waters with either nausea or electric shocks, and subsequently measure how the rats responded in their preference for the beverage. After the conditioning phase, during the post-test period, a rather interesting sets of results emerged: while rats readily learned to pair nausea with taste, they did not draw the connection between nausea and audiovisual cues. When it came to the shocks, however, the reverse pattern emerged: rats could pair shocks with audiovisual cues well, but could not manage to pair taste and shock. This result makes a good deal of sense in light of a more domain-specific learning mechanism: things which produce certain kinds of audiovisual cues (like predators) might also have the habit of inflicting certain kinds of shock-like harms (such as with teeth or claws). On the other hand, predators don’t tend to cause nausea; toxins in food tend to do so, and these toxins also tend to come paired with distinct tastes. An all-purpose learning mechanism, by contrast, should be able to pair all these kinds of stimuli and outcomes equally well; it shouldn’t matter whether the conditioning comes in the form of nausea or shocks.

Turns out that shocks are useful for extracting information, as well as communicating it.

The second experiment to consider on the subject of learning, like the previous one, also involves rats, and actually pre-dates it. This paper, by Petrinovich & Bolles (1954), examined whether different deprivation states have qualitatively different effects on behavior. In this case, the two deprivation states under consideration were hunger and thirst. Two samples of rats were either deprived of food or water, then placed in a standard T-maze (which looks precisely how you might imagine it would). The relevant reward – food for the hungry rats and water for the thirsty ones – was placed in one arm of the T maze. The first trial was always rewarded, no matter which side the rat chose. Following that initial choice, the food was placed on the side of the maze the rat did not chose on the previous trial. For instance, if the rat went ‘right’ on the first trial, the reward was placed in the ‘left’ arm on the second trial. Whether the rat chose correctly or incorrectly didn’t matter; the reward was always placed on the opposite side as its previous choice. Did it matter whether the reward was food or water?

Yes; it mattered a great deal. The hungry rats averaged substantially fewer errors in reaching the reward than the thirsty ones (approximately 13 errors over 34 trials, relative to 28 errors, respectively). The rats were further tested until they managed to perform 10 out of 12 trials correctly. The hungry rats managed to meet the criterion value substantially sooner, requiring a median of 23 total trials before reaching that mark. By contrast, 7 of the 10 thirsty rats failed to reach the criterion at all, and, of the three that did, they required approximately 30 trials on average to manage that achievement. Petrinovich & Bolles (1954) suggested that these results can be understood in the following light: hunger makes the rat’s behavior more variable, while thirst makes its behavior more stereotyped. Why? The most likely candidate explanation is the nature of the stimuli themselves, as they tend to appear in the world. Food sources tend to be distributed semi-unpredictably throughout the environment, and where there is food today, there might not be food tomorrow. By contrast, the location of water tends to be substantially more fixed (where there was a river today, there is probably a river tomorrow), so returning to the last place you found water would be the more-secure bet. To continue to drive this point home: a domain general learning mechanism should do both tasks equally as well, and a more general account would seem to struggle to explain these findings.

Shifting gears away from rats, the final study for consideration is one I’ve touched on before, and it involves the fear responses of monkeys. As I’ve already discussed the experiment, (Cook & Mineka, 1989) I’ll offer only a brief recap of the paper. Lab-reared monkeys show no intrinsic fear responses to snakes or flowers. However, social creatures that they are, these lab-reared monkeys can readily develop fear responses to snakes after observing another conspecific reacting fearfully to them. This is, quite literally, a case of monkey see, monkey do. Does this same reaction hold in response to observations of conspecifics reacting fearfully to a flower? Not at all. Despite the lab-reared monkeys being exposed to stimuli they have never seen before in their life (snakes and flowers) paired with a fear reaction in both cases, it seems that the monkeys are prepared to learn to fear snakes, but not similarly prepared to learn a fear of flowers. Of note is that this isn’t just a fear reaction in response to living organisms in general: while monkeys can learn a fear of crocodiles, they do not learn to fear rabbits under the same conditions.

An effect noted by Python (1975)

When it comes to learning, it does not appear that we are dealing with some kind of domain-general learning mechanism, equally capable of learning all types of contingencies. This shouldn’t be entirely surprising, as organisms don’t face all kinds of contingencies with equivalent frequencies: predators that cause nausea are substantially less common than toxic compounds which do. Don’t misunderstand this argument: humans and nonhumans alike are certainly capable of learning many phylogenetically novel things. That said, this learning is constrained and directed in ways we are often wholly unaware of. The specific content area of the learning is of prime importance in determining how quickly somethings can learned, how lasting the learning is likely to be, and which things are learned (or learnable) at all. The take-home message of all this research, then, can be phrased as such: Learning is not the end point of an explanation; it’s a phenomenon which itself requires an explanation. We want to know why an organism learns what it does; not simply that it learns.

References: Cook M, & Mineka S (1989). Observational conditioning of fear to fear-relevant versus fear-irrelevant stimuli in rhesus monkeys. Journal of abnormal psychology, 98 (4), 448-59 PMID: 2592680

Garcia, J. & Koelling, R. (1966). Relation of cue to consequence in avoidance learning. Psychonomic Science, 4, 123-124.

Petrinovich, L. & Bolles, R. (1954). Deprivation states and behavioral attributes. Journal of Comparative Physiological Psychology, 47, 450-453.

Classic Research In Evolutionary Psychology: Reasoning

I’ve consistently argued that evolutionary psychology, as a framework, is a substantial, and, in many ways, vital remedy to some wide-spread problems: it allows us to connect seemingly disparate findings under a common understanding, and, while the framework is by itself no guarantee of good research, it forces researchers to be more precise in their hypotheses, allowing for conceptual problems with hypotheses and theories to be more transparently observed and addressed. In some regards the framework is quite a bit like the practice of explaining something in writing: while you may intuitively feel as if you understand a subject, it is often not until you try to express your thoughts in actual words that you find your estimation of your understanding has been a bit overstated. Evolutionary psychology forces our intuitive assumptions about the world to be made explicit, often to our own embarrassment.

“Now that you mention it, I’m surprised I didn’t notice that sooner…”

As I’ve recently been discussing one of the criticisms of evolutionary psychology – that the field is overly focused on domain-specific cognitive mechanisms – I feel that now would be a good time to review some classic research that speaks directly to the topic. Though the research to be discussed itself is of recent vintage (Cosmides, Barrett, & Tooby, 2010), the topic has been examined for some time, which is whether our logical reasoning abilities are best convinced of as domain-general or domain-specific (whether they work equally well, regardless of content, or whether content area is important to their proper functioning). We ought to expect domain specificity in our cognitive functioning for two primary reasons (though these are not the only reasons): the first is that specialization yields efficiency. The demands of solving a specific task are often different from the demands of solving a different one, and to the extent that those demands do not overlap, it becomes difficult to design a tool that solves both problems readily. Imagining a tool that can both open wine bottles and cut tomatoes is hard enough; now imagine adding on the requirement that it also needs to function as a credit card and the problem becomes exceedingly clear. The second problem is outlined well by Cosmides, Barrett, & Tooby (2010) and, as usual, they express it more eloquently than I would:

The computational problems our ancestors faced were not drawn randomly from the universe of all possible problems; instead, they were densely clustered in particular recurrent families.

Putting the two together, we end up with the following: humans tend to face a non-random set of adaptive problems in which the solution to any particular one tends to differ from the solution to any other. As domain-specific mechanisms solve problems more efficiently than domain-general ones, we ought to expect the mind to contain a large number of cognitive mechanisms designed to solve these specific and consistently-faced problems, rather than only a few general-purpose mechanisms more capable of solving many problems we do not face, but poorly-suited to the specific problems we do. While such theorizing sounds entirely plausible and, indeed, quite reasonable, without empirical support for the notion of domain-specificity, it’s all so much bark and no bite.

Thankfully, empirical research abounds in the realm of logical reasoning. The classic tool used to assess people’s ability to reason logically is the Wason selection task. In this task, people are presented with a logical rule taking the form of “if P, then Q“, and a number of cards representing P, Q, ~P, and ~Q (i.e. “If a card has a vowel on one side, then it has an even number on the other”, with cards showing A, B, 1 & 2). They are asked to point out the minimum set of cards that would need to be checked to test the initial “if P, then Q” statement. People’s performance on the task is generally poor, with only around 5-30% of people getting it right on their first attempt. That said, performance on the task can become remarkably good – up to around 65-80% of subjects getting the correct answer – when the task is phrased as a social contract (“If someone [gets a benefit], then they need to [pay a cost]“, the most well known being “If someone is drinking, then they need to be at least 21″). Despite the underlying logical form not being altered, the content of the Wason task matters greatly in terms of performance. This is a difficult finding to account for if one holds to the idea of a domain-general logical reasoning mechanism that functions the same way in all tasks involving formal logic. Noting that content matters is one thing, though; figuring out how and why content matters becomes something of a more difficult task.

While some might suggest that content simply matters as a function of familiarity – as people clearly have more experience with age restrictions on drinking and other social situations than vaguer stimuli – familiarity doesn’t help: people will fail the task when it is framed in terms of familiar stimuli and people will succeed at the task for unfamiliar social contracts. Accordingly, criticisms of the domain-specific social contract (or cheater-detection) mechanism shifted to suggest that the mechanism at work is indeed content-specific, but perhaps not specific to social contracts. Instead, the contention was that people are good at reasoning about social contracts, but only because they’re good at reasoning about deontic categories – like permissions and obligations – more generally. Assuming such an account were accurate, it remains debatable as to whether that mechanism would be counted as a domain-general or domain-specific one. Such a debate need not be had yet, though, as the more general account turns out to be unsupported by the empirical evidence.

We’re just waiting for critics to look down and figure it out.

While all social contracts involve deontic logic, not all deontic logic involves social contracts. If the more general account of deontic reasoning were true, we ought to not expect performance difference between the former and latter types of problems. In order to test whether such differences exist, Cosmides, Barrett, & Tooby’s (2010) first experiment involved presenting subjects with a permission rule – “If you do P, you must do Q first” – varying whether P was a benefit (going out at night), neutral (staying in), or a chore (taking out the trash; Q, in this case, involved tying a rock around your ankle). When the rule was a social contract (the benefit), performance was high on the Wason task, with 80% of subjects answering correctly. However, when the rule involved staying in, only 52% of subjects got it right; that number was even lower in the garbage condition, with only 44% accuracy among subjects. Further, this same pattern of results was subsequently replicated in a new context involving filing/signing forms as well. This results is quite difficult to account for with a more-general permission schema, as all the conditions involve reasoning about permissions; they are, however, consistent with the predictions from social contract theory, as only the contexts involving some form of social contract ended up eliciting the highest levels of performance.

Permission schemas, in their general form, also appear unconcerned with whether one violates a rule intentionally or accidentally. By contrast, social contract theory is concerned with the intentionality of the violation, as accidental violations do not imply the presence of a cheater the way intentional violations do. To continue to test the distinction between the two models, subjects were presented with the Wason task in contexts where the violations of the rule were likely intentional (with or without a benefit for the actor) or accidental. When the violation was intentional and benefited the actor, subjects performed accurately 68% of the time; when it was intentional but did not benefit that actor, that percentage dropped to 45%; when the violation was likely unintentional, performance bottomed-out at 27%. These results make good sense if one is trying to find evidence of a cheater; they do not if one is trying to find evidence of a rule violation more generally.

In a final experiment, the Wason task was again presented to subjects, this time varying three factors: whether one was intending to violate a rule or not; whether it would benefit the actor or not; and whether the ability to violate was present or absent. The pattern of results mimicked those above: when benefit, intention, and ability were all present, 64% of subjects determined the correct answer to the task; when only 2 factors were present, 46% of subjects got the correct answer; and when only 1 factor was present, subjects did worse still, with only 26% getting the correct answer, which is approximately the same performance level as when there were no factors present. Taken together, these three experiments provide powerful evidence that people aren’t just good at reasoning about the behavior of other people in general, but rather that they are good at reasoning about social contracts in particular. In the now-immortal words of Bill O’Reilly, “[domain-general accounts] can’t explain that“.

“Now cut their mic and let’s call it a day!”

Now, of course, logical reasoning is just one possible example for demonstrating domain specificity, and these experiments certainly don’t prove that the entire structure of the mind is domain specific; there are other realms of life – such as, say, mate selection, or learning – where domain general mechanisms might work. The possibility of domain-general mechanisms remains just that – possible; perhaps not often well-reasoned on a theoretical level or well-demonstrated at an empirical one, but possible all the same. The problem in differentiating between these different accounts may not always be easy in practice, as they are often thought to generate some, or even many, of the same predictions, but in principle it remains simple: we need to place the two accounts in experimental contexts in which they generate opposing predictions. In the next post, we’ll examine some experiments in which we pit a more domain-general account of learning against some more domain-specific ones.

References: Cosmides L, Barrett HC, & Tooby J (2010). Adaptive specializations, social exchange, and the evolution of human intelligence. Proceedings of the National Academy of Sciences of the United States of America, 107 Suppl 2, 9007-14 PMID: 20445099

Evolutionary Psychology: Tying Psychology Together

Every now and again – perhaps more frequently than many would prefer – someone who apparently fails to understand one or more aspects of the evolutionary perspective in psychology goes on to make rather public proclamations about what it is and what it can and cannot do for us. Notable instances are not particularly difficult to find. The most recent of these to cross my desk comes from Gregg Henriques, which takes a substantially less-nasty tone than I have come to expect. In it, he claims that evolutionary psychology does not provide us with a viable metatheory for understanding psychology, and he bases his argument on three main points: (1) evolutionary psychology is overly committed to the domain-specificity concept, (2) that the theory fails to have the correct map of complexity, and (3) it hasn’t done much for people in a clinical setting. In the course of making these arguments, I feel he stumbles badly on several points, so I’d like to take a little time to point out these errors. Thankfully, given the relative consistency of these errors, doing so is becoming more a routine than anything else.

So feel free to change the channel if you’ve seen this before.

Gregg begins with the natural starting point for many people in criticizing EP: while we have been focusing on how organisms solve specific adaptive problems, there might be more general adaptive problems out there. As Gregg put it:

The EP founders also overlooked the fact that there really is a domain general behavioral problem, which can be characterized as the problem of behavioral investment

There are a number of things to say about such a suggestion. Thankfully, I have said them before, so this is a relatively easy task. To start off, these ostensibly domain-general problems are, in fact, not all that general. To use a simple example, consider one raised by Gregg in his discussion of behavioral investment theory: organisms need to solve the problem of obtaining more energy than they spend to keep on doing things like being alive and mating. That seems like an awfully general problem, but, stated in such manner, the means by which that general problem is, or can be, solved are massively unspecified. How does an organism calculate its current caloric state? How does an organism decide which things to eat to obtain energy? How does an organism decide when to stop foraging for food in one area and pursue a new one? How is the return on energy calculated and compared against the expenditure? As one can quickly appreciate, the larger, domain-general problem (obtain more energy than one expends) is actually composed of very many smaller problems, and things can get complicated quickly. Pursuing mating rather than food, for instance, is unlikely to result in an organism obtaining more energy than it expends. This leaves the behavioral investment problem – broadly phrased – wanting in terms of any predictive power: why do organism pursue goals other than gaining and energy and under what conditions do they do so? The issue here, then, is not so much that domain-general problems aren’t being accounted for by evolutionary psychology, but rather that the problems themselves are being poorly formulated by the critics.

The next area in this criticism that Gregg stumbles on is the level of analysis that evolutionary psychology tends to work with. Gregg considers associative learning a domain general system but, again, it’s trivial to demonstrate it is not all that general. There are many things that associative learning systems do not do: regulate homeostatic processes, like breathing and heart rate, perceive anything, like light, sound, pleasure, or pain, generate emotions, store memory, and so on. In terms of their function, associative learning systems only really seem to do one thing: make behavior followed by reward more likely than behavior followed by discomfort, and that’s only after other systems have decided what is rewarding and what is not. That this system can apply the same function to many different inputs doesn’t make it a domain-general one. The distinction that Gregg appears to miss, then, is that functional specificity is not the same as input specificity. Calling learning a domain-general system is a bit like calling a knife a domain-general tool because it can be used to cut many different objects. Try to use a knife to weld metal, and you’ll quickly appreciate how domain-specific the function of a knife is.

On top of that, there is also the issue that some associations are learned far more readily than others. To quote Dawkins, “However many ways there may be of being alive, it is certain that there are vastly more ways of being dead”. A similar logic applies to learning: there are many more potentially incorrect and useless things to learn than there are useful ones. This is why learning ends up being a rather constrained process: rats can learn to associate light and sound with shocks, but do not tend to make the association between taste and shock, despite the unpleasantness of the shock itself. Conversely, associations between taste and nausea can be readily learned, but not between light and nausea. To continue beating this point to death, a domain-general account of associative learning has a rather difficult time explaining why some connections are readily learned and others are not. In order to generate more textured predictions, you need to start focusing on the more-specific sub-problems that make up the more general one.

And if doing so is not enough of a pain-in-the-ass, you’re probably doing it wrong.

On a topic somewhat-related to learning, the helpful link provided by Gregg concerning behavioral investment theory has several passages that, I think, are rather diagnostic of the perspective he has about evolutionary psychology:

Finally, because [behavioral investment/shutdown theory] is an evolutionary model, it also readily accounts for the fact that there is a substantial genetic component associated with depression (p.61)…there is much debate on the relative amount of genetic constraint versus experiential plasticity in various domains of mental functioning (p.70).

The problem here is that evolutionary psychology concerns itself with far more than genetic components. In the primer on evolutionary psychology, the focus on genetic components in particular is deemed to be nonsensical in the first place, as the dichotomy between genetic and environmental itself is a false one. Gregg appears to be conflating “evolutionary” with “genetic” for whatever reason, and possibly both with “fixed” when he writes:

In contrast to the static model suggested by evolutionary psychologists, The Origin of Minds describes a mind that is dynamic and ever-changing, redesigning itself with each life experience

As far as I know, no evolutionary psychologist has ever suggested a static model of the mind; not one. Given that evolutionary psychologists is pluralized in that sentence, I can only assume that the error is made by at least several of them, but to whom “them” refers is a mystery to me. Indeed, this passage by Gregg appears to play by the rules articulated in the pop anti-evolutionary psychology game nearly perfectly:

The second part of the game should be obvious. Once you’ve baldly asserted what evolutionary psychologists believe – and you lose points if, breaking tradition, you provide some evidence for what evolutionary psychologists have actually claimed in print and accurately portray their view – point out the blindingly obvious opposite of the view you’ve hung on evolutionary psychology. Here, anything vacuous but true works. Development matters. People learn. Behavior is flexible. Brains change over time. Not all traits are adaptations. The world has changed. People differ across cultures. Two plus two equals four. Whatever.

The example is so by-the-book that little more really needs to be said about it. Somewhat ironically, Gregg suggests that the evolutionary perspective creates a straw man of other perspectives, like learning and cultural ones. I’ll leave that suggestion without further comment.

The next point Gregg raises concerning complexity I have a difficult time understanding. If I’m parsing his meaning correctly, he’s saying that culture adds a level of complexity to analyses of human behavior. Indeed, local environmental conditions can certainly shape how adaptations develop and are activated, whether due to culture or not, but I’m not sure precisely how that is supposed to be a criticism of evolutionary psychology. As I mentioned before, I’m not sure a single contemporary evolutionary psychologist has ever been caught seriously suggesting something to the contrary. Gregg also makes some criticism of evolutionary psychology not defining psychology as he would prefer. Again, I’m not quite sure I catch his intended meaning here, but I fail to see how that it is a criticism of the perspective. Gregg suggests that we need psychology that can apply to non-humans as well, but I don’t to see how an evolutionary framework fails that test. No examples are given for further consideration, so there’s not much more to say on that front.

Gregg’s final criticism  amounts to a single line, suggesting that an evolutionary perspective has yet to unify every approach people take in psychotherapy. Not being the expert on psychotherapy myself, I’ll plead ignorance to the success that an evolutionary framework has had in that realm, and no evidence of any kind is provided for assessment. I fail to see why such a claim has any bearing on whether an evolutionary perspective could do so; I just wanted to make note that the criticism has been heard, but perhaps not formulated into a more appreciable fashion.

Final verdict: the prosecution seems confused.

Criticisms of an evolutionary perspective like these are unfortunately common and consistently misguided. Why they continue to abound despite their being answered time and again from the field’s origins is curious. Now in all fairness, Gregg doesn’t appear hostile to the field, and deems it “essential” for understanding psychology. Thankfully, the pop anti-evolutionary psychology game captures this sentiment as well, so I’ll leave it on that note:

The third part of the game is not always followed perfectly, and it is the hardest part. Now that you’ve shown how you are in full command of the way science is conducted or some truth about human behavior that evolutionary psychologists have missed, it’s important to assert that you absolutely acknowledge that of course humans are the product of evolution, and of course humans aren’t exempt from the principles of biology.

Look, you have to say, I’m not opposed to applying evolutionary ideas to humans in principle. This is key, as it gives you a kind of ecumenical gravitas. Yes, you continue, I’m all for the unity of science and cross-pollination and making the social sciences better, and so on. But, you have to add – and writing plaintively, if you can, helps here – I just want things to be done properly. If only evolutionary psychologists would (police themselves, consider development, acknowledge learning, study neuroscience, run experiments, etc…), then I would be just perfectly happy with the discipline.

Sound The Alarm: Sexist Citations

First things first: I would like to wish Popsych.org a happy two-year anniversary. Here’s looking at many more. That’s enough celebration for now; back to the regularly scheduled events.

When it comes to reading and writing, academics are fairly busy people. Despite these constraints on time, some of us (especially the male sections) still make sure to take the extra time to examine the articles we’re reading to ascertain the gender of the authors so as to systematically avoid citing women, irrespective of the quality of their work. OK; maybe that sounds just a bit silly. Provided people like that actually exist in any appreciable sense of the word, their representation among academics must surely be a vast minority, else their presence would be well known. So what are we to make of the recently-reported finding that, among some political science journals, female academics tend to have their work cited less often than might be expected, given a host of variables (Maliniak, Power, & Walter, 2013)?  Perhaps there might exist some covert bias against female authors, such that the people doing the citing aren’t even aware that they favor the work of men, relative to women. If the conclusions of the current paper are to be believed, this is precisely what we’re seeing (among other things).  Sexism – even the unconscious kind – is a bit of a politically hot topic to handle so, naturally, I suggest we jump right into the debate with complete disregard for the potential consequences; you know, for the fun of it all.

Don’t worry; I’m, like, 70% sure I know what I’m doing.

I would like to begin the review of this paper by noting a rather interesting facet of the tone of the introduction: what it does and does not label as “problematic”. What is labeled as problematic is the fact that women do not appear to earning tenured positions in equal proportion to the number of women earning PhDs. Though they discuss this fact in the light of the political science field, I assume they intend their conclusion to span many fields. This is the well-known leaky pipeline issue about which much has been written. What is not labeled as problematic are the facts in the next two sentences: women make up 57% of the undergraduate population, 52% of the graduate population, and these percentages are only expected to rise in the future. Admittedly, not every gender gap needs to be discussed in every paper that mentions them and, indeed, this gap might not actually mean much to us. I just want to note that women outnumbering men on campus by 1.3-to-1 and growing is mentioned without so much as batting an eye. The focus of the paper is unmistakably on considering the troubles that women will face. Well, sort of; a more accurate way of putting it is that the focus is on the assumed troubles that women will face: difficulty getting cited. As we will see, this citation issue is far from a problem exclusive to women.

Onto the main finding of interest: in the field of international relations, over 3000 articles across 12 influential journals spanning about 3 decades were coded for various descriptors about the article and the authors. Articles that were authored by men only were cited about 5 additional times, on average, than articles authored by women only. Since the average number of citations for all articles was about 25 citations per paper, this difference of 5 citations is labeled as “quite a significant” one, and understandably so; citation count appears to be becoming a more important part of the job process in academia. Importantly, the gap persisted at statistically significant levels even after controlling for factors like the age of the publication, the topic of study, whether it came from an R1 school, the methodological and theoretical approach taken in the paper, and the author’s tenure status. Statistically, being a woman seemed to be bad for citation count.

The authors suggest that this gap might be due to a few factors, though they appear to concede that a majority of the gap remains unexplained. The first explanation on offer is that women might be citing themselves less than men tend to (which they were: men averaged 0.4 self-citations per paper and women 0.25). However, subtracting out self-citation count and the average number of additional citations self-citation was thought to add does not entirely remove the gap either. The other possibility that the authors float involves what are called “citation cartels”, where authors or journals agree to cite each other, formally or informally, in order to artificially inflate citation counts.  While they have no evidence concerning the extent to which this occurs, nor whether it occurs across any gendered lines, they at least report that anecdotes suggest this practice exists. Would that factor help us explain the gender gap? No clue; there’s no evidence. In any case, from these findings, the authors conclude:

“A research article written by a woman and published in any of the top journals will still receive significantly fewer citations than if the same article had been written by a man” (p.29, emphasis mine).

I find the emphasized section rather interesting, as nothing that the authors researched would allow them to reach that conclusion. They were certainly not controlling for the quality of the papers themselves, nor their conclusions. It seems that because they controlled for a number of variables, the authors might have gotten a bit overconfident in assuming they had controlled for all or most of the relevant ones.

“Well, I’m out of ideas. I guess we’re done here”

Like other gender gaps, however, this one may not be entirely what it seems. Means are only one measure of central tendency, and not always preferable for describing one’s sample. For instance, the mean income of 10 people might be a million dollars provided nine have none and one is rather wealthy. A similar example might concern the “average” number of mates your typical male elephant seal has; while some have large harems, others are left out entirely from the mating game. In other words, a skewed distribution can result in means that are not entirely reflective of what many might consider the “true” average of the population. Another possible measure of central tendency we might consider, then, is the median: the value that falls in the middle of all the observed values, which is a bit more robust against outliers. Doing just that, we see that the gender gap in citation count vanishes entirely: not only does it not favor the men anymore, but it slightly favors the women in 2 of the 3 decades considered (the median for men from the 80s, 90s, and 00s are 5, 14, and 13; for women, 6, 14, and 15, respectively). Further, in two of the decades considered, mix-gendered articles appear to be favored by about 2-to-1 over papers with a single gender of author (medians equal 10, 22, and 16, respectively). Overall, the mean citation count looks to be about two-to-three times as high as the median, and the standard deviations of the citation count are huge. For instance, in the 1980s, articles authored by men averaged 17.6 citations per paper (substantially larger than the median of 5), and the SD of that count was 51.63. Yikes. Why is this rather interesting facet of the data not considered in much, if any, depth by the authors? I have no idea.

Now this is not to say that the mean or the median is necessarily the “correct” measure to consider here, but the fact that they return such different values ought to give us some pause for consideration. Mean values that are over twice as large as the median values with huge standard deviations suggests that we’re dealing with a rather skewed distribution, where some papers garner citation counts which are remarkably higher than others (a trend I wrote about recently with respect to cultural products). Now the authors do state that their results remain even if any outliers above 3 standard deviations are removed from the analysis, but I think that upper limit probably fails to fully capture what’s going on here. This handy graphical representation of citation count provided in the paper can help shed some light on the issue.

This is what science looks like.

What we see is not a terribly-noticeable trend for men to be cited more than women in general, as much as we see a trend for the papers with the largest citation counts to come disproportionately from men.  The work of most of the men, like most of the women, would seem to linger in relative obscurity. Even the mixed-sex papers fail to reach the heights that male-only papers tend to. In other words, the prototypical paper by women doesn’t seem to differ too much from the prototypical male paper; the “rockstar” papers (of which I’d estimate there are about 20 to 30 of in that picture), however, do differ substantially along gendered lines. Gendered lines are not the only way in which they might differ, however. A more accurate way of phrasing the questionable conclusion I quoted earlier would be to say “A research article written by anyone other than the initial author, if published in any of the top journals, might still receive significantly fewer citation even if it was the same article”. Cultural products can be capricious in their popularity, and even minor variations in initial conditions can set the stage for later popularity, or lack thereof.

Except for black; black is always fashionable.

This would naturally raise the question as to precisely why the papers with the largest impact come from men, relative to women. Unfortunately, I don’t have a good answer for that question. There is undoubtedly some cultural inertia to account for; were I to publish the same book as Steven Pinker in a parallel set of universes, I doubt mine would sell nearly as many copies (Steven has over 94,000 twitter followers, whereas I have more fingers and toes than fans). There is also a good deal of noise to consider: an article might not end up being popular because it was printed in the wrong place at the wrong time, rather than because of its quality. On the subject on quality, however, some papers are better than others, by whatever metric we’re using to determine such things (typically, that standard is “I know it when I wish I had thought of it first”). Though none of these factors lend themselves to analysis in any straightforward way, the important point is to not jump to overstated conclusions about sexism being the culprit, or to suggest that reviewers “…monitor the ratio of male to female citations in articles they publish” so as to point it out to the authors in the hopes of “remedying” any potential “imbalances”. One might also, I suppose, have reviewers suggest that authors make a conscious effort to cite articles with lower citation counts more broadly, so as to ensure a greater parity among citation counts in all articles. I don’t know why that state of affairs would be preferable, but one could suggest it.

References: Maliniak, D., Powers, R., & Walter, B. (2013). The gender citation gap in international relations. International Organization DOI: 10.1017/S0020818313000209