Replicating Failures To Replicate

There are moments from my education that have stuck with me over time. One such moment involved a professor teaching his class about what might be considered a “classic” paper in social psychology. I happened to have been aware of this particular paper for two reasons: first, it was a consistent feature in many of my previous psychology classes and, second, because the news had broke recently that when people tried to replicate the effect they had failed to find it. Now a failure to replicate does not necessarily mean that the findings of the original study were a fluke or the result of experimental demand characteristics (I happen to think they are), but that’s not even why this moment in my education stood out to me. What made this moment stand out is that when I emailed the professor after class to let him know the finding had recently failed to replicate, his response was that he was aware of the failure. This seemed somewhat peculiar to me; if he knew the study had failed to replicate, why didn’t he at least mention that to his students? It seems like rather important information for the students to have and, frankly, a responsibility of the person teaching the material, since ignorance was no excuse in this case.

“It was true when I was an undergrad, and that’s how it will remain in my class”

Stories of failures to replicate have been making the rounds again lately, thanks to a massive effort on the part of hundreds of researchers to try and replicate 100 published effects in three psychology journals. These researchers worked with the original authors, used the original materials, were open about their methods, pre-registered their analyses, and archived all their data. Of these 100 published papers, 97 of them reported their effect as being statistically significant, with the other 3 being right on the borderline of significance and interpreted as being a positive effect. Now there is debate over the value of using these kinds of statistical tests in the first place, but, when the researchers tried to replicate these 100 effects using the statistically significant criterion, only 37 even managed to cross the barrier (given that 89 were expected to replicate if the effects were real, 37 is falling quite short of that goal).

There are other ways to assess these replications, though. One method is to examine the differences in effect size. The 100 original papers reported an average effect size of about 0.4; the attempted replications saw this average drop to about 0.2. A full 82% of the original papers showed a stronger effect size than the attempted replications, While there was a positive correlation (about r = 0.5) between the two – the stronger the original effect, the stronger the replication effect tended to be – this still represents an important decrease in the estimated size of these effects, in addition to their statistical existence. Another method of measuring replication success – unreliable as it might be – is to get the researcher’s subjective opinions about whether the results seemed to replicate. On that front, the researchers felt about 39 of the original 100 findings replicated; quite in line with the above statistical data. Finally, perhaps worth noting, social psychology research tended replicate less often than cognitive research (25% and 50%, respectively), and interaction effects replicated less often than simple effects (22% and 47%, respectively).

The scope of the problem may be a bit larger than that, however. In this case, the 100 papers upon which replication efforts were undertaken were drawn from three of the top journals in psychology. Assuming a positive correlation exists between journal quality (as measured by impact factor) and the quality of research they publish, the failures to replicate here should, in fact, be an underestimate of the actual replication issue across the whole field. If over 60% of papers failing to replicate is putting the problem a bit mildly, there’s likely quite a bit to be concerned about when it comes to psychology research. Noting the problem is only one step in the process towards correction, though; if we want to do something about it, we’re going to need to know why it happens.

So come join in my armchair for some speculation

There are some problems people already suspect as being important culprits. First, there are biases in the publication process itself. One such problem is that journals seem to overwhelmingly prefer to report positive findings; very few people want to read about a bad experiment which didn’t work out well. A related problem, however, is that many journals like to publish surprising, or counter-intuitive findings. Again, this can be attributed to the idea that people don’t want to read about things they already believe are true: most people perceive the sky as blue and research confirming this intuition won’t make many waves. However, I would also reckon that counter-intuitive findings are surprising to people precisely because they are also more likely to be inaccurate descriptions of reality. If that’s the case, than a preference on the part of journal editors for publishing positive, counter-intuitive findings might set them up to publish a lot of statistical flukes.

There’s also the problem I’ve written about before, concerning what are known as “research degrees of freedom“; more colloquially, we might consider this a form of data manipulation. In cases like these, researchers are looking for positive effects, so they test 20 people in each group and peak at the data. If they find an effect, they stop and publish it; if they don’t, they add a few more people and peak again, continuing until they find what they want or run out of resources. They might also split the data up into various groups and permutations until they find a set of data that “works”, so to speak (break it down by male/female, or high/medium/low, etc). While they are not directly faking the data (though some researchers do that as well), they are being rather selective about how they analyze it. Such methods inflate the possibility of finding of effect through statistical brute force, even if the effect doesn’t actually exist.

This problem is not unique to psychology, either. A recent paper by Kaplan & Irvin (2015) examined research from 1970-2012 that was looking at the effectiveness of various drugs and dietary supplements for preventing or treating cardiovascular disease. There were 55 trials that met the author’s inclusion criteria. What’s important to note about these trials is that, prior to the year 2000, none of the papers were pre-registered with respect to what variables they were interested in assessing; after 2000, every such study was pre-registered. Registering this research is important, as it doesn’t allow the researchers to then conduct a selective set of analyses on their data. Sure enough, prior to 2000, 57% of trials reported statistically-significant effects; after 2000, that number dropped to 8%. Indeed, about half the papers published after 2000 did report some statistically significant effects, but only for variables other than the primary outcomes they registered. While this finding is not necessarily a failure to replicate per se, it certainly does make one wonder about the reliability of those non-registered findings.

And some of those trials were studying death as an outcome, so that’s not good…

There is one last problem I would like to mention; one I’ve been beating the drum for for the past several years. Assuming that pre-registering research in psychology would help weed out false positives (it likely would), we would still be faced with the problem that most psychology research would not find anything of value, if the above data are any indication. In the most polite way possible, this would lead me to ask a question along the lines of, “why are so many psychology researchers bad at generating good hypotheses?” A pre-registered bad idea does not suddenly make it a good one, even if it makes data analysis a little less problematic. This leads me to my suggestion for improving research in psychology: the requirement of actual theory for guiding research. In psychology, most theories are not theories, but rather restatements of a finding. However, when psychologists begin to take an evolutionary approach to their work, the quality of research (in my obviously-biased mind) tends to improve dramatically. Even if the theory is wrong, making it explicit allows problems to be more easily discussed, discovered, and corrected (provided, of course, that one understands how to evaluate and test such theories, which many people unfortunately do not). Without guiding/foundational theories, the only thing you’re left with when it comes to generating hypotheses are the existing data and your intuitions which, again, don’t seem to be good guides for conducting quality research.

References: Kaplan, R. & Irvin, V. (2015). Likelihood of null effects of large NHLBI clinical trials has increased over time. PLoS One, doi:10.1371/journal.pone.013238

Why Do We Torture Ourselves With Spicy Foods?

As I write this, my mouth is currently a bit aflame, owing to a side of beans which had been spiced with a hot pepper (serrano, to be precise). Across the world (and across YouTube), people partake in the consumption of spicy – and spiced – foods. On the surface, this behavior seems rather strange owing to the pain and other unpleasant feelings induced by such foods. To get a real quick picture of how unpleasant these food additives can be, you could always try to eat an whole raw onion or spicy pepper, though just imagining the experience is likely enough (just in case it isn’t, YouTube will again be helpful). While this taste for spices might be taken for granted – it just seems normal that some people like different amounts of spicy foods – it warrants a deeper analysis to understand this ostensibly strange taste. Why do people love/hate the experience of eating spicy foods?

   Word of caution: don’t touch your genitals afterwards. Trust me.

Food preferences do not just exist in a vacuum; the cognitive mechanisms which generate such preferences need to have evolved owing to some adaptive benefits inherent in seeking out or avoiding certain potential food sources. Some of these preferences are easier to understand than others: for example, our taste for certain foods we perceive as sweet – sugars – likely owes its existence to the high caloric density that such foods historically provided us (which used to be quite valuable when they were relatively rare. As they exist in much higher concentrations in the first world – largely due to our preferences leading us to cultivate and refine them – these benefits can now dip over into costs associated with overconsumption and obesity). By contrast, our aversion to foods which appear spoiled or rotten helps us avoid potentially harmful pathogens which might reside in them; pathogens which we would rather not purposefully introduce into our bodies. Similar arguments can be made for avoiding foods which contain toxic compounds and taste correspondingly unpleasant. When such toxins are introduced into our bodies, the typical physiological response is nausea and vomiting; behaviors which help remove the offending material as best we can.

So where do spicy foods fall with respect to what costs they avoid or benefits they provide? As many such foods do indeed taste unpleasant, it is unlikely that they are providing us with direct nutritional benefits the way that more pleasant-tasting foods do. That is to say we don’t like spicy foods because they are rich sources of calories or vital nutrients. Indeed, the spiciness that is associated with such foods represents chemical weaponry evolved on the part of the plants. As it turns out, these plants have their own set of adaptive best interests which often include not being eaten at certain times or by certain species. Accordingly, they develop certain chemical weapons that dissuade would be predators from chowing down (this is the reason that the selective breeding of plants for natural insect resistance ends up making them more toxic for humans to eat as well. Just because pesticides aren’t being used, that doesn’t mean you’re avoiding toxic compounds). Provided this analysis is correct, then, the natural question arises of why people would have a taste for plants that possess certain types and amounts of chemical weaponry designed to prevent their being eaten. On a hedonic level, growing crops of jalapenos seems as peculiar as growing a crop of edible razor blades.

The most likely answer to this mystery comes in the form of understanding what these chemical weapons do not to humans, but rather what they do to the other pathogens that tend to accompany our other foods. If these chemical weapons are damaging to our bodies – as evidenced by the painful or unpleasant tastes that accompany them – it stands to reason they are also damaging to some pathogens which might reside in our food as well. Provided our bodies are better able to withstand certain doses of these harmful chemicals, relative to the microbes in our food, then eating spicy foods could represent a trade-off between the killing food-borne pathogens against the risk of poisoning ourselves. Provided the harm done to our bodies by the chemicals is less than the expected damage done by the pathogens, a certain perverse taste for spicy foods could evolve.

As before, you should still be wary of genital contact with such perverse tastes

A healthy degree of empirical evidence is consistent with such an adaptive hypothesis from the world over. One of the most extensive data sets focuses on recipes found in 93 traditional cookbooks from 36 different countries across the world (Sherman & Billing, 1999). The recipes in these cookbooks were examined for which of 43 spices were added to meat dishes. Of the approximately 4,500 different meat dishes present in these books, the average number of spices called for by the recipes was 4, with 93% of recipes calling for at least one. Importantly, the distribution of these spices was anything but random. Recipes coming from warmer climates tended to call for a much greater use of spices. The probable reason this finding emerged relates to the fact that, in warmer climates, food – especially meats – which would have been unrefrigerated for most of human history (alien as that idea sounds currently) will tend to spoil quicker, relative to cooler climates. Accordingly, as the degree and speed of spoilage tended to increase in warmer climates, a greater use of anti-microbial spices can be introduced to dishes to help combat food-borne illness. To use one of their examples, the typical Norwegian recipe called for 1.6 spices per dish and the recipes only mentioned 10 different spices; in Hungary, the average number of spices per dish was 3, and up to 21 different spices were referenced. It is not too far-fetched to go one step further and suggest that people indigenous to such regions might also have evolved slightly different tolerances for spices in their meals.

Even more interestingly, those spices with the strongest anti-microbial effects (such as garlic and onions) also tended to be the ones used more often in warmer climates, relative to cooler ones. Among the spices which had weaker effects, the correlation between temperature and spice use ceased to exist. Nevertheless, the most inhibitory spices were also the ones that people tended to use most regularly across the globe. Further, the authors also discuss the trade-off between balancing the fighting of pathogens against the possible toxicity of such spices when consumed in large quantities. A very interesting point bearing on that matter concerns the dietary preferences of pregnant women. While an adult female’s body might be able to tolerate the toxicity inherent in such compounds fairly well, the developing fetus might be poorly equipped for the task. Accordingly, women in their first trimester tend to show a shift in food preferences towards avoiding a variety of spices, just as they also tend to avoid meat dishes. This shift in taste preferences could well reflect the new variable of the fetus being introduced to the usual cost/benefit analysis of adding spices to foods.

An interesting question related to this analysis was also posed by the Sherman & Billing (1999): do carnivorous animals ingest similar kinds of spices? After all, if these chemical compounds are effective at fighting against food-borne pathogens, carnivores – especially scavengers – might have an interest in using such dietary tricks as well (provided they did not stumble upon a different adaptive solution). While animals do not appear to spice their foods the way humans do, the authors do note that vegetation makes up a small portion of many carnivore’s diets. Having owned cats my whole life, I confess I have always found their behavior of eating the grass outside to be quiet a bit odd: not only does the grass not seem to be a major part of a cat’s diet, but it often seems to make them vomit with some regularity. While they present no data bearing on this point, Sherman & Billing (1999) do float the possibility that a supplement of vegetation to their diet might be a variant of that same kind of spicing behavior: carnivores eat vegetation not necessarily for its nutritional value, but rather because of possible anti-microbial benefits. It’s certainly an idea worth examining further, though I know of no research at present to have tackled the matter. (As a follow up, it seems that ants engage in this kind of behavior as well)

It’s a point I’ll bear in mind next time she’s vomiting outside my window.

I find this kind of analysis fascinating, frankly, and would like to take this moment to mention that these fascinating ideas would be quite unlikely to have stumbled upon without the use of evolutionary theory as a guide. The typical explanation you might get when asking people about why we spice food would typically sound like “because we like the taste the spice adds”; a response as uninformative as it is incorrect, which is to say “mostly” (and if you don’t believe that last part, go ahead an enjoy your mouthfuls of raw onion and garlic). The proximate taste explanation would fail to predict the regional differences in spice use, the aversion to eating large quantities of them (though this is a comparative “large”, as a slice of Jalapeno can be more than some people can handle), and the maternal data concerning aversions to spices during critical fetal developmental windows. Taste preferences – like any psychological preferences – are things which require deeper explanations. There’s a big difference between knowing that people tend to add spices to food and knowing why people tend to do so. I would think that findings like these would help psychology researchers understand the importance of adaptive thinking. At the very least, I hope they serve as food for thought.

References: Sherman, P. & Billing, J. (1999). Darwinian gastronomy: Why we use spices. Bioscience, 49, 453–463.

The Altruism Of The Rich And The Poor

Altruistic behavior is a fascinating topic. On the first hand, it’s something of an evolutionary puzzle as to why an organism would provide benefits to others at an expense to itself. A healthy portion of this giving has already been explained via kin selection (providing resources to those who share an appreciable portion of your genes) and reciprocal altruism (giving to you today increases the odds of you giving to me in the future). As these phenomenon have, in a manner of speaking, been studied to death, they’re a bit less interesting; all the academic glory goes to people who tackle new and exciting ideas. One such new and exciting realm of inquiry (new at least as far as I’m aware of, anyway) concerns the social regulations and sanctions surrounding altruism. A particularly interesting case I came across some time ago concerned people actually condemning Kim Kardashian for giving to charity; specifically, for not giving enough. Another case involved the turning away of a sizable charitable donation from Tucker Max so as to avoid a social association with him.

*Unless I disagree with your personality; in that case, I’ll just starve

Just as it’s curious that people are altruistic towards others at all, then, it is, perhaps, more curious that people would ever turn down altruism or condemn others for giving it. To examine one more example that crossed my screen today, I wanted to consider two related articles. The first of the articles concerns charitable giving in the US. The point I wanted to highlight from that piece is that, as a percentage of their income, the richest section of the population tends to give the largest portion to charity. While one could argue that this is obviously the case because the rich have more available money which they don’t need to survive, that idea would fail to explain the point that charitable giving appears to evidence a U-shaped distribution, in which the richest and poorest sections of the population contribution a greater percentage of their income than those in the middle (though how to categorize the taxes paid by each group is another matter). The second article I wanted to bring up condemned the richer section of the population for giving less than they used to, compared to the poor, who had apparently increased the percentage they used to give. What’s notable about their analysis of the issue is that the former fact – that the rich still tended to donate a higher percentage of their income overall – is not mentioned at all. I imagine that such an omission was intentional.

Taken together, all these pieces of information are consistent with the idea that there’s a relatively opaque strategic element which surrounds altruistic behavior. While it’s one people might unconsciously navigate with relative automaticity, it’s worthwhile to take a step back and consider just how strange this behavior is. After all, if we saw this behavior in any other species, we would be very curious indeed as to what led them to do what they did; perhaps we would even forgoing the usual moralization that accompanies and clouds these issues while we examined them. So, on the subject of rich people and strategic altruism, I wanted to review a unique data set from Smeets, Bauer, & Gneezy (2015) concerning the behavior of millionaires in two standard economic games: the dictator and ultimatum games. In the former, participants are in charge of deciding how €100 will be divided between themselves and another participant; in the latter, the participant will propose how €100 will be split between themselves and a receiver. If the receiver accepts the offer, both players get paid the division; if the receiver rejects it, both players get nothing.

In the dictator game, approximately 200 Dutch millionaires (those with over €1,000,000 in their bank accounts) where told they were either playing the game with another millionaire or with a low-income receiver. According to data from existing literature on these games, the average amount given to the receiver in a dictator game is a little shy of 30%, with only about 5% of dictators allocating all the money to the recipient. In start contrast, when paired with a low-income individual, millionaire dictators tended to give an average of 71% of the money to the other player, with 45% of dictators giving the full €100. When paired with another millionaire recipient, however, the millionaire dictators only gave away approximately 50% of the €100 sum which, while still substantially more generous than the literature average, is less generous than their giving towards the poor.

The rich; maybe not as evil and cold as they’re imagined to be

Turning to the data from the ultimatum games, we often find that people are often more generous in their offers to receivers in such circumstances, owing to the real possibility that a rejected offer can leave the proposer without anything. Indeed, the reported percentage of the offers in ultimatum games from the wider literature is close to 45% of the total sum (as compared with 30% in dictator games). In the ultimatum game, the millionaires were actually less generous towards the low-income recipients than in the dictator game – bucking the overall trend – but were still quite generous overall, giving an average of 64% of the total sum, with 30% of dictators giving away the full €100 to the other person (as compared with 71% and 45% from above). Interestingly, when paired with other millionaires in the ultimatum game, millionaire proposers gave precisely the same amounts they tended to in the dictator games. In that case, the strategic context has no effect on their giving.

In sum, millionaires tended to evidence quite a bit more generosity in giving contexts than previous, lower-income samples had. However, this generosity was largely confined to instances of giving to those in greater need, relative to a more general kind of altruism. In fact, if one was in need and interested in receiving donations from rich targets, it would seem to serve your goal better to not frame the request as some kind of exchange relationship through which the rich person will eventually receive some monetary benefits, as that kind of strategic element appears to result in less giving.

Why should this be the case, though? One possible explanation that comes to mind builds upon the ostensibly obvious explanation for rich people giving more I mentioned initially: the rich already possess a great number of resources they don’t require. In economic terms, the marginal value of additional money for them is lower than it is for the poor. When the giving is economically strategic, then, the benefit to be received is more money, which, as I just suggested, has a relatively low marginal value to the rich recipient. By contrast, when the giving is driven more by altruism, the benefits to be receiver are predominately social in nature: the gratitude of the recipients, possible social status from observers, esteem from peers, and so on. The other side of this giving coin, as I also mentioned at the beginning, is there can also be social costs associated with not giving enough for the rich. As building social alliances and avoiding condemnation might have different marginal values than additional units of money, the rich could perceive greater benefits from giving in certain contexts, relative to exchange relationships.

Threats – implicit or explicit – do tend to be effective motivators for giving

Such an explanation could also, at least in principle, help explain why the poorest section of the population tends to be relatively charitable, compared to the middle: the poorest individuals are facing a greater need for social alliances, owing to the relatively volatile nature of their position in life. As economic resources might not be stable, poorer individuals might be better served by using more of them to build stronger social networks when money is available. Such spending would allow the poor to hedge and defend against the possibility of future bad luck; that friend you helped out today might be able to give you a place to sleep next month if you lose your job and can’t make rent. By contrast, those in the middle of the economic world are not facing the same degree of social need as the lower classes, while, at the same time, not having as much disposal income as the upper classes (and, accordingly, might also be facing less social pressure to be generous with what they do have), leading to them giving less. Considerations of social need guiding altruism also fits nicely with the moral aspect of altruism, which is just one more reason for me to like it.

References: Smeets, P., Bauer, R., & Gneezy, U. (2015). Giving behavior of millionaires. Proceedings of the National Academy of Sciences, DOI: 10.1073/pnas.1507949112

Examining The Performance-Gender Link In Video Games

Like many people around my age or younger, I’m a big fan of video games. I’ve been interested in these kinds of games for as long as I can remember, and they’ve been the most consistent form of entertainment in my life, often winning out over the company of other people and, occasionally, food. As I – or pretty much anyone who has spent time within the gaming community – can attest to, the experience of playing these games with others can frequently lead to, shall we say, less-than-pleasant interactions with those who are upset by losses. Whether being derided for your own poor performance, good performance, good luck, or tactics of choice, negative comments are a frequent occurrence in the competitive online gaming environment. There are some people, however, who believe that simply being a woman in such environments yields a negative reception from a predominately-male community. Indeed, some evidence consistent with this possibility was recently published by Kasumovic & Kuznekoff (2015) but, as you will soon see, the picture of hostile behavior towards women that emerges in much more nuanced than it is often credited as being.

Aggression, video games, and gender relations; what more could you want to read about?

As an aside, it is worth mentioning that some topics – sexism being among them – tend to evade clear thinking because people have some kind of vested social interest in what they have to say about the association value of particular groups. If, for instance, people who play video games are perceived negatively, I would likely suffer socially by extension, since I enjoy video games myself (so there’s my bias). Accordingly, people might report or interpret evidence in ways that aren’t quite accurate so as to paint certain pictures. This issue seems to rear its head in the current paper on more than one occasion. For example, one claim made by Kasumovic & Kuznekoff (2015) is that “…men and women are equally likely to play competitive video games”. The citation for this claim is listed as “Essential facts about the computer and video game industry (2014)“. However, in that document, the word “competitive” does not appear at all, let alone a gender breakdown of competitive game play. Confusingly, the authors subsequently claim that competitive games are frequently dominated by males in terms of who plays them, directly contradicting the former idea. Another claim made by Kasumovic & Kuznekoff (2015) is that women are “more often depicted as damsels in distress”, though the paper they link to to support that claim does not appear to contain any breakdown of women’s actual representation in video games as characters, instead measuring people’s perceptions of women’s representation in them. While such a claim may indeed be true – women may be depicted as in need of rescue more often than they’re depicted in other roles and/or relative to men’s depictions – it’s worth noting that the citation they use does not contain the data they imply it does.

Despite these inaccuracies, Kasumovic & Kuznekoff (2015) take a step in the right direction by considering how the reproductive benefits to competition have shaped male and female psychologies when approaching the women-in-competitive-video-games question. For men, one’s place in a dominance hierarchy was quite relevant for determining their eventual reproductive success, leading to more overt strategies of social hierarchy navigation. These overt strategies include the development of larger, more muscular upper-bodies in men, suited for direct physical contests. By contrast, women’s reproductive fitness was often less affected by their status within the social hierarchy, especially with respect to direct physical competitions. As men and women begin to compete in the same venues where differences in physical strength no longer determine the winner – as is the case in online video games – this could lead to some unpleasant situations for particular men who have the most to lose by having their status threatened by female competition.

In the interests of being more explicit about why female involvement in typically male-style competitions might be a problem for some men, let’s employ some Bayesian reasoning. In terms of physical contests, larger men tend to dominate smaller ones; this is why most fighting sports are separated into different classes based on the weight of the combatants. So what are we to infer when a smaller fighter consistently beats a larger one? Though these aren’t mutually exclusive, we could infer either that the smaller fighter is very skilled or that the larger fighter is particularly unskilled. Indeed, if the larger fighter is losing both to people of his own weight class and of a weight class below him, the latter interpretation becomes more likely. It doesn’t take much of a jump to replace size with sex in this example: because men tend to be stronger than women, our Bayesian priors should lead us to expect that men will win in direct physical competition over women, on average. A man who performs poorly against both men and women in physical competition, is going to suffer a major blow to his social status and reputation as a fighter.

It’ll be embarrassing for him to see that replayed five times from three angles.

While winning in competitive video games does not rely on physical strength, a similar type of logic applies there as well: if men tend to be the ones overwhelming dominating a video game in terms of their performance, then a man who performs poorly has the most to lose from women becoming involved in the game, as he now might compare poorly both to the standard reference group and to the disfavored minority group. By contrast, men who are high performers in these games would not be bothered by women joining in, as they aren’t terribly concerned about losing to them and having their status threatened. This yields some interesting predictions about what kind of men are going to become hostile towards women. By comparison, other social and lay theories (which are often hard to separate) do not tend to yield such predictions, instead suggesting that both high and low performing men might be hostile towards women in order to remove them from a type of male-only space; what one might consider a more general sexist discrimination.

To test these hypotheses, Kasumovic & Kuznekoff (2015) reported on some data collected while they were playing Halo 3, during which time all matches and conversations within the game were recorded. During these games, the authors had approximately a dozen neutral phrases prerecorded with either a male or female voice they would play during appropriate times in the match. These phrases served to cue the other players as to the ostensible gender of the researcher. The matches themselves were 4 vs 4 games in which the objective for each is to kill more members of the enemy team than they kill of yours. All in-game conversations were transcribed, with two coders examined the transcripts for comments directed towards the researcher playing the game, classifying them as positive, negative, or neutral. The performance of the players making these comments were also recorded with respect to whether the game was won or lost, that player’s overall skill level, and the number of their kills and deaths in the match, so as to get a sense for the type of player making them.

The data represented 163 games of Halo, during which 189 players directed comments towards the researcher across 102 of the games. Of those 189 players who made comments, all of them were males. Only the 147 of those commenters that came from a teammate were retained for analysis. In total, then, 82 players directed comments towards the female-voiced player, whereas 65 directed comments towards the male-voiced player.

A few interesting findings emerged with respect to the gender manipulation. While I won’t mention all of them, I wanted to highlight a few. First, when the researcher used the female voice, higher-skill male players tended to direct significantly more positive comments towards them, relative to low-skill players (β = -.31); no such trend was observed for the male-voiced character. Additionally, as the difference between the female-voiced researcher and the commenting player grew larger (specifically, as the person making the comment was of progressively higher ranks than the female-voiced player), the number of positive comments tended to increase. Similarly, high-skill male players tended to direct fewer negative comments towards the female-voiced research as well (β = -.18). Finally, in terms of their kills during the match, poor performing males directed more negative comments towards female voiced characters, relative to high-performing men (β = .35); no such trend was evident for the male-voiced condition.

“I’m bad at this game and it’s your fault people know it!”

Taken together, the results seem to point in a pretty consistent direction: low-performing men tended to be less welcoming of women in their competitive game of choice, perhaps because it highlighted their poor performance to a greater degree. By contrast, high-performing males were relatively less troubled by the ostensible presence of women, dipping over into being quite welcoming of them. After all, a man being good at the game might well be an attractive quality to women who also enjoy the world of Esports, and what better way to kick off a potential relationship than with a shared hobby? As a final point, it is worth noting that the truly sexist types might present a different pattern of data, relative to people who were just making positive or negative comments: only 11 of the players (out of 83 who made negative comments and 189 who made any comments) were classified as making comments considered to be “hostile sexism”, which did not yield a large enough sample for a proper analysis. The good news, then, seems to be such comments are at least relatively rare.

References: Kasumovic, M. & Kuznekoff, J. (2015). Insights into sexism: Male status and performance moderates female-directed hostile and amicable behavior. PLoS One, 10: e0131613. doi:10.1371/journal.pone.0131613