Online Games, Harassment, and Sexism

Gamers are no strangers to the anger that can accompany competition. As a timely for-instance, before I sat down to start writing this post I was playing my usual online game to relax after work. As I began playing my first game of the afternoon, I saw a message pop up from someone who had sent me a friend request a few days back after I had won a match (you need to accept these friend requests before messages can be sent). Despite the lag in between the time that request was sent and when I accepted it, the message I was greeted with called me a cunt and informed me that I have no life before the person removed themselves from my friend list to avoid any kind of response. However accurately they may have described me, that is the most typical reason friend requests get sent in that game: to insult. Many people – myself included – usually don’t accept them from strangers for that reason and, if you do, it is advisable to wait a few days for the sender to cool off a bit and hopefully forget they added you. Even then, that’s no guarantee of a friendly response.

Now my game happens to be more of a single-player experience. In team-based player vs player games, communication between strangers can be vital for winning, meaning there is usually less of a buffer between players and the nasty comments of their teammates. This might not draw much social attention, but these players being insulted are sometimes women, bringing us nicely to some research on sexism.

Gone are the simpler days of yelling at your friends in person

A 2015 paper by Kasumovic & Kuznekoff examined how players in the online, first-person shooter game Halo 3 responded to the presence of a male and female voice in the team voice chat, specifically in terms of both positive and negative comments directed at them. What drew me to this paper is two-fold: first, I’m a gamer myself but, more importantly, the authors also constructed their hypotheses based on evolutionary theory, which is unusual for papers on sexism. The heart of the paper revolves around the following idea: common theories of sexist behavior towards women suggest that men behave aggressively towards them to try and remove them from male-dominated arenas. Women get nasty comments because men want them gone from male spaces. The researchers in this case took a different perspective, predicting instead that male performance within the game would be a key variable in understanding the responses players have.

As men heavily rely on their social status for access to mating opportunities, the authors predicted they should be expected to respond more aggressively to newcomers into a status hierarchy that displace them. Put into practice, this means that a low-performing male should be threatened by the entry of a higher-performing woman into their game as it pushes them down the status hierarchy, resulting in aggression directed at the newcomers. By contrast, males that perform better should be less concerned by women in the game, as it does not undercut their status. Instead of being aggressive, then, higher-performing men might give female players more positive comments in the interests of attracting them as possible mates. Putting that together, we end up with the predictions that women should receive more negative comments than men from men who are performing worse, while women should receive more positive comments from men who are performing better.

To test this idea, the researchers played the game with 7 other random players (two teams of 4 players) while playing either male or female voice lines at various intervals during the game (all of which were pretty neutral-to-positive in terms of content, such as, “I like this map” played at the beginning of a game). The recordings of what the other players (who did not know they were being monitored in this way, making their behavior more natural) said were then transcribed and coded for whether they were saying something positive, negative, or neutral directed at the experimenter playing the game. The coders also checked to see whether the comments contained hostile sexist language to look for something specifically anti-woman, rather than just negativity or anger in general.

Nothing like some wholesome, gender-blind rage

Across 163 games, any other players spoke at all in 102 of them. In those 102 games, 189 players spoke in total, 100% of whom were male. This suggests that Halo 3, unsurprisingly, is a game that women aren’t playing as much as men. Only those players who said something and were on the experimenter’s team (147 of them) were maintained for analysis. About 57% of those comments were in the female-voiced condition, while 44% where in the male condition. In general, then, the presence of a female voice led to more comments from other male players.

In terms of positive comments, the predicted difference appeared: the higher the skill level of the player talking at the experimenter, the more positive comments they made when a woman’s voice was heard; the worse the player, the fewer positive comments they made. This interaction was almost significant when considering the relative difference, rather than the absolute skill rating (i.e. Did the player talking do worse or better than the experimenter). By contrast, the number of positive comments directed at the male-voiced player was unrelated to the skill of the speaker.

Turning to the negative comments, it was found that they were negatively correlated with player skill in general: the higher the skill of the player, the fewer negative comments they made (and the lower the skill, the more negative they got. As the old saying goes, “Mad because bad”). The interaction with gender was less clear, however. In general, the teammates of the female-voiced experimenter made more negative comments than in the male condition. When considering the impact of how many deaths a speaking player had, the players were more negative towards the woman when dying less, but they were also more negative towards the man when dying extremely often (which sees to run counter to the initial predictions). The players were also more negative towards a women when they weren’t getting very many kills (with negativity towards the woman declining as their personal kills increased), but that relationship was not observed when they had heard a male voice (which is in line with the initial predictions).

Finally, only a few players (13%) made sexist statements, so the results couldn’t be analyzed particularly well. Statistically, these comments were unrelated to any performance metrics. Not much more to say about that beyond small sample size.  

Team red is much more supportive of women in gaming

Overall, the response that speaking players had to the gender of their teammate depended, to some extent, on their personal performance. Those men who were doing better at the game were more positive towards the women, while those who were doing worse were more negative towards them, generally speaking.

While there are a number of details and statements within the paper I could nitpick, I suspect that Kasumovic & Kuznekoff (2015) are on the right track with their thinking. I would add some additional points, though. The first of these is rather core to their hypothesis: if men are threatened by status losses brought on by their relative poor performance, it seems that these threats should occur regardless of the sex of the person they’re playing with: whether a man performs poorly relative to a woman or another man, he will still be losing relative status. So why is there less negativity directed at men (sometimes), relative to women? The authors mention one possibility that I wish they had expanded upon more, which is that men might be responding not to the women per se as much as the pitch of the speaker’s voice. As the authors write, voice pitch tends to correlate with dominance, such that deeper voices tend to correlate with increased dominance.

What I wish they had added more explicitly is that aggression should not be deployed indiscriminately. Being aggressive towards people who are liable to beat you in a physical contest isn’t a brilliant strategy. Since men tend to be stronger than women, behaving aggressively towards other men – especially those outperforming you – should be expected to have carried different sets of immediate consequences, historically-speaking (though there aren’t many costs in modern online environments, which is why people behave more aggressively there than in person). It might not be that the men are any less upset about losing when other men are on their team, but that they might not be equally aggressive (in all cases) to them due to potential physical retribution (again, historically).

There are other points I would consider beyond that. The first of these is the nature of insults in general. If you remember the interaction I had with an angry opponent initially, you should remember that the goal of their message was to insult me. They were trying to make me feel bad or in some way drag me down. If you want to make someone feel bad, you would do well to focus on their flaws and things about them which make you look better by comparison. In that respect, insulting someone by calling attention to something you share in common, like your gender, is a very weak insult. On those grounds we might expect more gendered insults against women, given that men are by far the majority in these games. Now because lots of hostile sexist insults weren’t observed in the present work, the point might not be terribly applicable here. It does, however, bring me to my next point: you don’t insult people by bringing attention to things that reflect positively on them.

“Ha! That loser can only afford cars much more expensive than I can!”

As women do not play games like Halo nearly as much as men, that corresponds to lower skill in those games on a population level. Not because women are inherently worse at the game but simply because they don’t practice them as much (and people who play those games more tend to become better at them). If you look at the top competitive performance in competitive online games, you’ll notice the rosters are largely, if not exclusively, male (not unlike all the people who spoke in the current paper). Regardless of the causes of that sex difference in performance, the difference exists all the same.

If you knew nothing else about a person beyond their gender, you would predict that a man would perform better at Halo than a woman (at least if you wanted your predictions to be accurate). As such, if you’ve just under-performed at this game and are feeling pretty angry about it, some players might be looking to direct blame at their teammates who clearly caused the issue (as it would never be their the speaker’s skill in the game, of course. At least not if you’re talking about the people yelling at strangers).

If you wanted to find out who was to blame, you might consult the match scores: factors like kills and deaths. But those aren’t perfect representations of player skill (that nebulous variable which is hard to get at) and they aren’t the only thing you might consult. After all, scores in a singular game are not necessarily indicative of what would happen over a larger number of games. Because of that, the players on these teams still have limited information about the relative skill of their teammates. Given this lack of information, some people may fall back on generally-accurate stereotypes in trying to find a plausible scapegoat for their loss, assigning relatively more blame for the loss to the people who might be expected to be more responsible for it. The result? More blame assigned to women, at least initially, given the population-level knowledge.

“I wouldn’t blame you if I knew you better, so how about we get to know each other over coffee?”

That’s where the final point I would add also comes in. If women perform worse on a population level than men, the low-performing men suffer something of a double status hit when they are outperformed by a woman: not only is there another player who is doing better than them, but one might expect this player to be doing worse, knowing only their gender. As such, being outperformed by such a player makes it more difficult to blame external causes for the outcome. In a sentence, being beaten by someone who isn’t expected to perform well is a more honest signal of poor skill. The result, then, is more anger: either in an attempt to persuade others that they’re better than they actually performed or in an attempt to get the people out of there who are making them look even worse. This would fit within the author’s initial hypothesis as well, and would probably have been worth mentioning.

References: Kasumovic, M. & Kuznekoff, J. (2015). Insights into sexism: Male status and performance moderates female-directed hostile and amicable behavior. PLoS ONE 10(7). doi:10.1371/journal.pone.0131613

No Sexism In SCRABBLE

My last couple of posts have focused primarily on the topic of group differences and on understanding how they might come to exist. Some of the most commonly-advanced explanations for these differences concern discrimination – explicit or implicit – that serves to keep otherwise interested and qualified people out of arenas they would like to compete in. For example, few men might want to be nurses because male nurses aren’t considered for positions even if they’re qualified because of a social stigma against men in that area. If that was the explanation for these group differences, it would represent a wealth of untapped social value achievable by reducing or removing those discriminatory boundaries. On the other hand, if discrimination is not the cause of those differences, a lot of time and energy could be invested into chasing down a boogeyman without yielding much in the way of value for anyone.

Unfortunately, as we saw last time (and other times), research seeking to test these explanations can be designed or interpreted in ways that make them resilient to falsification. If the hypothesized effect attributable to discrimination is observed, it is counted as evidence consistent with the explanation; when the effect isn’t observed, however, it is not counted as evidence against the proposal. They are sure the discrimination is there; they just didn’t dig deep enough to find it. This practice can be maintained effectively in many domains because of the fuzzy nature of performance within them. That is, it’s not always clear which person would make a better manager or professor when it comes time to make a hiring decision or assess performance, so different rates of hiring or promotions cannot be clearly related to different behavior.

And if the quality of your work can’t be assessed, it also means you can never be said to fail at your job

One way of working that fuzziness out of the equation is to turn towards domains where more objective measures of performance can be obtained. While it might be difficult to say for certain that one person would make a superior manager to another – especially when they are closely matched in skills – it is quite a lot easier to see if they can complete a task with objective performance criteria, such as winning in a video game or performing pull-ups. In realms of objective performance, it doesn’t matter if people like you or not; your abilities are being tested against reality. Accordingly, domains with more objective performance criteria make for appealing research tools when it comes to assessing and understanding group differences.

On that note, Moxley, Ericsson, & Tuffiash (2017) report some interesting information concerning the board game SCRABBLE. For the handful of you who might not know what SCRABBLE is, it’s a game where each player randomly selects a number of tiles with letters on them, then uses those tiles to spell words: the larger the word or the harder the letters are to utilize, the more points the player receives. The player with the most points after the tiles have been used up wins. As it turns out, men tend to be over-represented in the upper tiers of SCRABBLE performance. Within the highest-performing competitive SCRABBLE divisions, 86% of the players are male, while only 31% of the players in the lowest-performing divisions are. This patterns holds even though most of the competitive SCRABBLE players are women. Indeed, when regular people are asked about whether they would expect more male or female SCRABBLE champions, the intuition seems to be that women should be more common (despite, for context, all 10 of the last world champions having been male).

How is that sex difference in performance to be explained? In this instance, discrimination looks to be an odd explanation: competitive SCRABBLE tournaments do not present clear barriers to entry and women appear to be at least as interested – if not more so – in SCRABBLE than men are, as inferred from participation rates. Moreover, people even seem to expect women would do better than men in that field, so an explanation along the lines of stereotype threat doesn’t work well either. According to the research of Moxley, Ericsson, & Tuffiash (2017), the explanation for most of that sex difference in performance does, in fact, relate to varying male and female interests, but perhaps not those directed at playing SCRABBLE itself. While I won’t discuss every part of the studies they undertook, I wanted to highlight some general points of this research because of how well it can highlight the difficulty and nuance in understanding sex differences and their relation to performance within a given field.

Even this vicious field of battle

The general methodology employed by the researchers involved surveying participants at National SCRABBLE competitions in 2004 and 2008 about their overall level of practice each year, both in terms of time spent studying alone and practicing seriously with others. These responses were then examined in the context of the player’s competitive SCRABBLE rating. The first study turned up several noteworthy relationships. As expected, women tended to have lower ratings than men (d = -0.74). However, it was also found that different types of SCRABBLE practice had varying impacts on player ratings. In this case, studying vocabulary had a negative impact on performance, while time spent analyzing past games and doing anagrams had a positive impact. This means that just asking people about how much they practiced SCRABBLE is not enough of a fine-tuned question for good predictive accuracy concerning performance. In this case, the practice questions asked about were unable to account for the entirety of the gender difference in performance, but they did reduce it somewhat.

This led the researchers to ask more detailed questions about SCRABBLE players’ practice in their second study. As before, women tended to have lower ratings than men (d = -0.69), but once the more refined questions about practice and experience were accounted for, there was no longer a direct effect of gender on rating. This would suggest that the performance advantage men had in SCRABBLE can be largely attributed to their spending more time engaged in solitary practice that benefits performance, while women tended to spend more time playing SCRABBLE with others; a behavior which did not yield comparable performance benefits.

The final step in this analysis was to figure out why men and women spent different amounts of time engaged in the types of practice they did. To do so, the players’ responses about how relevant, enjoyable, and effortful various types of practice felt were assessed. In order, the players felt tournament experience was the most important for improving their skills, then playing SCRABBLE itself, followed last by other types of word games. On that front, perceptions weren’t quite accurate. A similar pattern emerged in terms of which activities were rated as most enjoyable. However, there was a sex difference in that women rated playing SCRABBLE outside of tournaments as more enjoyable than men, and men rated SCRABBLE-specific practice (like anagrams) as more enjoyable than women.

Taken together, men tended to find the most-effective practice methods more enjoyable than women, and so engaged in them more. This differential involvement in effective practice in turn explained the sex difference in player rankings. Nothing too shocking, but reality often isn’t.

Published in the journal of, “I’m sorry; did you say something?”

What we see in this research is an appreciable sex difference in performance resulting from varying male and female interests, but those interests themselves are not necessarily the most obvious targets for investigation. If you were to just ask men and women whether they were interested in SCRABBLE, you might find that women had a higher average interest. If you were to just ask about how much time they spent practicing, you might not observe a sex difference capable of explaining the differences in performance. It wouldn’t be until you asked specifically about their interests in particular types of practice and understood how those related to eventual performance that you end up with a better picture of that performance gap. In this case, it seems to be the case that the sex difference is largely the product of men being more interested in specific types of practice that are ultimately more productive when it comes to improving performance. The corollary point to that is that if you were trying to reduce the male-female performance gap in SCRABBLE, if your explanation for that gap was that women are being discriminated against and so sought to reduce discrimination in the field, you’d probably do nothing to help even out the scores (though you might achieve some social maligning). 

Thankfully this kind of analysis can be reasonably undertaken in a realm where performance can be objectively assessed. If you were to think about trying this same analysis with respect to, say, the relative distribution of men and women in STEM fields, you’re in for a much rockier experience where it’s not clear how certain interests relate to ultimate performance.

References: Moxley, J., Ericsson, A., & Tuffiash, M. (2017). Gender differences in SCRABBLE performance and associated engagement in purposeful practice activities. Psychological Research, DOI 10.1007/s00426-017-0905-3

Imagine If The Results Went The Other Way

One day, three young children are talking about what they want to be when they get older. The first friend says, “I love animals, so I want to become a veterinarian.” The second says, “I love computers, so I want to become a programmer.” The third says, “I love making people laugh, so I want to become a psychology researcher.” Luckily for all these children, they all end up living a life that affords them the opportunity to pursue their desires, and each ends up working happily in the career of their choice for their entire adult life.

The first question I’d like to consider is whether any of those children made choices that were problematic. For instance, should the first child have decided to help animals, or perhaps should they have put their own interests aside and pursued another line of work because of their sex and the current sex-ratio of men and women in that field? Would your answer change if you found out the sex of each of the children in question? Answer as if the second child was a boy, then think about whether your answer would change if you found out she was a girl.

Well if you wanted to be a vet, you should have been born a boy

This hypothetical example should, hopefully, highlight a fact that some people seem to lose track of from time to time: broad demographic groups are not entities themselves; only made up of their individual members. Once one starts talking about how gender inequality in professions ought to be reduced – such that you see a greater representation of 50/50 men and women across a greater number of fields – you are, by default, talking about how some people need to start making choices less in line with their interests, skills, and desires to reach that parity. This can end up yielding strange outcomes, such as a gender studies major telling a literature major she should have gone into math instead. 

Speaking of which, a paper I wanted to examine today (Riegle-Crumb, King, & Moore, 2016) begins laying on the idea of gender inequality across majors rather thick. Unless I misread their meaning, they seem to think that gender segregation in college majors ought to be disrupted and, accordingly, sought to understand what happens to men and women who make non-normative choices in selecting a college major, relative to their more normative peers. Specifically, they set out to examine what happens to men who major both in male- and female-dominated fields: are they likely to persist in their chosen field of study in the same or different percentages? The same question was asked of women as well. Putting that into a quick example, you might consider how likely a man who initially majors in nursing is to switch or stay in his program, relative to one who majors in computer science. Similarly, you might think about the fate of a woman who majors in physics, compared to one who majors in psychology.

The authors expected that women would be more likely to drop out of male-dominated fields because they encounter a “chilly” social climate there and face stereotype threat, compared to their peers in female-dominated fields. By contrast, men were expected to drop out of female-dominated fields more often as they begin to confront the prospect of earning less money in the future and/or lose social status on account of emasculation brought on by their major (whether perceived or real).

To test these predictions, Riegle-Crumb, King, & Moore (2016) examined a nationally-representative sample of approximately 3,700 college students who had completed their degree. These students had been studied longitudinally, interviewed at the end of their first year of college in 2004, then again in 2006 and 2009. A gender atypical major was coded as one in which the opposite sex compromised 70% or more of the major. In the sample being examined, 14% of the males selected a gender-atypical field, while 4% of women did likewise. While this isn’t noted explicitly, I suspect some of that difference might have to do with the relative size of certain majors. For instance, psychology is one of the most popular majors in the US, but also happened to fall under the female-dominated category. That would naturally yield more men than women choosing a gender atypical major if the pattern continued into other fields.

Can’t beat that kind of ratio in the dating pool, though

Moving on to what was found, the researchers were trying to predict whether people would switch majors or not. The initial analysis found that men in male-typical majors switched about 39% of the time, compared to the 63% of men who switched from atypical majors. So the men in atypical fields were more likely to switch. There was a different story for the women, however: those in female-typical majors switched 46% of the time, compared to 41% who switched in atypical fields. The latter difference was neither statistically or practically significant. Unsurprisingly, for both men and women, those most likely to switch had lower GPAs than those who stayed, suggesting switching was due, in part, to performance.

When formally examined with a number of control variables (for social background and academic performance) included in the model, men in gender atypical fields were about 2.6 times as likely to switch majors, relative to those in male-dominated ones. The same analysis run for women found that those in atypical majors were about 0.8 times as likely to switch majors as those in female-dominated ones. Again, this difference wasn’t statistically significant. Nominally, however, women in atypical fields were more likely to stay put.

What do the authors make of this finding? Though they note correctly that their analysis says nothing of the reasons for the switch, they view the greater male-atypical pattern of switching as consistent with their expectations. I think this is probably close to the truth: as a greater proportion of a man’s future success is determined by his ability to provision mates and his social status, we might expect that men tend to migrate from majors with a lower future financial payoff to those that have a larger one. Placing that into a personal example, I might have wanted to be a musician, but the odds of landing a job as a respected rockstar seemed slim indeed. Better that I got a degree in something capable of paying the bills consistently if I care about money.

By contrast, the authors also correctly note that they don’t find evidence consistent with their prediction that women in gender-atypical fields would switch more often. This does not, however, cause them to abandon the justifications for their prediction. As far as I can tell, they still believe that factors like a chilly climate and stereotype threat are pushing women out of those majors; they just supplement that expectation by adding on that a number of factors (like the aforementioned financial ones) might be keeping them in, and the latter factors are either more common or influential (though that certainly makes you wonder why women tend to choose lower-paying fields in greater numbers the first place).

Certainly worth a 20-year career in a field you hate

This strikes me as kind of a fool-proof strategy for maintaining a belief in the prospect of nefarious social forces doing women harm. To demonstrate why, I’d like to take this moment to think about what people’s reactions to these findings might have been if the patterns for men and women were reversed. If it turned out that women in male-dominated majors were more likely to switch than their peers in female-dominated majors, would there have been calls to address the clear sexism causally responsible for that pattern? I suspect that answer is yes, judging from reactions I’ve seen in the past. So, if that result was found, the authors could point a finger at the assumed culprits. However, even when that result was not found, they can just tack on other assumptions (women remain in this major for the money) that allows the initial hypothesis of discrimination to be maintained in full force. Indeed, they end their paper by claiming, “Gender segregation in fields of study and related occupations severely constrains the life choices and chances of both women and men,” demonstrating a full commitment to being unphased by their results.

In other words, there doesn’t seem to be a pattern of data that could have been observed capable of falsifying the initial reasons these expectations were formed. Even nominally contradictory data appears to have been assimilated into their view immediately. Now I’m not going to say it’s impossible that there are large, sexist forces at work trying to push women out of gender atypical fields that are being outweighed by other forces pulling in the opposite direction; that is something that could, in theory, be happening. What I will say is that granting that possibility makes the current work a poor test of the original hypotheses, since no data could prove it wrong. If you aren’t conducting research capable of falsifying your ideas – asking yourself, “what data could prove me wrong?” – then you aren’t engaged in rigorous science. 

References: Riegle-Crumb, C., King, B., & Moore, C. (2016). Do they stay or do they go? The switching decisions of individuals who enter gender atypical college majors. Sex Roles, 74, 436-449.

Not-So-Leaky Pipelines

There’s an interesting perspective many people take when trying to understand the distribution of jobs in the world, specifically with respect to men and women: they look at the percentage of men and women in a population (usually in terms of country-wide percentages, but sometimes more localized), make note of any deviations from those percentages in terms of representation in a job, and then use those deviations to suggest that certain desirable fields (but not usually undesirable ones) are biased against women. So, for instance, if women make up 50% of the population but only represent 30% of lawyers, there are some who would conclude this means the profession (and associated organizations) is likely biased against women, usually because of some implicit sexism (as evidence of explicit and systematic sexism in training or hiring practices is exceptionally hard to come by). Similar methods have been used when substituting race for gender as well.

Just another gap, no doubt caused by sexism

Most of the ostensible demonstrations of this sexism issue are wanting, and I’ve covered a number of these examples before (see here, here, here, and here). Simply put, there are a lot of factors in the world that determine where people ultimately end up working (or whether they’re working at all). Finding a consistent gap between groups tells you something is different, just not what. As such, you don’t just get to assume that the cause of the difference is sexism and call it a day. My go-to example in that regard has long been plumbing. As a profession, it is almost entirely male dominated: something like 99% of the plumbers in the US are men. That’s as large of a gender gap as you could ask for, yet I have never once seen a campaign to get more women into plumbing or complaints about sexism in the profession keeping otherwise-interested women out. Similarly, men make up about 96% of the people shot by police, but the focus on police violence has never been on getting officers to shoot fewer men per se. In those cases, most people seem to recognize that factors other than sex are the primary determinants of the observed sex differences. Correlation isn’t causation, and maybe women aren’t as interested in digging around through human waste or committing violent felonies as men are. Not to say that many men are interested, just that more of those who are end up being men.

If that was the case and these sex differences aren’t caused by sexism, any efforts that sought to “fix” the gap by focusing on sexism would ultimately be unsuccessful. At the risk of saying something too obvious, you change outcomes by changing their causes; not unrelated issues. If we have the wrong idea as to what is causing an outcome, we end up wasting time and money (which often does not belong to us) trying to change it and accomplishing very little in the process (outside of getting people annoyed at us for wasting their time and money).

Today I wanted to add to that pile of questionable claims of sexism concerning an academic neighbor to psychology: philosophy. Though I was unaware of this debate, there is apparently some contention within the field concerning the perceived under-representation of women. As is typical, the apparent under-representation of women in this field has been chalked up to sexist biases keeping women discouraged and out of a job. To be clear about things, some people are looking at the percentage of men and women in the field of philosophy, noting that it differs from their expectations (whatever those are and however they were derived), calling it under-representation because of those expectations, and then further assuming a culprit in the form of sexism. As it turns out, the data has something to say about that.

It also has some great jokes about Polish people if you’re a racist.

The data in question come from a paper by Allen-Hermanson (2017), which examined sex differences in tenure-track hiring and academic publishing in philosophy departments. The reasoning behind this line of research was that if insidious forces are at work against women in philosophy departments, we ought to expect something of a leaky pipeline: women should not be as successful as men at landing desirable, tenure-track jobs, relative to the rates at which each sex earn philosophy degrees. So, if women earned, say, 40% of the philosophy PhDs during the last year, we might expect that they get 40% of the tenure-track jobs in the next, all else being equal. Across the 10 year period examined (2005-2014), there were three years in which women were hired very slightly below their relative percentage into the tenure-track jobs (and by “very slightly” I’m talking in range of about 1-2%), one year in which it was dead even, and during the remaining six years women were hired at above the rate which would be expected by much more substantial margins (in the range of 5-10%).

Putting some rough numbers to that, women earned about 28% of the PhDs and received about 36% of the jobs in the most recent hiring seasons. It seems, then, women tended to be over-represented in those positions, on average. Other data discussed in the paper corresponds to those findings, again suggesting that women had about a 25% advantage over men in finding desirable positions (in terms of less desirable positions, men and women were hired in about equal numbers).

This finding is made all the stranger by Allen-Hermanson (2017) noting that male and female degree holders differed with respect to how often they published. On average, the new tenure-track female candidates who had never held such a position before had 0.77 publications. The comparable male number was 1.37. Of those who secured a job in 2012-2013, men averaged 2.4 publications to women’s 1.17. Not only are the men publishing about twice as much, then, but they’re also modestly less successful at landing a job (and this effect did not appear to be driven by particularly prolific publishers). While one could possibly make the case that maybe female publications are in some sense higher qualitythat remains to be seen. One could more easily make the case that female candidates were held to lower standards than male ones.

As the data currently stand, I can’t imagine many people will be making a fuss about them and crying sexism. Perhaps the men with the degrees went out to seek work elsewhere and that explains why women are over-represented. Perhaps there are other causes. The world is a complicated place, after all. The point here is that there won’t be talk about how philosophy departments are biased against men, just like there wasn’t much talk I saw last time research found a much larger academic bias in favor of women, holding candidate quality constant. I think that is largely because the data apparently favor women with respect to hiring. If the results had run in the opposite direction, I can imagine that a lot more noise would have been made about them and many people would be getting scolded right now about their tolerance of sexism. But that’s just an intuition.

“Now, if you’ll excuse me, I’m off to find bias against my group somewhere else”

When asking a question of under-representation, the most pressing matter should always be, “under-represented with respect to what expectation?” In order to say that a group is under-represented, you need to make it clear what the expected degree of representation is as well as why. We shouldn’t expect that men and women be killed by police in equal numbers unless we also expect that both groups behave more-or-less identically. We similarly shouldn’t expect that men and women enter into certain fields in the same proportion unless they have identical sets of interests. On the other hand, if the two groups are different with respect to some key factor that determines an outcome, such as interests, using sex itself is just a poor variable choice. Compared to interest in fixing toilets (and other such relevant factors), I imagine sex itself uniquely predicts very little about who ultimately ends up becoming a plumber. If we can use those better, more directly-relevant factors, we should. You don’t build your predictive model with irrelevant factors; not if accuracy is your goal, in any case.

References: Allen-Hermanson S. (2017). Leaking pipeline myths: In search of gender effects on the job market and early career publishing in philosophy. Frontiers in Psychology, 8, doi: 10.3389/fpsyg.2017.00953

Spinning Sexism Research On Accuracy

When it comes to research on sexism, there appear to be many parties interested in the notion that sexism ought to be reduced. This is a laudable goal, and one that I would support; I am very much in favor in treating people as individuals rather than representatives of their race, sex, or any other demographic characteristics. It is unfortunately, however, that this goal often gets side-tracked by an entirely different one: trying to get people to reduce the extent to which people view men and women as different. What I mean by this is that I have seen many attempts to combat sexism by trying to reduce the perception that men and women differ in terms of their psychology, personality, intelligence, and so on; it’s much more seldom that those same voices appear to convince people who inaccurately perceive sex differences as unusually small to adjust their estimate upwards. In other words, rather that championing accuracy is perceptions, there appears to be a more targeted effort for minimizing particular differences; while those are sometimes the same thing (sometimes people are wrong because they overestimate), they are often not (sometimes people are wrong because they underestimate), and when those goals do overlap, the minimization side tends to win out.

Just toss your perceptions in with the rest of the laundry; they’ll shrink

In my last post, I discussed some research by Zell et al (2016) primarily in the service of examining measures of sexism and the interpretation of the data they produce (which I recommend reading first). Today I wanted to give that paper a more in-depth look to illustrate this (perhaps unconscious) goal of trying to get people to view the sexes as more similar than they actually are. Zell et al (2016) begin their introduction by suggesting that most psychological differences between men and women are small, and the cases in which medium to large differences exist – like mating preferences and aggression – tend to be rare. David Schmitt has already put remarks like that into some context, and I highly recommend you read his post on the subject. In the event you can’t be bothered to do so at the moment, one of the most important takeaway points from his post is that even if the differences in any one domain tend to be small on average, when considered across all those domains simultaneously, those small differences can aggregate into much larger ones.

Moreover, the significance of a gender difference is not necessarily determined by its absolute size, either. This was a point Steven Pinker mentioned in a somewhat-recent debate with Elizabeth Spelke (and was touched on again in a recent talk by Jon Haidt at SUNY New Paltz). To summarize this point briefly, if you’re looking at a trait in two normally-distributed populations that are, on average, quite similar, the further from that average value you get, the most extreme the difference between populations become. Pinker makes the point clear in this example:

“…it’s obvious that distributions of height for men and women overlap: it’s not the case that all men are taller than all women. But while at five foot ten there are thirty men for every woman, at six feet there are two thousand men for every woman. Now, sex differences in cognition tend not to be so extreme, but the statistical phenomenon is the same.”

Not only are small sex differences sometimes important, then, (such as when you’re trying to hire people for a job who are in the top 1% of distribution for a trait like intelligence, speed, conscientiousness; you name it) but a large number of small effects (as well as some medium and large ones) can all add up to collectively represent some rather large differences (and that assumes you’re accounting for all relevant sex differences; not just a non-representative sample of them). With all this considered, the declaration at the beginning of Zell et al’s paper that most sex differences tend to be small strikes me less as a statement of empirical concern, but rather one that serves to set up the premise for the rest of their project: specifically, the researchers wanted to test whether people’s scores on the ambivalent sexism inventory predicted (a) the extent to which they perceive sex differences as being large and (b) the extent to which they are inaccurate in their perceptions. The prediction in this case was that people who scored high on their ostensible measures of sexism would be more likely to exaggerate sex differences and more likely to be wrong about their size overall (as an aside, I don’t think those sexism questions measure what the authors hope they do; see my last post).

Pictured: Something not even close to what was being assessed in this study

In their first study, Zell et al (2016) asked about 320 participants to estimate how large they think sex differences are between men and women (from 1-99) were for 48 traits and to answer 6 questions intended to measure their hostile and benevolent sexism (as another aside, I have no idea why those 48 traits in particular were selected). These answers were then averaged for each participant to create an overall score for how large they viewed the sex differences to be, and how high they scored on hostile and benevolent sexism. When the relevant factors were plugged into their regression, the results showed that those higher in hostile (ß = .19) and benevolent (ß = .29) sexism tended to perceive sex differences as larger, on average. When examined by gender, it was found that women (ß = .41) who were higher in benevolent sexism were more likely to perceive sex differences as large (but this was not true for men: ß = .11) and – though it was not significant – the reverse pattern held for hostile sexism, such that women high in hostile sexism were nominally less likely to perceive sex differences as large (ß = -.32).

The more interesting finding, at least as far as I’m concerned, is that in spite of those scoring higher on their sexism scores perceiving sex differences to be larger, they were not really more likely to be wrong about them. Specifically, those who scored higher on benevolent sexism were slightly less accurate (ß = -.20), just as women tended to be less accurate than men (ß = -.19); however, hostile sexism scores were unrelated to accuracy altogether (ß = .003), and no interactions with gender and sexism emerged. To put that in terms of the simple correlations, hostile and benevolent sexism correlated much better with the perceived size of sex differences (rs = .26 and .43, respectively) than they did with accuracy (rs = -.12 and -.22, with the former not being significant and the latter being rather small). Now since we’re dealing with two genders, two sexism scales, and relatively small effects, it is possible that some of these findings are a bit more likely to be statistical flukes; that does tend to happen as you keep slicing data up. Nevertheless, these results are discussed repeated within the context of their paper as representing exaggerations: those scoring higher on these sexism measures are said to exaggerate sex differences, which is odd on account of them not consistently getting them all that wrong.

This interpretation extends to their second study as well. In that experiment, about 230 participants were presented with two mock abstracts and told that only one of them represented an accurate summary of psychological research on sex differences. The accurate version, of course, was the one that said sex differences were small on average and therefore concluded that men and women are very similar to each other, whereas the bogus abstract concluded that gender differences are often large and therefore men and women are very different from one another. As I reviewed in the beginning of the post, small differences can often have meaningful impacts both individually and collectively, so the lines about how men and women are very similar to each other might not reflect an entirely accurate reading of the literature even if the part about small average sex differences did. This setup is already conflating the two statements (“average effect sizes on all these traits is small” and “men and women are very similar across the board”).

“Most of the components aren’t that different from modern cars, so they’re basically the same”

As before, those higher in hostile and benevolent sexism tended to say that the larger sex difference abstract more closely reflected their personal views (women tended to select the large-difference abstract 50.4% of the time compared to men’s 44.2% as well). Now because the authors view the large sex difference abstract as being the fabricated one, they conclude that those higher in those sexism measures are less accurate and more likely to exaggerate these views (they also make a remark that their sexism measures indicate which people “endorse sexist ideologies”; a determination it’s not at all cut out for making). In other words, the authors interpret this finding as those selecting the large-differences abstract to hold “empirically unsupported” views (which in a sort-of ironic sense means that, as the late George Carlin put it, “Men are better at it” when it comes to recognizing sex differences).

This is an interesting methodological trick they employ: since they failed to find much in the way of a correlation between sexism scores and accuracy in their first study (it existed sometimes, but was quite small across the board and certainly much smaller than the perception of size correlation), they created a coarser and altogether worse measure of accuracy in the second study and use that to support their views that believing men and women tend to be rather different is wrong instead. As the old saying goes, if at first you don’t succeed, change your measures until you do.

References: Zell, E., Strickhouser, J., Lane, T., & Teeter, S. (2016). Mars, Venus, or Earth? Sexism and the exaggeration of psychological gender differences. Sex Roles, 75, 287-300.

Research Tip: Ask About What You Want To Measure

Recently I served as a reviewer for a research article that had been submitted to a journal for publication. Without going into too much detail as to why, the authors of this paper wanted to control for people’s attitudes towards casual sex when conducting their analysis. They thought that it was possible people who were more sexually-permissive when it comes to infidelity might respond to certain scenarios differently than those who were less sexually-permissive. If you were the sensible type of researcher, you might do something like ask your participants to indicate on some scale as to how acceptable or unacceptable they think sexually infidelity is, then. The authors of this particular paper opted for a different, altogether stranger route: they noted that people’s attitudes towards infidelity correlate (imperfectly) with their political ideology (i.e., whether they consider themselves to be liberals or conservatives). So, rather than ask participants directly about how acceptable infidelity is (what they actually wanted to know), they asked participants about their political ideology and used that as a control instead.

 ”People who exercise get tired, so we measured how much people napped to assess physical fitness”

This example is by no means unique; psychology researchers frequently try to ask questions about topic X in the hopes of understanding something about topic Y. This can be acceptable at times, specifically when topic Y is unusually difficult – but not impossible – to study directly. After all, if topic Y is impossible to directly study, then one obviously cannot say that studying topic X tells you something about Y with much confidence, as you would have no way of assessing the relationship between X and Y to begin with. Assuming that the relationship between X and Y has been established and it is sufficiently strong and Y is unusually difficult to study directly, then there’s a good, practical case to be made for using X instead. When that is done, however, it should always be remembered that you aren’t actually studying what you’d like to study, so it’s important to not get carried away with the interpretation of your results.

This brings us nicely to the topic of research on sexism. When people hear the word “sexism” a couple things come to mind: someone who believes one sex is (or should be) – socially, morally, legally, psychologically, etc – inferior to the other, or worth less; someone who wouldn’t want to hire a member of one sex for a job (or intentionally pays them less if they did) strictly because of that variable regardless of their qualifications; someone who inherently dislikes members of one sex. While this list is by no means exhaustive, I suspect things like these are probably the prototypical examples of sexism; some kind of explicit, negative attitude about people because of their sex per se that directly translates into behavior. Despite this, people who research sexism don’t usually ask about such matters directly, as far as I’ve seen. To be clear, they easily could ask such questions assessing such attitudes in straightforward manners (in fact, they used to do just that with measures like the “Attitudes Towards Women Scale” in the 1970s), but they do not. As I understand it, the justification for not asking about such matters directly is because it has become more difficult to find people who actually express such views (Loo & Thorpe, 1998). As attitudes had already become markedly less sexist from 1972 to 1998, one can only guess at how much more change occurred from then to now. In short, it’s becoming rare to find blatant sexists anymore, especially if you’re asking college students.

Many researchers interpret that difficulty as being the result of people still holding sexist attitudes but either (a) are not willing express them publicly for fear of condemnation, or (b) are not consciously aware that they hold such views. As such, researchers like to ask about questions about “Modern Sexism” or “Ambivalent Sexism“; they maintain the word “sexism” in their scales, but they begin to ask about things which are not what people first think of when they hear the term. They no longer ask about explicitly sexist attitudes. Therein lies something of a problem, though: if what you really want to know is whether people hold particular sexist beliefs or attitudes, you need some way of assessing those attitudes directly in order to determine that other questions which don’t directly ask about that sexism will accurately reflect it. However, if such a method of assessing those beliefs accurately, directly, and easily does exist, then it seems altogether preferable to use that method instead. In short, just ask about the things you want to ask about. 

“We wanted to measure sugar content, so we assessed how much fruit the recipe called for”

If you continue on with using an alternate measure – like using the Ambivalent Sexism Inventory (ASI), rather than the Attitudes towards Women Scale – then you really should restrict your interpretations to things you’re actually asking about. As a quick example, let’s consider the ASI, which is made up of a hostile and benevolent sexism component. Zell et al (2016) summarize the scale as follows:

“Hostile sexism is an adversarial view of gender relations in which women are perceived as seeking control over men. Benevolent sexism is a subjectively positive view of gender relations in which women are perceived as pure creatures who ought to be protected, supported, and adored; as necessary companions to make a man complete; but as weak and therefore best relegated to traditional gender roles (e.g., homemaker).”

In other words, the benevolent scale measures the extent to which women are viewed as children: incapable of making their own decisions and, as such, in need of protection and provisioning by men. The hostile scale measures the extent to which men don’t trust women and view them as enemies. Glick & Fiske (1996) claim that  ”...hostile and benevolent sexism…combine notions of the exploited group’s lack of competence to exercise structural power with self-serving “benevolent” justifications.” However, not a single measure on either the hostile or benevolent sexism inventory actually asks about female competencies or whether women ought to be restricted socially. 

To make this explicit, let’s consider the questions Zell et al (2016) used to assess both components. In terms of hostile sexism, participants were asked to indicate their agreement with the following three statements:

  • Women seek power by gaining control over men
  • Women seek special favors under the guise of equality
  • Women exaggerate their problems at work

There are a few points to make about these questions: first, they are all clearly true to some extent. I say that because these are behaviors that all kinds of people engage in. If these behaviors are not specific to one sex – if both men and women exaggerate their problems at work – then agreement with the idea that women do does not stop me from believing men do this as well and, accordingly, does not necessarily track any kind of sexist belief (the alternative, I suppose, is to believe that women never exaggerate problems, which seems unlikely). If the questions are meant to be interpreted as a relative statement (e.g., “women exaggerate their problems at work more than men do”), then that statement needs to first be assessed empirically as true or false before you can say that endorsement of it represents sexism. If women actually do tend to exaggerate problems at work more (a matter that is quite difficult to objectively determine because of what the term exaggerate means), then agreement with the statement just means you accurately perceive reality; not that you’re a sexist.

More to the point, however, none of the measures ask about what the researchers interpret them to mean: women seeking special favors does not imply they are incompetent or unfit to hold positions outside of the home, nor does it imply that one views gender relations primarily as adversarial. If those views are really what a researcher is trying to get at, then they ought to just ask about them directly. A similar story emerges for the benevolent questions:

  • Women have a quality of purity few men possess
  • Men should sacrifice to provide for women
  • Despite accomplishment, men are incomplete without women

 Again, I see no mention of women’s competency, ability, intelligence, or someone’s endorsement of strict gender roles. Saying that men ought to behave altruistically towards women in no way implies that women can’t manage without men’s help. When a man offers to pay for an anniversary dinner (a behavior which I have seen labeled sexist before), he is usually not doing so because he feels his partner is incapable of paying anymore than my helping a friend move suggests I view them as a helpless child. 

“Our saving you from this fire implies you’re unfit to hold public office”

The argument can, of course, be made that scores on the ASI are related to the things these researchers actually want to measure. Indeed, Glick & Fiske (1996) made that very argument: they report that the hostile sexism scores (controlling for the benevolent scores) did correlate with “Old Fashion Sexism” and “Attitudes towards Women” scores (rs = .43 and .60, respectively, bearing in mind that was almost 20 years ago and these attitudes are changing). However, the correlations between benevolent sexism scores and these sexist attitudes were effectively zero (rs = -.03 and .04, respectively). In other words, it appears that people endorse these statements for reasons that have nothing at all to do with whether they view women as weak, or stupid, or any other pejorative you might throw out there, and their responses may tell you nothing at all about their opinion concerning gender roles. If you want to know about those matters, then ask about them. In general, it’s fine to speculate about what your results might mean – how they can best be interpreted – but an altogether easier path is to simply ask about such matters directly and reduce the need for pointless speculation.

 References: Glick, P. & Fiske, S. (1996). The ambivalent sexism inventory: Differentiating hostile and benevolent sexism. Journal of Personality & Social Psychology, 70, 491-512.

Loo, R. & Thorpe, K. (1998). Attitudes towards women’s roles in society: A replication after 20 years. Sex Roles, 39, 903-912.

Zell, E., Strickhouser, J., Lane, T., & Teeter, S. (2016). Mars, Venus, or Earth? Sexism and the exaggeration of psychological gender differences. Sex Roles, 75, 287-300.

Chivalry Isn’t Dead, But Men Are

In the somewhat-recent past, there was a vote in the Senate held on the matter of whether women in the US should be required to sign up for the selective service – the military draft – when they turn 18. Already accepted, of course, was the idea that men should be required to sign up; what appears to be a relatively less controversial idea. This represents yet another erosion of male privilege in modern society; in this case, the privilege of being expected to fight and die in armed combat, should the need arise. Now whether any conscription is likely to happen in the foreseeable future (hopefully not) is a somewhat different matter than whether women would be among the first drafted if that happened (probably not), but the question remains as to how to explain this state of affairs. The issue, it seems, is not simply one of whether men or women are better able to shoulder the physical demands of combat, however; it extends beyond military service into intuitions about real and hypothetical harm befalling men and women in everyday life. When it comes to harm, people seem to generally care less about it happening to men.

Meh

One anecdotal example of these intuitions I’ve encountered during my own writing is when an editor at Psychology Today removed an image in one my posts of a woman undergoing bodyguard training in China by having a bottle smashed over her head (which can be seen here; it’s by no means graphic). There was a concern expressed that the image was in some way inappropriate, despite my posting of other pictures of men being assaulted or otherwise harmed. As a research-minded individual, however, I want to go beyond simple anecdotes from my own life that confirm my intuitions into the empirical world where other people publish results that confirm my intuitions. While I’ve already written about this issue a number of times, it never hurts to pile on a little more.  Recently, I came upon a paper by FeldmanHall et al (2016) that examined these intuitions about harm directed towards men and women across a number of studies that can help me do just that.

The first of the studies in the paper was a straightforward task: fifty participants were recruited from Mturk to respond to a classic morality problem called the footbridge dilemma. Here, the life of five people can be saved from a train by pushing one person in front of it. When these participants were asked whether they would push a man or woman to their death (assuming, I think, that they were going to push one of them), 88% of participants opted for killing the man. Their second study expanded a bit on that finding using the same dilemma, but asking instead how willing they would be (on a 1-10 scale) to push either a man, woman, or a person of unspecified gender without other options existing. The findings here with regard to gender were a bit less dramatic and clear-cut: participants were slightly more likely to indicate that they would push a man (M = 3.3) than a woman (M = 3.0), though female participants were nominally less likely to push a woman (roughly M = 2.3) than men were (roughly M = 3.8), perhaps counter to what might be predicted. That said, the sample size for this second study was fairly small (only about 25 per group), so that difference might not be worth making much over until more data is collected.

When faced with a direct and unavoidable trade-off between the welfare of men and women, then, the results overwhelmingly showed that the women were being favored; however, when it came to cases where men or women could be harmed alone, there didn’t seem to be a marked difference between the two. That said, that moral dilemma alone can only take us so far in understanding people’s interests about the welfare of others in no small part because of their life-and-death nature potentially introducing ceiling effects (man or woman, very few people are willing to throw someone else in front of a train). In other instances where the degree of harm is lowered – such as, say, male vs female genital cutting – differences might begin to emerge. Thankfully, FeldmanHall et al (2016) included an additional experiment that brought these intuitions out of the hypothetical and into reality while lowering the degree of harm. You can’t kill people to conduct psychological research, after all.

Yet…

In the next experiment, 57 participants were recruited and given £20. At the end of the experiment, any money they had would be multiplied by ten, meaning participants could leave with a total of £200 (which is awfully generous as far as these things go). As with most psychology research, however, there was a catch: the participants would be taking part in 20 trials where £1 was at stake. A target individual – either a man or a woman – would be receiving a painful electric shock, and the participants could give up some of that £1 to reduce its intensity, with the full £1 removing the shock entirely. To make the task a little less abstract, the participants were also forced to view videos of the target receiving the shocks (which, I think, were prerecorded videos of real shocks – rather than shocks in real time – but I’m not sure from my reading of the paper if that’s a completely accurate description).

In this study, another large difference emerged: as expected, participants interacting with female targets ended up keeping less money by the end (M = £8.76) than those interacting with male targets (M = £12.54; d = .82). In other words, the main finding of interest was that participants were willing to give up substantially more money to prevent women from receiving painful shocks than they were to help men. Interestingly, this was the case in spite of the facts that (a) the male target in the videos was rated more positively overall than the female target, and (b) in a follow-up study where participants provided emotional reactions to thinking about being a participant in the former study, the amount of reported aversion to letting the target suffer shocks was similar regardless of the target’s gender. As the authors conclude:

While it is equally emotionally aversive to hurt any individual—regardless of their gender—that society perceives harming women as more morally unacceptable, suggests that gender bias and harm considerations play a large role in shaping moral action.

So, even though people find harming others – or letting them suffer harm for a personal gain – to generally be an uncomfortable experience regardless of their gender, they are more willing to help/avoid harming women than they are men, sometimes by a rather substantial margin.

Now onto the fun part: explaining these findings. It doesn’t go nearly far enough as an explanation to note that “society condones harming men more than women,” as that just restates the finding; likewise, we only get so far by mentioning that people perceive men to have a higher pain tolerance than women (because they do), as that only pushes the question back a step to the matter of why men tolerate more pain than women. As for my thoughts, first, I think these findings highlight the importance of a modular understanding of psychological systems: our altruistic and moral systems are made up of a number of component pieces, each with a distinct function, and the piece that is calculating how much harm is generated is, it would seem, not the same piece deciding whether or not to do something about it. The obvious reason for this distinction is that alleviating harm to others isn’t always adaptive to the same extent: it does me more adaptive good to help kin relative to non-kin, friends relative to strangers, and allies relative to enemies, all else being equal. 

“Just stay out of it; he’s bigger than you”

Second, it might well be the case that helping men, on average, tends to pay off less than helping women. Part of the reason for that state of affairs is that female reproductive potential cannot be replaced quite as easily as male potential; male reproductive success is constrained by the number of available women much more than female potential is by male availability (as Chris Rock put it, “any money spent on dick is a bad investment“). As such, men might become particularly inclined to invest in alleviating women’s pain as a form of mating effort. The story clearly doesn’t end there, however, or else we would predict men being uniquely likely to benefit women, rather than both sexes doing similarly. This raises two additional possibilities to me: one of these is that, if men value women highly as a form of mating effort, that increased social value could also make women more valuable to other women in turn. To place that in a Game of Thrones example, if a powerful house values their own children highly, non-relatives may come to value those same children highly as well in the hopes of ingratiating themselves to – or avoiding the wrath of – the child’s family.

The other idea that comes to mind is that men are less willing to reciprocate aid that alleviated their pain because to do so would be an admission of a degree of weakness; a signal that they honestly needed the help (and might in the future as well), which could lower their relative status. If men are less willing to reciprocate aid, that would make men worse investments for both sexes, all else being equal; better to help out the person who would experience more gratitude for your assistance and repay you in turn. While these explanations might or might not adequately explain these preferential altruistic behaviors directed towards women, I feel they’re worthwhile starting points.

References: FeldmanHall, O., Dalgleish, T., Evans, D., Navrady, L., Tedeschi, E., & Mobbs, D. (2016). Moral chivalry: Gender and harm sensitive predict costly altruism. Social Psychological & Personality Science, DOI: 10.1177/1948550616647448

Sexism, Testing, And “Academic Ability”

When I was teaching my undergraduate course on evolutionary psychology, my approach to testing and assessment was unique. You can read about that philosophy in more detail here, but the gist of my method was specifically avoiding multiple-choice formats in favor of short-essay questions with unlimited revision ability on the part of the students. I favored this exam format for a number of reasons, chief among which was that (a) I didn’t feel multiple choice tests were very good at assessing how well students understood the material (memorization and good guessing does not equal understanding), and (b) I didn’t really care about grading my students as much as I cared about getting them to learn the material. If they didn’t grasp it properly on their first try (and very few students do), I wanted them to have the ability and motivation to continue engaging with it until they did get it right (which most eventually did; the class average for each exam began around a 70 and rose to a 90). For the purposes of today’s discussion, the important point here is that my exams were a bit more cognitively challenging than is usual and, according to a new paper, that means I had unintentionally biased my exams in ways that disfavor “historically underserved groups” like women and the poor.

Oops…

What caught my eye about this particular paper, however, was the initial press release that accompanied it. Specifically, the authors were quoted as saying something I found, well, a bit queer:

“At first glance, one might assume the differences in exam performance are based on academic ability. However, we controlled for this in our study by including the students’ incoming grade point averages in our analysis,”

So the authors appear to believe that a gap in performance on academic tests arises independent of academic abilities (whichever those entail). This raised the immediate question in my mind of how one knows that abilities are the same unless one has a method of testing them. It seems a bit strange to say that abilities are the same on the basis of one set of tests (those that provided incoming GPAs), but then to continue to suggest that abilities are the same when a different set of tests provides a contrary result. In the interests of settling my curiosity, I tracked the paper down to see what was actually reported; after all, these little news blurbs frequently get the details wrong. Unfortunately, this one appeared to capture the author’s views accurately.

So let’s start by briefly reviewing what the authors were looking at. The paper, by Wright et al (2016), is based on data collected from three-years worth of three introductory biology courses spanning 26 different instructors, approximately 5,000 students, and 87 different exams.Without going into too much unnecessary detail, the tests were assessed by independent raters for how cognitively challenging they were, their format, and the students were classified according to their gender and socio-economic status (SES; as measured by whether they qualified for a financial aid program). In order to attempt and control for academic ability, Wright et al (2016) also looked at the freshman-year GPA of the students coming into the biology classes (based on approximately 45 credits, we are told). Because the authors controlled for incoming GPA, they hope to persuade the reader of the following:

This implies that, by at least one measure, these students have equal academic ability, and if they have differential outcomes on exams, then factors other than ability are likely influencing their performance.

Now one could argue that there’s more to academic ability than is captured by a GPA – which is precisely why I will do so in a minute – but let’s continue on with what the authors found first.

Cognitive challenging test were indeed, well, more challenging. A statistically-average male student, for instance, would be expected to do about 12% worse on the most challenging test in their sample, relative to the easiest one. This effect was not the same between genders, however. Again, using statistically-average men and women, when the tests were the least cognitively challenging, there was effectively no performance gap (about a 1.7% expected difference favoring men); however, when the tests were the most cognitively challenging, that expected gap rose to an astonishing expected…3.2% difference. So, while the gender difference just about nominally doubled, in terms of really mattering in any practical sense of the word, its size was such that it likely wouldn’t be noticed unless one was really looking for it. A similar pattern was discovered for SES: when the tests were easy, there was effectively no difference between those low or high in SES (1.3% favoring those higher); however, when the tests were about maximally challenging, this expected difference rose to about 3.5%. 

Useful for both spotting statistical blips and burning insects

There’s a lot to say about these results and how they’re framed within the paper. First, as I mentioned, they truly are minor differences; there are very few cases were a 1-3% difference in test scores is going to make-or-break a student, so I don’t think there’s any real reason to be concerned or to adjust the tests; not practically, anyway.

However, there are larger, theoretical issues looming in the paper. One of these is that the authors use the phrase “controlled for academic ability” so often that a reader might actually come to believe that’s what they did from simple repetition. The problem here, of course, is that the authors did not control for that; they controlled for GPA. Unfortunately for Wright et al’s (2016) presentation, those two things are not synonyms. As I said before, it is strange to say that academic ability is the same because one set of tests (incoming GPA) says they are while another set does not. The former set of tests appear to be privileged for no sound reason. Because of that unwarranted interpretation, the authors lose (or rather, purposefully remove) the ability to talk about how these gaps might be due to some performance difference. This is a useful rhetorical move if one is interested in doing advocacy – as it implies the gap is unfair and ought to be fixed somehow – but not if one is seeking the truth of the matter.

Another rather large issue in the paper is that, as far as I could tell, the authors predicted they would find these effects without ever really providing an explanation as for how or why that prediction arose. That is, what drove their expectation that men would outperform women and the rich outperform the poor? This ends up being something of a problem because, at the end of the paper, the authors do float a few possible (untested) explanations for their findings. The first of these is stereotype threat: the idea that certain groups of people will do poorly on tests because of some negative stereotype about their performance. This is a poor fit for the data for two reasons: first, while Wright et al (2016) claim that stereotype is “well-documented”, it actually fails to replicate (on top of not making much theoretical sense). Second, even if it was a real thing, stereotype threat, as it typically studied, requires that one’s sex be made salient prior to the test. As I encountered a total of zero tests during my entire college experience that made my gender salient, much less my SES, I can only assume that the tests in question didn’t do it either. In order for stereotype threat to work as an explanation, then, women and the poor would need to be under relative constant stereotype threat. In turn, this would make documenting and student stereotype threat in the first place rather difficult, as you could never have a condition where your subjects were not experiencing it. In short, then, stereotype threat seems like a bad fit.

The other explanations that are put forth for this gender difference are the possibility that women and poor students have more fixed views of intelligence instead of growth mindsets, so they withdraw from the material when challenged rather than improve (i.e., “we need to change their mindsets to close this daunting 2% gap), or the possibility that the test questions themselves are written in ways that subtly bias people’s ability to think about them (the example the authors raise is that a question written about applying some concept to sports might favor men, relative to women, as men tend to enjoy sports more). Given that the authors did have access to the test questions, it seems that they could have examined that latter possibility in at least some detail (minimally, perhaps, by looking at whether tests written by female instructors resulted in different outcomes than those written by male ones, or by examining the content of the questions themselves to see if women did worse on gendered ones). Why they didn’t conduct such analyses, I can’t say.

 Maybe it was too much work and they lacked a growth mindset

In summary, these very minor average differences that were uncovered could easily be chalked up – very simply – to GPA not being a full measure of a student’s academic ability. In fact, if the tests determining freshman GPA aren’t the most cognitively challenging (as one might well expect, given that students would have been taking mostly general introductory courses with large class sizes), then this might make the students appear to be more similar in ability than they actually were. The matter can be thought of using this stereotypically-male example (that will assuredly hinder women’s ability to think about it): imagine I tested people in a room with weights ranging from 1-15 pounds and asked them to curl each one time. This would give me a poor sense for any underlying differences in strength because the range of ability tested was restricted. Provided I were to ask them to do the same with weights ranging from 1-100 pounds the next week, I might conclude that it’s something about the weights – and not people’s abilities – when it came to figuring out why differences suddenly emerged (since I mistakenly believe I already controlled for their abilities the first time).

Now I don’t know if something like that is actually responsible, but if the tests determining freshman GPA were tapping the same kinds of abilities to the same degrees as those in the biology courses studied, then controlling for GPA should have taken care of that potential issue. Since controlling for GPA did not, I feel safe assuming there being some difference in the tests in terms of what abilities they’re measuring.

References: Wright, C., Eddy, S., Wenderoth, M., Abshire, E., Blankenbiller, M., & Brownell, S. (2016). Cognitive difficulty and format of exams predicts gender and socioeconomic gaps in exam performance of students in introductory biology courses. Life Science Education, 15.

Psychology Research And Advocacy

I get the sense that many people get a degree in psychology because they’re looking to help others (since most clearly aren’t doing it for the pay). For those who get a degree in the clinical side of the field, this observation seems easy to make; at the very least, I don’t know of any counselors or therapists who seek to make their clients feel worse about the state their life is in and keep them there. For those who become involved in the research end of psychology, I believe this desire to help others is still a major motivator. Rather than trying to help specific clients, however, many psychological researchers are driven by a motivation to help particular groups in society: women, certain racial groups, the sexually promiscuous, the outliers, the politically liberal, or any group that the researcher believes to be unfairly marginalized, undervalued, or maligned. Their work is driven by a desire to show that the particular group in question has been misjudged by others, with those doing the misjudging being biased and, importantly, wrong. In other words, their role as a researcher is often driven by their role as an advocate, and the quality of their work and thinking can often take a back seat to their social goals.

When megaphones fail, try using research to make yourself louder

Two such examples are highlighted in a recent paper by Eagly (2016), both of which can broadly be considered to focus on the topic of diversity in the workplace. I want to summarize them quickly before turning to some of the other facets of the paper I find noteworthy. The first case concerns the prospect that having more women on corporate boards tends to increase their profitability, a point driven by a finding that Fortune 500 companies in the top quarter of female representation on boards of directors performed better than those in the bottom quarter of representation. Eagly (2016) rightly notes that such a basic data set would be all but unpublishable in academia for failing to do a lot of important things. Indeed, when more sophisticated research was considered in a meta-analysis of 140 studies, the gender diversity of the board of directors had about as close to no effect as possible on financial outcomes: the average correlations across all the studies ranged from about r = .01 all the way up to r = .05 depending on what measures were considered. Gender diversity per se seemed to have no meaningful effect despite a variety of advocacy sources claiming that increasing female representation would provide financial benefits. Rather than considering the full scope of the research, the advocates tended to cite only the most simplistic analyses that provided the conclusion they wanted (others) to hear.

The second area of research concerned how demographic diversity in work groups can affect performance. The general assumption that is often made about diversity is that it is a positive force for improving outcomes, given that a more cognitively-varied group of people can bring a greater number of skills and perspectives to bear on solving tasks than more homogeneous groups can. As it turns out, however, another meta-analysis of 146 studies concluded that demographic diversity (both in terms of gender and racial makeup) had effectively no impact on performance outcomes: the correlation for gender was r = -.01 and was r = -.05 for racial diversity. By contrast, differences in skill sets and knowledge had a positive, but still very small effect (r = .05). In summary, findings like these would suggest that groups don’t get better at solving problems just because they’re made up of enough [men/women/Blacks/Whites/Asians/etc]. Diversity in demographics per se, unsurprisingly, doesn’t help to magically solve complex problems.

While Eagly (2016) appears to generally be condemning the role of advocacy in research when it comes to getting things right (a laudable position), there were some passages in the paper that caught my eye. The first of these concerns what advocates for causes should do when the research, taken as a whole, doesn’t exactly agree with their preferred stance. In this case, Eagly (2016) focuses on the diversity research that did not show good evidence for diverse groups leading to positive outcomes. The first route one might take is to simply misrepresent the state of the research, which is obviously a bad idea. Instead, Eagly suggests advocates take one of two alternative routes: first, she recommends that researchers might conduct research into more specific conditions under which diversity (or whatever one’s preferred topic is) might be a good thing. This is an interesting suggestion to evaluate: on the one hand, people would often be inclined to say it’s a good idea; in some particular contexts diversity might be a good thing, even if it’s not always, or even generally, useful. This wouldn’t be the first time effects in psychology are found to be context-dependent. On the other hand, this suggestion also runs some serious risks of inflating type 1 errors. Specifically, if you keep slicing up data and looking at the issue in a number of different contexts, you will eventually uncover positive results even if they’re just due to chance. Repeated subgroup or subcontext analysis doesn’t sound much different from the questionable statistical practices currently being blamed for psychology’s replication problem: just keep conducting research and only report the parts of it that happened to work, or keep massaging the data until the right conclusion falls out.    

“…the rest goes in the dumpster out back”

Eagly’s second suggestion I find a bit more worrisome: arguing that relevant factors – like increases in profits, productivity, or finding better solutions – aren’t actually all that relevant when it comes to justifying why companies should increase diversity. What I find odd about this is that it seems to suggest that the advocates begin with their conclusion (in this case, that diversity in the work force ought to be increased) and then just keep looking for ways to justify it in spite of previous failures to do so. Again, while it is possible that there are benefits to diversity which aren’t yet being considered in the literature, bad research would likely result from a process where someone starts their analysis with the conclusion and keeps going until they justify it to others, no matter how often it requires shifting the goal posts. A major problematic implication with that suggestion mirrors other aspects of the questionable psychology research practices I mentioned before: when a researcher finds the conclusion they’re looking for, they stop looking. They only collect data up until the point it is useful, which rigs the system in favor of finding positive results where there are none. That could well mean, then, that there will be negative consequences to these diversity policies which are not being considered. 

What I think is a good example of this justification problem leading to shoddy research practices/interpretation follows shortly thereafter. In talking about some of these alternative benefits that more female hires might have, Eagly (2016) notes that women tend to be more compassionate and egalitarian than men; as such, hiring more women should be expected to increase less-considered benefits, such as a reduction in the laying-off of employees during economic downturns (referred to as labor hoarding), or more favorable policies towards time off for family care. Now something like this should be expected: if you have different people making the decisions, different decisions will be made. Forgoing for the moment the question of whether those different policies are better, in some objective sense of the word, if one is interested in encouraging those outcomes (that is, they’re preferred by the advocate) then one might wish to address those issue directly, rather than by proxy. That is to say if you are looking to make the leadership of some company more compassionate, then it makes sense to test for and hire more compassionate people, not hiring more women under the assumption you will be increasing compassion. 

This is an important matter because people are not perfect statistical representations of the groups to which they belong. On average, women may be more compassionate than men; the type of woman who is interested in actively pursuing a CEO position in a Fortune 500 company might not be as compassionate as your average woman, however, and, in fact, might even be less compassionate than a particular male candidate. What Eagly (2016) has ended up reaching, then, is not a justification for hiring more women; it’s a justification for hiring compassionate or egalitarian people. What is conspicuously absent from this section is a call for more research to be conducted on contexts in which men might be more compassionate than women; once the conclusion that hiring women is a good thing has been justified (in the advocate’s mind, anyway), the concerns for more information seem to sputter out. It should go without saying, but such a course of action wouldn’t be expected to lead to the most accurate scientific understanding of our world.

The solution to that problem being more diversity, of course..

To place this point in another quick example, if you’re looking to assemble a group of tall people, it would be better to use people’s height when making that decision rather than their sex, even if men do tend to be taller than women. Some advocates might suggest that being male is a good enough proxy for height, so you should favor male candidates; others would suggest that you shouldn’t be trying to assemble a group of tall people in the first place, as short people offer benefits that tall ones don’t; other still will argue that it doesn’t matter if short people don’t offer benefits as they should be preferentially selected to combat negative attitudes towards the short regardless (at the expense of selecting tall candidates). For what it’s worth, I find the attitude of “keep doing research until you justify your predetermined conclusion” to be unproductive and indicative of why the relationship between advocates and researchers ought not be a close one. Advocacy can only serve as a cognitive constraint that decreases research quality as the goal of advocacy is decidedly not truth. Advocates should update their conclusions in light of the research; not vice versa. 

References: Eagly, A. (2016). When passionate advocates meet research on diversity, does the honest broker stand a chance? Journal of Social Issues, 72, 199-222.

Men Are Better At Selling Things On eBay

When it comes to gender politics, never take the title of the piece at face value; or the conclusions for that matter.

In my last post, I mentioned how I find some phrases and topics act as red flags regarding the quality of research one is liable to encounter. Today, the topic is gender equality – specifically some perceived (and, indeed, some rather peculiar) discrimination against women – which is an area not renowned for its clear-thinking or reasonable conclusions. As usual, the news articles circulating this piece of research made some outlandish claim that lacks even remote face validity. In this case, the research in question concludes that people, collectively, try to figure out the gender of the people selling things on eBay so as to pay women substantially less than men for similar goods. Those who found such a conclusion agreeable to their personal biases spread it to others across social media as yet another example of how the world is an evil, unfair place. So here I am again, taking a couple recreational shots at some nonsense story of sexism.

Just two more of these posts and I get a free smoothie

The piece question today is an article from Kricheli-Katz & Regev (2016) that examined data from about 1.1 million eBay auctions. The stated goals of the authors involve examining gender inequality in online product markets, so at least we can be sure they’re going into this without an agenda. Kricheli-Katz & Regev (2016) open their piece by talking about how gender inequality is a big problem, launching their discussion almost immediately with a rehashing of that misleading 20% pay gap statistic that’s been floating around forever. As that claim has been dissected so many times at this point, there’s not much more to say about it other than (a) when controlling for important factors, it drops to single digits and (b) when you see it, it’s time to buckle in for what will surely be an unpleasant ideological experience. Thankfully, the paper does not disappoint in that regard, promptly suggesting that women are discriminated against in online markets like eBay.

So let’s start by considering what the authors did, and what they found. First, Kricheli-Katz & Regev (2016) present us with their analysis of eBay data. They restricted their research to auctions only, where sellers will post an item and any subsequent interaction occurs between bidders alone, rather than between bidders and sellers. On average, they found that the women had about 10 fewer months of experience than men, though the accounts of both sexes had existed for over nine years of age, and women also had very-slightly better reputations, as measured by customer feedback. Women also tended to set slightly higher initial prices than men for their auctions, controlling for the product being sold. As such, women also tended to receive slightly fewer bids on their items, and ultimately less money per sale when they ended.

However, when the interaction between sex and product type (new or used) was examined, the headline-grabbing result appeared: while women netted a mere 3% less on average for used products than men, they netted a more-impressive 20% less for new products (where, naturally, one expects products to be the same). Kricheli-Katz & Regev (2016) claim that the discrepancy in the new-product case are due to beliefs about gender. Whatever these unspecified beliefs are, they cause people to pay women about 20% less for the same item. Taking that idea on face value for a moment, why does that gap all but evaporate in the used category of sales? The authors attribute that lack of a real difference to an increased trust people have in women’s descriptions of the condition of their products. So men trust women more when it comes to used goods, but pay them less for new ones when trust is less relevant. Both these conclusions, as far as I can see from the paper, have been pulled directly out of thin air. There is literally no evidence presented to support them: no data; not citations; no anything.

I might have found the source of their interpretations

By this point, anyone familiar with how eBay works is likely a bit confused. After all, the sex of the seller is at no point readily apparent in almost any listings. Without that crucial piece of information, people would have a very difficult time discriminating on the basis of it. Never fear, though; Kricheli-Katz & Regev (2016) report the results of a second study where they pulled 100 random sellers from their sample and asked about 400 participants to try and determine the sex of sellers in question. Each participant offered their guesses about five profiles, for a total of 2000 attempts. About 55% of the time, participants got the sex right, 9% of the time they got it wrong, and the remaining 36% of the time, they said they didn’t know (which, since they don’t know, also means they got it wrong). In short, people couldn’t determine the sex reliably about half the time. The authors do mention that the guesses got better as participants viewed more items that the seller had posted, however.

So here’s the story they’re trying to sell: When people log onto eBay, they seek out a product they’re looking to buy. When they find a seller listing the product, they examine the seller’s username, the listing in question, and their other listings in their store to attempt and discern the sex of the seller. Buyers subsequently lower their willingness to pay for an item by quite a bit if they see it is being sold by a woman, but only if it’s new. In fact, since women made 20% less, the actual reduction in willingness to pay must be larger than that, as sex can only be determined about half of the time reliably when people are trying. Buyers do all this despite even trusting female sellers more. Also, I do want to emphasis the word they, as this would need to be a pretty collective action. If it wasn’t a fairly universal response among buyers, the prices of female-sold items would eventually even out with the male price, as those who discriminated less against women would be drawn towards the cheaper prices and bump them back up.

Not only do I not buy this story – not even a little – but I wouldn’t pay the authors less for it because they happen to be women if I was looking to make a purchase. While people might be able to determine the sex of the seller on eBay sometimes, when they’re specifically asked to do so, that does not mean people engage in this sort of behavior naturally.

Finally, Kricheli-Katz & Regev (2016) report the results of a third study, asking 100 participants how much they value a $100 gift card being sold by either an Alison or a Brad. Sure enough, people were willing to pay Alison less for the card: she got a mere $83 to Brad’s $87; a 5% difference. I’d say someone should call the presses, but it looks like they already did, judging from the coverage this piece has received. Now this looks like discrimination – because it is – but I don’t think it’s based on sex per se. I say that because, earlier in the paper, Kricheli-Katz & Regev (2016) also report that women as buyers on eBay, tended to pay about 3% more than men for comparable goods. To the extent that the $4 difference in valuation is meaningful here, there are two things to say about it. First, it may well represent the fact that women aren’t as willing to negotiate prices in their favor. Indeed, while women were 23% of the sellers on eBay, they only represented 16% of the auctions with a negotiation component. If that’s the case, people are likely willing to pay less to women because they perceive (correctly) some population differences in their ability to get a good deal. I suspect if you gave them individuating information about the seller’s abilities, sex would stop mattering even 5%. Second, that slight, 5% difference would by no means account for the 20% gap the authors report finding with respect to new product sales; not even close.

But maybe your next big idea will work out better…

Instead, my guess is that in spite of the authors’ use of the word “equally qualified” when referring to the men and women in their seller sample, there were some important differences in listings the buyers noticed; the type of differences that you can’t account for when you’re looking at over a million of them and rough control measures aren’t effective. Kricheli-Katz & Regev (2016) never seemed to consider – and I mean really consider – the possibility that something about these listings, something they didn’t control for, might have been driving sale price differences. While they do control for factors like the seller’s reputation, experience, number of pictures, year of the sale, and some of the sentiments expressed by words in the listing (how positive or negative it is), there’s more to making a good listing than that. A more likely story is that differences in sale prices reflect different behaviors on the part of male and female sellers (as we already know others differences exist in the sample), as the alternative story attempting to be championed would require a level of obsession with gender-based discrimination in the population so wide and deep that we wouldn’t need to research it; it would be plainly obvious to everyone already.

Then again, perhaps it’s time I make my way over to eBay to pick up a new tinfoil hat.

References: Kricheli-Katz, T. & Regev, T. (2016). How many cents on the dollar? Women and men in product markets. Science Advances, 2, DOI: 10.1126/sciadv.1500599