When it comes to research on sexism, there appear to be many parties interested in the notion that sexism ought to be reduced. This is a laudable goal, and one that I would support; I am very much in favor in treating people as individuals rather than representatives of their race, sex, or any other demographic characteristics. It is unfortunately, however, that this goal often gets side-tracked by an entirely different one: trying to get people to reduce the extent to which people view men and women as different. What I mean by this is that I have seen many attempts to combat sexism by trying to reduce the perception that men and women differ in terms of their psychology, personality, intelligence, and so on; it’s much more seldom that those same voices appear to convince people who inaccurately perceive sex differences as unusually small to adjust their estimate upwards. In other words, rather that championing accuracy is perceptions, there appears to be a more targeted effort for minimizing particular differences; while those are sometimes the same thing (sometimes people are wrong because they overestimate), they are often not (sometimes people are wrong because they underestimate), and when those goals do overlap, the minimization side tends to win out.

In my last post, I discussed some research by Zell et al (2016) primarily in the service of examining measures of sexism and the interpretation of the data they produce (which I recommend reading first). Today I wanted to give that paper a more in-depth look to illustrate this (perhaps unconscious) goal of trying to get people to view the sexes as more similar than they actually are. Zell et al (2016) begin their introduction by suggesting that most psychological differences between men and women are small, and the cases in which medium to large differences exist – like mating preferences and aggression – tend to be rare. David Schmitt has already put remarks like that into some context, and I highly recommend you read his post on the subject. In the event you can’t be bothered to do so at the moment, one of the most important takeaway points from his post is that even if the differences in any one domain tend to be small on average, when considered across all those domains simultaneously, those small differences can aggregate into much larger ones.

Moreover, the significance of a gender difference is not necessarily determined by its absolute size, either. This was a point Steven Pinker mentioned in a somewhat-recent debate with Elizabeth Spelke (and was touched on again in a recent talk by Jon Haidt at SUNY New Paltz). To summarize this point briefly, if you’re looking at a trait in two normally-distributed populations that are, on average, quite similar, the further from that average value you get, the most extreme the difference between populations become. Pinker makes the point clear in this example:

“…it’s obvious that distributions of height for men and women overlap: it’s not the case that all men are taller than all women. But while at five foot ten there are thirty men for every woman, at six feet there are two thousand men for every woman. Now, sex differences in cognition tend not to be so extreme, but the statistical phenomenon is the same.”

Not only are small sex differences sometimes important, then, (such as when you’re trying to hire people for a job who are in the top 1% of distribution for a trait like intelligence, speed, conscientiousness; you name it) but a large number of small effects (as well as some medium and large ones) can all add up to collectively represent some rather large differences (and that assumes you’re accounting for all relevant sex differences; not just a non-representative sample of them). With all this considered, the declaration at the beginning of Zell et al’s paper that most sex differences tend to be small strikes me less as a statement of empirical concern, but rather one that serves to set up the premise for the rest of their project: specifically, the researchers wanted to test whether people’s scores on the ambivalent sexism inventory predicted (a) the extent to which they perceive sex differences as being large and (b) the extent to which they are inaccurate in their perceptions. The prediction in this case was that people who scored high on their ostensible measures of sexism would be more likely to exaggerate sex differences and more likely to be wrong about their size overall (as an aside, I don’t think those sexism questions measure what the authors hope they do; see my last post).

In their first study, Zell et al (2016) asked about 320 participants to estimate how large they think sex differences are between men and women (from 1-99) were for 48 traits and to answer 6 questions intended to measure their hostile and benevolent sexism (as another aside, I have no idea why those 48 traits in particular were selected). These answers were then averaged for each participant to create an overall score for how large they viewed the sex differences to be, and how high they scored on hostile and benevolent sexism. When the relevant factors were plugged into their regression, the results showed that those higher in hostile (ß = .19) and benevolent (ß = .29) sexism tended to perceive sex differences as larger, on average. When examined by gender, it was found that women (ß = .41) who were higher in benevolent sexism were more likely to perceive sex differences as large (but this was not true for men: ß = .11) and – though it was not significant – the reverse pattern held for hostile sexism, such that women high in hostile sexism were nominally less likely to perceive sex differences as large (ß = -.32).

The more interesting finding, at least as far as I’m concerned, is that in spite of those scoring higher on their sexism scores perceiving sex differences to be larger, they were not really more likely to be wrong about them. Specifically, those who scored higher on benevolent sexism were slightly less accurate (ß = -.20), just as women tended to be less accurate than men (ß = -.19); however, hostile sexism scores were unrelated to accuracy altogether (ß = .003), and no interactions with gender and sexism emerged. To put that in terms of the simple correlations, hostile and benevolent sexism correlated much better with the perceived size of sex differences (rs = .26 and .43, respectively) than they did with accuracy (rs = -.12 and -.22, with the former not being significant and the latter being rather small). Now since we’re dealing with two genders, two sexism scales, and relatively small effects, it is possible that some of these findings are a bit more likely to be statistical flukes; that does tend to happen as you keep slicing data up. Nevertheless, these results are discussed repeated within the context of their paper as representing exaggerations: those scoring higher on these sexism measures are said to exaggerate sex differences, which is odd on account of them not consistently getting them all that wrong.

This interpretation extends to their second study as well. In that experiment, about 230 participants were presented with two mock abstracts and told that only one of them represented an accurate summary of psychological research on sex differences. The accurate version, of course, was the one that said sex differences were small on average and therefore concluded that men and women are very similar to each other, whereas the bogus abstract concluded that gender differences are often large and therefore men and women are very different from one another. As I reviewed in the beginning of the post, small differences can often have meaningful impacts both individually and collectively, so the lines about how men and women are very similar to each other might not reflect an entirely accurate reading of the literature even if the part about small average sex differences did. This setup is already conflating the two statements (“average effect sizes on all these traits is small” and “men and women are very similar across the board”).

As before, those higher in hostile and benevolent sexism tended to say that the larger sex difference abstract more closely reflected their personal views (women tended to select the large-difference abstract 50.4% of the time compared to men’s 44.2% as well). Now because the authors view the large sex difference abstract as being the fabricated one, they conclude that those higher in those sexism measures are less accurate and more likely to exaggerate these views (they also make a remark that their sexism measures indicate which people “endorse sexist ideologies”; a determination it’s not at all cut out for making). In other words, the authors interpret this finding as those selecting the large-differences abstract to hold “empirically unsupported” views (which in a sort-of ironic sense means that, as the late George Carlin put it, “Men are better at it” when it comes to recognizing sex differences).

This is an interesting methodological trick they employ: since they failed to find much in the way of a correlation between sexism scores and accuracy in their first study (it existed sometimes, but was quite small across the board and certainly much smaller than the perception of size correlation), they created a coarser and altogether worse measure of accuracy in the second study and use that to support their views that believing men and women tend to be rather different is wrong instead. As the old saying goes, if at first you don’t succeed, change your measures until you do.

References: Zell, E., Strickhouser, J., Lane, T., & Teeter, S. (2016). Mars, Venus, or Earth? Sexism and the exaggeration of psychological gender differences. Sex Roles, 75, 287-300.

