Skepticism Surrounding Sex

Posted on July 25, 2016 by Jesse Marczyk

“It’s a basic truth of the human condition that everybody lies; the only variable is about what“

One of my favorite shows from years ago was House; a show centered around a brilliant but troubled doctor who frequently discovers the causes of his patient’s ailments through discerning what they – or others – are lying about. This outlook on people appears to be correct, at least in spirit. Because it is sometimes beneficial for us that other people are made to believe things that are false, communication is often less than honest. This dishonesty entails things like outright lies, lies by omission, or stretching the truth in various directions and placing it in different lights. Of course, people don’t just lie because deceiving others is usually beneficial. Deception – much like honesty – is only adaptive to the extent that people do reproductively-relevant things with it. Convincing your spouse that you had an affair when you didn’t is dishonest for sure, but probably not a very useful thing to do; deceiving someone about what you had for breakfast is probably fairly neutral (minus the costs you might incur from coming to be known as a liar). As such, we wouldn’t expect selection to have shaped our psychology to lie about all topics with equal frequency. Instead, we should expect that people tend to preferentially lie about particular topics in predictable ways.

Lies like, “This college degree will open so many doors for you in life”

The corollary idea to that point concerns skepticism. Distrusting the honesty of communications can protect against harmful deceptions, but it also runs the risk of failing to act on accurate and beneficial information. There are costs and benefits to skepticism as there are to deception. Just as we shouldn’t expect people to be dishonest about all topics equally often, then, we shouldn’t expect people to be equally skeptical of all the information they receive either. This is point I’ve talked about before with regards to our reasoning abilities, whereby information agreeable to our particular interests tends to be accepted less critically, while disagreeable information is scrutinized much more intensely.

This line of thought was recently applied to the mating domain in a paper by Walsh, Millar, & Westfall (2016). Humans face a number of challenges when it comes to attracting sexual partners typically centered around obtaining the highest quality of partner(s) one can (metaphorically) afford, relative to what one offers to others. What determines the quality of partners, however, is frequently context specific: what makes a good short-term partner might differ from what makes a good long-term partner and – critically, as far as the current research is concerned – the traits that make good male partners for women are not the same as those that make good females partner for men. Because women and men face some different adaptive challenges when it comes to mating, we should expect that they would also preferentially lie (or exaggerate) to the opposite sex about those traits that the other sex values the most. In turn, we should also expect that each sex is skeptical of different claims, as this skepticism should reflect the costs associated with making poor reproductive decisions on the basis of bad information.

In case that sounds too abstract, consider a simple example: women face a greater obligate cost when it comes to pregnancy than men do. As far as men are concerned, their role in reproduction could end at ejaculation (which it does, for many species). By contrast, women would be burdened with months of gestation (during which they cannot get pregnant again), as well as years of breastfeeding prior to modern advancements (during which they also usually can’t get pregnant). Each child could take years of a woman’s already limited reproductive lifespan, whereas the man has lost a few minutes. In order to ease those burdens, women often seek male partners who will stick around and invest in them and their children. Men who are willing to invest in children should thus prove to be more attractive long-term partners for women than those who are unwilling. However, a man’s willingness to stick around needs to be assessed by a woman in advance of knowing what his behavior will actually be. This might lead to men exaggerating or lie about their willingness to invest, so as to encourage women to mate with them. Women, in turn, should be preferentially skeptical of such claims, as being wrong about a man’s willingness to invest is costly indeed. The situation should be reversed for traits that men value in their partners more than women.

Figure 1: What men most often value in a woman

Three such traits for both men and women were examined by Walsh et al (2016). In their study, eight scenarios depicting a hypothetical email exchange between a man and woman who had never met were displayed to approximately 230 (mostly female; 165) heterosexual undergraduate students. For the women, these emails depicted a man messaging a woman; for men, it was a woman messaging a man. The purpose of these emails was described as the person sending them looking to begin a long-term intimate relationship with the recipient. Each of these emails described various facets of the sender, which could be broadly classified as either relevant primarily to female mating interests, relevant to male interests, or neutral. In terms of female interests, the sender described their luxurious lifestyle (cuing wealth), their desire to settle down (commitment), or how much they enjoy interacting with children (child investment). In terms of male interests, the sender talked about having a toned body (cuing physical attractiveness), their openness sexually (availability/receptivity), or their youth (fertility and mate value). In the two neutral scenarios, the sender either described their interest in stargazing or board games.

Finally, the participants were asked to rate (on a 1-5 scale) how deceitful they thought the sender was, whether they believed the sender or not, and how skeptical they were of the claims in the message. These three scores were summed for each participant to create a composite score of believability for each of the messages (the lower the score, the less believable it was rated as being). Those scores were then averaged across the female-relevant items (wealth, commitment, and childcare), the male-relevant items (attractiveness, youth, and availability), and the control conditions. (Participants also answered questions about whether the recipient should respond and how much they personally liked the sender. No statistical analyses are reported on those measures, however, so I’m going to assume nothing of note turned up)

The results showed that, as expected, the control items were believed more readily (M = 11.20) than the male (M = 9.85) or female (9.6) relevant items. This makes sense, inasmuch as believing lies about stargazing or interests in board games aren’t particularly costly for either sex in most cases, so there’s little reason to lie about them (and thus little reason to doubt them); by contrast, messages about one’s desirability as a partner have real payoffs, and so are treated more cautiously. However, an important interaction with the sex of the participant was uncovered as well: female participants were more skeptical on the female-relevant items (M = about 9.2) than males were (M = 10.6); similarly, males were more likely to be skeptical in male-relevant conditions (M = 9.5) than females were (M = 10). Further, the scores for the individual items all showed evidence of the same sex kinds of differences in skepticism. No sex difference emerged for the control condition, also as expected.

In sum, then – while these differences were relatively small in magnitude – men tended to be more skeptical of claims that, if falsely believed, were costlier for them than women, and women tended to be more skeptical of claims that, if falsely believed, were costlier for them than men. This is a similar pattern to that found in the reasoning domain, where evidence that agrees with one’s position is accepted more readily than evidence that disagrees with it.

“How could it possibly be true if it disagrees with my opinion?”

The authors make a very interesting point towards the end of their paper about how their results could be viewed as inconsistent with the hypothesis that men have a bias to over-perceived women’s sexual interest. After all, if men are over-perceiving such interest in the first place, why would they be skeptical about claims of sexual receptivity? It is possible, of course, that men tend to over-perceive such availability in general and are also skeptical of claims about its degree (e.g., they could still be manipulated by signals intentionally sent by females and so are skeptical, but still over-perceive ambiguous or less-overt cues), but another explanation jumps out at me that is consistent with the theme of this research: perhaps when asked to self-report about their own sexual interest, women aren’t being entirely accurate (consciously or otherwise). This explanation would fit well with the fact that men and women tend to perceive a similar level of sexual interest in other women. Then again, perhaps I only see that evidence as consistent because I don’t think men, as a group, should be expected to have such a bias, and that’s biasing my skepticism in turn.

References: Walsh, M., Millar, M., & Westfall, S. (2016). The effects of gender and cost on suspicion in initial courtship communications. Evolutionary Psychological Science, DOI 10.1007/s40806-016-0062-8

Why Women Are More Depressed Than Men

Posted on July 5, 2016 by Jesse Marczyk

Women are more likely to be depressed than men; about twice as likely here in the US, as I have been told. It’s an interesting finding, to be sure, and making sense of it poses a fun little mystery (as making sense of many things tends to). We don’t just want to know that women are more depressed than men; we also want to know why women are more depressed. So what are the causes of this difference? The Mayo Clinic floats a few explanations, noting that this sex difference appears to emerge around puberty. As such, many of the explanations they put forth center around the problems that women (but not men) might face when undergoing that transitional period in their life. These include things like increased pressure to achieve in school, conflict with parents, gender confusion, PMS, and pregnancy-related factors. They also include ever-popular suggestions such as societal biases that harm women. Now I suspect these are quite consistent with the answers you would get if queried your average Joe or Jane on the street as to why they think women are more depressed. People recognize that depression often appears to follow negative life events and stressors, and so they look for proximate conditions that they believe (accurately or not) disproportionately affect women.

Boys don’t have to figure out how to use tampons; therefore less depression

While that seems to be a reasonable strategy, it produces results that aren’t entirely satisfying. First, it seems unlikely that women face that much more stress and negative life events than men do (twice as much?) and, secondly, it doesn’t do much to help us understand individual variation. Lots of people face negative life events, but lots of them also don’t end up spiraling into depression. As I noted above, our understanding of the facts related to depression can be bolstered by answering the why questions. In this case, the focus many people have is on answering the proximate whys rather than the ultimate ones. Specifically, we want to know why people respond to these negative life events with depression in the first place; what adaptive function depression might have. Though depression reactions appear completely normal to most, perhaps owing to their regularity, we need to make that normality strange. If, for example, you imagine a new mouse mother facing the stresses of caring for her young in a hostile world, a postpartum depression on her part might seem counterproductive: faced with the challenges of surviving and caring for her offspring, what adaptive value would depressive symptoms have? How would low energy, a lack of interest in important everyday activities, and perhaps even suicidal ideation help make her situation better? If anything, they would seem to disincline her from taking care of these important tasks, leaving her and her dependent offspring worse off. This strangeness, of course, wouldn’t just exist in mice; it should be just as strange when we see it in humans.

The most compelling adaptive account of depression I’ve read (Hagen, 2003) suggests that the ultimate why of depression focuses on social bargaining. I’ve written about it before, but the gist of the idea is as follows: if I’m facing adversity that I am unlikely to be able to solve alone, one strategy for overcoming that problem is to recruit others in the world to help me. However, those other people aren’t always forthcoming with the investment I desire. If others aren’t responding to my needs adequately, it would behoove me to try and alter their behavior so as to encourage them to increase their investment in me. Depression, in this view, is adapted to do just that. The psychological mechanisms governing depression work to, essentially, place the depressed individual on a social strike. When workers are unable to effectively encourage an increased investment from their employers (perhaps in the form of pay or benefits), they will occasionally refuse to work at all until their conditions improve. While this is indeed costly for the workers, it is also costly for the employer, and it might be beneficial for the employer to cave to the demands rather than continue to face the costs of not having people work. Depression shows a number of parallels to this kind of behavior, where people withdraw from the social world – taking with them the benefits they provided to others – until other people increase their investment in the depressed individual to help see them through a tough period.

Going on strike (or, more generally, withdrawing from cooperative relationships), of course, is only one means of getting other people to increase their investment in you; another potential strategy is violence. If someone is enacting behaviors that show they don’t value me enough, I might respond with aggressive behaviors to get them to alter that valuation. Two classic examples of this could be shooting someone in self-defense or a loan-shark breaking a delinquent client’s legs. Indeed, this is precisely the type of function that Sell et al (2009) proposed that anger has: if others aren’t giving me my due, anger motivates me to take actions that could recalibrate their concern for my welfare. This leaves us with two strategies – depression and anger – that can both solve the same type of problem. The question arises, then, as to which strategy will be the most effective for a given individual and their particular circumstances. This raises a rather interesting possibility: it is possible that the sex difference in depression exists because the anger strategy is more effective for men, whereas the depression strategy is more effective for women (rather than, say, because women face more adversity than men). This would be consistent with the sex difference in depression arising around puberty as well, since this is when sex differences in strength also begin to emerge. In other words, both men and women have to solve similar social problems; they just go about it in different ways.

“An answer that doesn’t depend on wide-spread sexism? How boring…”

Crucially, this explanation should also be able to account for within-sex differences as well: while men are more able to successfully enact physical aggression than women, not all men will be successful in that regard since not all men possess the necessary formidability. The male who is 5’5″ and 130 pounds soaking wet likely won’t win against his taller, heavier, and stronger counterparts in a fight. As such, men who are relatively weak might preferentially make use of the depression strategy, since picking fights they probably won’t win is a bad idea, while those who are on the stronger side might instead make use of anger more readily. Thankfully, a new paper by Hagen & Rosenstrom (2016) examines this very issue; at least part of it. The researchers sought to test whether upper-body strength would negatively predict depression scores, controlling for a number of other, related variables.

To do so, they accessed data from the National Health and Nutrition Examination Survey (NHANES), netting a little over 4,000 subjects ranging in age from 18-60. As a proxy for upper-body strength, the authors made use of the measures subjects had provided of their hand-grip strength. The participants had also filled out questions concerning their depression, height and weight, socioeconomic status, white blood cell count (to proxy health), and physical disabilities. The researchers predicted that: (1) depression should negatively correlate with grip-strength, controlling for age and sex, (2) that relationship should be stronger for men than women, and (3) that the relationship would persist after controlling for physical health. About 9% of the sample qualified as depressed and, as expected, women were more likely to report depression than men by about 1.7 times. Sex, on its own, was a good predictor of depression (in their regression, ß = 0.74).

When grip-strength was added into the statistical model, however, the effect of sex dropped into the non-significant range (ß = 0.03), while strength possessed good predictive value (ß = -1.04). In support of the first hypothesis, then, increased upper-body strength did indeed negatively correlate with depression scores, removing the effect of sex almost entirely. In fact, once grip strength was controlled for, men were actually slightly more likely to report depression than women (though this didn’t appear to be significant). Prediction 2 was not supported, however, with their being no significant interaction between sex and grip-strength on measures of depression. This effect persisted even when controlling for socioeconomic status, age, anthropomorphic, and hormonal variables. However, physical disability did attenuate the relationship between strength and depression quite a bit, which is understandable in light of the fact that physically-disabled individuals likely have their formidability compromised, even if they have stronger upper bodies (an example being a man in a wheelchair having good grip strength, but still not being much use in a fight). It is worth mentioning that the relationship between strength and depression appeared to grow larger over time; the authors suggest this might have something to do with older individuals having more opportunities to test their strength against others, which sounds plausible enough.

Also worth noting is that when depression scores were replaced with suicidal ideation, the predicted sex-by-strength interaction did emerge, such that men with greater strength reported being less suicidal, while women with greater strength reported being more suicidal (the latter portion of which is curious and not predicted). Given that men succeed at committing suicide more often than women, this relationship is probably worth further examination.

“Not today, crippling existential dread”

Taken together with findings from Sell et al (2009) – where men, but not women, who possessed greater strength reported being quicker to anger and more successful in physical conflicts – the emerging picture is one in which women tend to (not consciously) “use” depression as a means social bargaining because it tends to work better for them than anger, whereas the reverse holds true for men. To be clear, both anger and depression are triggered by adversity, but those events interact with an individual’s condition and their social environment in determining the precise response. As the authors note, the picture is likely to be a dynamic one; not one that’s as simple as “more strength = less depression” across the board. Of course, other factors that co-vary with physical strength and health – like attractiveness – could also being playing a roll in the relationship with depression, but since such matters aren’t spoken to directly by the data, the extent and nature of those other factors is speculative.

What I find very persuasive about this adaptive hypothesis, however – in addition to the reported data – is that many existing theories of depression would not make the predictions tested by Hagen & Rosenstrom (2016) in the first place. For example, those who claim something like, “depressed people perceive the world more accurately” would be at a bit of a loss to explain why those who perceive the world more accurately also seem to have lower upper-body strength (they might also want to explain why depressed people don’t perceive the world more accurately, either). A plausible adaptive hypothesis, on the other hand, is useful for guiding our search for, and understanding of, the proximate causes of depression.

References: Hagen, E.H. (2003). The bargaining model of depression. In: Genetic and Cultural Evolution of Cooperation, P. Hammerstein (ed.). MIT Press, 95-123

Hagen, E. & Rosenstrom, T. (2016). Explain the sex difference in depression with a unified bargaining model of anger and depression. Evolution, Medicine, & Public Health, 117-132

Sell, A., Tooby, J., & Cosmides, L. (2009). Formidability and the logic of human anger. Proceedings of the National Academy of Sciences, 106, 15073-78.

Chivalry Isn’t Dead, But Men Are

Posted on June 27, 2016 by Jesse Marczyk

In the somewhat-recent past, there was a vote in the Senate held on the matter of whether women in the US should be required to sign up for the selective service – the military draft – when they turn 18. Already accepted, of course, was the idea that men should be required to sign up; what appears to be a relatively less controversial idea. This represents yet another erosion of male privilege in modern society; in this case, the privilege of being expected to fight and die in armed combat, should the need arise. Now whether any conscription is likely to happen in the foreseeable future (hopefully not) is a somewhat different matter than whether women would be among the first drafted if that happened (probably not), but the question remains as to how to explain this state of affairs. The issue, it seems, is not simply one of whether men or women are better able to shoulder the physical demands of combat, however; it extends beyond military service into intuitions about real and hypothetical harm befalling men and women in everyday life. When it comes to harm, people seem to generally care less about it happening to men.

Meh

One anecdotal example of these intuitions I’ve encountered during my own writing is when an editor at Psychology Today removed an image in one my posts of a woman undergoing bodyguard training in China by having a bottle smashed over her head (which can be seen here; it’s by no means graphic). There was a concern expressed that the image was in some way inappropriate, despite my posting of other pictures of men being assaulted or otherwise harmed. As a research-minded individual, however, I want to go beyond simple anecdotes from my own life that confirm my intuitions into the empirical world where other people publish results that confirm my intuitions. While I’ve already written about this issue a number of times, it never hurts to pile on a little more. Recently, I came upon a paper by FeldmanHall et al (2016) that examined these intuitions about harm directed towards men and women across a number of studies that can help me do just that.

The first of the studies in the paper was a straightforward task: fifty participants were recruited from Mturk to respond to a classic morality problem called the footbridge dilemma. Here, the life of five people can be saved from a train by pushing one person in front of it. When these participants were asked whether they would push a man or woman to their death (assuming, I think, that they were going to push one of them), 88% of participants opted for killing the man. Their second study expanded a bit on that finding using the same dilemma, but asking instead how willing they would be (on a 1-10 scale) to push either a man, woman, or a person of unspecified gender without other options existing. The findings here with regard to gender were a bit less dramatic and clear-cut: participants were slightly more likely to indicate that they would push a man (M = 3.3) than a woman (M = 3.0), though female participants were nominally less likely to push a woman (roughly M = 2.3) than men were (roughly M = 3.8), perhaps counter to what might be predicted. That said, the sample size for this second study was fairly small (only about 25 per group), so that difference might not be worth making much over until more data is collected.

When faced with a direct and unavoidable trade-off between the welfare of men and women, then, the results overwhelmingly showed that the women were being favored; however, when it came to cases where men or women could be harmed alone, there didn’t seem to be a marked difference between the two. That said, that moral dilemma alone can only take us so far in understanding people’s interests about the welfare of others in no small part because of their life-and-death nature potentially introducing ceiling effects (man or woman, very few people are willing to throw someone else in front of a train). In other instances where the degree of harm is lowered – such as, say, male vs female genital cutting – differences might begin to emerge. Thankfully, FeldmanHall et al (2016) included an additional experiment that brought these intuitions out of the hypothetical and into reality while lowering the degree of harm. You can’t kill people to conduct psychological research, after all.

Yet…

In the next experiment, 57 participants were recruited and given £20. At the end of the experiment, any money they had would be multiplied by ten, meaning participants could leave with a total of £200 (which is awfully generous as far as these things go). As with most psychology research, however, there was a catch: the participants would be taking part in 20 trials where £1 was at stake. A target individual – either a man or a woman – would be receiving a painful electric shock, and the participants could give up some of that £1 to reduce its intensity, with the full £1 removing the shock entirely. To make the task a little less abstract, the participants were also forced to view videos of the target receiving the shocks (which, I think, were prerecorded videos of real shocks – rather than shocks in real time – but I’m not sure from my reading of the paper if that’s a completely accurate description).

In this study, another large difference emerged: as expected, participants interacting with female targets ended up keeping less money by the end (M = £8.76) than those interacting with male targets (M = £12.54; d = .82). In other words, the main finding of interest was that participants were willing to give up substantially more money to prevent women from receiving painful shocks than they were to help men. Interestingly, this was the case in spite of the facts that (a) the male target in the videos was rated more positively overall than the female target, and (b) in a follow-up study where participants provided emotional reactions to thinking about being a participant in the former study, the amount of reported aversion to letting the target suffer shocks was similar regardless of the target’s gender. As the authors conclude:

While it is equally emotionally aversive to hurt any individual—regardless of their gender—that society perceives harming women as more morally unacceptable, suggests that gender bias and harm considerations play a large role in shaping moral action.

So, even though people find harming others – or letting them suffer harm for a personal gain – to generally be an uncomfortable experience regardless of their gender, they are more willing to help/avoid harming women than they are men, sometimes by a rather substantial margin.

Now onto the fun part: explaining these findings. It doesn’t go nearly far enough as an explanation to note that “society condones harming men more than women,” as that just restates the finding; likewise, we only get so far by mentioning that people perceive men to have a higher pain tolerance than women (because they do), as that only pushes the question back a step to the matter of why men tolerate more pain than women. As for my thoughts, first, I think these findings highlight the importance of a modular understanding of psychological systems: our altruistic and moral systems are made up of a number of component pieces, each with a distinct function, and the piece that is calculating how much harm is generated is, it would seem, not the same piece deciding whether or not to do something about it. The obvious reason for this distinction is that alleviating harm to others isn’t always adaptive to the same extent: it does me more adaptive good to help kin relative to non-kin, friends relative to strangers, and allies relative to enemies, all else being equal.

“Just stay out of it; he’s bigger than you”

Second, it might well be the case that helping men, on average, tends to pay off less than helping women. Part of the reason for that state of affairs is that female reproductive potential cannot be replaced quite as easily as male potential; male reproductive success is constrained by the number of available women much more than female potential is by male availability (as Chris Rock put it, “any money spent on dick is a bad investment“). As such, men might become particularly inclined to invest in alleviating women’s pain as a form of mating effort. The story clearly doesn’t end there, however, or else we would predict men being uniquely likely to benefit women, rather than both sexes doing similarly. This raises two additional possibilities to me: one of these is that, if men value women highly as a form of mating effort, that increased social value could also make women more valuable to other women in turn. To place that in a Game of Thrones example, if a powerful house values their own children highly, non-relatives may come to value those same children highly as well in the hopes of ingratiating themselves to – or avoiding the wrath of – the child’s family.

The other idea that comes to mind is that men are less willing to reciprocate aid that alleviated their pain because to do so would be an admission of a degree of weakness; a signal that they honestly needed the help (and might in the future as well), which could lower their relative status. If men are less willing to reciprocate aid, that would make men worse investments for both sexes, all else being equal; better to help out the person who would experience more gratitude for your assistance and repay you in turn. While these explanations might or might not adequately explain these preferential altruistic behaviors directed towards women, I feel they’re worthwhile starting points.

References: FeldmanHall, O., Dalgleish, T., Evans, D., Navrady, L., Tedeschi, E., & Mobbs, D. (2016). Moral chivalry: Gender and harm sensitive predict costly altruism. Social Psychological & Personality Science, DOI: 10.1177/1948550616647448

Sexism, Testing, And “Academic Ability”

Posted on June 14, 2016 by Jesse Marczyk

When I was teaching my undergraduate course on evolutionary psychology, my approach to testing and assessment was unique. You can read about that philosophy in more detail here, but the gist of my method was specifically avoiding multiple-choice formats in favor of short-essay questions with unlimited revision ability on the part of the students. I favored this exam format for a number of reasons, chief among which was that (a) I didn’t feel multiple choice tests were very good at assessing how well students understood the material (memorization and good guessing does not equal understanding), and (b) I didn’t really care about grading my students as much as I cared about getting them to learn the material. If they didn’t grasp it properly on their first try (and very few students do), I wanted them to have the ability and motivation to continue engaging with it until they did get it right (which most eventually did; the class average for each exam began around a 70 and rose to a 90). For the purposes of today’s discussion, the important point here is that my exams were a bit more cognitively challenging than is usual and, according to a new paper, that means I had unintentionally biased my exams in ways that disfavor “historically underserved groups” like women and the poor.

Oops…

What caught my eye about this particular paper, however, was the initial press release that accompanied it. Specifically, the authors were quoted as saying something I found, well, a bit queer:

“At first glance, one might assume the differences in exam performance are based on academic ability. However, we controlled for this in our study by including the students’ incoming grade point averages in our analysis,”

So the authors appear to believe that a gap in performance on academic tests arises independent of academic abilities (whichever those entail). This raised the immediate question in my mind of how one knows that abilities are the same unless one has a method of testing them. It seems a bit strange to say that abilities are the same on the basis of one set of tests (those that provided incoming GPAs), but then to continue to suggest that abilities are the same when a different set of tests provides a contrary result. In the interests of settling my curiosity, I tracked the paper down to see what was actually reported; after all, these little news blurbs frequently get the details wrong. Unfortunately, this one appeared to capture the author’s views accurately.

So let’s start by briefly reviewing what the authors were looking at. The paper, by Wright et al (2016), is based on data collected from three-years worth of three introductory biology courses spanning 26 different instructors, approximately 5,000 students, and 87 different exams.Without going into too much unnecessary detail, the tests were assessed by independent raters for how cognitively challenging they were, their format, and the students were classified according to their gender and socio-economic status (SES; as measured by whether they qualified for a financial aid program). In order to attempt and control for academic ability, Wright et al (2016) also looked at the freshman-year GPA of the students coming into the biology classes (based on approximately 45 credits, we are told). Because the authors controlled for incoming GPA, they hope to persuade the reader of the following:

This implies that, by at least one measure, these students have equal academic ability, and if they have differential outcomes on exams, then factors other than ability are likely influencing their performance.

Now one could argue that there’s more to academic ability than is captured by a GPA – which is precisely why I will do so in a minute – but let’s continue on with what the authors found first.

Cognitive challenging test were indeed, well, more challenging. A statistically-average male student, for instance, would be expected to do about 12% worse on the most challenging test in their sample, relative to the easiest one. This effect was not the same between genders, however. Again, using statistically-average men and women, when the tests were the least cognitively challenging, there was effectively no performance gap (about a 1.7% expected difference favoring men); however, when the tests were the most cognitively challenging, that expected gap rose to an astonishing expected…3.2% difference. So, while the gender difference just about nominally doubled, in terms of really mattering in any practical sense of the word, its size was such that it likely wouldn’t be noticed unless one was really looking for it. A similar pattern was discovered for SES: when the tests were easy, there was effectively no difference between those low or high in SES (1.3% favoring those higher); however, when the tests were about maximally challenging, this expected difference rose to about 3.5%.

Useful for both spotting statistical blips and burning insects

There’s a lot to say about these results and how they’re framed within the paper. First, as I mentioned, they truly are minor differences; there are very few cases were a 1-3% difference in test scores is going to make-or-break a student, so I don’t think there’s any real reason to be concerned or to adjust the tests; not practically, anyway.

However, there are larger, theoretical issues looming in the paper. One of these is that the authors use the phrase “controlled for academic ability” so often that a reader might actually come to believe that’s what they did from simple repetition. The problem here, of course, is that the authors did not control for that; they controlled for GPA. Unfortunately for Wright et al’s (2016) presentation, those two things are not synonyms. As I said before, it is strange to say that academic ability is the same because one set of tests (incoming GPA) says they are while another set does not. The former set of tests appear to be privileged for no sound reason. Because of that unwarranted interpretation, the authors lose (or rather, purposefully remove) the ability to talk about how these gaps might be due to some performance difference. This is a useful rhetorical move if one is interested in doing advocacy – as it implies the gap is unfair and ought to be fixed somehow – but not if one is seeking the truth of the matter.

Another rather large issue in the paper is that, as far as I could tell, the authors predicted they would find these effects without ever really providing an explanation as for how or why that prediction arose. That is, what drove their expectation that men would outperform women and the rich outperform the poor? This ends up being something of a problem because, at the end of the paper, the authors do float a few possible (untested) explanations for their findings. The first of these is stereotype threat: the idea that certain groups of people will do poorly on tests because of some negative stereotype about their performance. This is a poor fit for the data for two reasons: first, while Wright et al (2016) claim that stereotype is “well-documented”, it actually fails to replicate (on top of not making much theoretical sense). Second, even if it was a real thing, stereotype threat, as it typically studied, requires that one’s sex be made salient prior to the test. As I encountered a total of zero tests during my entire college experience that made my gender salient, much less my SES, I can only assume that the tests in question didn’t do it either. In order for stereotype threat to work as an explanation, then, women and the poor would need to be under relative constant stereotype threat. In turn, this would make documenting and student stereotype threat in the first place rather difficult, as you could never have a condition where your subjects were not experiencing it. In short, then, stereotype threat seems like a bad fit.

The other explanations that are put forth for this gender difference are the possibility that women and poor students have more fixed views of intelligence instead of growth mindsets, so they withdraw from the material when challenged rather than improve (i.e., “we need to change their mindsets to close this daunting 2% gap), or the possibility that the test questions themselves are written in ways that subtly bias people’s ability to think about them (the example the authors raise is that a question written about applying some concept to sports might favor men, relative to women, as men tend to enjoy sports more). Given that the authors did have access to the test questions, it seems that they could have examined that latter possibility in at least some detail (minimally, perhaps, by looking at whether tests written by female instructors resulted in different outcomes than those written by male ones, or by examining the content of the questions themselves to see if women did worse on gendered ones). Why they didn’t conduct such analyses, I can’t say.

Maybe it was too much work and they lacked a growth mindset

In summary, these very minor average differences that were uncovered could easily be chalked up – very simply – to GPA not being a full measure of a student’s academic ability. In fact, if the tests determining freshman GPA aren’t the most cognitively challenging (as one might well expect, given that students would have been taking mostly general introductory courses with large class sizes), then this might make the students appear to be more similar in ability than they actually were. The matter can be thought of using this stereotypically-male example (that will assuredly hinder women’s ability to think about it): imagine I tested people in a room with weights ranging from 1-15 pounds and asked them to curl each one time. This would give me a poor sense for any underlying differences in strength because the range of ability tested was restricted. Provided I were to ask them to do the same with weights ranging from 1-100 pounds the next week, I might conclude that it’s something about the weights – and not people’s abilities – when it came to figuring out why differences suddenly emerged (since I mistakenly believe I already controlled for their abilities the first time).

Now I don’t know if something like that is actually responsible, but if the tests determining freshman GPA were tapping the same kinds of abilities to the same degrees as those in the biology courses studied, then controlling for GPA should have taken care of that potential issue. Since controlling for GPA did not, I feel safe assuming there being some difference in the tests in terms of what abilities they’re measuring.

References: Wright, C., Eddy, S., Wenderoth, M., Abshire, E., Blankenbiller, M., & Brownell, S. (2016). Cognitive difficulty and format of exams predicts gender and socioeconomic gaps in exam performance of students in introductory biology courses. Life Science Education, 15.

Psychology Research And Advocacy

Posted on May 1, 2016 by Jesse Marczyk

I get the sense that many people get a degree in psychology because they’re looking to help others (since most clearly aren’t doing it for the pay). For those who get a degree in the clinical side of the field, this observation seems easy to make; at the very least, I don’t know of any counselors or therapists who seek to make their clients feel worse about the state their life is in and keep them there. For those who become involved in the research end of psychology, I believe this desire to help others is still a major motivator. Rather than trying to help specific clients, however, many psychological researchers are driven by a motivation to help particular groups in society: women, certain racial groups, the sexually promiscuous, the outliers, the politically liberal, or any group that the researcher believes to be unfairly marginalized, undervalued, or maligned. Their work is driven by a desire to show that the particular group in question has been misjudged by others, with those doing the misjudging being biased and, importantly, wrong. In other words, their role as a researcher is often driven by their role as an advocate, and the quality of their work and thinking can often take a back seat to their social goals.

When megaphones fail, try using research to make yourself louder

Two such examples are highlighted in a recent paper by Eagly (2016), both of which can broadly be considered to focus on the topic of diversity in the workplace. I want to summarize them quickly before turning to some of the other facets of the paper I find noteworthy. The first case concerns the prospect that having more women on corporate boards tends to increase their profitability, a point driven by a finding that Fortune 500 companies in the top quarter of female representation on boards of directors performed better than those in the bottom quarter of representation. Eagly (2016) rightly notes that such a basic data set would be all but unpublishable in academia for failing to do a lot of important things. Indeed, when more sophisticated research was considered in a meta-analysis of 140 studies, the gender diversity of the board of directors had about as close to no effect as possible on financial outcomes: the average correlations across all the studies ranged from about r = .01 all the way up to r = .05 depending on what measures were considered. Gender diversity per se seemed to have no meaningful effect despite a variety of advocacy sources claiming that increasing female representation would provide financial benefits. Rather than considering the full scope of the research, the advocates tended to cite only the most simplistic analyses that provided the conclusion they wanted (others) to hear.

The second area of research concerned how demographic diversity in work groups can affect performance. The general assumption that is often made about diversity is that it is a positive force for improving outcomes, given that a more cognitively-varied group of people can bring a greater number of skills and perspectives to bear on solving tasks than more homogeneous groups can. As it turns out, however, another meta-analysis of 146 studies concluded that demographic diversity (both in terms of gender and racial makeup) had effectively no impact on performance outcomes: the correlation for gender was r = -.01 and was r = -.05 for racial diversity. By contrast, differences in skill sets and knowledge had a positive, but still very small effect (r = .05). In summary, findings like these would suggest that groups don’t get better at solving problems just because they’re made up of enough [men/women/Blacks/Whites/Asians/etc]. Diversity in demographics per se, unsurprisingly, doesn’t help to magically solve complex problems.

While Eagly (2016) appears to generally be condemning the role of advocacy in research when it comes to getting things right (a laudable position), there were some passages in the paper that caught my eye. The first of these concerns what advocates for causes should do when the research, taken as a whole, doesn’t exactly agree with their preferred stance. In this case, Eagly (2016) focuses on the diversity research that did not show good evidence for diverse groups leading to positive outcomes. The first route one might take is to simply misrepresent the state of the research, which is obviously a bad idea. Instead, Eagly suggests advocates take one of two alternative routes: first, she recommends that researchers might conduct research into more specific conditions under which diversity (or whatever one’s preferred topic is) might be a good thing. This is an interesting suggestion to evaluate: on the one hand, people would often be inclined to say it’s a good idea; in some particular contexts diversity might be a good thing, even if it’s not always, or even generally, useful. This wouldn’t be the first time effects in psychology are found to be context-dependent. On the other hand, this suggestion also runs some serious risks of inflating type 1 errors. Specifically, if you keep slicing up data and looking at the issue in a number of different contexts, you will eventually uncover positive results even if they’re just due to chance. Repeated subgroup or subcontext analysis doesn’t sound much different from the questionable statistical practices currently being blamed for psychology’s replication problem: just keep conducting research and only report the parts of it that happened to work, or keep massaging the data until the right conclusion falls out.

“…the rest goes in the dumpster out back”

Eagly’s second suggestion I find a bit more worrisome: arguing that relevant factors – like increases in profits, productivity, or finding better solutions – aren’t actually all that relevant when it comes to justifying why companies should increase diversity. What I find odd about this is that it seems to suggest that the advocates begin with their conclusion (in this case, that diversity in the work force ought to be increased) and then just keep looking for ways to justify it in spite of previous failures to do so. Again, while it is possible that there are benefits to diversity which aren’t yet being considered in the literature, bad research would likely result from a process where someone starts their analysis with the conclusion and keeps going until they justify it to others, no matter how often it requires shifting the goal posts. A major problematic implication with that suggestion mirrors other aspects of the questionable psychology research practices I mentioned before: when a researcher finds the conclusion they’re looking for, they stop looking. They only collect data up until the point it is useful, which rigs the system in favor of finding positive results where there are none. That could well mean, then, that there will be negative consequences to these diversity policies which are not being considered.

What I think is a good example of this justification problem leading to shoddy research practices/interpretation follows shortly thereafter. In talking about some of these alternative benefits that more female hires might have, Eagly (2016) notes that women tend to be more compassionate and egalitarian than men; as such, hiring more women should be expected to increase less-considered benefits, such as a reduction in the laying-off of employees during economic downturns (referred to as labor hoarding), or more favorable policies towards time off for family care. Now something like this should be expected: if you have different people making the decisions, different decisions will be made. Forgoing for the moment the question of whether those different policies are better, in some objective sense of the word, if one is interested in encouraging those outcomes (that is, they’re preferred by the advocate) then one might wish to address those issue directly, rather than by proxy. That is to say if you are looking to make the leadership of some company more compassionate, then it makes sense to test for and hire more compassionate people, not hiring more women under the assumption you will be increasing compassion.

This is an important matter because people are not perfect statistical representations of the groups to which they belong. On average, women may be more compassionate than men; the type of woman who is interested in actively pursuing a CEO position in a Fortune 500 company might not be as compassionate as your average woman, however, and, in fact, might even be less compassionate than a particular male candidate. What Eagly (2016) has ended up reaching, then, is not a justification for hiring more women; it’s a justification for hiring compassionate or egalitarian people. What is conspicuously absent from this section is a call for more research to be conducted on contexts in which men might be more compassionate than women; once the conclusion that hiring women is a good thing has been justified (in the advocate’s mind, anyway), the concerns for more information seem to sputter out. It should go without saying, but such a course of action wouldn’t be expected to lead to the most accurate scientific understanding of our world.

The solution to that problem being more diversity, of course..

To place this point in another quick example, if you’re looking to assemble a group of tall people, it would be better to use people’s height when making that decision rather than their sex, even if men do tend to be taller than women. Some advocates might suggest that being male is a good enough proxy for height, so you should favor male candidates; others would suggest that you shouldn’t be trying to assemble a group of tall people in the first place, as short people offer benefits that tall ones don’t; other still will argue that it doesn’t matter if short people don’t offer benefits as they should be preferentially selected to combat negative attitudes towards the short regardless (at the expense of selecting tall candidates). For what it’s worth, I find the attitude of “keep doing research until you justify your predetermined conclusion” to be unproductive and indicative of why the relationship between advocates and researchers ought not be a close one. Advocacy can only serve as a cognitive constraint that decreases research quality as the goal of advocacy is decidedly not truth. Advocates should update their conclusions in light of the research; not vice versa.

References: Eagly, A. (2016). When passionate advocates meet research on diversity, does the honest broker stand a chance? Journal of Social Issues, 72, 199-222.

Men Are Better At Selling Things On eBay

Posted on February 26, 2016 by Jesse Marczyk

When it comes to gender politics, never take the title of the piece at face value; or the conclusions for that matter.

In my last post, I mentioned how I find some phrases and topics act as red flags regarding the quality of research one is liable to encounter. Today, the topic is gender equality – specifically some perceived (and, indeed, some rather peculiar) discrimination against women – which is an area not renowned for its clear-thinking or reasonable conclusions. As usual, the news articles circulating this piece of research made some outlandish claim that lacks even remote face validity. In this case, the research in question concludes that people, collectively, try to figure out the gender of the people selling things on eBay so as to pay women substantially less than men for similar goods. Those who found such a conclusion agreeable to their personal biases spread it to others across social media as yet another example of how the world is an evil, unfair place. So here I am again, taking a couple recreational shots at some nonsense story of sexism.

Just two more of these posts and I get a free smoothie

The piece question today is an article from Kricheli-Katz & Regev (2016) that examined data from about 1.1 million eBay auctions. The stated goals of the authors involve examining gender inequality in online product markets, so at least we can be sure they’re going into this without an agenda. Kricheli-Katz & Regev (2016) open their piece by talking about how gender inequality is a big problem, launching their discussion almost immediately with a rehashing of that misleading 20% pay gap statistic that’s been floating around forever. As that claim has been dissected so many times at this point, there’s not much more to say about it other than (a) when controlling for important factors, it drops to single digits and (b) when you see it, it’s time to buckle in for what will surely be an unpleasant ideological experience. Thankfully, the paper does not disappoint in that regard, promptly suggesting that women are discriminated against in online markets like eBay.

So let’s start by considering what the authors did, and what they found. First, Kricheli-Katz & Regev (2016) present us with their analysis of eBay data. They restricted their research to auctions only, where sellers will post an item and any subsequent interaction occurs between bidders alone, rather than between bidders and sellers. On average, they found that the women had about 10 fewer months of experience than men, though the accounts of both sexes had existed for over nine years of age, and women also had very-slightly better reputations, as measured by customer feedback. Women also tended to set slightly higher initial prices than men for their auctions, controlling for the product being sold. As such, women also tended to receive slightly fewer bids on their items, and ultimately less money per sale when they ended.

However, when the interaction between sex and product type (new or used) was examined, the headline-grabbing result appeared: while women netted a mere 3% less on average for used products than men, they netted a more-impressive 20% less for new products (where, naturally, one expects products to be the same). Kricheli-Katz & Regev (2016) claim that the discrepancy in the new-product case are due to beliefs about gender. Whatever these unspecified beliefs are, they cause people to pay women about 20% less for the same item. Taking that idea on face value for a moment, why does that gap all but evaporate in the used category of sales? The authors attribute that lack of a real difference to an increased trust people have in women’s descriptions of the condition of their products. So men trust women more when it comes to used goods, but pay them less for new ones when trust is less relevant. Both these conclusions, as far as I can see from the paper, have been pulled directly out of thin air. There is literally no evidence presented to support them: no data; not citations; no anything.

I might have found the source of their interpretations

By this point, anyone familiar with how eBay works is likely a bit confused. After all, the sex of the seller is at no point readily apparent in almost any listings. Without that crucial piece of information, people would have a very difficult time discriminating on the basis of it. Never fear, though; Kricheli-Katz & Regev (2016) report the results of a second study where they pulled 100 random sellers from their sample and asked about 400 participants to try and determine the sex of sellers in question. Each participant offered their guesses about five profiles, for a total of 2000 attempts. About 55% of the time, participants got the sex right, 9% of the time they got it wrong, and the remaining 36% of the time, they said they didn’t know (which, since they don’t know, also means they got it wrong). In short, people couldn’t determine the sex reliably about half the time. The authors do mention that the guesses got better as participants viewed more items that the seller had posted, however.

So here’s the story they’re trying to sell: When people log onto eBay, they seek out a product they’re looking to buy. When they find a seller listing the product, they examine the seller’s username, the listing in question, and their other listings in their store to attempt and discern the sex of the seller. Buyers subsequently lower their willingness to pay for an item by quite a bit if they see it is being sold by a woman, but only if it’s new. In fact, since women made 20% less, the actual reduction in willingness to pay must be larger than that, as sex can only be determined about half of the time reliably when people are trying. Buyers do all this despite even trusting female sellers more. Also, I do want to emphasis the word they, as this would need to be a pretty collective action. If it wasn’t a fairly universal response among buyers, the prices of female-sold items would eventually even out with the male price, as those who discriminated less against women would be drawn towards the cheaper prices and bump them back up.

Not only do I not buy this story – not even a little – but I wouldn’t pay the authors less for it because they happen to be women if I was looking to make a purchase. While people might be able to determine the sex of the seller on eBay sometimes, when they’re specifically asked to do so, that does not mean people engage in this sort of behavior naturally.

Finally, Kricheli-Katz & Regev (2016) report the results of a third study, asking 100 participants how much they value a $100 gift card being sold by either an Alison or a Brad. Sure enough, people were willing to pay Alison less for the card: she got a mere $83 to Brad’s $87; a 5% difference. I’d say someone should call the presses, but it looks like they already did, judging from the coverage this piece has received. Now this looks like discrimination – because it is – but I don’t think it’s based on sex per se. I say that because, earlier in the paper, Kricheli-Katz & Regev (2016) also report that women as buyers on eBay, tended to pay about 3% more than men for comparable goods. To the extent that the $4 difference in valuation is meaningful here, there are two things to say about it. First, it may well represent the fact that women aren’t as willing to negotiate prices in their favor. Indeed, while women were 23% of the sellers on eBay, they only represented 16% of the auctions with a negotiation component. If that’s the case, people are likely willing to pay less to women because they perceive (correctly) some population differences in their ability to get a good deal. I suspect if you gave them individuating information about the seller’s abilities, sex would stop mattering even 5%. Second, that slight, 5% difference would by no means account for the 20% gap the authors report finding with respect to new product sales; not even close.

But maybe your next big idea will work out better…

Instead, my guess is that in spite of the authors’ use of the word “equally qualified” when referring to the men and women in their seller sample, there were some important differences in listings the buyers noticed; the type of differences that you can’t account for when you’re looking at over a million of them and rough control measures aren’t effective. Kricheli-Katz & Regev (2016) never seemed to consider – and I mean really consider – the possibility that something about these listings, something they didn’t control for, might have been driving sale price differences. While they do control for factors like the seller’s reputation, experience, number of pictures, year of the sale, and some of the sentiments expressed by words in the listing (how positive or negative it is), there’s more to making a good listing than that. A more likely story is that differences in sale prices reflect different behaviors on the part of male and female sellers (as we already know others differences exist in the sample), as the alternative story attempting to be championed would require a level of obsession with gender-based discrimination in the population so wide and deep that we wouldn’t need to research it; it would be plainly obvious to everyone already.

Then again, perhaps it’s time I make my way over to eBay to pick up a new tinfoil hat.

References: Kricheli-Katz, T. & Regev, T. (2016). How many cents on the dollar? Women and men in product markets. Science Advances, 2, DOI: 10.1126/sciadv.1500599

Thoughtful Suggestions For Communicating Sex Differences

Posted on February 13, 2016 by Jesse Marczyk

Having spent quite a bit of time around the psychological literature – both academic and lay pieces alike – there are some words or phrases I can no longer read without an immediate, knee-jerk sense of skepticism arising in me, as if they taint everything that follows and precedes them. Included in this list are terms like bias, stereotype, discrimination, and, for the present purposes, fallacy. The reason these words elicit such skepticism on my end is due to the repeated failure of people using them to consistently produce high-quality work or convincing lines of reasoning. This is almost surely due to the perceived social stakes when such terms are being used: if you can make members of a particular group appear uniquely talented, victimized, or otherwise valuable, you can subsequently direct social support towards and away from various ends. When the goal of argumentation becomes persuasion, truth is not a necessary component and can be pushed aside. Importantly, the people engaged in such persuasive endeavors do not usually recognize they are treating information or arguments differently, contingent on how it suits their ends.

“Of course I’m being fair about this”

There are few areas of research that seem to engender as much conflict – philosophically and socially – as sex differences, and it is here those words appear regularly. As there are social reasons people might wish to emphasize or downplay sex differences, it has steadily become impossible for me to approach most of the writing I see on the topic with the assumption it is at least sort of unbiased. That’s not to say every paper is hopelessly mired in a particular worldview, rejecting all contrary data, mind you; just that I don’t expect them to reflect earnest examinations of the capital-T, truth. Speaking of which, a new paper by Maney (2016) recently crossed my desk; a the paper that concerns itself with how sex differences get reported and how they ought to be discussed. Maney (2016) appears to take a dim view of the research on sex differences in general and attempts to highlight some perceived fallacies of people’s understandings of them. Unfortunately, for someone trying and educate people about issues surrounding the sex difference literature, the paper does not come off as one written by someone possessing a uniquely deep knowledge of the topic.

The first fallacy Maney (2016) seeks to highlight is the idea that sexes form discrete groups. Her logic for explaining why this is not the case revolves around the idea that while the sexes do indeed differ to some degree on a number of traits, they also often overlap a great deal on them. Instead, Maney (2016) argues that we ought to not be asking whether the sexes differ on a given trait, but rather by how much they do. Indeed, she even puts the word ‘differences’ in quotes, suggesting that these ‘differences’ between sexes aren’t, in many cases, real. I like this brief section, as it highlights well why I have grown to distrust words like fallacy. Taking her points in reverse order, if one is interested in how much groups (in this case, sexes) differ, then one must have, at least implicitly, already answered the question as whether or not they do. After all, if the sexes did not differ, it would pointless to talk about the extent of those non-differences; there simply wouldn’t be variation. Second, I know of zero researchers whose primarily interest resides in answering the question of whether the sexes differ to the exclusion of the extent of those differences. As far as I’m aware, Maney (2016) seems to be condemning a strange class of imaginary researchers who are content to find that a difference exists and then never look into it further or provide more details. Finally, I see little value in noting that the sexes often overlap a great deal when it comes to explaining the areas in which they do not. In much the same way, if you were interested in understanding the differences between humans and chimpanzees, you are unlikely to get very far by noting that we share a great deal of genes in common. Simply put, you can’t explain differences with similarities. If one’s goal is to minimize the perception of differences, though, this would be a helpful move.

The second fallacy that Maney (2016) seeks to tackle is that idea that the cause of a sex differences in behavior can be attributed to differing brain structures. Her argument on this front is that it is logically invalid to do the following: (1) note that some brain structure between men and women differ, (2) note that this brain structure is related to a given behavior on which they also differ, and so (3) conclude that a sex difference in brain structure between men and women is responsible for that different behavior. Now while this argument is true within the rules of formal logic, it is clear that differences in brain structure will result in differences in behavior; the only way that idea could be false would be if brain structure was not connected to behavior, and I don’t know of anyone crazy enough to try and make that argument. The researchers engaging in the fallacy thus might not get the specifics right all the time, but their underlying approach is fine: if a difference exists in behavior (between sexes, species, or individuals), there will exist some corresponding structural differences in the brain. The tools we have for studying the matter are a far cry from perfect, making inquiry difficult, but that’s a different issue. Relatedly, then, noting that some formal bit of logic is invalid is assuredly not the same thing as demonstrating that a conclusion is incorrect or the general approach misguided. (Also worth noting is that the above validity issue stops being a problem when conclusions are probabilistic, rather than definitive.)

“Sorry, but it’s not logical to conclude his muscles might determine his strength”

The third fallacy Maney (2016) addresses is the idea that sex differences in the brain must be preprogrammed or fixed, attempting to dispel the notion that sex differences are rooted in biology and thus impervious to experience. In short, she is arguing against the idea of hard genetic determinism. Oddly enough, I have never met a single genetic determinist in person; in fact, I’ve never even read an article that advanced such an argument (though maybe I’ve just been unusually lucky…). As every writer on the subject I have come across has emphasized – often in great detail – the interactive nature of genes and environments in determining the direction of development, it again seems like Maney (2016) is attacking philosophical enemies that are more imagined than real. She could have, for instance, quoted researchers who made claims along the lines of, “trait X is biologically-determined and impervious to environmental inputs during development”; instead, it looks like everyone she cites for this fallacy is making a similar criticism of others, rather than anyone making the claims being criticized (though I did not check those references myself, so I’m not 100% there). Curiously, Maney (2016) doesn’t seem to be at all concerned about the people who, more-or-less, disregard the role of genetics or biology in understanding human behavior; at the very least she doesn’t devote any portion of her paper to addressing that particular fallacy. That rather glaring omission – coupled with what she does present – could leave one with the impression that she isn’t really trying to present a balanced view of the issue.

With those ostensibly fallacies out of the way, there are a few other claims worth mentioning in the paper. The first is that Maney (2016) seems to have a hard time reconciling the idea of sexual dimorphisms – traits that occur in one form typical of males and one typical of females – with the idea that the sexes overlap to varying degrees on many of them, such as height. While it’s true enough that you can’t tell someone’s sex for certain if you only know their height, that doesn’t mean you can’t make some good guesses that are liable to be right a lot more often than they’re wrong. Indeed, the only dimorphisms she mentions are the presence of sex chromosomes, external genitalia, and gonads and then continues to write as if these were of little to no consequence. Much like height, however, there couldn’t be selection for any physical sex differences if the sexes did not behave differently. Since behavior is controlled by the brain, physical differences between the sexes, like height and genitalia, are usually also indicative of some structural differences in the brain. This is the case whether the dimorphism is one of degree (like height) or kind (like chromosomes).

Returning to the main point, outside of these all-or-none traits, it is unclear what Maney (2016) would consider a genuine difference, much less any clear justification for that standard. For example, she notes some research that found a 90% overlap in interhemispheric connectivity between the male and female distributions, but then seems to imply that the corresponding 10% non-overlap does not reflect a ‘real’ sex difference. We would surely notice a 10% difference in other traits, like height, IQ, or number of fingers but, I suppose in the realm of the brain, 10% just doesn’t cut it.

Maney (2016) also seems to take an odd stance when it comes to explanations for these differences. In one instance, she writes about a study on multitasking that found a sex difference favoring men; a difference which, we are told, was explained by a ‘much larger difference in video game experience,’ rather than sex per se. Great, but what are we to make of that ‘much larger’ sex difference in video game experience? It would seem that that finding too requires an explanation, and one is not present. Perhaps video game experience is explained more by, I don’t know, competitiveness than sex, but then what are we to explain competitiveness with? These kinds of explanations usually end up going nowhere in a hurry unless they eventually land on some kind of adaptive endpoint, as once a trait’s reproductive value is explained, you don’t need to go any further. Unfortunately, Maney (2016) seems to oppose evolutionary explanations for sex differences, scolding those who propose ‘questionable’ functional or evolutionary explanations for sex differences for being genetic determinists who see no role for sociocultural influences. In her rush to condemn those genetic determinists (who, again, I have never met or read, apparently), Maney’s (2016) piece appears to fall victim to the warning laid out by Tinbergen (1963) several decades ago: rather than seeking to improve the shape and direction of evolutionary, functional analyses, Maney (2016) instead recommends that people simply avoid them altogether.

“Don’t ask people to think about these things; you’ll only hurt their unisex brains”

This is a real shame, as evolutionary theory is the only tool available for providing a deeper understanding of these sex differences (as well as our physical and psychological form more generally). Just as species will differ in morphology and behavior to the extent they have faced different adaptive problems, so too will the sexes within a species. By understanding the different challenges faced by the sexes historically, one can get a much clearer sense as to where psychological and physical difference will – and will not – be expected to exist, as well as why (this extra level of ‘why’ is important, as it allows you to better figure out where an analysis has gone wrong if the predictions don’t work). Maney (2016), it would seem, even missed a golden opportunity within her paper to explain to her readers that evolutionary explanations complement, rather than supplant, more proximate explanations when quoting an abstract that seemed to contrast the two. I suspect this opportunity was missed because she is either legitimately unaware of that point, or does not understand it (judging from the tone of her paper), believing (incorrectly) instead that evolutionary means genetic, and therefore immutable. If that is the case, it would be rather ironic for someone who does not seem to have much understanding of the evolutionary literature lecturing others on how it ought to be reported.

References: Maney, D. (2016). Perils and pitfalls of reporting sex differences. Philosophical Transactions B, 371, 1-11.

Tinbergen, N. (1964). On aims and methods of ethology. Zeitschrift für Tierpsychologie, 20, 410-433.

Stereotyping Stereotypes

Posted on July 20, 2015 by Jesse Marczyk

I’ve attended a number of talks on stereotypes; I’ve read many more papers in which the word was used; I’ve seen still more instances where the term has been used outside of academic settings in discussions or articles. Though I have no data on hand, I would wager that the weight of this academic and non-academic literature leans heavily towards the idea that stereotypes are, by in large, inaccurate. In fact, I would go a bit farther than that: the notion that stereotypes are inaccurate seems to be so common that people often see little need in ensuring any checks were put into place to test for their accuracy in the first place. Indeed, one of my major complaints about the talks on stereotypes I’ve attended is just that: speakers never mentioning the possibility that people’s beliefs about other groups happen to, on the whole, match up to reality fairly well in many cases (sometimes they have mentioned this point as an afterthought but, from what I’ve seen, that rarely translates into later going out and testing for accuracy). To use a non-controversial example, I expect that many people believe men are taller than women, on average, because men do, in fact, happen to be taller.

Pictured above: not a perceptual bias or an illusory correlation

This naturally raises the question of how accurate stereotypes – when defined as beliefs about social groups – tend to be. It should go without saying that there will not be a single answer to that question: accuracy is not an either/or type of matter. If I happen to think it’s about 75 degrees out when the temperature is actually 80, I’m more accurate in my belief than if the temperature was 90. Similarly, the degree of that accuracy should be expected to vary on the intended nature of the stereotype in question; a matter to which I’ll return later. That said, as I mentioned before, quite a bit of the exposure I’ve had to the subject of stereotypes suggests rather strongly and frequently that they’re inaccurate. Much of the writing about stereotypes I’ve encountered focuses on notions like “tearing them down”, “busting myths”, or about how people are unfairly discriminated against because of them; comparatively little of that work has focused on instances in which they’re accurate which, one would think, would represent the first step in attempting to understand them.

According to some research reviewed by Jussim et al (2009), however, that latter point is rather unfortunate, as stereotypes often seem to be quite accurate, at least by the standards set by other research in psychology. In order to test for the accuracy of stereotypes, Jussim et al (2009) report on some empirical studies that met two key criteria: first, the research had to compare people’s beliefs about a group to what that group was actually like; that much is a fairly basic requirement. Second, the research had to use an appropriate sample to determine what that group was actually like. For example, if someone was interested in people’s beliefs about some difference between men and women in general, but only tested these beliefs against data from a convenience sample (like men and women attending the local college), this could pose something of a problem to the extent that the convenience sample differs from the reference group of people holding the stereotypes. If people, by in large, have accurate stereotypes, researchers would never know if they make use of a non-represented reference group.

Within the realm of racial stereotypes, Jussim et al (2009) summarized the results of 4 papers that met this criteria. The majority of the results fell within what the authors consider “accurate” range (as defined by being 0-10% off from the criteria values) or near-misses (those between 10-20% off). Indeed, the average correlations between the stereotypes and criteria measures ranged from .53 to .93, which are very high, relative to the average correlation uncovered by psychological research. Even the personal stereotypes, while not as high, were appreciably accurate, ranging from .36 to .69. Further, while people weren’t perfectly accurate in their beliefs, those who overestimated differences between racial groups tended be balanced out by those who underestimated those differences in most instances. Interestingly enough, people’s stereotypes about group differences tended to be a bit more accurate than their within group stereotypes.

“Ha! Look at all that inaccurate shooting. Didn’t even come close”

The same procedure was used to review research on gender stereotypes as well, yielding 7 papers with larger sample sizes. A similar set of results emerged: the average stereotype was rather accurate, with correlations ranging between .34 to .98, most of which hovered in the range of .7. Individual stereotypes were again less accurate, but most were still heading in the right direction. To put those numbers in perspective, Jussim et al (2009) summarized a meta-analyses examining the average correlation found in psychological research. According to that data, only 24% of social psychology effects represent correlations larger than .3 and a mere 5% exceeded a correlation of .5; the corresponding numbers for averaged stereotypes were 100% of the reviewed work meeting the .3 threshold, and about 89% of the correlations exceeding the .5 threshold (personal stereotypes at 81% and 36%, respectively).

Now neither Jussim et al (2009) or I would claim that all stereotypes are accurate (or at least reasonably close); no one I’m aware of has. This brings us to the matter of when we should expect stereotypes to be accurate and when we should expect them to fall shorter of that point. As an initial note, we should always expect some degree of inaccuracy in stereotypes – indeed, in all beliefs about the world – to the extent that gathering information takes time and improving accuracy is not always worth that investment in the adaptive sense. To use a non-biological example, spending an extra three hours studying to improve one’s grade on a test from a 70 to a 90 might seem worth it, but the same amount of time used to improve from a 90 to a 92 might not. Similarly, if one lacks access to reliable information about the behavior of others in the first place, stereotypes should also tend to be relatively inaccurate. For this reason, Jussim et al (2009) note that cross-cultural stereotypes in national personalities tend to be among the most inaccurate, as people from, say, India, might have relatively little exposure to information about people from South Africa, and vice versa.

The second point to make on accuracy is that, to the extent that beliefs guide behavior and that behavior carries costs or benefits, we should expect beliefs to tend towards accuracy (again, regardless of whether they’re about social groups or the world more generally). If you believe, incorrectly, that group A is as likely to assault you as group B (the example that Jussim et al (2009) use involves biker gang members and ballerinas), you’ll either end up avoiding one group more than you need to, not being wary enough around one, or miss in both directions, all of which involves social and physical costs. One of the only cases in which being wrong might reliably carry benefits are contexts in which one’s inaccurate beliefs modifies the behavior of other people. In other words, stereotypes can be expected to be inaccurate in the realm of persuasion. Jussim et al (2009) make nods toward this possibility, noting that political stereotypes are among the least accurate ones out there, and that certain stereotypes might have been crafted specifically with the intent of maligning a particular group.

For instance…

While I do suspect that some stereotypes exist specifically to malign a particular group, that possibility does raise another interesting question: namely, why would anyone, let alone large groups of people, be persuaded to accept inaccurate stereotypes? For the same reason that people should prefer accurate information over inaccurate information when guiding their own behaviors, they should also be relatively resistant to adopting stereotypes which are inaccurate, just as they should be when it comes to applying them to individuals when they don’t fit. To the extent that a stereotype is of this sort (inaccurate), then, we should expect that it not be widely held, except in a few particular contexts.

Indeed, Jussim et al (2009) also review evidence that suggests people do not inflexibly make use of stereotypes, preferring individuating information when it’s available: according to the meta-analyses reviewed, the average influence of stereotypes on judgments hangs around r = .1 (which does not, in many instances, have anything to say about the accuracy of the stereotype; just the extent of its effect); by contrast, individuating information had an average effect of about .7 which, again, is much larger than the average psychology effect. Once individuating information is controlled for, stereotypes tend to have next to zero impact on people’s judgments of others. People appear to rely on personal information to a much higher degree than stereotypes, and often jettison ill-fitting stereotypes in favor of personal information. In other words, the knowledge that men tend to be taller than women does not have much of an influence on whether I think a particular women is taller than a particular man.

When should we expect that people will make the greatest use of stereotypes, then? Likely when they have access to the least amount of individuating information. This has been the case in a lot of the previous research on gender bias where very little information is provided about the target individual beyond their sex (see here for an example). In these cases, stereotypes represent an individual doing the best they can with limited information. In some cases, however, people express moral opposition to making use of that limited information, contingent on the group(s) it benefits or disadvantages. It is in such cases that, ironically, stereotypes might be stereotyped as inaccurate (or at least insufficiently accurate) to the greatest degree.

References: Jussim, L., Cain, T., Crawford, J., Harber, K., & Cohen, F. (2009). The unbearable accuracy of stereotypes. In Nelson, T. The Handbook of Prejudice, Stereotyping, and Discrimination (199-227). NY: Psychological Press.

Should We Expect Cross-Cultural Perceptual Errors?

Posted on May 5, 2015 by Jesse Marczyk

There was a rather interesting paper that crossed my social media feeds recently concerning stereotypes about women in science fields; a topic about which I have been writing lately. I’m going to do something I don’t usually do and talk about it briefly despite having just read the abstract and discussion section. The paper, by Miller, Eagly, and Linn (2014), reported on people’s implicit gender stereotypes about science, which associated science more readily with men, relative to women. As it turns out, across a number of different cultures, people’s implicit stereotypes corresponded fairly well to the actual representation of men and women in those fields. In other words, people’s perceptions, or at least their responses, tended to be accurate: if more men were associated with science psychologically, it seemed to be because more men also happened to work in science fields. In general, this is how we should expect the mind to work. While our minds might imperfectly gather information about the world, they should do their best to be accurate. The reasons for this accuracy, I suspect, have a lot to do with being right resulting in useful modifications of behaviors.

Being wrong about skateboarding skill, for instance, has some consequences

Whenever people propose psychological hypotheses that have to do with people being wrong, then, we should be a bit skeptical. A psychology designed in such a way so as to be wrong about the world consistently will, on the whole, tend to direct behavior in more maladaptive ways than a more accurate mind would. If one is positing that people are wrong about the world in some regard, it would require either that (a) there are no consequences for being wrong in that particular way or (b) there are some consequences, but the negative consequences are outweighed by the benefits. Most hypotheses for holding incorrect beliefs I have encountered tend towards the latter route, suggesting that some incorrect beliefs might outperform true beliefs in some fitness-relevant way(s).

One such hypothesis that I’ve written about before concerns error management theory. To recap, error management theory recognizes that some errors are costlier to make than others. To use an example in the context of the current paper I’m about to discuss, consider a case in which a man desires to have sex with a woman. The woman in question might or might not be interested in the prospect; the man might also perceive that she is interested or not interested. If the woman is interested and the man makes the mistake of thinking she isn’t, he has missed out on a potentially important opportunity to increase his reproductive output. On the other hand, if the woman isn’t interested and the man makes the mistake of thinking she is, he might waste some time and energy pursuing her unsuccessfully. These two mistakes do not carry equivalent costs: one could make the argument that a missed encounter is costlier on average, from a fitness standpoint, than an unsuccessful pursuit (depending, of course, on how much time and energy is invested in the pursuit).

Accordingly, it has been hypothesized that male psychology might be designed in such a way so as to over-perceive women’s sexual interest in them, minimizing the costs associated with making mistakes, multiplied by their frequency, rather than minimizing the number of mistakes one makes in total. While that sounds plausible at first glance, there is a rather important point worth bearing in mind when evaluating it: incorrect beliefs are not the only way to go about solving this problem: a man could believe, correctly, that a woman is not all that interested in him, but simply use a lower threshold for acceptable pursuits. Putting that into numbers, let’s say a woman has a 5% chance of having sex with the man in question: the man might not pursue any chance below 10%, and so could bias his belief upward to think he actually has a 10% chance; alternatively, he might believe she has about a 5% chance of having sex with him and decide to go after her anyway. It seems that the second route solves this problem more effectively, as a biased probability of success with a woman might have downstream effects on other pursuits.

Like on the important task of watching the road

Now in that last post I mentioned, it seems that the evidence that men over-perceive women’s sexual interest might instead be better explained by the hypothesis that women are underreporting their intentions. After all, we have no data on the probability of a woman having sex with someone given she did something like held his hand or bought him a present, so concluding that men over-perceive requires assuming that women report accurately (the previous evidence would also require that pretty much everyone else but the woman is wrong about her behavior, male or female). Some new evidence puts the hypothesis of male over-perception into even hotter water. A recent paper by Perilloux et al (2015) sought to test this over-perception bias cross-culturally, as most of the data bearing on it happens to have been derived from American samples. If men possess some adaptation designed for over-perception of sexual interest, we should expect to see it cross-culturally; it ought to be a human universal (as I’ve noted before, this doesn’t mean we should expect invariance in its expression, but we should at least find its presence).

Perilloux et al (2015) collected data from participants in Spain, Chile, and France, representing a total sample size of approximately 400 subjects. Men and women were given a list of 15 behaviors. They were asked to imagine they had been out on a few dates with a member of the opposite sex, and then about their estimates of having sex with them, given that this opposite sex individual engaged in those behaviors (from -3 being “extremely unlikely” to 3 being “extremely likely”). The results showed an overall sex difference in each country, with men tending perceive more sexual interest than women. While this might appear to support the idea that over-perception is a universal feature of male psychology, a closer examination of the data cast some doubt on that idea.

In the US sample, men perceived more sexual interest than women in 12 of the 15 items; in Spain, that number was 5, in Chile it was 2, and in France it was 1. It seemed that the question concerning whether someone bought jewelry was enough to driving this sex difference in both the French and Chilean samples. Rather than men over-perceiving women’s reported interests in general across a wide range of behaviors, it seemed that the cross-cultural sample’s differences were being driven by only a few behaviors; behaviors which are, apparently, also rather atypical for relationships in those countries (inasmuch as women don’t usually buy men jewelry). As for why there’s a greater correspondence between French and Chilean men and women’s reported likelihoods, I can’t say. However, that men from France and Chile seem to be rather accurate in their perceptions of female sexual intent would cast doubt on the idea that male psychology contains some mechanisms for sexual over-perception.

I’ll bet US men still lead in shooting accuracy, though

This paper helps make two very good points that, at first, might seem like they oppose each other, despite their complimentary nature. The first point is the obvious importance of cross-cultural research; one cannot simply take it for granted that a given effect will appear in other cultures. Many sex differences – like height and willingness to engage in casual sex – do, but some will not. The second point, however, is that hypotheses about function can be developed and even tested (albeit incompletely) in absence of data about their universality. Hypotheses about function are distinct from hypotheses about proximate form or development, though these different levels of analysis can often be used to inform others. Indeed, that’s what happened in the current paper, with Perilloux et al (2015) drawing the implicit hypothesis about universality from the hypothesis about ultimate functioning, using data about the former to inform their posterior beliefs about the latter. While different levels of analysis inform each other, they are nonetheless distinct, and that’s always worth repeating.

References: Perilloux, C., Munoz-Reyes, J., Turiegano, E., Kurzban, R., & Pita, M. (2015). Do (non-American) men overestimate women’s sexual intentions? Evolutionary Psychological Science, DOI 10.1007/s40806-015-0017-5

Miller, D., Eagly, A., & Linn, M., (2014). Women’s representation in science predicts national gender-science stereotypes: Evidence from 66 nations. Journal of Educational Psychology, http://dx.doi.org/10.1037/edu0000005

I Reject Your Fantasy And Substitute My Own

Posted on November 21, 2014 by Jesse Marczyk

I don’t think it’s a stretch to make the following generalization: people want to feel good about themselves. Unfortunately for all of us, our value to other people tends to be based on what we offer them and, since our happiness as a social species tends to be tethered to how valuable we are perceived to be by others, being happy can be more of chore than we would prefer. These valuable things need not be material; we could offer things like friendship or physical attractiveness, pretty much anything that helps fill a preference or need others have. Adding to the list of misfortunes we must suffer in the pursuit of happiness, other people in the world also offer valuable things to the people we hope to impress. This means that, in order to be valuable to others, we need to be particularly good at offering things to others people: either through being better at providing something than many people provide, or able to provide something relatively unique that others typically don’t. If we cannot match the contributions of others, then people will not like to spend time with us and we will become sad; a terrible fate indeed. One way to avoid that undesirable outcome, then, is to increase your level of competition to become more valuable to other people; make yourself into the type of person others find valuable. Another popular route, which is compatible with the first, is to condemn other people who are successful or promote the images of successful people. If there’s less competition around, then our relative ability becomes more valuable. On that note, Barbie is back in the news again.

“Finally; a new doll for my old one to tease for not meeting her standards!”

The Lammily doll has been making the rounds on various social media sites, marketed as the average Barbie, with the tag line: “average is beautiful”. Lammily is supposed to be proportioned so as to represent the average body of a 19-year-old woman. She also comes complete with stickers for young girls to attach to her body in order to give her acne, scars, cellulite, and stretch marks. The idea here seems to be that if young girls see a more average-looking doll, they will compare themselves less negatively to it and, hopefully, end up feeling better about their body. Future incarnations of the doll are hoped to include diverse body types, races, and I presume other features upon which people vary (just in case the average doll ends up being too alienating or high-achieving, I think). If this doll is preferred by girls to Barbie, then by all means I’m not going to tell them they shouldn’t enjoy it. I certainly don’t discourage the making of this doll or others like it. I just get the sense that the doll will end up primarily making parents feel better by giving them the sense they’re accomplishing something they aren’t, rather than affecting their children’s perceptions.

As an initial note, I will say that I find it rather strange that the creator of the doll stated: “By making a doll real I feel attention is taken away from the body and to what the doll actually does.” The reason I find that strange is because the doll does not, as far as I can see, come with a number of different accessories that make it do different things. In fact, if Lammily does anything, I’m not sure what that anything is, as it’s never mentioned. The only accessory I see are the aforementioned stickers to make her look different. Indeed, the whole marketing of the doll is focuses on how it looks; not what it does. For a doll ostensibly attempting to take attention away from the body, it’s body seems to be its only selling point.

The main idea, rather, as far as I can tell, is to try and remove the possible intrasexual competition over appearance that women might feel when confronted with a skinny, attractive, makeup-clad figure. So, by making the doll less attractive with scar stickers, girls will feel less competition to look better. There are a number of facets of the marketing of the doll that would support this interpretation: one such point is the tag line. Saying that “average is beautiful” is, from a statistical standpoint, kind of strange; it’s a bit like saying “average is tall” or “average is smart”. These descriptors are all relative terms – typically ones that apply to upper-ends of some distribution – so applying them to more people would imply that people don’t differ as much on the trait in question. The second point to make about the tagline is that I’m fairly certain, if you asked him, the creator of the Lammily doll – Nickolay Lamm - would not tell you he meant to imply that women who are above or below some average are not beautiful; instead, you’d probably get some sentiment to the effect that everyone is attractive and unique in their own special way, further obscuring the usefulness of the label. Finally, if the idea is to “take attention away from the body”, then selling the doll under the label of its natural beauty is kind of strange.

So does Barbie have a lot to answer for culturally, and is Lammily that answer? Let’s consider some evidence examining whether Barbie dolls are actually doing harm to young girl in the first place and, if they are, whether that harm might be mitigated via the introduction of more-proportionate figures.

“If only she wasn’t as thin, this never would have happened”

One 2006 paper (Dittmar, Halliwell, & Ive, 2006) concludes that the answer is “yes” to both those questions, though I have my doubts. In their paper, the researchers exposed 162 girls between the ages of 5 and 8 to one of three picture books. These books contained a few images of Barbie (who would be a US dress size 2) or Emme (a size 16) dolls engaged in some clothing shopping; there was also a control book that did not draw attention to bodies. The girls were then asked questions about how they looked, how they wanted to look, and how they hoped to look when they grew up. After 15 minutes of exposure to these books, there were some changes in these girl’s apparent satisfaction with their bodies. In general, the girls exposed to the Barbies tended to want to be thinner than those exposed to the Emme dolls. By contrast, those exposed to Emme didn’t want to be thinner than those exposed to no body images at all. In order to get a sense for what was going on, however, those effects require some qualifications

For starters, when measuring the difference between one’s perception of her current body and her current ideal body, exposure to Barbie only made the younger children want to be thinner. This includes the girls in the 5 – 7.5 age range, but not the girls in the 7.5 – 8.5 range. Further, when examining what the girl’s ideal adult bodies would be, Barbie had no effect on the youngest girls (5 – 6.5) or the oldest ones (7.5 – 8.5). In fact, for the older girls, exposure to the Emme doll seemed to make them want to be thinner as adults (the authors suggesting this to be the case as Emme might represent a real, potential outcome the girls are seeking to avoid). So these effects are kind of all over the place, and it is worth noting that they, like many effects in psychology, are modest in size. Barbie exposure, for instance, reduced the girls “body esteem” (a summed measure of six questions about the girl felt about their bodies that got a 1 to 3 response, with 1 being bad, 2 neutral, and 3 being good) from a mean of 14.96 in the control condition to 14.45. To put that in perspective, exposure to Barbie led to girls, on average, moving one response out of six half a point on a small scale, compared to the control group.

Taking these effects at face value, though, my larger concerns with the paper involve a number of things it does not do. First, it doesn’t show that these effects are Barbie-specific. By that I don’t mean that they didn’t compare Barbie against another doll – they did – but rather that they didn’t compare Barbie against, say, attractive (or thin) adult human women. The authors credit Barbie with some kind of iconic status that is likely playing an important role in determining girl’s later ideals of beauty (as opposed to Barbie temporarily, but not lastingly, modifying it their satisfaction), but they don’t demonstrate it. On that point, it’s important to note what the authors are suggesting about Barbie’s effects: that Barbies lead to lasting changes in perceptions and ideals, and that the older girls weren’t being affected by exposures to Barbies because they have already ”…internalized [a thin body ideal] as part of their developing self-concept” by that point.

At least you got all that self-deprecation out of the way early

An interesting idea, to be sure. However, it should make the following prediction: adult women exposed to thin or attractive members of the same sex shouldn’t have their body satisfaction affected, as they have already “internalized a thin ideal”. Yet this is not what one of the meta-analysis papers cited by the authors themselves finds (Groesz, Levine, & Murnen, 2002). Instead, adult women faced with thin models feel less satisfied with their bodies relative to when they view average or above-average weight models. This is inconsistent with the idea that some thin beauty standard has been internalized by age 8. Both sets of data, however, are consistent with the idea that exposure to an attractive competitor might reduce body satisfaction temporarily, as the competitor will be perceived to be more attractive by other people. In much the same way, I might feel bad about my skill at playing music when I see someone much better at the task than I am. I would be dissatisfied because, as I mentioned initially, my value to others depends on who else happens to offer what I do: if they’re better at it, my relative value decreases. A little dissatisfaction, then, either pushes me to improve my skill or to find a new domain in which I can compete more effectively. The disappointment might be painful to experience, but it is useful for guiding behavior. If the older girls just stopped viewing Barbie as competition, perhaps, because they have moved onto new stages in their development, this would explain why Barbie had no effect on them as well. The older girls might simply have grown out of competing with Barbie.

Another issue with the paper is that the experiment used line drawings of body shapes, rather than pictures of actual human bodies, to determine which body girls think they have and which body they want, both now and in the future. This could be an issue, as previous research (Tovee & Cornelissen, 2001) failed to replicate the “girls want to be skinnier than men would prefer” effects – which were found using line drawings – when using actual pictures of human bodies. One potential reason for that different in findings is that a number of features besides thinness might unintentionally co-vary in these line drawings. So some of the desire to be skinny that the girls were expressing in the 2006 experiment might have just been an artifact of the stimulus materials being used.

Additionally, Dittmar, Halliwell, & Ive (2006), somewhat confusingly, didn’t ask the girls about whether or not they owned Barbies or how much exposure they had to them (though they do note that it probably would have been a useful bit of information to have). There are a number of predictions we might make about such a variable. For instance, girls exposed to Barbie more often should be expected to have a greater desire for thinness, if the author’s account is true. Further still, we might also predict that, among girls who have lots of experience with Barbies, a temporary exposure to pictures of Barbie shouldn’t be expected to effect their perception of their ideal body much, if at all. After all, if they’re constantly around the doll, they should have, as the authors put it, already “…internalized [a thin body ideal] as part of their developing self-concept”, meaning that additional exposure might be redundant (as it was with the older girls). Since there’s no data on the matter, I can’t say much more about it.

A match made in unrealistic heaven.

So would a parent have a lasting impact on their daughter’s perception of beauty by buying her a Barbie? Probably not. The current research doesn’t demonstrate any particularly unique, important, or lasting role for Barbie in the development of children’s feelings about their bodies (thought it does assume them). You probably won’t do any damage to your child by buying them an Emme or a Lammily either. It is unlikely that these dolls are the ones socializing children and building their expectations of the world; that’s a job larger than one doll could ever hope to accomplish. It’s more probable that features of these dolls reflect (in some cases exaggerated) aspects of our psychology concerning what is attractive, rather than creating them.

A point of greater interest I wanted to end with, though, is why people felt that the problem which needed to be addressed when it came to Barbie was that she was disproportionate. What I have in mind is that Barbie has a long history of prestigious careers; over 150 of them, most of which being decidedly above-average. If you want a doll that focuses on what the character does, Barbie seems to be doing fine in that regard. If we want Barbie to be an average girl sure, she won’t be as thin, but then chances are that she doesn’t even have her Bachelor’s degree either, which would preclude her from a number of the professions she has held. She’s also unlikely to be a world class athlete or performer. Now, yes, it is possible for people to hold those professions while it is impossible for anyone to be proportioned as Barbie is, but it’s certainly not the average. Why is the concern over what Barbie looks like, rather than what unrealistic career expectations she generates? My speculation is that the focus arises because, in the real world, women compete with each other more over their looks than their careers in the mating market, but I don’t have time to expand on that much more here.

It just seems peculiar to focus on one particular non-average facet of reality obsessively only to state that it doesn’t matter. If the debate over Barbie can teach us anything, it’s that physical appearance does matter; quite a bit, in fact. To try and teach people – girls or boys – otherwise might help them avoid some temporary discomfort (“Looks don’t matter; hooray!”), but it won’t give them an accurate impression of how the wider world will react to them (“Yeah, about that whole looks thing…”); a rather dangerous consequence, if you ask me.

References: Dittmar, H., Halliwell, E., & Ive, S. (2006). Does Barbie make girls want to be thin? The effect of experimental exposure to images of dolls on the body image of 5- to 8-year-old girls. Developmental Psychology, 42, 283-292.

Groesz, L., Levine, M., & Murnen, S. (2002). The effect of experimental presentation of thin media images on body satisfaction: A metaanalytic review. International Journal of Eating Disorders, 31, 1–16.

Tovee, M. & Cornelissen, P. (2001). Female and male perceptions of physical attractiveness in front-view and profile. British Journal of Psychology, 92, 391-402.

Pop Psychology

The Internet's Best Evolutionary Psycholo-guy

Category Archives: Sex Differences