Skepticism Surrounding Sex

It’s a basic truth of the human condition that everybody lies; the only variable is about what

One of my favorite shows from years ago was House; a show centered around a brilliant but troubled doctor who frequently discovers the causes of his patient’s ailments through discerning what they – or others – are lying about. This outlook on people appears to be correct, at least in spirit. Because it is sometimes beneficial for us that other people are made to believe things that are false, communication is often less than honest. This dishonesty entails things like outright lies, lies by omission, or stretching the truth in various directions and placing it in different lights. Of course, people don’t just lie because deceiving others is usually beneficial. Deception – much like honesty – is only adaptive to the extent that people do reproductively-relevant things with it. Convincing your spouse that you had an affair when you didn’t is dishonest for sure, but probably not a very useful thing to do; deceiving someone about what you had for breakfast is probably fairly neutral (minus the costs you might incur from coming to be known as a liar). As such, we wouldn’t expect selection to have shaped our psychology to lie about all topics with equal frequency. Instead, we should expect that people tend to preferentially lie about particular topics in predictable ways.

Lies like, “This college degree will open so many doors for you in life”

The corollary idea to that point concerns skepticism. Distrusting the honesty of communications can protect against harmful deceptions, but it also runs the risk of failing to act on accurate and beneficial information. There are costs and benefits to skepticism as there are to deception. Just as we shouldn’t expect people to be dishonest about all topics equally often, then, we shouldn’t expect people to be equally skeptical of all the information they receive either. This is point I’ve talked about before with regards to our reasoning abilities, whereby information agreeable to our particular interests tends to be accepted less critically, while disagreeable information is scrutinized much more intensely.

This line of thought was recently applied to the mating domain in a paper by Walsh, Millar, & Westfall (2016). Humans face a number of challenges when it comes to attracting sexual partners typically centered around obtaining the highest quality of partner(s) one can (metaphorically) afford, relative to what one offers to others. What determines the quality of partners, however, is frequently context specific: what makes a good short-term partner might differ from what makes a good long-term partner and – critically, as far as the current research is concerned – the traits that make good male partners for women are not the same as those that make good females partner for men. Because women and men face some different adaptive challenges when it comes to mating, we should expect that they would also preferentially lie (or exaggerate) to the opposite sex about those traits that the other sex values the most. In turn, we should also expect that each sex is skeptical of different claims, as this skepticism should reflect the costs associated with making poor reproductive decisions on the basis of bad information.

In case that sounds too abstract, consider a simple example: women face a greater obligate cost when it comes to pregnancy than men do. As far as men are concerned, their role in reproduction could end at ejaculation (which it does, for many species). By contrast, women would be burdened with months of gestation (during which they cannot get pregnant again), as well as years of breastfeeding prior to modern advancements (during which they also usually can’t get pregnant). Each child could take years of a woman’s already limited reproductive lifespan, whereas the man has lost a few minutes. In order to ease those burdens, women often seek male partners who will stick around and invest in them and their children. Men who are willing to invest in children should thus prove to be more attractive long-term partners for women than those who are unwilling. However, a man’s willingness to stick around needs to be assessed by a woman in advance of knowing what his behavior will actually be. This might lead to men exaggerating or lie about their willingness to invest, so as to encourage women to mate with them. Women, in turn, should be preferentially skeptical of such claims, as being wrong about a man’s willingness to invest is costly indeed. The situation should be reversed for traits that men value in their partners more than women.

Figure 1: What men most often value in a woman

Three such traits for both men and women were examined by Walsh et al (2016). In their study, eight scenarios depicting a hypothetical email exchange between a man and woman who had never met were displayed to approximately 230 (mostly female; 165) heterosexual undergraduate students. For the women, these emails depicted a man messaging a woman; for men, it was a woman messaging a man. The purpose of these emails was described as the person sending them looking to begin a long-term intimate relationship with the recipient. Each of these emails described various facets of the sender, which could be broadly classified as either relevant primarily to female mating interests, relevant to male interests, or neutral. In terms of female interests, the sender described their luxurious lifestyle (cuing wealth), their desire to settle down (commitment), or how much they enjoy interacting with children (child investment). In terms of male interests, the sender talked about having a toned body (cuing physical attractiveness), their openness sexually (availability/receptivity), or their youth (fertility and mate value). In the two neutral scenarios, the sender either described their interest in stargazing or board games.

Finally, the participants were asked to rate (on a 1-5 scale) how deceitful they thought the sender was, whether they believed the sender or not, and how skeptical they were of the claims in the message. These three scores were summed for each participant to create a composite score of believability for each of the messages (the lower the score, the less believable it was rated as being). Those scores were then averaged across the female-relevant items (wealth, commitment, and childcare), the male-relevant items (attractiveness, youth, and availability), and the control conditions. (Participants also answered questions about whether the recipient should respond and how much they personally liked the sender. No statistical analyses are reported on those measures, however, so I’m going to assume nothing of note turned up)

The results showed that, as expected, the control items were believed more readily (M = 11.20) than the male (M = 9.85) or female (9.6) relevant items. This makes sense, inasmuch as believing lies about stargazing or interests in board games aren’t particularly costly for either sex in most cases, so there’s little reason to lie about them (and thus little reason to doubt them); by contrast, messages about one’s desirability as a partner have real payoffs, and so are treated more cautiously. However, an important interaction with the sex of the participant was uncovered as well: female participants were more skeptical on the female-relevant items (M = about 9.2) than males were (M = 10.6); similarly, males were more likely to be skeptical in male-relevant conditions  (M = 9.5) than females were (M = 10). Further, the scores for the individual items all showed evidence of the same sex kinds of differences in skepticism. No sex difference emerged for the control condition, also as expected.

In sum, then – while these differences were relatively small in magnitude – men tended to be more skeptical of claims that, if falsely believed, were costlier for them than women, and women tended to be more skeptical of claims that, if falsely believed, were costlier for them than men. This is a similar pattern to that found in the reasoning domain, where evidence that agrees with one’s position is accepted more readily than evidence that disagrees with it.

“How could it possibly be true if it disagrees with my opinion?”

The authors make a very interesting point towards the end of their paper about how their results could be viewed as inconsistent with the hypothesis that men have a bias to over-perceived women’s sexual interest. After all, if men are over-perceiving such interest in the first place, why would they be skeptical about claims of sexual receptivity? It is possible, of course, that men tend to over-perceive such availability in general and are also skeptical of claims about its degree (e.g., they could still be manipulated by signals intentionally sent by females and so are skeptical, but still over-perceive ambiguous or less-overt cues), but another explanation jumps out at me that is consistent with the theme of this research: perhaps when asked to self-report about their own sexual interest, women aren’t being entirely accurate (consciously or otherwise). This explanation would fit well with the fact that men and women tend to perceive a similar level of sexual interest in other women. Then again, perhaps I only see that evidence as consistent because I don’t think men, as a group, should be expected to have such a bias, and that’s biasing my skepticism in turn.

References: Walsh, M., Millar, M., & Westfall, S. (2016). The effects of gender and cost on suspicion in initial courtship communications. Evolutionary Psychological Science, DOI 10.1007/s40806-016-0062-8

Musings About Police Violence

I was going to write about something else today (the finding from a meta-analysis that artificial surveillance cues do not appear to appreciably increase generosity; the effects fail to reliably replicate), but I decided to switch topics up to something more topical: police violence. My goal today is not to provide answers to this on-going public debate – I certainly don’t know enough about the topic to consider myself an expert – but rather to try and add some clarity to certain features of the discussions surrounding the matter, and hopefully help people think about it in somewhat unusual ways. If you expect me to take a specific stance on the issue, be that one that agrees or disagrees with your own, I’m going to disappoint you. That alone may upset some people who take anything other than definite agreement as a sign of aggression against them, but there isn’t much to do about that. That said, the discussion about police violence itself is a large and complex one, the scope of which far exceeds the length constraints of my usual posts. Accordingly, I wanted to limit my thoughts on the matter to two main domains: important questions worth answering, and addressing the matter of why many people find the “Black Lives Matter” hashtag needlessly divisive.

Which I’m sure will receive a warm, measured response

First, let’s jump into the matter of important questions. One of the questions I’ve never seen explicitly raised in the context of these discussions – let alone answered – is the following: How many people should we expect to get killed by police each year? There is a gut response that many would no doubt have to that question: zero. Surely someone getting killed is a tragedy that we should seek to avoid at all times, regardless of the situation; at best, it’s a regrettable state of affairs that sometimes occurs because the alternative is worse. While zero might be the ideal world outcome, this question is asking more about the world that we find ourselves in now. Even if you don’t particularly like the expectation that police will kill people from time to time, we need to have some expectation of just how often it will happen to put the violence in context. These killings, of course, include a variety of scenarios: there are those in which the police justifiably kill someone (usually in defense of themselves or others), those cases where the police mistakenly kill someone (usually when an error of judgment occurs regarding the need for defense, such as when someone has a toy gun), and those cases where police maliciously kill someone (the killing is aggressive, rather than defensive, in nature). How are we to go about generating these expectations? One popular method seems to be comparisons of police shootings cross-nationally. The picture that results from such analyses appears to suggest that US police shoot people much more frequently than police from other modern countries. For instance, The Guardian claims that Canadian police shoot and kill about 25 people a year, as compared with approximately 1,000 such shootings in the US in 2015. Assuming those numbers are correct, once we correct for population size (the US is about ten-times more populated than Canada), we can see that US police shoot and kill about four-times as many people. That sure seems like a lot, probably because it is a lot. We want to do more than note that there is a difference, however; we want to see whether that difference violates our expectations, and to do that, we need to be clear about why our expectations were generated. If, for example, police in the US face threatening situations more often than Canadian police, this is a relevant piece of information. To begin engaging with that idea, we might consider how many police die each year in the line of duty, cross-nationally as well. In Canada, the number for 2015 looks to be three; adjusting for population size again, we would generate an expectation of 30 US police officer deaths if all else were equal. All else is apparently not equal, however, as the actual number for 2015 in the US is about 130. Not only are the US police killing four-times as often as their Canadian counterparts, then, but they’re also dying at approximately the same rate as well. That said, those numbers include factors other than homicides, and so that too should be taken into account when generating our expectations (in Canada, the number of police shot was 2 in 2015, compared to 40 in the US, which is still twice as high as one would expect from population size. There are also other methods of killing police, such as the 50 US police killed by bombs or cars; 0 for Canada). Given the prevalence of firearm ownership in the US, it might not be too surprising that the rates of violence between police and citizens – as well as between citizens and other citizens – looks substantially different than in other countries. There are other facts which might adjust our expectations up or down. For instance, while the US has 10 times the population of Canada, the number of police per 100,000 people (376) is different than that of Canada (202). How we should adjust the numbers to make a comparison based on population differences, then, is a matter worth thinking about (should we expect ratio of police officers to citizens per se to increase the number of them that are shot, or is population the better metric?). Also worth mentioning is that the general homicide rate per 100,000 people is quite a bit higher in the US (3.9) than in Canada (1.4). While this list of considerations is very clearly not exhaustive, I hope it generates some thoughts regarding the importance of figuring out what our expectations are, as well as why. The numbers of shootings alone are going to be useless without good context. 

Factor 10: Perceived silliness of uniforms

The second question concerns bias within these shootings in the US. In addition to our expectations for the number of people being killed each year by police, we also want to generate some expectations for the demographics of those who are shot: what should we expect the demographics of those being killed by police to be? Before we can claim there is a bias in the shooting data, we need to both have a sense for what our expectation in that regard are, why they are such, and only then can we look at how those expectations are violated. The obvious benchmark that many people would begin would be the demographics of the US as a whole. We might expect, for instance, that the victims of police violence in the US are 63% white, 12% black, about 50% male, and so on, mirroring the population of the country. Some data I’ve come across suggests that this is not the case, however, with approximately 50% of the victims being white and 26% being black. Now that we know the demographics don’t match up as we’d expect from population alone, we want to know why. One tempting answer that many people fall back on is that police are racially motivated: after all, if black people make up 12% of the population but represent 26% of police killings, this might mean police specifically target black suspects. Then again, males make up about 50% of the population but represent about 96% of police killings. While one could similarly posit that police have a wide-spread hatred of men and seek to harm them, that seems unlikely. A better explanation for more of the variation is that men are behaving differently than women: less compliant, more aggressive, or something along those lines. After all, the only reasons you’d expect police shootings to match population demographics perfectly would be either if police shot people at random (they don’t) or police shot people based on some nonrandom factors that did not differ between groups of people (which also seems unlikely). One such factor that we might use to adjust our expectations would be crime rates in general; perhaps violent crime in particular, as that class likely generates a greater need for officers to defend themselves. In that respect, men tend to commit much more crime than women, which likely begins to explain why men are also shot by police more often. Along those lines, there are also rather stark differences between racial groups when it comes to involvement in criminal activity: while 12% of the US population is black, approximately 40% of the prison population is, suggesting differences in patterns of offending. While some might claim that prison percentage too is due to racial discrimination against blacks, the arrest records tend to agree with victim reports, suggesting a real differential involvement in criminal activity. That said, criminal activity per se shouldn’t get one shot by police. When generating our expectations, we also might want to consider factors such as whether people resist arrest or otherwise threaten the officers in some way. In testing theories of racial biases, we would want to consider whether officers of different races are more or less likely to shoot citizens of various demographics (that is to ask whether, say, black officers are any more or less likely to shoot black civilians than white officers are. I could have sworn I’ve seen data on that before but cannot appear to locate it at this time. What I did find, however, was a case-matched study of NYPD officers, reporting that black officers were about three times as likely to discharge their weapon as white officers at the scene, spanning 106 shooting and about 300 officers; Ridgeway, 2016). Again, while this is not a comprehensive list of things to think about, factors like these should help us generate our expectations about what the demographics of police shooting victims should look like, and it is only from there that we can begin to make claims about racial biases in the data.

It’s hard to be surprised at the outcomes sometimes

Regardless of where you settled on your answer to the above expectations, I suspect that many people would nonetheless want to reduce those numbers, if possible. Fewer people getting killed by police is a good thing most of the time. So how do we want to go about seeing that outcome achieved? Some have harnessed the “Black Lives Matter” (BLM) hashtag and suggest that police (and other) violence should be addressed via a focus on, and reductions in, explicit, and presumably implicit, racism (I think; finding an outline of the goals of the movement proves a bit difficult). One common response to this hashtag has been the notion that BLM is needlessly divisive, suggesting instead that “All Lives Matter” (ALM) be used as a more appropriate description. In turn, the reply to ALM by BLM is that the lack of focus on black people is an attempt to turn a blind eye to problems viewed a disproportionately affecting black populations. The ALM idea was recently criticized by the writer Maddox, who compared the ALM expression to a person who, went confronted with the idea of “supporting the troops,” suggests that we should support all people (the latter being a notion that receives quite a bit of support, in fact). This line of argument is not unique to Maddox, of course, and I wanted to address that thought briefly to show why I don’t think it works particularly well here. First, I would agree that “support the troops” slogan is met with a much lower degree of resistance than “black lives matter,” at least as far as I’ve seen. So why this differential response? As I see it, the reason this comparison breaks down involves the zero-sum nature of each issue: if you spend $5 to buy a “support the troops” ribbon magnet to attach to your car, that money is usually intended to be designated towards military-related causes. Now, importantly, money that is spent relieving the problems in the military domain cannot be spent elsewhere. That $5 cannot be given to both military causes and also given to cancer research and also given to teachers and also used to repave roads, and so on. There need to be trade-offs in whom you support in that case. However, if you want to address the problem of police violence against civilians, it seems that tactics which effectively reduce violence against black populations should also be able to reduce violence against non-black populations, such as use-of-force training or body cameras. The problems, essentially, have a very high degree of overlap and, in terms of the raw numbers, many more non-black people are killed by police than black ones. If we can alleviate both at the same time with the same methods, focusing on one group seems needless. It is only those killings of civilians that effect black populations (24% of the shootings) and are also driven predominately or wholly by racism (an unknown percent of that 24%) that could be effectively addressed by a myopic focus on the race of the person being killed per se. I suspect that many people have independently figured that out – consciously or otherwise – and so dislike the specific attention drawn to race. While a focus on race might be useful for virtue signaling, I don’t think it will be very productive in actually reducing police violence.

“Look at how high my horse is!”

To summarize, to meaningfully talk about police violence, we need to articulate our expectations about how much of it we should see, as well as its shape. It makes no sense to talk about how violence is biased against one group or another until those benchmarks have been established (this logic applies to all discussions of bias in data, regardless of topic). None of this is intended to be me telling you how much or what kind of violence to expect; I’m by no means in possession of the necessary expertise. Regardless, if one wants to reduce police violence, inclusive solutions are likely going to be superior to exclusive ones, as a large degree of overlap in causes likely exists between cases, and solving the problems of one group will help solve the problems of another. There is merit to addressing specific problems as well – as that overlap is certainly less than 100% – but in doing so, it is important to not lose sight of the commonalities and distance those who might otherwise be your allies.  References: Ridgeway, G. (2016). Officer risk factors associated with police shootings: a matched case-control study. Statistics & Public Policy, 3, 1-6.

Why Women Are More Depressed Than Men

Women are more likely to be depressed than men; about twice as likely here in the US, as I have been told. It’s an interesting finding, to be sure, and making sense of it poses a fun little mystery (as making sense of many things tends to). We don’t just want to know that women are more depressed than men; we also want to know why women are more depressed. So what are the causes of this difference? The Mayo Clinic floats a few explanations, noting that this sex difference appears to emerge around puberty. As such, many of the explanations they put forth center around the problems that women (but not men) might face when undergoing that transitional period in their life. These include things like increased pressure to achieve in school, conflict with parents, gender confusion, PMS, and pregnancy-related factors. They also include ever-popular suggestions such as societal biases that harm women. Now I suspect these are quite consistent with the answers you would get if queried your average Joe or Jane on the street as to why they think women are more depressed. People recognize that depression often appears to follow negative life events and stressors, and so they look for proximate conditions that they believe (accurately or not) disproportionately affect women.

Boys don’t have to figure out how to use tampons; therefore less depression

While that seems to be a reasonable strategy, it produces results that aren’t entirely satisfying. First, it seems unlikely that women face that much more stress and negative life events than men do (twice as much?) and, secondly, it doesn’t do much to help us understand individual variation. Lots of people face negative life events, but lots of them also don’t end up spiraling into depression. As I noted above, our understanding of the facts related to depression can be bolstered by answering the why questions. In this case, the focus many people have is on answering the proximate whys rather than the ultimate ones. Specifically, we want to know why people respond to these negative life events with depression in the first place; what adaptive function depression might have. Though depression reactions appear completely normal to most, perhaps owing to their regularity, we need to make that normality strange. If, for example, you imagine a new mouse mother facing the stresses of caring for her young in a hostile world, a postpartum depression on her part might seem counterproductive: faced with the challenges of surviving and caring for her offspring, what adaptive value would depressive symptoms have? How would low energy, a lack of interest in important everyday activities, and perhaps even suicidal ideation help make her situation better? If anything, they would seem to disincline her from taking care of these important tasks, leaving her and her dependent offspring worse off. This strangeness, of course, wouldn’t just exist in mice; it should be just as strange when we see it in humans.

The most compelling adaptive account of depression I’ve read (Hagen, 2003) suggests that the ultimate why of depression focuses on social bargaining. I’ve written about it before, but the gist of the idea is as follows: if I’m facing adversity that I am unlikely to be able to solve alone, one strategy for overcoming that problem is to recruit others in the world to help me. However, those other people aren’t always forthcoming with the investment I desire. If others aren’t responding to my needs adequately, it would behoove me to try and alter their behavior so as to encourage them to increase their investment in me. Depression, in this view, is adapted to do just that. The psychological mechanisms governing depression work to, essentially, place the depressed individual on a social strike. When workers are unable to effectively encourage an increased investment from their employers (perhaps in the form of pay or benefits), they will occasionally refuse to work at all until their conditions improve. While this is indeed costly for the workers, it is also costly for the employer, and it might be beneficial for the employer to cave to the demands rather than continue to face the costs of not having people work. Depression shows a number of parallels to this kind of behavior, where people withdraw from the social world – taking with them the benefits they provided to others – until other people increase their investment in the depressed individual to help see them through a tough period.

Going on strike (or, more generally, withdrawing from cooperative relationships), of course, is only one means of getting other people to increase their investment in you; another potential strategy is violence. If someone is enacting behaviors that show they don’t value me enough, I might respond with aggressive behaviors to get them to alter that valuation. Two classic examples of this could be shooting someone in self-defense or a loan-shark breaking a delinquent client’s legs. Indeed, this is precisely the type of function that Sell et al (2009) proposed that anger has: if others aren’t giving me my due, anger motivates me to take actions that could recalibrate their concern for my welfare. This leaves us with two strategies – depression and anger – that can both solve the same type of problem. The question arises, then, as to which strategy will be the most effective for a given individual and their particular circumstances. This raises a rather interesting possibility: it is possible that the sex difference in depression exists because the anger strategy is more effective for men, whereas the depression strategy is more effective for women (rather than, say, because women face more adversity than men). This would be consistent with the sex difference in depression arising around puberty as well, since this is when sex differences in strength also begin to emerge. In other words, both men and women have to solve similar social problems; they just go about it in different ways. 

“An answer that doesn’t depend on wide-spread sexism? How boring…”

Crucially, this explanation should also be able to account for within-sex differences as well: while men are more able to successfully enact physical aggression than women, not all men will be successful in that regard since not all men possess the necessary formidability. The male who is 5’5″ and 130 pounds soaking wet likely won’t win against his taller, heavier, and stronger counterparts in a fight. As such, men who are relatively weak might preferentially make use of the depression strategy, since picking fights they probably won’t win is a bad idea, while those who are on the stronger side might instead make use of anger more readily. Thankfully, a new paper by Hagen & Rosenstrom (2016) examines this very issue; at least part of it. The researchers sought to test whether upper-body strength would negatively predict depression scores, controlling for a number of other, related variables.

To do so, they accessed data from the National Health and Nutrition Examination Survey (NHANES), netting a little over 4,000 subjects ranging in age from 18-60. As a proxy for upper-body strength, the authors made use of the measures subjects had provided of their hand-grip strength. The participants had also filled out questions concerning their depression, height and weight, socioeconomic status, white blood cell count (to proxy health), and physical disabilities. The researchers predicted that: (1) depression should negatively correlate with grip-strength, controlling for age and sex, (2) that relationship should be stronger for men than women, and (3) that the relationship would persist after controlling for physical health. About 9% of the sample qualified as depressed and, as expected, women were more likely to report depression than men by about 1.7 times. Sex, on its own, was a good predictor of depression (in their regression, ß = 0.74).

When grip-strength was added into the statistical model, however, the effect of sex dropped into the non-significant range (ß = 0.03), while strength possessed good predictive value (ß = -1.04). In support of the first hypothesis, then, increased upper-body strength did indeed negatively correlate with depression scores, removing the effect of sex almost entirely. In fact, once grip strength was controlled for, men were actually slightly more likely to report depression than women (though this didn’t appear to be significant). Prediction 2 was not supported, however, with their being no significant interaction between sex and grip-strength on measures of depression. This effect persisted even when controlling for socioeconomic status, age, anthropomorphic, and hormonal variables. However, physical disability did attenuate the relationship between strength and depression quite a bit, which is understandable in light of the fact that physically-disabled individuals likely have their formidability compromised, even if they have stronger upper bodies (an example being a man in a wheelchair having good grip strength, but still not being much use in a fight). It is worth mentioning that the relationship between strength and depression appeared to grow larger over time; the authors suggest this might have something to do with older individuals having more opportunities to test their strength against others, which sounds plausible enough. 

Also worth noting is that when depression scores were replaced with suicidal ideation, the predicted sex-by-strength interaction did emerge, such that men with greater strength reported being less suicidal, while women with greater strength reported being more suicidal (the latter portion of which is curious and not predicted). Given that men succeed at committing suicide more often than women, this relationship is probably worth further examination.  

“Not today, crippling existential dread”

Taken together with findings from Sell et al (2009) – where men, but not women, who possessed greater strength reported being quicker to anger and more successful in physical conflicts – the emerging picture is one in which women tend to (not consciously) “use” depression as a means social bargaining because it tends to work better for them than anger, whereas the reverse holds true for men. To be clear, both anger and depression are triggered by adversity, but those events interact with an individual’s condition and their social environment in determining the precise response. As the authors note, the picture is likely to be a dynamic one; not one that’s as simple as “more strength = less depression” across the board. Of course, other factors that co-vary with physical strength and health – like attractiveness – could also being playing a roll in the relationship with depression, but since such matters aren’t spoken to directly by the data, the extent and nature of those other factors is speculative.

What I find very persuasive about this adaptive hypothesis, however – in addition to the reported data – is that many existing theories of depression would not make the predictions tested by Hagen & Rosenstrom (2016) in the first place. For example, those who claim something like, “depressed people perceive the world more accurately” would be at a bit of a loss to explain why those who perceive the world more accurately also seem to have lower upper-body strength (they might also want to explain why depressed people don’t perceive the world more accurately, either). A plausible adaptive hypothesis, on the other hand, is useful for guiding our search for, and understanding of, the proximate causes of depression.

References: Hagen, E.H. (2003). The bargaining model of depression. In: Genetic and Cultural Evolution of Cooperation, P. Hammerstein (ed.). MIT Press, 95-123

Hagen, E. & Rosenstrom, T. (2016). Explain the sex difference in depression with a unified bargaining model of anger and depression. Evolution, Medicine, & Public Health, 117-132

Sell, A., Tooby, J., & Cosmides, L. (2009). Formidability and the logic of human anger. Proceedings of the National Academy of Sciences, 106, 15073-78.

Chivalry Isn’t Dead, But Men Are

In the somewhat-recent past, there was a vote in the Senate held on the matter of whether women in the US should be required to sign up for the selective service – the military draft – when they turn 18. Already accepted, of course, was the idea that men should be required to sign up; what appears to be a relatively less controversial idea. This represents yet another erosion of male privilege in modern society; in this case, the privilege of being expected to fight and die in armed combat, should the need arise. Now whether any conscription is likely to happen in the foreseeable future (hopefully not) is a somewhat different matter than whether women would be among the first drafted if that happened (probably not), but the question remains as to how to explain this state of affairs. The issue, it seems, is not simply one of whether men or women are better able to shoulder the physical demands of combat, however; it extends beyond military service into intuitions about real and hypothetical harm befalling men and women in everyday life. When it comes to harm, people seem to generally care less about it happening to men.

Meh

One anecdotal example of these intuitions I’ve encountered during my own writing is when an editor at Psychology Today removed an image in one my posts of a woman undergoing bodyguard training in China by having a bottle smashed over her head (which can be seen here; it’s by no means graphic). There was a concern expressed that the image was in some way inappropriate, despite my posting of other pictures of men being assaulted or otherwise harmed. As a research-minded individual, however, I want to go beyond simple anecdotes from my own life that confirm my intuitions into the empirical world where other people publish results that confirm my intuitions. While I’ve already written about this issue a number of times, it never hurts to pile on a little more.  Recently, I came upon a paper by FeldmanHall et al (2016) that examined these intuitions about harm directed towards men and women across a number of studies that can help me do just that.

The first of the studies in the paper was a straightforward task: fifty participants were recruited from Mturk to respond to a classic morality problem called the footbridge dilemma. Here, the life of five people can be saved from a train by pushing one person in front of it. When these participants were asked whether they would push a man or woman to their death (assuming, I think, that they were going to push one of them), 88% of participants opted for killing the man. Their second study expanded a bit on that finding using the same dilemma, but asking instead how willing they would be (on a 1-10 scale) to push either a man, woman, or a person of unspecified gender without other options existing. The findings here with regard to gender were a bit less dramatic and clear-cut: participants were slightly more likely to indicate that they would push a man (M = 3.3) than a woman (M = 3.0), though female participants were nominally less likely to push a woman (roughly M = 2.3) than men were (roughly M = 3.8), perhaps counter to what might be predicted. That said, the sample size for this second study was fairly small (only about 25 per group), so that difference might not be worth making much over until more data is collected.

When faced with a direct and unavoidable trade-off between the welfare of men and women, then, the results overwhelmingly showed that the women were being favored; however, when it came to cases where men or women could be harmed alone, there didn’t seem to be a marked difference between the two. That said, that moral dilemma alone can only take us so far in understanding people’s interests about the welfare of others in no small part because of their life-and-death nature potentially introducing ceiling effects (man or woman, very few people are willing to throw someone else in front of a train). In other instances where the degree of harm is lowered – such as, say, male vs female genital cutting – differences might begin to emerge. Thankfully, FeldmanHall et al (2016) included an additional experiment that brought these intuitions out of the hypothetical and into reality while lowering the degree of harm. You can’t kill people to conduct psychological research, after all.

Yet…

In the next experiment, 57 participants were recruited and given £20. At the end of the experiment, any money they had would be multiplied by ten, meaning participants could leave with a total of £200 (which is awfully generous as far as these things go). As with most psychology research, however, there was a catch: the participants would be taking part in 20 trials where £1 was at stake. A target individual – either a man or a woman – would be receiving a painful electric shock, and the participants could give up some of that £1 to reduce its intensity, with the full £1 removing the shock entirely. To make the task a little less abstract, the participants were also forced to view videos of the target receiving the shocks (which, I think, were prerecorded videos of real shocks – rather than shocks in real time – but I’m not sure from my reading of the paper if that’s a completely accurate description).

In this study, another large difference emerged: as expected, participants interacting with female targets ended up keeping less money by the end (M = £8.76) than those interacting with male targets (M = £12.54; d = .82). In other words, the main finding of interest was that participants were willing to give up substantially more money to prevent women from receiving painful shocks than they were to help men. Interestingly, this was the case in spite of the facts that (a) the male target in the videos was rated more positively overall than the female target, and (b) in a follow-up study where participants provided emotional reactions to thinking about being a participant in the former study, the amount of reported aversion to letting the target suffer shocks was similar regardless of the target’s gender. As the authors conclude:

While it is equally emotionally aversive to hurt any individual—regardless of their gender—that society perceives harming women as more morally unacceptable, suggests that gender bias and harm considerations play a large role in shaping moral action.

So, even though people find harming others – or letting them suffer harm for a personal gain – to generally be an uncomfortable experience regardless of their gender, they are more willing to help/avoid harming women than they are men, sometimes by a rather substantial margin.

Now onto the fun part: explaining these findings. It doesn’t go nearly far enough as an explanation to note that “society condones harming men more than women,” as that just restates the finding; likewise, we only get so far by mentioning that people perceive men to have a higher pain tolerance than women (because they do), as that only pushes the question back a step to the matter of why men tolerate more pain than women. As for my thoughts, first, I think these findings highlight the importance of a modular understanding of psychological systems: our altruistic and moral systems are made up of a number of component pieces, each with a distinct function, and the piece that is calculating how much harm is generated is, it would seem, not the same piece deciding whether or not to do something about it. The obvious reason for this distinction is that alleviating harm to others isn’t always adaptive to the same extent: it does me more adaptive good to help kin relative to non-kin, friends relative to strangers, and allies relative to enemies, all else being equal. 

“Just stay out of it; he’s bigger than you”

Second, it might well be the case that helping men, on average, tends to pay off less than helping women. Part of the reason for that state of affairs is that female reproductive potential cannot be replaced quite as easily as male potential; male reproductive success is constrained by the number of available women much more than female potential is by male availability (as Chris Rock put it, “any money spent on dick is a bad investment“). As such, men might become particularly inclined to invest in alleviating women’s pain as a form of mating effort. The story clearly doesn’t end there, however, or else we would predict men being uniquely likely to benefit women, rather than both sexes doing similarly. This raises two additional possibilities to me: one of these is that, if men value women highly as a form of mating effort, that increased social value could also make women more valuable to other women in turn. To place that in a Game of Thrones example, if a powerful house values their own children highly, non-relatives may come to value those same children highly as well in the hopes of ingratiating themselves to – or avoiding the wrath of – the child’s family.

The other idea that comes to mind is that men are less willing to reciprocate aid that alleviated their pain because to do so would be an admission of a degree of weakness; a signal that they honestly needed the help (and might in the future as well), which could lower their relative status. If men are less willing to reciprocate aid, that would make men worse investments for both sexes, all else being equal; better to help out the person who would experience more gratitude for your assistance and repay you in turn. While these explanations might or might not adequately explain these preferential altruistic behaviors directed towards women, I feel they’re worthwhile starting points.

References: FeldmanHall, O., Dalgleish, T., Evans, D., Navrady, L., Tedeschi, E., & Mobbs, D. (2016). Moral chivalry: Gender and harm sensitive predict costly altruism. Social Psychological & Personality Science, DOI: 10.1177/1948550616647448

Sexism, Testing, And “Academic Ability”

When I was teaching my undergraduate course on evolutionary psychology, my approach to testing and assessment was unique. You can read about that philosophy in more detail here, but the gist of my method was specifically avoiding multiple-choice formats in favor of short-essay questions with unlimited revision ability on the part of the students. I favored this exam format for a number of reasons, chief among which was that (a) I didn’t feel multiple choice tests were very good at assessing how well students understood the material (memorization and good guessing does not equal understanding), and (b) I didn’t really care about grading my students as much as I cared about getting them to learn the material. If they didn’t grasp it properly on their first try (and very few students do), I wanted them to have the ability and motivation to continue engaging with it until they did get it right (which most eventually did; the class average for each exam began around a 70 and rose to a 90). For the purposes of today’s discussion, the important point here is that my exams were a bit more cognitively challenging than is usual and, according to a new paper, that means I had unintentionally biased my exams in ways that disfavor “historically underserved groups” like women and the poor.

Oops…

What caught my eye about this particular paper, however, was the initial press release that accompanied it. Specifically, the authors were quoted as saying something I found, well, a bit queer:

“At first glance, one might assume the differences in exam performance are based on academic ability. However, we controlled for this in our study by including the students’ incoming grade point averages in our analysis,”

So the authors appear to believe that a gap in performance on academic tests arises independent of academic abilities (whichever those entail). This raised the immediate question in my mind of how one knows that abilities are the same unless one has a method of testing them. It seems a bit strange to say that abilities are the same on the basis of one set of tests (those that provided incoming GPAs), but then to continue to suggest that abilities are the same when a different set of tests provides a contrary result. In the interests of settling my curiosity, I tracked the paper down to see what was actually reported; after all, these little news blurbs frequently get the details wrong. Unfortunately, this one appeared to capture the author’s views accurately.

So let’s start by briefly reviewing what the authors were looking at. The paper, by Wright et al (2016), is based on data collected from three-years worth of three introductory biology courses spanning 26 different instructors, approximately 5,000 students, and 87 different exams.Without going into too much unnecessary detail, the tests were assessed by independent raters for how cognitively challenging they were, their format, and the students were classified according to their gender and socio-economic status (SES; as measured by whether they qualified for a financial aid program). In order to attempt and control for academic ability, Wright et al (2016) also looked at the freshman-year GPA of the students coming into the biology classes (based on approximately 45 credits, we are told). Because the authors controlled for incoming GPA, they hope to persuade the reader of the following:

This implies that, by at least one measure, these students have equal academic ability, and if they have differential outcomes on exams, then factors other than ability are likely influencing their performance.

Now one could argue that there’s more to academic ability than is captured by a GPA – which is precisely why I will do so in a minute – but let’s continue on with what the authors found first.

Cognitive challenging test were indeed, well, more challenging. A statistically-average male student, for instance, would be expected to do about 12% worse on the most challenging test in their sample, relative to the easiest one. This effect was not the same between genders, however. Again, using statistically-average men and women, when the tests were the least cognitively challenging, there was effectively no performance gap (about a 1.7% expected difference favoring men); however, when the tests were the most cognitively challenging, that expected gap rose to an astonishing expected…3.2% difference. So, while the gender difference just about nominally doubled, in terms of really mattering in any practical sense of the word, its size was such that it likely wouldn’t be noticed unless one was really looking for it. A similar pattern was discovered for SES: when the tests were easy, there was effectively no difference between those low or high in SES (1.3% favoring those higher); however, when the tests were about maximally challenging, this expected difference rose to about 3.5%. 

Useful for both spotting statistical blips and burning insects

There’s a lot to say about these results and how they’re framed within the paper. First, as I mentioned, they truly are minor differences; there are very few cases were a 1-3% difference in test scores is going to make-or-break a student, so I don’t think there’s any real reason to be concerned or to adjust the tests; not practically, anyway.

However, there are larger, theoretical issues looming in the paper. One of these is that the authors use the phrase “controlled for academic ability” so often that a reader might actually come to believe that’s what they did from simple repetition. The problem here, of course, is that the authors did not control for that; they controlled for GPA. Unfortunately for Wright et al’s (2016) presentation, those two things are not synonyms. As I said before, it is strange to say that academic ability is the same because one set of tests (incoming GPA) says they are while another set does not. The former set of tests appear to be privileged for no sound reason. Because of that unwarranted interpretation, the authors lose (or rather, purposefully remove) the ability to talk about how these gaps might be due to some performance difference. This is a useful rhetorical move if one is interested in doing advocacy – as it implies the gap is unfair and ought to be fixed somehow – but not if one is seeking the truth of the matter.

Another rather large issue in the paper is that, as far as I could tell, the authors predicted they would find these effects without ever really providing an explanation as for how or why that prediction arose. That is, what drove their expectation that men would outperform women and the rich outperform the poor? This ends up being something of a problem because, at the end of the paper, the authors do float a few possible (untested) explanations for their findings. The first of these is stereotype threat: the idea that certain groups of people will do poorly on tests because of some negative stereotype about their performance. This is a poor fit for the data for two reasons: first, while Wright et al (2016) claim that stereotype is “well-documented”, it actually fails to replicate (on top of not making much theoretical sense). Second, even if it was a real thing, stereotype threat, as it typically studied, requires that one’s sex be made salient prior to the test. As I encountered a total of zero tests during my entire college experience that made my gender salient, much less my SES, I can only assume that the tests in question didn’t do it either. In order for stereotype threat to work as an explanation, then, women and the poor would need to be under relative constant stereotype threat. In turn, this would make documenting and student stereotype threat in the first place rather difficult, as you could never have a condition where your subjects were not experiencing it. In short, then, stereotype threat seems like a bad fit.

The other explanations that are put forth for this gender difference are the possibility that women and poor students have more fixed views of intelligence instead of growth mindsets, so they withdraw from the material when challenged rather than improve (i.e., “we need to change their mindsets to close this daunting 2% gap), or the possibility that the test questions themselves are written in ways that subtly bias people’s ability to think about them (the example the authors raise is that a question written about applying some concept to sports might favor men, relative to women, as men tend to enjoy sports more). Given that the authors did have access to the test questions, it seems that they could have examined that latter possibility in at least some detail (minimally, perhaps, by looking at whether tests written by female instructors resulted in different outcomes than those written by male ones, or by examining the content of the questions themselves to see if women did worse on gendered ones). Why they didn’t conduct such analyses, I can’t say.

 Maybe it was too much work and they lacked a growth mindset

In summary, these very minor average differences that were uncovered could easily be chalked up – very simply – to GPA not being a full measure of a student’s academic ability. In fact, if the tests determining freshman GPA aren’t the most cognitively challenging (as one might well expect, given that students would have been taking mostly general introductory courses with large class sizes), then this might make the students appear to be more similar in ability than they actually were. The matter can be thought of using this stereotypically-male example (that will assuredly hinder women’s ability to think about it): imagine I tested people in a room with weights ranging from 1-15 pounds and asked them to curl each one time. This would give me a poor sense for any underlying differences in strength because the range of ability tested was restricted. Provided I were to ask them to do the same with weights ranging from 1-100 pounds the next week, I might conclude that it’s something about the weights – and not people’s abilities – when it came to figuring out why differences suddenly emerged (since I mistakenly believe I already controlled for their abilities the first time).

Now I don’t know if something like that is actually responsible, but if the tests determining freshman GPA were tapping the same kinds of abilities to the same degrees as those in the biology courses studied, then controlling for GPA should have taken care of that potential issue. Since controlling for GPA did not, I feel safe assuming there being some difference in the tests in terms of what abilities they’re measuring.

References: Wright, C., Eddy, S., Wenderoth, M., Abshire, E., Blankenbiller, M., & Brownell, S. (2016). Cognitive difficulty and format of exams predicts gender and socioeconomic gaps in exam performance of students in introductory biology courses. Life Science Education, 15.

Smoking Hot

If the view counts on previous posts have been any indication, people really do enjoy reading about, understanding, and – perhaps more importantly – overcoming the obstacles found on the dating terrain; understandably so, given its greater personal relevance to their lives. In the interests of adding some value to the lives of others, then, today I wanted to discuss some research examining the connection between recreational drug use and sexual behavior in order to see if any practical behavioral advice can be derived from it. The first order of business will be to try and understand the relationship between recreational drugs and mating from an evolutionary perspective; the second will be to take a more direct look at whether drug use has positive and negative effects when it comes to attracting a partner, and in what contexts those effects might exist. In short, will things like drinking and smoking make you smoking hot to others?

So far selling out has been unsuccessful, so let’s try talking sex

We can begin by considering why people care so much about recreational drug use in general: from historical prohibitions on alcohol to modern laws prohibiting the possession, use, and sale of drugs, many people express a deep concern over who gets to put what into their body at what times and for what reasons. The ostensibly obvious reason for this concern that most people will raise immediately is that such laws are designed to save people from themselves: drugs can cause a great degree of harm to users and people are, essentially, too stupid to figure out what’s really good for them. While perceptions of harm to drug users themselves no doubt play a role in these intuitions, they are unlikely to actually be whole story for a number of reasons, chief among which is that they would have a hard time explaining the connection between sexual strategies and drug use (and that putting people in jail probably isn’t all that good for them either, but that’s another matter). Sexual strategies, in this case, refer roughly to an individual’s degree of promiscuity: some people preferentially enjoy engaging in one or more short-term sexual relationships (where investment is often funneled to mating efforts), while others are more inclined towards single, long-term ones (where investment is funneled to parental efforts). While people do engage in varying degrees of both at times, the distinction captures the general idea well enough. Now, if one is the type who prefers long-term relationships, it might benefit you to condemn behaviors that encourage promiscuity; it doesn’t help your relationship stability to have lots of people around who might try to lure your mate away or reduce the confidence of a man’s paternity in his children. To the extent that recreational drug use does that (e.g., those who go out drinking in the hopes of hooking up with others owing to their reduced inhibitions), it will be condemned by the more long-term maters in turn. Conversely, those who favor promiscuity should be more permissive towards drug use as it makes enacting their preferred strategy easier.

This is precisely the pattern of results that Quintelier et al (2013) report: in a cross-cultural sample of Belgians (N = 476), Dutch (N = 298), and Japanese (N = 296) college students who did not have children, even after controlling for age, sex, personality variables, political ideology, and religiosity, attitudes towards drug use were still reliably predicted by participant’s sexual attitudes: the more sexually permissive one was, the more they tended to approve of drug use. In fact, sexual attitudes were the best predictors of people’s feelings about recreational drugs both before and after the controls were added (findings which replicated a previous US sample). By contrast, while the non-sexual variables were sometimes significant predictors of drug views after controlling for sexual attitudes, they were not as reliable and their effects were not as large. This pattern of results, then, should yield some useful predictions about how drug use effects your attractiveness to other people: those who are looking for short-term sexual encounters might find drug use more appealing (or at least less off-putting), relative to those looking for long-term relationships.

“I pronounce you man and wife. Now it’s time to all get high”

Thankfully, I happen to have a paper on hand that speaks to the matter somewhat more directly. Vincke (2016) sought to examine how attractive brief behavioral descriptions of men were rated as being by women for either short- or long-term relationships. Of interest, these descriptions included the fact that the man in question either (a) did not, (b) occasionally, or (c) frequently smoke cigarettes or drink alcohol. A sample of 240 Dutch women were recruited and asked to rate these profiles with respect to how attractive the men in question would be for either a casual or committed relationship and whether they thought the men themselves were more likely to be interested in short/long-term relationships.

Taking these in reverse order, the women rated the men who never smoked as somewhat less sexually permissive (M = 4.31, scale from 1 to 7) than those who either occasionally or frequently did (Ms = 4.83 and 4.98, respectively; these two values did not significantly differ). By contrast, those who never drank or occasionally did were rated as being comparably less permissive (Ms = 4.04) than the men who drank frequently (M = 5.17). Drug use, then, did effect women’s perceptions of men’s sexual interests (and those perceptions happen to match reality, as a second  study with men confirmed). If you’re interested in managing what other people think your relationship intentions are, then, managing your drug use accordingly can make something of a difference. Whether that ended up making the men more attractive is a different matter, however.

As it turns out, smoking and drinking appear to look distinct in that regard: in general, smoking tended to make men look less attractive, regardless of whether the mating context was short- or long-term, and frequent smoking was worse than occasional smoking. However, the decline in attractiveness from smoking was not as large in short-term contexts. (Oddly, Vincke (2016) frames smoking as being an attractiveness benefit in short-term contexts within her discussion when it’s really just less of a cost. The slight bump seen in the data is neither statistically or practically significant) This pattern can be seen in the left half of the author’s graph. By contrast – on the right side – occasional drinkers were generally rated as more attractive than men who never or frequently drank across conditions across both short- and long-term relationships. However, in the context of short-term mating, frequent drinking was rated as being more attractive than never drinking, whereas this pattern reversed itself for long-term relationships. As such, if you’re looking to attract someone for a serious relationship, you probably won’t be impressing them much with your ability to do keg stands of liquor, but if you’re looking for someone to hook up with that night it might be better to show that off than sip on water all evening.

Cigarettes and alcohol look different from one another in the attractiveness domain even though both might be considered recreational drug use. It is probable that what differentiates them here is their effects on encouraging promiscuity, as previously discussed. While people are often motivated to go out drinking in order to get intoxicated, lose their inhibitions, and have sex, the same cannot usually be said about smoking cigarettes. Singles don’t usually congregate at smoking bars to meet people and start relationships, short-term or otherwise (forgoing for the moment that smoking bars aren’t usually things, unless you count the rare hookah lounges). Smoking might thus make men appear to be more interested in casual encounters because it cues a more general interest in short-term rewards, rather than anything specifically sexual; in this case, if one is willing to risk the adverse health effects in the future for the pleasure cigarettes provide today, then it is unlikely that someone would be risk averse in other areas of their life.

If you want to examine sex specifically, you might have picked the wrong smoke

There are some limitations here, namely that this study did not separate women in terms of what they were personally seeking in terms of relationships or their own interests/behaviors when it comes to engaging in recreational drug use. Perhaps these results would look different if you were to account for women’s smoking/drinking habits. Even if frequent drinking is a bad thing for long-term attractiveness in general, a mismatch with the particular person you’re looking to date might be worse. It is also possible that a different pattern might emerge if men were assessing women’s attractiveness, but what differences those would be are speculative. It is unfortunate that the intuitions of the other gender didn’t appear to be assessed. I think this is a function of Vincke (2016) looking for confirmatory evidence for her hypothesis that recreational drug use is attractive to women in short-term contexts because it entails risk, and women value risk-taking more in short-term male partners than long-term ones. (There is a point to make about that theory as well: while some risky activities might indeed be more attractive to women in short-term contexts, I suspect those activities are not preferred because they’re risky per se, but rather because the risks send some important cue about the mate quality of the risk taker. Also, I suspect the risks need to have some kind of payoff; I don’t think women prefer men who take risks and fail. Anyone can smoke, and smoking itself doesn’t seem to send any honest signal of quality on the part of the smoker.)

In sum, the usefulness of these results for making any decisions in the dating world is probably at its peak when you don’t really know much about the person you’re about to meet. If you’re a man and you’re meeting a woman who you know almost nothing about, this information might come in handy; on the other hand, if you have information about that woman’s preferences as an individual, it’s probably better to use that instead of the overall trends. 

References: Quintelier, K., Ishii, K., Weeden, J., Kurzban, R., & Braeckman, J. (2013). Individual differences in reproductive strategy are related to views about recreational drug use in Belgium, the Netherlands, and Japan. Human Nature, 24, 196-217.

Vincke, E. (2016). The young male cigarette and alcohol syndrome: Smoking and drinking as a short-term mating strategy. Evolutionary Psychology, 1-13.

Count The Hits; Not The Misses

At various points in our lives, we have all read or been told anecdotes about how someone turned a bit of their life around. Some of these (or at least variations of them) likely sound familiar: “I cut out bread from my diet and all the sudden felt so much better”; “Amy made a fortune working from home selling diet pills online”; “After the doctors couldn’t figure out what was wrong with me, I started drinking this tea and my infection suddenly cleared up”. The whole point of such stories is to try and draw a casual link, in these cases: (1) eating bread makes you feel sick, (2) selling diet pills is a good way to make money, and (3) tea is useful for combating infections. Some or all of these statements may well be true, but the real problem with these stories is the paucity of data upon which they are based. If you wanted to be more certain about those statements, you want more information. Sure; you might have felt better after drinking that tea, but what about the other 10 people who drank similar tea and saw no results? How about all the other people selling diet pills who were in the financial hole from day one and never crawled out of it because it’s actually a scam? If you want to get closer to understanding the truth value of those statements, you need to consider the data as a whole; both stories of success and stories of failure. However, stories of someone not getting rich from selling diet pills aren’t quite as moving, and so don’t see the light of day; at least not initially. This facet of anecdotes was made light of by The Onion several years ago (and Clickhole had their own take more recently).

“At first he failed, but with some positive thinking he continued to fail over and over again”

These anecdotes often try and throw the spotlight on successful cases (hits) while ignoring the unsuccessful ones (misses), resulting in a biased picture of how things will work out. They don’t get us much closer to the truth. Most people who create and consume psychology research would like to think that psychologists go beyond these kinds of anecdotes and generate useful insights into how the mind works, but there have been a lot of concerns raised lately about precisely how much further they go on average, largely owing the the results of the reproducibility project. There have been numerous issues raised about the way psychology research is conducted: either in the form of advocacy for particular political and social positions (which distorts experimental designs and statistical interpretations) or the selective ways in which data is manipulated or reported to draw attention to successful data without acknowledging failed predictions. The result has been quite a number of false positives and overstated real ones cropping up in the literature.

While these concerns are warranted, it is difficult to quantify the extent of the problems. After all, very few researchers are going to come out and say they manipulated their experiments or data to find the results they wanted because (a) it would only hurt their careers and (b) in some cases, they aren’t even aware that they’re doing it, or that what they’re doing is wrong. Further, because most psychological research isn’t preregistered and null findings aren’t usually published, figuring out what researchers hoped to find (but did not) becomes a difficult undertaking just by reading the literature. Thankfully, a new paper from Franco et al (2016) brings some data to bear on the matter of how much underreporting is going on. While this data will not be the final word on the subject by any means (largely owing to their small sample size), they do provide some of the first steps in the right direction.

Franco et al (2016) report on a group of psychology experiments whose questionnaires and data were made publicly available. Specifically, these come from the Time-sharing Experiments for the Social Sciences (TESS), an NSF program in which online experiments are embedded in nationally-representative population surveys. Those researchers making use of TESS face strict limits on the number of questions they can ask, we are told, meaning that we ought to expect they would restrict their questions to the most theoretically-meaningful ones. In other words, we can be fairly confident that the researchers had some specific predictions they hoped to test for each experimental condition and outcome measure, and that these predictions were made in advance of actually getting the data. Franco et al (2016) were then able to track the TESS studies through to the eventual published versions of the papers to see what experimental manipulations and results were and were not reported. This provided the authors with a set of 32 semi-preregistered psychology experiments to examine for reporting biases.

A small sample I will recklessly generalize to all of psychology research

The first step was to compare the number of experimental conditions and outcome variables that were present in the TESS studies to the number that ultimately turned up in published manuscripts (i.e. are the authors reporting what they did and what they measured?). Overall, 41% of the TESS studies failed to report at least one of their experimental conditions; while there were an average of 2.5 experimental conditions in the studies, the published papers only mentioned an average of 1.8. In addition, 72% of the papers failed to report all their outcomes variables; while there were an average of 15.4 outcome variables in the questionnaires, the published reports only mentioned 10.4  Taken together, only about 1-in-4 of the experiments reported all of what they did and what they measured. Unsurprisingly, this pattern extended to the size of the reported effects as well. In terms of statistical significance, the median reported p-value was significant (.02), while the median unreported p-value was not (.32); two-thirds of the reported tests were significant, while only one-forth of the unreported tests were. Finally, published effect sizes were approximately twice as large as unreported ones.

Taken together, the pattern that emerged is that psychology research tends to underreport failed experimental manipulations, measures that didn’t pan out, and smaller effects. This should come as no surprise to almost anyone who has spent much time around psychology researchers or the researchers themselves who have tried to publish null findings (or, in fact, have tried to publish almost anything). Data is often messy and uncooperative, and people are less interested in reading about the things that didn’t work out (unless they’re placed in the proper contexts, where failures to find effects can actually be considered meaningful, such as when you’re trying to provide evidence against a theory). Nevertheless, the result of such selective reporting on what appears to be a fairly large scale is that the overall trustworthiness of reported psychology research dips ever lower, one false-positive at a time.

So what can be done about this issue? One suggestion that is often tossed around is the prospect that researchers should register their work in advance, making it clear what analyses they will be conducting and what predictions they have made. This was (sort of) the case in the present data, and Franco et al (2016) endorse this option. It allows people to assess research as more of a whole than just relying on the published accounts of it. While that’s a fine suggestion, it only goes so far to improving the state of the literature. Specifically, it doesn’t really help the problem of journals not publishing null findings in the first place, nor does it necessarily disallow researchers from doing post-hoc analyses of their data either and turning up additional false positives. What is perhaps a more ambitious way of alleviating these problems that comes to mind would be to collectively change the way journals accept papers for publication. In this alternate system, researchers would submit an outline of their article to a journal before the research is conducted, making clear (a) what their manipulations will be, (b) what their outcome measures will be, and (c) what statistical analyses they will undertake. Then, and this is important, before either the researcher or the journals know what the results will be, the decision will be made to publish the paper or not. This would allow null results to make their way into mainstream journals while also allowing the researchers to build up their own resumes if things don’t work out well. In essence, it removes some of the incentives for researchers to cheat statistically. The assessment of the journals will then be based not on whether interesting results emerged, but rather on whether a sufficiently important research question had been asked.

Which is good, considering how often real, strong results seem to show up

There are some downsides to that suggestion, however. For one, the plan would take some time to enact even if everyone was on board. Journals would need to accept a paper for publication weeks or months in advance of the paper itself actually being completed. This would pose some additional complications for journals inasmuch as researchers will occasionally fail to complete the research at all, in timely manner, or submit sub-par papers not worthy of print quite yet, leaving possible publication gaps. Further, it will sometimes mean that an issue of a journal goes out without containing any major advancements to the field of psychological research (no one happened to find anything this time), which might negatively affect the impact factor of the journals in question. Indeed, that last part is probably the biggest impediment to making major overhauls to the publication system that’s currently in place: most psychology research probably won’t work out all that well, and that will probably mean fewer people ultimately interested in reading about and citing it. While it is possible, I suppose, that null findings would actually be cited at similar rates to positive ones, that remains to be seen, and in the absence of that information I don’t foresee journals being terribly interested in changing their policies and taking that risk.

References: Franco, A., Malhotra, N., & Simonovits, G. (2016). Underreporting in psychology experiments: Evidence from a study registry. Social Psychological & Personality Science, 7, 8-12.

Who Deserves Healthcare And Unemployment Benefits?

As I find myself currently recovering from a cold, it’s a happy coincidence that I had planned to write about people’s intuitions about healthcare this week. In particular, a new paper by Jensen & Petersen (2016) attempted to demonstrate a fairly automatic cognitive link between the mental representation of someone as “sick” and of that same target as “deserving of help.” Sickness is fairly unique in this respect, it is argued, because of our evolutionary history with it: as compared with what many refer to as diseases of modern lifestyle (including those resulting from obesity and smoking), infections tended to strike people randomly; not randomly in the sense that anyone is equally as likely to get sick, but more in the sense that people often had little control over when they did. Infections were rarely the result of people intentionally seeking them out or behaving in certain ways. In essence, then, people view those who are sick as unlucky, and unlucky individuals are correspondingly viewed as being more deserving of help than those who are responsible for their own situation.

…and more deserving of delicious, delicious pills

This cognitive link between luck and deservingness can be partially explained by examining expected returns on investment in the social world (Tooby & Cosmides, 1996). In brief, helping others takes time and energy, and it would only be adaptive for an organism to sacrifice resources to help another if doing so was beneficial to the helper in the long term. This is often achieved by me helping you at a time when you need it (when my investment is more valuable to you than it is to me), and then you helping me in the future when I need it (when your investment is more valuable to me than it is to you). This is reciprocal altruism, known by the phrase, “I scratch your back and you scratch mine.” Crucially, the probability of receiving reciprocation from the target you help should depend on why that target needed help in the first place: if the person you’re helping is needy because of their own behavior (i.e., they’re lazy), their need today is indicative of their need tomorrow. They won’t be able to help you later for the same reasons they need help now. By contrast, if someone is needy because they’re unlucky, their current need is not as diagnostic of their future need, and so it is more likely they will repay you later. Because the latter type is more likely to repay than the former, our intuitions about who deserves help shift accordingly.

As previously mentioned, infections tend to be distributed more randomly; my being sick today (generally) doesn’t tell you much about the probability of my future ability to help you once I recover. Because of that, the need generated by infections tends to make sick individuals look like valuable targets of investment: their need state suggests they value your help and will be grateful for it, both of which likely translate into their helping you in the future. Moreover, the needs generated by illnesses can frequently be harmful, even to the point of death if assistance isn’t provided. The greater the need state to be filled, the greater the potential for alliances to be formed, both with and against you. To place that point in a quick, yet extreme, example, pulling someone from a burning building is more likely to ingratiate them to you than just helping them move; conversely, failing to save someone’s life when it’s well within your capabilities can set their existing allies against you.

The sum total of this reasoning is that people should intuitively perceive the sick as more deserving of help than those suffering from other problems that cause need. The particular other problem that Jensen & Petersen (2016) contrast sickness with is unemployment, which they suggest is a fairly modern problem. The conclusion drawn by the authors from these points is that the human mind – given its extensive history with infections and their random nature – should automatically tag sick individuals as deserving of assistance (i.e., broad support for government healthcare programs), while our intuitions about whether the unemployed deserve assistance should be much more varied, contingent on the extent to which unemployment is viewed as being more luck- or character-based. This fits well with the initial data that Jensen & Petersen (2016) present about the relative, cross-national support for government spending on healthcare and unemployment: not only is healthcare much more broadly supported than unemployment benefits (in the US, 90% vs 52% of the population support government assistance), but support for healthcare is also quite a bit less variable across countries.

Probably because the unemployed don’t have enough bake sales or ribbons

Some additional predictions drawn by the authors were examined across a number of studies in the paper, only two of which I would like to focus on for length constraints. The first of these studies presented 228 Danish participants with one of four scenarios: two in which the target was sick and two in which the target was unemployed. In each of these conditions, the target was also said to be lazy (hasn’t done much in life and only enjoys playing video games) or hardworking (is active and does volunteer work; of note, the authors label the lazy/hardworking conditions as high/low control, respectively, but I’m not sure that really captures the nature of the frame well). Participants were asked how much an individual like that deserved aid from the government when sick/unemployed on a 7-point scale (which was converted to a 0-1 scale for ease of interpretation).

Overall, support for government aid was lower in both conditions when the target was framed as being lazy, but this effect was much larger in the case of unemployment. When it came to the sick individual, support for healthcare for the hardworking target was about a 0.9, while support for the lazy one dipped to about 0.75; by contrast, the hardworking unemployed individual was supported with benefits at about 0.8, while the lazy one only received support around the 0.5 point. As the authors put it, the effect of the deservingness information was about 200% less influential when it came to sickness.

There is an obvious shortcoming in that study, however: being lazy has quite a bit less to do with getting sick than it does to getting a job. This issue was addressed better in the third study where the stimuli were more tailored to the problems. In the case of unemployed individuals, they were described as being unskilled workers who were told to get further training by their union, with the union even offering to help. The individual either takes or does not take the additional training, but either way eventually ends up unemployed. In the case of healthcare, the individual is described as being a long-term smoker who was repeatedly told by his doctor to quit. The person either eventually quits smoking or does not, but either way ends up getting lung cancer. The general pattern of results from study two replicated again: for the smoker, support for government aid hovered around 0.8 when he quit and 0.7 when he did not; for the unemployed person, support was about 0.75 when he took the training and around 0.55 when he did not.

“He deserves all that healthcare for looking so cool while smoking”

While there does seem to be evidence for sicknesses being cognitively tagged as more deserving of assistance than unemployment (there were also some association studies I won’t cover in detail), there is a recurrent point in the paper that I am hesitant about endorsing fully. The first mention of this point is found early on in the manuscript, and reads:

“Citizens appear to reason as if exposure to health problems is randomly distributed across social strata, not noting or caring that this is not, in fact, the case…we argue that the deservingness heuristic is built to automatically tag sickness-based needs as random events…”

A similar theme is mentioned later in the paper as well:

“Even using extremely well-tailored stimuli, we find that subjects are reluctant to accept explicit information that suggests that sick people are undeserving.”

In general I find the data they present to be fairly supportive of this idea, but I feel it could do with some additional precision. First and foremost, participants did utilize this information when determining deservingness. The dips might not have been as large as they were for unemployment (more on that later), but they were present. Second, participants were asked about helping one individual in particular. If, however, sickness is truly being automatically tagged as randomly distributed, then deservingness factors should not be expected to come into play when decisions involve making trade-offs between the welfare of two individuals. In a simple case, a hospital could be faced with a dilemma in which two patients need a lung transplant, but only a single lung is available. These two patients are otherwise identical except one has lung cancer due to a long history of smoking, while the other has lung cancer due to a rare infection. If you were to ask people which patient should get the organ, a psychological system that was treating all illness as approximately random should be indifferent between giving it to the smoker or the non-smoker. A similar analysis could be undertaken when it comes to trading-off spending on healthcare and non-healthcare items as well (such as making budget cuts to education or infrastructure in favor of healthcare). 

Finally, there are two additional factors which I would like to see explored by future research in this area. First, the costs of sickness and unemployment tend to be rather asymmetric in a number of ways: not only might sickness be more often life-threatening than unemployment (thus generating more need, which can swamp the effects of deservingness to some degree), but unemployment benefits might well need to be paid out over longer periods of time than medical ones (assuming sickness tends to be more transitory than unemployment). In fact, unemployment benefits might actively encourage people to remain unemployed, whereas medical benefits do not encourage people to remain sick. If these factors could somehow be held constant or removed, a different picture might begin to emerge. I could imagine deservingness information mattering more when a drug is required to alleviate discomfort, rather than save a life. Second - though I don’t know to what extent this is likely to be relevant – the stimulus materials in this research all ask about whether the government ought to be providing aid to sick/unemployed people. It is possible that somewhat different responses might have been obtained if some measures were taken about the participant’s own willingness to provide that aid. After all, it is much less of a burden on me to insist that someone else ought to be taking care of a problem relative to taking care of it myself.

References: Jensen, C. & Petersen, M. (2016). The deservingness heuristic and the politics of health care. American Journal of Political Science, DOI: 10.1111/ajps.12251

 Tooby, J. & Cosmides, L. (1996). Friendship and the banker’s paradox:Other pathways to the evolution of adaptations for altruism. Proceedings of the British Academy, 88, 119-143

Absolute Vs Relative Mate Preferences

As the comedian Louis CK quipped some time ago, “Everything is amazing right now and nobody is happy.” In that instance he was referring to the massive technological improvements that have arisen in the fairly-recent past which served to make our lives easier and more comfortable. Reflecting on the level of benefit that this technology has added to our lives (e.g., advanced medical treatments, the ability to communicate with people globally in an instant, or to travel globally in the matter of a few hours, etc), it might feel kind of silly that we aren’t content with the world; this kind of lifestyle sure beats living in the wilderness in a constant contest to find food, ward off predators and parasites, and endure the elements. So why aren’t we happy all the time? There are many ways to answer this question, but I wanted to focus on one in particular: specifically, given our nature as a social species, much of our happiness is determined by relative factors. If everyone is fairly well off in the absolute sense, you being well off doesn’t help you when it comes to being selected as a friend, cooperative partner, or mate because it doesn’t signal anything special about your value to others. What you are looking for in that context is not to be doing well on an absolute level, but to be doing better than others.

 If everyone has an iPhone, no one has an iPhone

To place this in a simple example, if you want to get picked for the basketball team, you’re looking to be taller than other people; increasing everyone’s height by 3 inches doesn’t uniquely benefit you, as your relative position and desirability has remained the same. On a related note, if you are doing well on some absolute metric but could be doing better, remaining content with one’s lot in life and forgoing those additional benefits is not the type of psychology one would predict to have proven adaptive. All else being equal, the male satisfied with a single mate that foregoes an additional one will be out-reproduced by the male who takes the second as well. Examples like these help to highlight the positional aspects of human satisfaction: even though some degree of our day-to-day lives are no doubt generally happier because people aren’t dying from smallpox and we have cell phones, people are often less happy than we might expect because so much of that happiness is not determined by one’s absolute state. Instead, our happiness is determined by our relative state: how good we could be doing relative to our current status, and how much we offer socially, relative to others.

A similar logic was applied in a recent paper by Conroy-Beam, Goetz, & Buss (2016) that examined people’s relationship satisfaction. The researchers were interested in testing the hypothesis that it’s not about how well one’s partner matches their ideal preferences on some absolute threshold when it comes to relationship satisfaction; instead, partner satisfaction is more likely to be a product of (a) whether more attractive alternative partners are available and (b) whether one is desirable enough to attract one of them. One might say that people are less concerned with how much they like their spouse and more concerned with whether they could get a better possible spouse: if one can move up in the dating world, then their satisfaction with their current partner should be relatively low; if one can’t move up, they ought to be satisfied with what they already have. After all, it makes little sense to abandon your mate for not meeting your preferences if your other options are worse.

These hypotheses were tested in a rather elegant and unique way across three studies, all of which utilized a broadly-similar methodology (though I’ll only be discussing two). The core of each involved participants who were currently in relationships completing four measures: one concerning how important 27 traits would be in an ideal mate (on a 7-point scale), another concerning how well those same traits described their current partner, a third regarding how those traits described themselves, and finally rating their relationship satisfaction.

To determine how well a participant’s current partner fulfilled their preferences, the squared difference between the participant’s ideal and actual partner was summed for all 27 traits and then the square root of that value was taken. This process generated a single number that provided a sense for how far off from some ideal an actual partner was across a large number of traits: the larger this number, the worse of a fit the actual partner was. A similar transformation was then carried out with respect to how all the other participants rated their partners on those traits. In other words, the authors calculated what percentage of other people’s actual mates fit the preferences of each participant better than their current partner. Finally, the authors calculated the discrepancy in mate value between the participant and their partner. This was done in a three-step process, the gist of which is that they calculated how well the participant and their partner met the average ideals of the opposite sex. If you are closer to the average ideal partner of the opposite sex than your partner, you have the higher mate value (i.e., are more desirable to others); if you are further away, you have the lower mate value.

 It’s just that simple!

In the interests of weeding out the mathematical complexity, there were three values calculated. Assuming you were taking the survey, they would correspond to (1) how well your actual partner matched your ideal (2) what percent of possible real mates out in the world are better overall fits, and (3) how much more or less desirable you are to others, relative to your partner. These values were then plugged into a regression predicting relationship satisfaction. As it turned out, in the first study (N = 260), the first value – how well one’s partner matched their ideal – barely predicted relationship satisfaction at all (ß = .06); by contrast, the number of other potential people who might make better fits was a much stronger predictor (ß = -.53), as was the difference in relative mate value between the participant and their partner (ß = .11). There was also an interaction between these latter two values (ß = .21). As the authors summarized these results:

Participants lower in mate value than their partners were generally satisfied regardless of the pool of potential mates; participants higher in mate value than their partners became increasingly dissatisfied with their relationships as better alternative partners became available”

So, if your partner is already more attractive than you, then you probably consider yourself pretty lucky. Even if there are a great number of better possible partners out there for you, you’re not likely to be able to attract them (you got lucky once dating up; better to not try your luck a second time). By contrast, if you are more attractive than your partner, then it might make sense to start looking around for better options. If few alternatives exist, you might want to stick around; if many do, then switching might be beneficial.

The second study addressed the point that partners in these relationships are not passive bystanders when it comes to being dumped; they’re wary about the possibility of their partner seeking greener pastures. For instance, if you understand that your partner is more attractive than you, you likely also understand (at least intuitively) that they might try to find someone who suits them better than you do (because they have that option). If you view being dumped as a bad thing (perhaps because you can’t do better than your current partner) you might try to do more to keep them around. Translating that into a survey, Conroy et al (2016) asked participants to indicate how often they engaged in 38 mate retention tactics over the course of the past year. These include a broad range of behaviors, including calling to check up on one’s partner, asking to deepen commitment to them, derogating potential alternative mates, buying gifts, or performing sexual favors, among others. Participants also filled out the mate preference measures as before.

The results from the first study regarding satisfaction were replicated. Additionally, as expected, there was a positive relationship between these retention behaviors and relationship satisfaction (ß = .20): the more satisfied one was with their partner, the more they behaved in ways that might help keep them around. There was also a negative relationship between trust and these mate retention behaviors (ß = -.38): the less one trusted their partner, the more they behaved in ways that might discourage them from leaving. While that might sound strange at first – why encourage someone you don’t trust to stick around? – it is fairly easy to understand to the extent that the perceptions of partner trust are intuitively tracking the probability that your partner can do better than you: it’s easier to trust someone who doesn’t have alternatives than it is to trust one who might be tempted.

It’s much easier avoid sinning when you don’t live around an orchard

Overall, I found this research an ingenious way to examine relationship satisfaction and partner fit across a wide range of different traits. There are, of course, some shortcomings to the paper which the authors do mention, including the fact that all the traits were given equal weighting (meaning that the fit for “intelligent” would be rated as being as important as the fit for “dominant” when determining how well your partner suited you) and the pool of potential mates was not considered in the context of a local sample (that is, it matters less if people across the country fit your ideal better than your current mate, relative to if people in your immediate vicinity do). However, given the fairly universal features of human mating psychology and the strength of the obtained results, these do not strike me as fatal to the design in any way; if anything, they raise the prospect that the predictive strength of this approach could actually be improved by tailoring it to specific populations.

References: Conroy-Beam, D., Goetz, C., & Buss, D. (2016). What predicts romantic relationship satisfaction and mate retention intensity: mate preference fulfillment or mate value discrepancies? Evolution & Human Behavior, DOI: http://dx.doi.org/10.1016/j.evolhumbehav.2016.04.003

Psychology Research And Advocacy

I get the sense that many people get a degree in psychology because they’re looking to help others (since most clearly aren’t doing it for the pay). For those who get a degree in the clinical side of the field, this observation seems easy to make; at the very least, I don’t know of any counselors or therapists who seek to make their clients feel worse about the state their life is in and keep them there. For those who become involved in the research end of psychology, I believe this desire to help others is still a major motivator. Rather than trying to help specific clients, however, many psychological researchers are driven by a motivation to help particular groups in society: women, certain racial groups, the sexually promiscuous, the outliers, the politically liberal, or any group that the researcher believes to be unfairly marginalized, undervalued, or maligned. Their work is driven by a desire to show that the particular group in question has been misjudged by others, with those doing the misjudging being biased and, importantly, wrong. In other words, their role as a researcher is often driven by their role as an advocate, and the quality of their work and thinking can often take a back seat to their social goals.

When megaphones fail, try using research to make yourself louder

Two such examples are highlighted in a recent paper by Eagly (2016), both of which can broadly be considered to focus on the topic of diversity in the workplace. I want to summarize them quickly before turning to some of the other facets of the paper I find noteworthy. The first case concerns the prospect that having more women on corporate boards tends to increase their profitability, a point driven by a finding that Fortune 500 companies in the top quarter of female representation on boards of directors performed better than those in the bottom quarter of representation. Eagly (2016) rightly notes that such a basic data set would be all but unpublishable in academia for failing to do a lot of important things. Indeed, when more sophisticated research was considered in a meta-analysis of 140 studies, the gender diversity of the board of directors had about as close to no effect as possible on financial outcomes: the average correlations across all the studies ranged from about r = .01 all the way up to r = .05 depending on what measures were considered. Gender diversity per se seemed to have no meaningful effect despite a variety of advocacy sources claiming that increasing female representation would provide financial benefits. Rather than considering the full scope of the research, the advocates tended to cite only the most simplistic analyses that provided the conclusion they wanted (others) to hear.

The second area of research concerned how demographic diversity in work groups can affect performance. The general assumption that is often made about diversity is that it is a positive force for improving outcomes, given that a more cognitively-varied group of people can bring a greater number of skills and perspectives to bear on solving tasks than more homogeneous groups can. As it turns out, however, another meta-analysis of 146 studies concluded that demographic diversity (both in terms of gender and racial makeup) had effectively no impact on performance outcomes: the correlation for gender was r = -.01 and was r = -.05 for racial diversity. By contrast, differences in skill sets and knowledge had a positive, but still very small effect (r = .05). In summary, findings like these would suggest that groups don’t get better at solving problems just because they’re made up of enough [men/women/Blacks/Whites/Asians/etc]. Diversity in demographics per se, unsurprisingly, doesn’t help to magically solve complex problems.

While Eagly (2016) appears to generally be condemning the role of advocacy in research when it comes to getting things right (a laudable position), there were some passages in the paper that caught my eye. The first of these concerns what advocates for causes should do when the research, taken as a whole, doesn’t exactly agree with their preferred stance. In this case, Eagly (2016) focuses on the diversity research that did not show good evidence for diverse groups leading to positive outcomes. The first route one might take is to simply misrepresent the state of the research, which is obviously a bad idea. Instead, Eagly suggests advocates take one of two alternative routes: first, she recommends that researchers might conduct research into more specific conditions under which diversity (or whatever one’s preferred topic is) might be a good thing. This is an interesting suggestion to evaluate: on the one hand, people would often be inclined to say it’s a good idea; in some particular contexts diversity might be a good thing, even if it’s not always, or even generally, useful. This wouldn’t be the first time effects in psychology are found to be context-dependent. On the other hand, this suggestion also runs some serious risks of inflating type 1 errors. Specifically, if you keep slicing up data and looking at the issue in a number of different contexts, you will eventually uncover positive results even if they’re just due to chance. Repeated subgroup or subcontext analysis doesn’t sound much different from the questionable statistical practices currently being blamed for psychology’s replication problem: just keep conducting research and only report the parts of it that happened to work, or keep massaging the data until the right conclusion falls out.    

“…the rest goes in the dumpster out back”

Eagly’s second suggestion I find a bit more worrisome: arguing that relevant factors – like increases in profits, productivity, or finding better solutions – aren’t actually all that relevant when it comes to justifying why companies should increase diversity. What I find odd about this is that it seems to suggest that the advocates begin with their conclusion (in this case, that diversity in the work force ought to be increased) and then just keep looking for ways to justify it in spite of previous failures to do so. Again, while it is possible that there are benefits to diversity which aren’t yet being considered in the literature, bad research would likely result from a process where someone starts their analysis with the conclusion and keeps going until they justify it to others, no matter how often it requires shifting the goal posts. A major problematic implication with that suggestion mirrors other aspects of the questionable psychology research practices I mentioned before: when a researcher finds the conclusion they’re looking for, they stop looking. They only collect data up until the point it is useful, which rigs the system in favor of finding positive results where there are none. That could well mean, then, that there will be negative consequences to these diversity policies which are not being considered. 

What I think is a good example of this justification problem leading to shoddy research practices/interpretation follows shortly thereafter. In talking about some of these alternative benefits that more female hires might have, Eagly (2016) notes that women tend to be more compassionate and egalitarian than men; as such, hiring more women should be expected to increase less-considered benefits, such as a reduction in the laying-off of employees during economic downturns (referred to as labor hoarding), or more favorable policies towards time off for family care. Now something like this should be expected: if you have different people making the decisions, different decisions will be made. Forgoing for the moment the question of whether those different policies are better, in some objective sense of the word, if one is interested in encouraging those outcomes (that is, they’re preferred by the advocate) then one might wish to address those issue directly, rather than by proxy. That is to say if you are looking to make the leadership of some company more compassionate, then it makes sense to test for and hire more compassionate people, not hiring more women under the assumption you will be increasing compassion. 

This is an important matter because people are not perfect statistical representations of the groups to which they belong. On average, women may be more compassionate than men; the type of woman who is interested in actively pursuing a CEO position in a Fortune 500 company might not be as compassionate as your average woman, however, and, in fact, might even be less compassionate than a particular male candidate. What Eagly (2016) has ended up reaching, then, is not a justification for hiring more women; it’s a justification for hiring compassionate or egalitarian people. What is conspicuously absent from this section is a call for more research to be conducted on contexts in which men might be more compassionate than women; once the conclusion that hiring women is a good thing has been justified (in the advocate’s mind, anyway), the concerns for more information seem to sputter out. It should go without saying, but such a course of action wouldn’t be expected to lead to the most accurate scientific understanding of our world.

The solution to that problem being more diversity, of course..

To place this point in another quick example, if you’re looking to assemble a group of tall people, it would be better to use people’s height when making that decision rather than their sex, even if men do tend to be taller than women. Some advocates might suggest that being male is a good enough proxy for height, so you should favor male candidates; others would suggest that you shouldn’t be trying to assemble a group of tall people in the first place, as short people offer benefits that tall ones don’t; other still will argue that it doesn’t matter if short people don’t offer benefits as they should be preferentially selected to combat negative attitudes towards the short regardless (at the expense of selecting tall candidates). For what it’s worth, I find the attitude of “keep doing research until you justify your predetermined conclusion” to be unproductive and indicative of why the relationship between advocates and researchers ought not be a close one. Advocacy can only serve as a cognitive constraint that decreases research quality as the goal of advocacy is decidedly not truth. Advocates should update their conclusions in light of the research; not vice versa. 

References: Eagly, A. (2016). When passionate advocates meet research on diversity, does the honest broker stand a chance? Journal of Social Issues, 72, 199-222.