Psychology Research And Advocacy

I get the sense that many people get a degree in psychology because they’re looking to help others (since most clearly aren’t doing it for the pay). For those who get a degree in the clinical side of the field, this observation seems easy to make; at the very least, I don’t know of any counselors or therapists who seek to make their clients feel worse about the state their life is in and keep them there. For those who become involved in the research end of psychology, I believe this desire to help others is still a major motivator. Rather than trying to help specific clients, however, many psychological researchers are driven by a motivation to help particular groups in society: women, certain racial groups, the sexually promiscuous, the outliers, the politically liberal, or any group that the researcher believes to be unfairly marginalized, undervalued, or maligned. Their work is driven by a desire to show that the particular group in question has been misjudged by others, with those doing the misjudging being biased and, importantly, wrong. In other words, their role as a researcher is often driven by their role as an advocate, and the quality of their work and thinking can often take a back seat to their social goals.

When megaphones fail, try using research to make yourself louder

Two such examples are highlighted in a recent paper by Eagly (2016), both of which can broadly be considered to focus on the topic of diversity in the workplace. I want to summarize them quickly before turning to some of the other facets of the paper I find noteworthy. The first case concerns the prospect that having more women on corporate boards tends to increase their profitability, a point driven by a finding that Fortune 500 companies in the top quarter of female representation on boards of directors performed better than those in the bottom quarter of representation. Eagly (2016) rightly notes that such a basic data set would be all but unpublishable in academia for failing to do a lot of important things. Indeed, when more sophisticated research was considered in a meta-analysis of 140 studies, the gender diversity of the board of directors had about as close to no effect as possible on financial outcomes: the average correlations across all the studies ranged from about r = .01 all the way up to r = .05 depending on what measures were considered. Gender diversity per se seemed to have no meaningful effect despite a variety of advocacy sources claiming that increasing female representation would provide financial benefits. Rather than considering the full scope of the research, the advocates tended to cite only the most simplistic analyses that provided the conclusion they wanted (others) to hear.

The second area of research concerned how demographic diversity in work groups can affect performance. The general assumption that is often made about diversity is that it is a positive force for improving outcomes, given that a more cognitively-varied group of people can bring a greater number of skills and perspectives to bear on solving tasks than more homogeneous groups can. As it turns out, however, another meta-analysis of 146 studies concluded that demographic diversity (both in terms of gender and racial makeup) had effectively no impact on performance outcomes: the correlation for gender was r = -.01 and was r = -.05 for racial diversity. By contrast, differences in skill sets and knowledge had a positive, but still very small effect (r = .05). In summary, findings like these would suggest that groups don’t get better at solving problems just because they’re made up of enough [men/women/Blacks/Whites/Asians/etc]. Diversity in demographics per se, unsurprisingly, doesn’t help to magically solve complex problems.

While Eagly (2016) appears to generally be condemning the role of advocacy in research when it comes to getting things right (a laudable position), there were some passages in the paper that caught my eye. The first of these concerns what advocates for causes should do when the research, taken as a whole, doesn’t exactly agree with their preferred stance. In this case, Eagly (2016) focuses on the diversity research that did not show good evidence for diverse groups leading to positive outcomes. The first route one might take is to simply misrepresent the state of the research, which is obviously a bad idea. Instead, Eagly suggests advocates take one of two alternative routes: first, she recommends that researchers might conduct research into more specific conditions under which diversity (or whatever one’s preferred topic is) might be a good thing. This is an interesting suggestion to evaluate: on the one hand, people would often be inclined to say it’s a good idea; in some particular contexts diversity might be a good thing, even if it’s not always, or even generally, useful. This wouldn’t be the first time effects in psychology are found to be context-dependent. On the other hand, this suggestion also runs some serious risks of inflating type 1 errors. Specifically, if you keep slicing up data and looking at the issue in a number of different contexts, you will eventually uncover positive results even if they’re just due to chance. Repeated subgroup or subcontext analysis doesn’t sound much different from the questionable statistical practices currently being blamed for psychology’s replication problem: just keep conducting research and only report the parts of it that happened to work, or keep massaging the data until the right conclusion falls out.    

“…the rest goes in the dumpster out back”

Eagly’s second suggestion I find a bit more worrisome: arguing that relevant factors – like increases in profits, productivity, or finding better solutions – aren’t actually all that relevant when it comes to justifying why companies should increase diversity. What I find odd about this is that it seems to suggest that the advocates begin with their conclusion (in this case, that diversity in the work force ought to be increased) and then just keep looking for ways to justify it in spite of previous failures to do so. Again, while it is possible that there are benefits to diversity which aren’t yet being considered in the literature, bad research would likely result from a process where someone starts their analysis with the conclusion and keeps going until they justify it to others, no matter how often it requires shifting the goal posts. A major problematic implication with that suggestion mirrors other aspects of the questionable psychology research practices I mentioned before: when a researcher finds the conclusion they’re looking for, they stop looking. They only collect data up until the point it is useful, which rigs the system in favor of finding positive results where there are none. That could well mean, then, that there will be negative consequences to these diversity policies which are not being considered. 

What I think is a good example of this justification problem leading to shoddy research practices/interpretation follows shortly thereafter. In talking about some of these alternative benefits that more female hires might have, Eagly (2016) notes that women tend to be more compassionate and egalitarian than men; as such, hiring more women should be expected to increase less-considered benefits, such as a reduction in the laying-off of employees during economic downturns (referred to as labor hoarding), or more favorable policies towards time off for family care. Now something like this should be expected: if you have different people making the decisions, different decisions will be made. Forgoing for the moment the question of whether those different policies are better, in some objective sense of the word, if one is interested in encouraging those outcomes (that is, they’re preferred by the advocate) then one might wish to address those issue directly, rather than by proxy. That is to say if you are looking to make the leadership of some company more compassionate, then it makes sense to test for and hire more compassionate people, not hiring more women under the assumption you will be increasing compassion. 

This is an important matter because people are not perfect statistical representations of the groups to which they belong. On average, women may be more compassionate than men; the type of woman who is interested in actively pursuing a CEO position in a Fortune 500 company might not be as compassionate as your average woman, however, and, in fact, might even be less compassionate than a particular male candidate. What Eagly (2016) has ended up reaching, then, is not a justification for hiring more women; it’s a justification for hiring compassionate or egalitarian people. What is conspicuously absent from this section is a call for more research to be conducted on contexts in which men might be more compassionate than women; once the conclusion that hiring women is a good thing has been justified (in the advocate’s mind, anyway), the concerns for more information seem to sputter out. It should go without saying, but such a course of action wouldn’t be expected to lead to the most accurate scientific understanding of our world.

The solution to that problem being more diversity, of course..

To place this point in another quick example, if you’re looking to assemble a group of tall people, it would be better to use people’s height when making that decision rather than their sex, even if men do tend to be taller than women. Some advocates might suggest that being male is a good enough proxy for height, so you should favor male candidates; others would suggest that you shouldn’t be trying to assemble a group of tall people in the first place, as short people offer benefits that tall ones don’t; other still will argue that it doesn’t matter if short people don’t offer benefits as they should be preferentially selected to combat negative attitudes towards the short regardless (at the expense of selecting tall candidates). For what it’s worth, I find the attitude of “keep doing research until you justify your predetermined conclusion” to be unproductive and indicative of why the relationship between advocates and researchers ought not be a close one. Advocacy can only serve as a cognitive constraint that decreases research quality as the goal of advocacy is decidedly not truth. Advocates should update their conclusions in light of the research; not vice versa. 

References: Eagly, A. (2016). When passionate advocates meet research on diversity, does the honest broker stand a chance? Journal of Social Issues, 72, 199-222.

More About Psychology Research Replicating

By now, many of you have no doubt heard about the reproducibility project, where 100 psychological findings were subjected to replication attempts. In case you’re not familiar with it, the results of this project were less than a ringing endorsement of research in the field: of the expected 89 replications, only 37 were obtained and the average size of the effects fell dramatically; social psychology research in particular seemed uniquely bad in this regard. This suggests that, in many cases, one would be well served by taking many psychological findings with a couple grains of salt. Naturally, this leads many people to wonder whether there’s anyway they might be more confident that an effect is real, so to speak. One possible means through which your confidence might be bolstered is whether or not the research in question contains conceptual replications. What this refers to are cases where the authors of a manuscript report the results of several different studies purporting to measure the same underlying thing with varying methods; that is, they are studying topic A with methods X, Y, and Z. If all of these turn up positive, you ought to be more confident that an effect is real. Indeed, I have had a paper rejected more than once for only containing a single experiment. Journals often want to see several studies in one paper, and that is likely part of the reason why: a single experiment is surely less reliable than multiple ones.

It doesn’t go anywhere, but at least it does so reliably

According to the unknown moderator account of replication failure, psychological research findings are, in essence, often fickle. Some findings might depend on the time of day that measurements were taken, the country of the sample, some particular detail of the stimulus material, whether the experimenter is a man or a woman; you name it. In other words, it is possible that these published effects are real, but only occur in some rather specific contexts of which we are not adequately aware; that is to say they are moderated by unknown variables. If that’s the case, it is unlikely that some replication efforts will be successful, as it is quite unlikely that all of the unique, unknown, and unappreciated moderators will be replicated as well. This is where conceptual replications come in: if a paper contains two, three, or more different attempts at studying the same topic, we should expect that the effect they turn up is more likely to extend beyond a very limited set of contexts and should replicate more readily.

That’s a flattering hypothesis for explaining these replication failures; there’s just not enough replication going on prepublication, so limited findings are getting published as if they were more generalizable. The less-flattering hypothesis is that many researchers are, for lack of a better word, cheating by employing dishonest research tactics. These tactics can include hypothesizing after data is collected, only collecting participants until the data says what the researchers want and then stopping, splitting samples up into different groups until differences are discovered, and so on. There’s also the notorious issue of journals only publishing positive results rather than negative ones (creating a large incentive to cheat, as punishment for doing so is all but non-existent so long as you aren’t just making up the data). It is for these reasons that requiring the pre-registering of research – explicitly stating what you’re going to look at ahead of time – drops positive findings markedly. If research is failing to replicate because the system is being cheated, more internal replications (those from the same authors) don’t really help that much when it comes to predicting external replications (those conducted by outside parties). Internal replications just provide researchers the ability to report multiple attempts at cheating.

These two hypotheses make different predictions concerning the data from the aforementioned reproducibility project: specifically, research containing internal replications ought to be more likely to successfully replicate if the unknown moderator hypothesis is accurate. It certainly would be a strange state of affairs from a “this finding is true” perspective if multiple conceptual replications were no more likely to prove reproducible than single-study papers. It would be similar to saying that effects which have been replicated are no more likely to subsequently replicate than effects which have not. By contrast, the cheating hypothesis (or, more politely, questionable research practices hypothesis) has no problem at all with the idea that internal replications might prove to be as externally replicable as single-study papers; cheating a finding out three times doesn’t mean it’s more likely to be true than cheating it out once.

It’s not cheating; it’s just a “questionable testing strategy”

This brings me to a new paper by Kunert (2016) who reexamined some of the data from the reproducibility project. Of the 100 original papers, 44 contained internal replications: 20 contained just one replication, 10 were replicated twice, 9 were replicated 3 times, and 5 contained more than three. These were compared against the 56 papers which did not contain internal replications to see which would subsequently replicate better (as measured by achieving statistical significance). As it turned out, papers with internal replications externally replicated about 30% of the time, whereas papers without internal replications externally replicated about 40% of the time. Not only were the internally-replicated papers not substantially better, they were actually slightly worse in that regard. A similar conclusion was reached regarding the average effect size: papers with internal replications were no more likely to subsequently contain a larger effect size, relative to papers without such replications.

It is possible, of course, that papers containing internal replications are different than papers which do not contain such replications. This means it might be possible that internal replications are actually a good thing, but their positive effects are being outweighed by other, negative factors. For example, someone proposing a particularly novel hypothesis might be inclined to include more internal replications in their paper than someone studying an established one; the latter researcher doesn’t need more replications in his paper to get it published because the effect has already been replicated in other work. Towards examining this point, Kunert (2016) made use of the 7 identified reproducibility predictors from the Open Science Collaboration – field of study, effect type, original P-value, original effect size, replication power, surprisingness of original effect, and the challenge of conducting the replication – to assess whether internally-replicated work differed in any notable ways from the non-internally-replicated sample. As it turns out, the two samples were pretty similar overall on all the factors except one: field of study. Internally-replicated effects tended to come from social psychology more frequently (70%) than cognitive psychology (54%). As I mentioned before, social psychology papers did tend to replicate less often. However, the unknown moderator effect was not particularly well supported for either field when examined individually.

In summary, then, papers containing internal replications were no more likely to do well when it came to external replications which, in my mind, suggests that something is going very wrong in the process somewhere. Perhaps researchers are making use of their freedom to analyze and collect data as they see fit in order deliver the conclusions they want to see; perhaps journals are preferentially publishing the findings of people who got lucky, relative to those who got it right. These possibilities, of course, are not mutually exclusive. Now I suppose one could continue to make an argument that goes something like, “papers that contain conceptual replications are more likely to be doing something else different, relative to papers with only a single study,” which could potentially explain the lack of strength provided by internal replications, and whatever that “something” is might not be directly tapped by the variables considered in the current paper. In essence, such an argument would suggest that there are unknown moderators all the way down.

“…and that turtle stands on the shell of an even larger turtle…”

While it’s true enough that such an explanation is not ruled out by the current results, it should not be taken as any kind of default stance on why this research is failing to replicate. The “researchers are cheating” explanation strikes me as a bit more plausible at this stage, given that there aren’t many other obvious explanations for why ostensibly replicated papers are no better at replicating. As Kunert (2016) plainly puts it:

This report suggests that, without widespread changes to psychological science, it will become difficult to distinguish it from informal observations, anecdotes and guess work.

This brings us to the matter of what might be done about the issue. There are procedural ways of attempting to address the problem – such as Kunert’s (2016) recommendation for getting journals to publish papers independent of their results – but my focus has, and continues to be, on the theoretical aspects of publication. Too many papers in psychology get published without any apparent need for the researchers to explain their findings in any meaningful sense; instead, they usually just restate and label their findings, or they posit some biologically-implausible function for what they found. Without the serious and consistent application of evolutionary theory to psychological research, implausible effects will continue to be published and subsequently fail to replicate because there’s otherwise little way to tell whether a finding makes sense. By contrast, I find it plausible that unlikely effects can be more plainly spotted – by reviewers, readers, and replicators – if they are all couched within the same theoretical framework; even better, the problems in design can be more easily identified and rectified by considering the underlying functional logic, leading to productive future research.  

References: Kunert, R. (2016). Internal conceptual replications do not increase independent replication success. Psychological Bulletin Review, DOI 10.3758/s13423-016-1030-9

Morality, Alliances, And Altruism

Having one’s research ideas scooped is part of academic life. Today, for instance, I’d like to talk about some research quite similar in spirit to work I intended to do as part of my dissertation (but did not, as it didn’t end up making the cut in the final approved package). Even if my name isn’t on it, it is still pleasing to see the results I had anticipated. The idea itself arose about four years ago, when I was discussing the curious case of Tucker Max’s donation to Planned Parenthood being (eventually) rejected by the organization. To quickly recap, Tucker was attempting to donate half-a-million dollars to the organization, essentially receiving little more than a plaque in return. However, the donation was rejected, it would seem, under fear of building an association between the organization and Tucker, as some people perceived Tucker to be a less-than-desirable social asset. This, of course, is rather strange behavior, and we would recognize it as such if it were observed in any other species (e.g., “this cheetah refused a free meal for her and her cubs because the wrong cheetah was offering it”); refusing free benefits is just peculiar.

“Too rich for my blood…”

As it turns out, this pattern of behavior is not unique to the Tucker Max case (or the Kim Kardashian one…); it has recently been empirically demonstrated by Tasimi & Wynn (2016), who examined how children respond to altruistic offers from others, contingent on the moral character of said others. In their first experiment, 160 children between the ages of 5 and 8 were recruited to make an easy decision; they were shown two pictures of people and told that the people in the pictures wanted to give them stickers, and they had to pick which one they wanted to receive the stickers from. In the baseline conditions, one person was offering 1 sticker, while the other was offering either 2, 4, 8, or 16 stickers. As such, it should come as no surprise that the person offering more stickers was almost universally preferred (71 of the 80 children wanted the person offering more, regardless of how many more).

Now that we’ve established that more is better, we can consider what happened in the second condition where the children received character information about their benefactors. One of the individuals was said to always be mean, having hit someone the other day while playing; the other was said to always be nice, having hugged someone the other day instead. The mean person was always offering more stickers than the nice one. In this condition, the children tended to shun the larger quantity of stickers in most cases: when the sticker ratio was 2:1, less than 25% of children accepted the larger offer from the mean person; the 4:1 and 8:1 ratios were accepted about 40% of the time, and the 16:1 ratio 65% of the time. While more is better in general, it is apparently not better enough for children to overlook the character information at times. People appear willing to forgo receiving altruism when it’s coming from the wrong type of person. Fascinating stuff, especially when one considers that such refusals end up leaving the wrongdoers with more resources than they would otherwise have (if you think someone is mean, wouldn’t you be better off taking those resources from them, rather than letting them keep them?).

This line was replicated in 64 very young children (approximately one-year old). In this experiment, the children observed a puppet show in which two puppets offered them crackers, with one offering a single cracker and the other offering either 2 or 8. Again, unsurprisingly, the majority of children accepted the larger offer, regardless of how much larger it was (24 of 32 children). In the character information condition, one puppet was shown to be a helper, assisting another puppet in retrieving a toy from a chest, whereas the other puppet was a hinderer, preventing another from retrieving a toy. The hindering puppet, as before, now offered the greater number of crackers, whereas the helper only offered one cracker. When the hindering puppet was offering 8 crackers, his offer was accepted about 70% of the time, which did not differ from the baseline group. However, when the hindering puppet was only offering 2, the acceptance rate was a mere 19%. Even young children, it would seem, are willing to avoid accepting altruism from wrongdoers, assuming the difference in offers isn’t too large.

“He’s not such a bad guy once you get $10 from him”

While neat, these results beg for a deeper explanation as to why we should expect such altruism to be rejected. I believe hints of this explanation are provided by the way Tasimi & Wynn (2016) write about their results:

Taken together, these findings indicate that when the stakes are modest, children show a strong tendency to go against their baseline desire to optimize gain to avoid ‘‘doing business” with a wrongdoer; however, when the stakes are high, children show more willingness to ‘‘deal with the devil…”

What I find strange about that passage is that children in the current experiments were not “doing business” or “making deals” with the altruists; there was no quid pro quo going on. The children were no more doing business with the others than they are doing business with a breastfeeding mother. Nevertheless, there appears to an implicit assumption being made here: an individual who accepts altruism from another is expected to pay that altruism back in the future. In other words, merely receiving altruism from another generates the perception of a social association between the donor and recipient.

This creates an uncomfortable situation for the recipient in cases where the donor has enemies. Those enemies are often interested in inflicting costs on the donor or, at the very least, withholding benefits from him. In the latter case, this makes that social association with the donor less beneficial than it otherwise might, since the donor will have fewer expected future resources to invest in others if others don’t help him; in the former case, not only does the previous logic hold, but the enemies of your donor might begin to inflict costs on you as well, so as to dissuade you from helping him. Putting this into a quick example Jon – your friend – goes out an hurts Bob, say, by sleeping with Bob’s wife. Bob and his friends, in response, both withhold altruism from Jon (as punishment) and might even be inclined to attack him for his transgression. If they perceive you as helping Jon – either by providing him with benefits or by preventing them from hurting Jon – they might be inclined to withhold benefits from or punish you as well until you stop helping Jon as a means of indirect punishment. To turn the classic phrase, the friend of my enemy is also my enemy (just as the enemy of my enemy is my friend).

What cues might they use to determine if you’re Jon’s ally? Well, one likely useful cue is whether Bob directs altruism towards you. If you are accepting his altruism, this is probably a good indication that you will be inclined to reciprocate it later (else risk being labeled a social cheater or free rider). If you wish to avoid condemnation and punishment by proxy, then, one route to take is to refuse benefits from questionable sources. This risk can be overcome, however, in cases where the morally-questionable donor is providing you a large enough benefit which, indeed, was precisely the pattern of results observed here. What will determine what counts as “large enough” should be expected to vary as a function of a few things, most notably the size and nature of the transgressions, as well as the degree of expected reciprocity. For example, receiving large donations from morally-questionable donors should be expected to be more acceptable to the extent the donation is made anonymously vs publicly, as anonymity might reduce the perceived social associations between donor and recipient.

You might also try only using “morally clean” money

Importantly (as far as I’m concerned) this data fits well within my theory of morality – where morality is hypothesized to function as an association-management mechanism – but not particularly well with other accounts: altruistic accounts of morality should predict that more altruism is still better, dynamic coordination says nothing about accepting altruism, as giving isn’t morally condemned, and self-interest/mutualistic accounts would, I think, also suggest that taking more money would still be preferable since you’re not trying to dissuade others from giving. While I can’t help but feel some disappointment that I didn’t carry this research out myself, I am both happy with the results that came of it and satisfied with the methods utilized by the authors. Getting research ideas scooped isn’t so bad when they turn out well anyway; I’m just happy enough to see my main theory supported.  

References: Tasimi, A. & Wynn, K. (2016). Costly rejection of wrongdoers by infants and children. Cognition, 151, 76-79.

Benefiting Others: Motives Or Ends?

The world is full of needy people; they need places to live, food to eat, medical care to combat biological threats, and, if you ask certain populations in the first world, a college education. Plenty of ink has been spilled over the matter of how to best meet the needs of others, typically with a focus on uniquely needy populations, such as the homeless, poverty-stricken, sick, and those otherwise severely disadvantaged. In order to make meaningful progress in such discussions, there arises the matter of precisely why - in the functional sense of the word – people are interested in helping others, as I believe the answer(s) to that question will be greatly informative when it comes to determining the most effective strategies for doing so. What is very interesting about these discussions is that the focus is frequently placed on helping others altruistically; delivering benefits to others in ways that are costly for the person doing the helping. The typical example of this involves charitable donations, where I would give up some of my money so that someone else can benefit. What is interesting about this focus is that our altruistic systems often seem to face quite a bit of pushback from other parts of our psychology when it comes to helping others, resulting in fairly poor deliveries of benefits. It represents a focus on the means by which we help others, rather than really serving to improve the ends of effective helping. 

For instance. this sign isn’t asking for donations

As a matter of fact, the most common ways of improving the lives of others doesn’t involve any altruism at all. For an alternative focus, we might consider the classic Adam Smith quote pertaining to butchers and bakers:

But man has almost constant occasion for the help of his brethren, and it is in vain for him to expect it from their benevolence only. He will be more likely to prevail if he can interest their self-love in his favour, and show them that it is for their own advantage to do for him what he requires of them. Whoever offers to another a bargain of any kind, proposes to do this. Give me that which I want, and you shall have this which you want, is the meaning of every such offer; and it is in this manner that we obtain from one another the far greater part of those good offices which we stand in need of. It is not from the benevolence of the butcher, the brewer, or the baker that we expect our dinner, but from their regard to their own interest.

In short, Smith appears to recommend that, if we wish to effectively meet the needs of others (or have them meet our needs), we must properly incentivize that other-benefiting behavior instead of just hoping people will be willing to continuously suffer costs. Smith’s system, then, is more mutualistic or reciprocal in nature. There are a lot of benefits to trying to use these mutualistic and reciprocally-altruistic cognitive mechanisms, rather than altruistic ones, some of which I outlined last week. Specifically, altruistic systems typically direct benefits preferentially towards kin and social allies, and such a provincial focus is unlikely to deliver benefits to the needy individuals in the wider world particularly well (e.g., people who aren’t kin or allies). If, however, you get people to behave in a way that benefits themselves and just so happen to benefit others as a result, you’ll often end up with some pretty good benefit delivery. This is because you don’t need to coerce people into helping themselves.  

So let’s say we’re faced with a very real-world problem: there is a general shortage of organs available for people in need of transplants. What cognitive systems do we want to engage to solve that problem? We could, as some might suggest, make people more empathetic to the plight of those suffering in hospitals, dying from organ failure; we might also try to convince people that signing up as an organ donor is the morally-virtuous thing to do. Both of these plans might increase the number of people willing to posthumously donate their organs, but perhaps there are much easier and effective ways to get people to become organ donors even if they have no particular interest in helping others. I wanted to review two such candidate methods today, neither of which require that people’s altruistic cognitive systems be particular engaged.

The first method comes to us from Johnson & Goldstein (2003), who examine some cross-national data on rates of organ donor status. Specifically, they note an oddity in the data: very large and stable differences exist between nations in organ donor status, even after controlling for a number of potentially-relevant variables. Might these different rates exist because of people’s preferences for being an organ donor varying markedly between countries? It seems unlikely, unless people in Germany have an exceedingly unpopular opinion toward being an organ donor (14% are donors, from the figures cited), while people in Sweden are particularly interested in it (86%). In fact, in the US, support for organ donation is at near ceiling levels, yet a large gap persists between those who support it (95%) and those who indicated on a driver’s license they were donors (51% in 2005; 60% in 2015) or who had signed a donor card (30%). If it’s not people’s lack of support for such a policy, what is explaining the difference?

A poor national sense for graphic design?

Johnson & Goldstein (2003) float a simple explanation for most of the national differences: whether donor programs were opt-in or opt-out. What that refers to is the matter of, assuming someone has made no explicit decision as to what happens to their organs after they die, what decision would be treated as the default? In opt-in countries (like Germany and the US), non-donor status would be assumed unless someone signs up to be a donor; in opt-out countries, like Sweden, people are assumed to be donors unless they indicate that they do not wish to be one. As the authors report, the opt-in countries have much lower effective consent rates (on average, 60% lower) and the two groups represent non-overlapping populations. That data supplements the other experimental findings from Johnson & Goldstein (2003) as well. The authors had 161 participants take part in an experiment where they were asked to imagine they had moved to a new state. This state either treated organ donation as the default option or non-donation as the default, and participants were asked whether they would like to confirm or change their status. There was also a third condition where no default answer was provided. When no default answer was given, 79% of participants said they would be willing to be an organ donor; a percentage which did not differ from those who confirmed their donor status when it was the default (82%). However, when non-donor status was the default, only 42% of the participants changed their status to donor. 

So defaults seem to matter quite a bit, but let’s assume that a nation isn’t going to change its policy from opt-in to opt-out anytime soon. What else might we do if we wanted to improve the rates of people signing up to be an organ donor in the short term? Eyting et al (2016) tested a rather simple method: paying people €10. The researchers recruited 320 German university students who did not currently have an organ donor card and provided them the opportunity to fill one out. These participants were split into three groups: one in which there was no compensation offered for filling out the card, one in which they would personally receive €10 for filling out a card (regardless of which choice they picked: donor or non-donor), and a final condition in which €10 would be donated to a charitable organization (the Red Cross) if they filled out a card. No differences were observed between the percentage of participants who filled out the card between the control (35%) and charity (36%) conditions. However, in the personal benefit group, there was a spike in the number of people filling out the card (72%). Not all those who filled out the cards opted for donor status, though. Between conditions, the percentage of people who both (a) filled out the card and (b) indicated they wanted to be a donor where about 44% in the personal payment condition, 28% in the control condition, and only 19% in the charity group. Not only did the charity appeal not seem particularly effective, it was even nominally counterproductive.

“I already donated $10 to charity and now they want my organs too?!”

Now, admittedly, helping others because there’s something in it for you isn’t quite as sexy (figuratively speaking) as helping because you’re driven by an overwhelming sense of empathy, conscience, or simply helping for no benefit at all. This is because there’s a lower signal value in that kind of self-beneficial helping; it doesn’t predict future behavior in the absence of those benefits. As such, it’s unlikely to be particularly effective at building meaningful social connections between helpers and others. However, if the current data is any indication, such helping is also likely to be consistently effective. If one’s goal is to increase the benefits being delivered to others (rather than building social connections), that will often involve providing valued incentives for the people doing the helping.

On one final note, it’s worth mentioning that these papers only deal with people becoming a donor after death; not the prospect of donating organs while alive. If one wanted to, say, incentivize someone to donate a kidney while alive, a good way to do so might be to offer them money; that is, allow people to buy and sell organs they are already capable of donating. If people were allowed to engage in mutually-beneficial interactions when it came to selling organs, it is likely we would see certain organ shortages decrease as well. Unfortunately for those in need of organs and/or money, our moral systems often oppose this course of action (Tetlock, 2000), likely contingent on perceptions about which groups would be benefiting the most. I think this serves as yet another demonstration that our moral sense might not be well-suited for maximizing the welfare of people in the wider social world, much like our empathetic systems don’t.

References: Eyting, M., Hosemann, A., & Johannesson, M. (2016). Can monetary incentives increase organ donations? Economics Letters, 142, 56-58.

Johnson, E. & Goldstein, D. (2003). Do defaults save lives? Science, 132, 1338-1339.

Tetlock, P. (2000). Coping with trade-offs: Psychological constraints and political implications. In Elements of Reason: Cognition, Choice, & the Bounds of Rationality. Ed. Lupia, A., McCubbins, M., & Popkin, S. 239-322.  

Morality, Empathy, And The Value Of Theory

Let’s solve a problem together: I have some raw ingredients that I would like to transform into my dinner. I’ve already managed to prepare and combine the ingredients, so all I have left to do is cook them. How am I to solve this problem of cooking my food? Well, I need a good source of heat. Right now, my best plan is to get in my car and drive around for a bit, as I have noticed that, after I have been driving for some time, the engine in my car gets quite hot. I figure I can use the heat generated by driving to cook my food. It would come as no surprise to anyone if you have a couple of objections with my suggestion, mostly focused on the point that cars were never designed to solve the problems posed by cooking. Sure, they do generate heat, but that’s really more of a byproduct of their intended function. Further, the heat they do produce isn’t particularly well-controlled or evenly-distributed. Depending on how I position my ingredients or the temperature they require, I might end up with a partially-burnt, partially-raw dinner that is likely also full of oil, gravel, and other debris that has been kicked up into the engine. Not only is the car engine not very efficient at cooking, then, it’s also not very sanitary. You’d probably recommend that I try using a stove or oven instead.

“I’m not convinced. Get me another pound of bacon; I’m going to try again”

Admittedly, this example is egregious in its silliness, but it does make its point well: while I noted that my car produces heat, I misunderstood the function of the device more generally and tried to use it to solve a problem inappropriately as a result. The same logic also holds in cases where you’re dealing with evolved cognitive mechanisms. I examined such an issue recently, noting that punishment doesn’t seem to do a good job as a mechanism for inspiring trust, at least not relative to its alternatives. Today I wanted to take another run at the underlying issue of matching proximate problem to adaptive function, this time examining a different context: directing aid to the great number of people around the world who need altruism to stave off death and non-lethal, but still quite severe, suffering (issues like alleviating malnutrition and infectious diseases). If you want to inspire people to increase the amount of altruism directed towards these needy populations, you will need to appeal to some component parts of our psychology, so what parts should those be?

The first step in solving this problem is to think about what cognitive systems might increase the amount of altruism directed towards others, and then examine the adaptive function of each to determine whether they will solve the problem particularly efficiently. Paul Bloom attempted a similar analysis (about three years ago, but I’m just reading it now), arguing that empathetic cognitive systems seem like a poor fit for the global altruism problem. Specifically, Bloom makes the case that empathy seems more suited to dealing with single-target instances of altruism, rather than large-scale projects. Empathy, he writes, requires an identifiable victim, as people are giving (at least proximately) because they identify with the particular target and feel their pain. This becomes a problem, however, when you are talking about a population of 100 or 1000 people, since we simply can’t identify with that many targets at the same time. Our empathetic systems weren’t designed to work that way and, as such, augmenting their outputs somehow is unlikely to lead to a productive solution to the resource problems plaguing certain populations. Rather than cause us to give more effectively to those in need, these systems might instead lead us to over-invest further in a single target. Though Bloom isn’t explicit on this point, I feel he would likely agree that this has something to do with empathetic systems not having evolved because they solved the problems of others per se, but rather because they did things like help the empathetic person build relationships with specific targets, or signal their qualities as an associate to those observing the altruistic behavior.

Nothing about that analysis strikes me as distinctly wrong. However, provided I have understood his meaning properly, Bloom goes on to suggest that the matter of helping others involves the engagement of our moral systems instead (as he explains in this video, he believes empathy “fundamentally…makes the world worse,” in the moral sense of the term, and he also writes that there’s more to morality – in this case, helping others – than empathy). The real problem with this idea is that our moral systems are not altruistic systems, even if they do contain altruistic components (in much the same way that my car is not a cooking mechanism even if it does generate heat). This can be summed up in a number of ways, but simplest is in a study by Kurzban, DeScioli, & Fein (2012) in which participants were presented with the footbridge dilemma (“Would you push one person in front of a train – killing them – to save five people from getting killed by it in turn?”). If one was interested in being an effective altruist in the sense of delivering the greatest number of benefits to others, pushing is definitely the way to go under the simple logic that five lives saved is better than one life spared (assuming all lives have equal value). Our moral systems typically oppose this conclusion, however, suggesting that saving the lives of the five is impermissible if it means we need to kill the one. What is noteworthy about the Kurzban et al (2012) paper is that you can increase people’s willingness to push the one if the people in the dilemma (both being pushed and saved) are kin.

Family always has your back in that way…

The reason for this increase in pushing when dealing with kin, rather than strangers, seems to have something to do with our altruistic systems that evolved for delivering benefits to close genetic relatives; what we call kin-selected mechanisms (mammary glands being a prime example). This pattern of results from the footbridge dilemma suggests there is a distinction between our altruistic systems (that benefit others) and our moral ones; they function to do different things and, as it seems, our moral systems are not much better suited to dealing with the global altruism problem than empathetic ones. Indeed, one of the main features of our moral systems is nonconsequentialism: the idea that the moral value of an act depends on more than just the net consequences to others. If one is seeking to be an effective altruist, then, using the moral system to guide behavior seems to be a poor way to solve that problem because our moral system frequently focuses on behavior per se at the expense of its consequences. 

That’s not the only reason to be wary of the power of morality to solve effective altruism problems either. As I have argued elsewhere, our moral systems function to manage associations with others, most typically by strategically manipulating our side-taking behavior in conflicts (Marczyk, 2015). Provided this description of morality’s adaptive function is close to accurate, the metaphorical goal of the moral system is to generate and maintain partial social relationships. These partial relationships, by their very nature, oppose the goals of effective altruism, which are decidedly impartial in scope. The reasoning of effective altruism might, for instance, suggest that it would be better for parents to spend their money not on their child’s college tuition, but rather on relieving dehydration in a population across the world. Such a conclusion would conflict not only with the outputs of our kin-selected altruistic systems, but can also conflict with other aspects of our moral systems. As some of my own, forthcoming research finds, people do not appear to perceive much of a moral obligation for strangers to direct altruism towards other strangers, but they do perceive something of an obligation for friends and family to help each other (specifically when threatened by outside harm). Our moral obligations towards existing associates make us worse effective altruists (and, in Bloom’s sense of the word, morally worse people in turn).

While Bloom does mention that no one wants to live in that kind of strictly utilitarian world – one in which the welfare of strangers is treated equally to the welfare of friends and kin – he does seem to be advocating we attempt something close to it when he writes:

Our best hope for the future is not to get people to think of all humanity as family—that’s impossible. It lies, instead, in an appreciation of the fact that, even if we don’t empathize with distant strangers, their lives have the same value as the lives of those we love.

Appreciation of the fact that the lives of others have value is decidedly not the same thing as behaving as if they have the same value as the ones we love. Like most everyone else in the world, I want my friends and family to value my welfare above the welfare of others; substantially so, in fact. There are obvious adaptive benefits to such relationships, such as knowing that I will be taken care of in times of need. By contrast, if others showed no particular care for my welfare, but rather just sought to relieve as much suffering as they could wherever it existed in the world, there would be no benefit to my retaining them as associates; they would provide with me assistance or they wouldn’t, regardless of the energy I spent (or didn’t) maintaining social relationship with them. Asking the moral system to be a general-purpose altruism device is unlikely to be much more successful than asking my car to be an efficient oven, that people to treat others the world over as if they were kin, or that you empathize with 1000 people. It represents an incomplete view as to the functions of our moral psychology. While morality might be impartial with respect to behavior, it is unlikely to be impartial with regard to the social value of others (which is why, also in my forthcoming research, I find that stealing to defend against an outside agent of harm is rated as more morally acceptable than doing so to buy recreational drugs).  

“You have just as much value to me as anyone else; even people who aren’t alive yet”

To top this discussion off, it is also worth mentioning those pesky, unintended consequences that sometimes accompany even the best of intentions. By relieving deaths from dehydration, malaria, and starvation today, you might be ensuring greater harm in future generations in the form of increasing the rate of climate change, species extinction, and habitat destruction brought about by sustaining larger global human populations. Assuming for the moment that was true, would that mean that feeding starving people and keeping them alive today would be morally wrong? Both options – withholding altruism when it could be provided and ensuring harm for future generations – might get the moral stamp of disapproval, depending on the reference group (from the perspective of future generations dealing with global warming, it’s bad to feed; from the perspective of the starving people, it’s bad to not feed). This is why the slight majority of participants in Kurzban et al (2012) reported that pushing and not pushing can both be morally unacceptable courses of action.  If we are relying on our moral sense to guide our behavior in this instance, then, we would unlikely be very successful in our altruistic endeavors.

References: Kurzban, R., DeScioli, P., & Fein, D. (2012). Hamilton vs. Kant: Pitting adaptations for altruism against adaptation for moral judgment. Evolution & Human Behavior, 33, 323-333.

Marczyk, J. (2015). Moral alliance strategies theory. Evolutionary Psychological Science, 1, 77-90.

Examining Some Limited Data On Open Relationships

Thanks to Facebook, the topic of non-monogamous relationships has been crossing my screen with some regularity lately. One of the first instances involved the topic of cuckoldry: cases in which a man’s committed female partner will have sex with, and become pregnant by, another another man, often while the man in the relationship is fully aware of the situation; perhaps he’s even watching. The article discussing the matter came from Playboy which, at one point, suggested that cuckoldry porn is the second most common type of porn sought out in online searches; a statement that struck me as rather strange. While I was debating discussing that point – specifically because it doesn’t seem to be true (not only does cuckold porn, or related terms, not hold the number 2 slot in PornHub’s data searches, it doesn’t even crack the top 10 or 20 searches in any area of the world) – I decided it wasn’t worth a full-length feature, in no small part because I have no way of figuring out how such data was collected barring purchasing a book 

“To put our findings in context, please light $30 on fire”

The topic for today is not cuckoldry per se, but it is somewhat adjacent to the matter: open relationships and polyamory. Though the specifics of these relationships vary from couple to couple, the general arrangements being considered are relationships that are consensually non-monogamous, permitting one or more of the members to engage in sexual relationships with individuals outside of the usual dyad pair, at least in some contexts. Such relationships are indeed curious, as a quick framing of the issue in a nonhuman example would show. Imagine, for instance, that a researcher in the field observed a pair-bonded dyad of penguins. Every now and again, the resident male would allow – perhaps even encourage – his partner to go out and mate with another male. While such an arrangement might have its benefits for the female – such as securing paternity from a male of higher status than her mate – it would seem to be a behavior that is quite costly from the male’s perspective. The example can just as easily be flipped with regard to sex: a female that permitted her partner to go off and mate with/invest in the offspring of another female would seem to be suffering a cost, relative to a female that retained such benefits for herself. Within this nonhuman example, I suspect no one would be proposing that the penguins benefit from such an arrangement by removing pressure from themselves to spend time with their partners, or by allowing the other to do things they don’t want to do, like go out dancing. While humans are not penguins, discussing the behavior in the context of others other animals can remove some of less-useful explanations for it that are floated by people (in this case, people might quickly understand that couples can spend time apart and doing different things without needing to have sex with other partners).

The very real costs of such non-monogamous behavior can be seen in the form of psychological mechanisms governing sexual jealousy in men and women. If such behavior did not reliably carry costs for the other partner, mechanisms for sexual jealousy would not be expected to exist (and, in fact, they may well not exist for other species where associations between parents ends following copulation). The expectation of monogamy seems to be the key factor separating pair-bonds from other social associations – such as friendship and kinship – and when that expectation is broken in the form of infidelity, it often leads to the dissolution of the bond. Given that theoretical foundation, what are we to make of open relationships? Why do they exist? How stable are they, compared to monogamous relationships? Is it a lifestyle that just anyone might adopt successfully? At the outset, it’s worth noting that there doesn’t seem to exist a wealth of good empirical data on the matter, making it hard to answer such questions definitively. There are, however, two papers that discuss the topic I wanted to examine today to start making some progress on those fronts. 

The first study (Rubin & Adams, 1986) examined martial stability between monogamous and open relationships over a five-year period from 1978-1983 (though precisely how open these relationships were is unknown). Their total sample was unfortunately small, beginning with 41 demographically-matched couples per group and ending with 34 sexually-open couples and 39 monogamous ones (the authors refer to this as an “embarrassingly small” number). As for why the attrition rate obtained, two of the non-monogamous couples couldn’t be located and five of the couples had suffered a death, compared with one missing and one death in the monogamous group. Why so many deaths appeared to be concentrated in the open group is not mentioned, but as the average age of the sample at follow up was about 46 and the ages of the participants ranged from 20-80, is possible that age-related factors were responsible.

Concerning the stability of these relationships over those five years, the monogamous group reported a separation rate of 18%, while 32% of those in the open relationships reported no longer being together with their primary partner. Though this difference was not statistically significant, those in open relationships were nominally almost twice as likely to have broken up with their primary partner. Again, the sample size here is small, so interpreting those numbers is not a straightforward task. That said, Rubin & Adams (1986) also mention that both monogamous and open couples report similar levels of jealously and happiness in those relationships, regardless of whether they broke up or stayed together. 

However, there’s the matter of representativeness….

It’s difficult to determine how many couples we ought to have expected to have broken up during that time period, however. This study was conducted during the early 80s, and that time period apparently marked a high-point in US divorce frequency. That might put the separation figures in some different context, though it’s not easy to say what that context is: perhaps the monogamous/open couples were unusually likely to have stayed together/broken up, relative to the population they were drawn from. On top of being small, then, the sample might also fail to represent the general population. The authors insinuate as much, noting that they were using an opportunity sample for their research. Worth noting, for instance, is that about 90% of their subjects held a college degree, which is exceedingly high even by today’s standards (about 35% of contemporary US citizens do); a full half of them even had MAs, and 20% had PhDs (11% and 2% today). As such, getting a sense for the demographics of the broader polyamorous community – and how well they match the general population – might provide some hints (but not strong conclusions) as to whether such a lifestyle would work well for just anyone. 

Thankfully, a larger data set containing some demographics from polyamorous individuals does exist. Approximately 1,100 polyamorous people from English-speaking countries were recruited by Mitchell et al (2014) via hundreds of online sources. For inclusion, the participants needed to be at least 19 years old, currently involved in two or more relationships, and have partners that did not participate in the survey (so as to make the results independent of each other). Again, roughly 70% of their sample held an undergraduate degree or higher, suggesting that the more sexually-open lifestyle appear to disproportionately attract the well-educated (that, or their recruitment procedure was only capturing individuals very selectively). However, another piece of the demographic information from that study sticks out: reported sexual orientations. The males in Mitchell et al (2014) reported a heterosexual orientation about 60% of the time, whereas the females reported a heterosexual orientation a mere 20% of the time. The numbers for other orientations (male/female) were similarly striking: bisexual or pansexual (28%/68%), homosexual (3%/4%), or other (7%/9%).

There are two very remarkable things about that finding: first, the demographics from the polyamorous group are divergent – wildly so – from the general population. In terms of heterosexuality, general populations tend to report such an orientation about 97-99% of the time. To find, then, that heterosexual orientations dropped to about 60% in men and 20% in women represents a rather enormous gulf. Now it is possible that those reporting their orientation in the polyamorous sample were not being entirely truthful – perhaps by exaggerating – but I have no good reason to assume that is the case, nor would I be able to accurately estimate by how much those reports might be driven by social desirability concerns, assuming they are at all. That point aside, however, the second remarkable thing about this finding is that Mitchell et al (2014) don’t seem to even notice how strange it is, failing to make mention of that difference at all. Perhaps that’s a factor of it not really being the main thrust of their analysis, but I certainly find that piece of information worthy of deeper consideration. If your sample has a much greater degree of education and incidence of non-heterosexuality than is usual, that fact shouldn’t be overlooked.

Their most common major was in gettin’ down

In general, from this limited peek into the less-monogamous relationships and individuals in the world, the soundest conclusion one might be able to draw is that those who engage in such relationships are likely different than those who do not in some important regards; we can see that in the form of educational attainment and sexual orientation in the present data set, and it’s likely that other, unaccounted for differences exist as well. What those differences might or might not be, I can’t rightly say at the moment. Nevertheless, this non-representativeness could well explain why the polyamorists and monogamists have such difficulty seeing eye-to-eye on the issue of exclusivity. However, sexual topics tend to receive quite a bit of moralization in all directions, and this can impede good scientific progress in understanding the issue. If, for instance, one is seeking to make polyamory appear to be more normative, important psychological differences between groups might be overlooked (or not asked about/reported in the first place) in the interests of building acceptance; if one views them as something to be discouraged, one’s interpretation of the results will likely follow suit as well.

References: Mitchell, M., Bartholomew, K., & Cobb, R. (2014). Need fulfillment in polyamorous relationships. Journal of Sex Research, 21, 329-339.

Rubin, A. & Adams, J. (1986). Outcomes of sexually open marriages. The Journal of Sex Research, 22, 311-319.

Punishment Might Signal Trustworthiness, But Maybe…

As one well-known saying attributed to Maslow goes, “when all you have is hammer, everything looks like a nail.” If you can only do one thing, you will often apply that thing as a solution to a problem it doesn’t fit particularly well. For example, while a hammer might make for a poor cooking utensil in many cases, if you are tasked with cooking a meal and given only a hammer, you might try to make the best of a bad situation, using the hammer as an inefficient, makeshift knife, spoon, and spatula. That you might meet with some degree of success in doing so does not tell you that hammers function as cooking implements. Relatedly, if I then gave you a hammer and a knife, and tasked with you the same cooking jobs, I would likely observe that hammer use drops precipitously while knife use increases quite a bit. It is also worth bearing in mind that if the only task you have to do is cooking, the only conclusion I’m realistically capable of drawing concerns whether a tool is designed for cooking. That is, if I give you a hammer and a knife and tell you to cook something, I won’t be able to draw the inference that hammers are designed for dealing with nails because nails just aren’t present in the task.

Unless one eats nails for breakfast, that is

While all that probably sounds pretty obvious in the cooking context, a very similar set up appears to have been used recently to study whether third-party punishment (the punishment of actors by people not directly affected by their behavior; hereafter TPP) functions to signal the trustworthiness of the punisher. In their study, Jordan et al (2016) has participants playing a two-stage economic game. The first stage was a TPP game. In this game, there are three players: player A is the helper, and is given 30 cents, player B is the recipient, and given nothing, and player C is the punisher, given 20 cents. The helper can choose to either give the recipient 15 cents or nothing. If the helper decides to give nothing, the punisher then has the option to pay 5 cents to reduce the helper’s pay by 15 cents, or not do so. In this first stage, the first participant would either play one round as a helper or a punisher, or play two rounds: one in the role of the helper and another in the role of the punisher.

The second stage of this game involved a second participant. This participant observed the behavior of the people playing the first game, and then played a trust game with the first participant. In this trust game, the second participant is given 30 cents and decides how much, if any, to send to the first participant. Any amount sent is tripled, and then the first participant decides how much of that amount, if any, to send back. The working hypothesis of Jordan et al (2016) is that TPP will be used a signal of trustworthiness, but only when it is the only possible signal; when participants have an option to send better signals of trustworthiness – such as when they are in the roll of the helper, rather than the punisher – punishment will lose its value as a signal for trust. By contrast, helping should always serve as a good signal of trustworthiness, regardless of whether punishment is an option.

Indeed, this is precisely what they found. When the first participant was only able to punish, the second participant tended to trust punishers more, sending them 16% more in the trust game than non-punishers; in turn, the punishers also tended to be slightly more trustworthy, sending back 8% more than non-punishers. So, the punishers were slightly, though not substantially, more trustworthy than the non-punishers when punishing was all they could do. However, when participants were in the helper role (and not the punisher role), those who transferred money to the recipient were in turn trusted more – being sent an average of 39% more in the trust game than non-helpers – and were, in fact, more trustworthy – returning an average of 25% more than non-helpers. Finally, when the first participant was in the role of both the punisher and the helper, punishment was less common (30% of participants in both roles punished, whereas 41% of participants who were only punishers did) and, controlling for helping, punishers were only trusted with 4% more in the second stage and actually returned 0.3% less.

The final task was less about trust and more about upper-body strength

To sum up, then, when people only had the option to punish others, punishment behavior was used by observers as a cue to trustworthiness. However, when helping was possible as well, punishment ceased to predict trustworthiness. From this set of findings, the authors make the rather strange conclusion that “clear support” was found for their model of punishment as signaling trustworthiness. My enthusiasm for that interpretation is a bit more tepid. To understand why, we can return to my initial example: you have given people a tool (a hammer/punishment) and a task (cooking/a trust game). When they use this tool in the task, you see some results, but they aren’t terribly efficient (16% more trusted and 8% more returned). Then, you give them a second tool (a knife/helping) to solve the same task. Now the results are much better (39% more trusted, 25% more returned). In fact, when they have both tools, they don’t seem to use the first one to accomplish the task as much (punishment falls 11%) and, when they do, they don’t end up with better outcomes (4% more trusted, 0.3% less returned). From that data alone, I would say that the evidence does not support the inference that punishment is a mechanism for signaling trustworthiness. People might try using it in a pinch, but its value seems greatly diminished compared to other behaviors.  

Further, the only tasks people were doing involved playing a dictator and trust game. If punishment serves some other purpose beyond signaling trustworthiness, you wouldn’t be able to observe it there because people aren’t in the right contexts for it to be observed. To make that point clear, we could consider other examples. First, let’s consider murder. If I condemn murder morally and, as a third party, punish someone for engaging in murder, does this tell you that I am more trustworthy than someone else who doesn’t punish it themselves? Probably not; almost everyone condemns murder, at least in the abstract, but the costs of engaging in punishment aren’t the same for all people. Someone who is just as trustworthy might not be willing or able to suffer the associated costs. What about something a bit more controversial: let’s say that, as a third party, I punish people for obtaining or providing abortions. Does hearing about my punishment make me seem like a more trustworthy person? That probably depends on what side of the abortion issue you fall on.

To put this in more precise detail, here’s what I think is going on: the second participant – the one sending money in the trust game, so let’s call him the sender – primarily wants to get as much money back as possible in this context. Accordingly, they are looking for cues that the first participant – the one they’re trusting, or the recipient – is an altruist. One good cue for altruism is, well, altruism. If the sender sees that the recipient has behaved altruistically by giving someone else money, this is a pretty good cue for future altruism. Punishment, however, is not the same thing as altruism. From the point of the view of the person benefiting from the punishment, TPP is indeed altruistic; from the point of view of the target of that TPP, the punishment is spiteful. While punishment can contain this altruistic component, it is more about trading off the welfare of others, rather than providing benefits to people per se. While that altruistic component of punishment can be used as a cue for trustworthiness in a pinch when no other information is available, that does not suggest to me sending such a signal is its only, or even its primary function.

Sure, they can clean the floors, but that’s not really why I hired them

In the real world, people’s behaviors are not ever limited to just the punishment of perpetrators. If there are almost always better ways to signal one’s trustworthiness, then TPP’s role in that regard is likely quite low. For what it’s worth, I happen to think that the roll of TPP has more to do with using transient states of need to manage associations (friendships) with others, as such an explanation works well outside the narrow boundaries of the present paper when things other than unfairness are being punished and people are seeking to do more than make as much money as possible. Finding a good friend is not the same thing as finding a good altruist, and friendships do not usually resemble trust games. However, when all you are observing is unfairness and cooperation, TPP might end up looking a little bit like a mechanism for building trust. Sometimes. If you sort of squint a bit.

References: Jordan, K., Hoffman, M., Bloom, P. & Rand. D. (2016). Third-party punishment as a costly signal of trustworthiness. Nature, 530, 473-476.

Smart People Are Good At Being Dumb In Politics

While I do my best to keep politics out of my life – usually by selectively blocking people who engage in too much proselytizing via link spamming on social media – I will never truly be rid of it. I do my best to cull my exposure to politics, not because I am lazy and looking to stay uninformed about the issues, but rather because I don’t particularly trust most of the sources of information I receive to leave me better informed than when I began. Putting this idea in a simple phrase, people are biased. In these socially-contentious domains, we tend to look for evidence that supports our favored conclusions first, and only stop to evaluate it later, if we do at all. If I can’t trust the conclusions of such pieces to be accurate, I would rather not waste my time with them at all, as I’m not looking to impress a particular partisan group with my agreeable beliefs. Naturally, since I find myself disinterested in politics – perhaps even going so far as to say I’m biased against such matters – this should mean I am more likely to approve of research that concludes people engaged with political issues aren’t quite good at reaching empirically-correct conclusions. Speaking of which… 

“Holy coincidences, Batman; let’s hit them with some knowledge!”

A recent paper by Kahan et al (2013) examined how people’s political beliefs affected their ability to reach empirically-sound conclusions in the face of relevant evidence. Specifically, the authors were testing two competing theories for explaining why people tended to get certain issues wrong. The first of these is referred to as the Science Comprehension Thesis (SCT), which proposes that people tend to get different answers to questions like, “Is global warming affected by human behavior?” or “Are GMOs safe to eat?” simply because they lack sufficient education on such topics or possess poor reasoning skills. Put in more blunt terms, we might (and frequently do) say that people get the answers to such questions wrong because they’re stupid or ignorant. The competing theory the authors propose is called the Identity-Protective Cognition Thesis (ICT) which suggests that these debates are driven more by people’s desire to not be ostracized by their in-group, effectively shutting off their ability to reach accurate conclusions. Again, putting this in more blunt terms, we might (and I did) say that people get the answers to such questions wrong because they’re biased. They have a conclusion they want to support first, and evidence is only useful inasmuch as it helps them do that.

Before getting to the matter of politics, though, let’s first consider skin cream. Sometimes people develop unpleasant rashes on their skin and, when that happens, people will create a variety of creams and lotions designed to help heal the rash and remove its associated discomfort. However, we want to know if these treatments actually work; after all, some rashes will go away on their own, and some rashes might even get worse following the treatment. So we do what any good scientist does: we conduct an experiment. Some people will use the cream while others will not, and we track who gets better and who gets worse. Imagine, then, that you are faced with the following results from your research: of the people who did use the skin cream, 223 of them got better, while 75 got worse; of the people who did not use the cream, 107 got better, while 21 got worse. From this, can we conclude that the skin cream works?

A little bit of division tells us that, among those who used the cream, about 3 people got better for each 1 who got worse; among those not using the cream, roughly 5 people got better for each 1 who got worse. Comparing the two ratios, we can conclude that the skin cream is not effective; if anything, it’s having precisely the opposite result. If you haven’t guessed by now, this is precisely the problem that Kahan et al (2013) posed to 1,111 US adults (though they also flipped the numbers between the conditions so that sometimes the treatment was effective). As it turns out, this problem is by no means easy for a lot of people to solve: only about half the sample was able to reach the correct conclusion. As one might expect, though, the participant’s numeracy – their ability to use quantitative skills – did predict their ability to get the right answer: the highly-numerate participants got the answer right about 75% of the time; those in the low-to-moderate end of numeracy ability got it right only about 50% of the time.

“I need it for a rash. That’s my story and I’m sticking to it”

Kahan et al (2013) then switched up the story. Instead of participants reading about a skin cream, they instead read about gun legislation that banned citizens from carrying handguns concealed in public; instead of looking at whether a rash went away, they examined whether crime in the cities that enacted such bans went up or down, relative to those cities that did not. Beyond the change in variables, all the numbers remained exactly the same. Participants were asked whether the gun ban was effective at reducing crime.  Again, people were not particularly good at solving this problem either – as we would expect – but an interesting result emerged: the most numerate subjects were now only solving the problem correctly 57% of the time, as compared with 75% in the skin-cream group. The change of topic seemed to make people’s ability to reason about these numbers quite a bit worse.

Breaking the data down by political affiliations made it clear what was going on. The more numerate subjects were, again, more likely to get the answer to the question correct, but only when it accorded with their political views. The most numerate liberal democrats, for instance, got the answer right when the data showed that concealed carry bans resulted in decreased crime; when crime increased, however, they were not appreciably better at reaching that conclusion relative to the less-numerate democrats. This pattern was reversed in the case of conservative republicans: when the concealed carry bans resulted in increased crime, the more numerate ones got the question right more often; when the ban resulted in decreased crime, performance plummeted.

More interestingly still, the gap in performance was greatest for the more-numerate subjects. The average difference in getting the right answer among the highly-numerate individuals was about 45% between cases in which the conclusion of the experiment did or did not support their view, while it was only 20% in the case of the less-numerate ones. Worth noting is that these differences did not appear when people were thinking about the non-partisan skin-cream issue. In essence, smart people were either not using their numeracy skills regularly  in cases where it meant drawing unpalatable political conclusions, or they were using them and subsequently discarding the “bad” results. This is an empirical validation of my complaints about people ignoring base rates when discussing Islamic terrorism. Highly-intelligent people will often get the answers to these questions wrong because of their partisan biases, not because of their lack of education. They ought to know better – indeed, they do know better – but that knowledge isn’t doing them much good when it comes to being right in cases where that means alienating members of their social group.

That future generations will appreciate your accuracy is only a cold comfort

At the risk of repeating this point, numeracy seemed to increase political polarization, not make it better. These abilities are being used more to metaphorically high-five in-group members than to be accurate. Kahan et al (2013) try to explain this effect in two ways, one of which I think is more plausible than the other. On the implausible front, the authors suggest that using these numeracy abilities is a taxing, high-effort activity that people try to avoid whenever possible. As such, people with this numeracy ability only engage in effortful reasoning when their initial beliefs were threatened by some portion of the data. I find this idea strange because I don’t think that – metabolically – these kinds of tasks are particularly costly or effortful. On the more plausible front, Kahan et al (2013) suggest that these conclusions have a certain kind of rationality behind them: if drawing an unpalatable conclusion would alienate important social relations that one depends on for their own well-being, then an immediate cost/benefit analysis can favor being wrong. If you are wrong about whether GMOs are harmful, the immediate effects on you are likely quite small (unless you’re starving); on the other hand, if your opinion about them puts off your friends, the immediate social effects are quite large.

In other words, I think people sometimes interpret data in incorrect ways to suit their social goals, but I don’t think they avoid interpreting it properly because doing so is difficult.

References: Kahan, D., Peters, E., Dawson, E., & Slovic, P. (2013). Motivated numeracy and enlightened self-government. Yale Law School, Public Law Working Paper No. 307.

Men Are Better At Selling Things On eBay

When it comes to gender politics, never take the title of the piece at face value; or the conclusions for that matter.

In my last post, I mentioned how I find some phrases and topics act as red flags regarding the quality of research one is liable to encounter. Today, the topic is gender equality – specifically some perceived (and, indeed, some rather peculiar) discrimination against women – which is an area not renowned for its clear-thinking or reasonable conclusions. As usual, the news articles circulating this piece of research made some outlandish claim that lacks even remote face validity. In this case, the research in question concludes that people, collectively, try to figure out the gender of the people selling things on eBay so as to pay women substantially less than men for similar goods. Those who found such a conclusion agreeable to their personal biases spread it to others across social media as yet another example of how the world is an evil, unfair place. So here I am again, taking a couple recreational shots at some nonsense story of sexism.

Just two more of these posts and I get a free smoothie

The piece question today is an article from Kricheli-Katz & Regev (2016) that examined data from about 1.1 million eBay auctions. The stated goals of the authors involve examining gender inequality in online product markets, so at least we can be sure they’re going into this without an agenda. Kricheli-Katz & Regev (2016) open their piece by talking about how gender inequality is a big problem, launching their discussion almost immediately with a rehashing of that misleading 20% pay gap statistic that’s been floating around forever. As that claim has been dissected so many times at this point, there’s not much more to say about it other than (a) when controlling for important factors, it drops to single digits and (b) when you see it, it’s time to buckle in for what will surely be an unpleasant ideological experience. Thankfully, the paper does not disappoint in that regard, promptly suggesting that women are discriminated against in online markets like eBay.

So let’s start by considering what the authors did, and what they found. First, Kricheli-Katz & Regev (2016) present us with their analysis of eBay data. They restricted their research to auctions only, where sellers will post an item and any subsequent interaction occurs between bidders alone, rather than between bidders and sellers. On average, they found that the women had about 10 fewer months of experience than men, though the accounts of both sexes had existed for over nine years of age, and women also had very-slightly better reputations, as measured by customer feedback. Women also tended to set slightly higher initial prices than men for their auctions, controlling for the product being sold. As such, women also tended to receive slightly fewer bids on their items, and ultimately less money per sale when they ended.

However, when the interaction between sex and product type (new or used) was examined, the headline-grabbing result appeared: while women netted a mere 3% less on average for used products than men, they netted a more-impressive 20% less for new products (where, naturally, one expects products to be the same). Kricheli-Katz & Regev (2016) claim that the discrepancy in the new-product case are due to beliefs about gender. Whatever these unspecified beliefs are, they cause people to pay women about 20% less for the same item. Taking that idea on face value for a moment, why does that gap all but evaporate in the used category of sales? The authors attribute that lack of a real difference to an increased trust people have in women’s descriptions of the condition of their products. So men trust women more when it comes to used goods, but pay them less for new ones when trust is less relevant. Both these conclusions, as far as I can see from the paper, have been pulled directly out of thin air. There is literally no evidence presented to support them: no data; not citations; no anything.

I might have found the source of their interpretations

By this point, anyone familiar with how eBay works is likely a bit confused. After all, the sex of the seller is at no point readily apparent in almost any listings. Without that crucial piece of information, people would have a very difficult time discriminating on the basis of it. Never fear, though; Kricheli-Katz & Regev (2016) report the results of a second study where they pulled 100 random sellers from their sample and asked about 400 participants to try and determine the sex of sellers in question. Each participant offered their guesses about five profiles, for a total of 2000 attempts. About 55% of the time, participants got the sex right, 9% of the time they got it wrong, and the remaining 36% of the time, they said they didn’t know (which, since they don’t know, also means they got it wrong). In short, people couldn’t determine the sex reliably about half the time. The authors do mention that the guesses got better as participants viewed more items that the seller had posted, however.

So here’s the story they’re trying to sell: When people log onto eBay, they seek out a product they’re looking to buy. When they find a seller listing the product, they examine the seller’s username, the listing in question, and their other listings in their store to attempt and discern the sex of the seller. Buyers subsequently lower their willingness to pay for an item by quite a bit if they see it is being sold by a woman, but only if it’s new. In fact, since women made 20% less, the actual reduction in willingness to pay must be larger than that, as sex can only be determined about half of the time reliably when people are trying. Buyers do all this despite even trusting female sellers more. Also, I do want to emphasis the word they, as this would need to be a pretty collective action. If it wasn’t a fairly universal response among buyers, the prices of female-sold items would eventually even out with the male price, as those who discriminated less against women would be drawn towards the cheaper prices and bump them back up.

Not only do I not buy this story – not even a little – but I wouldn’t pay the authors less for it because they happen to be women if I was looking to make a purchase. While people might be able to determine the sex of the seller on eBay sometimes, when they’re specifically asked to do so, that does not mean people engage in this sort of behavior naturally.

Finally, Kricheli-Katz & Regev (2016) report the results of a third study, asking 100 participants how much they value a $100 gift card being sold by either an Alison or a Brad. Sure enough, people were willing to pay Alison less for the card: she got a mere $83 to Brad’s $87; a 5% difference. I’d say someone should call the presses, but it looks like they already did, judging from the coverage this piece has received. Now this looks like discrimination – because it is – but I don’t think it’s based on sex per se. I say that because, earlier in the paper, Kricheli-Katz & Regev (2016) also report that women as buyers on eBay, tended to pay about 3% more than men for comparable goods. To the extent that the $4 difference in valuation is meaningful here, there are two things to say about it. First, it may well represent the fact that women aren’t as willing to negotiate prices in their favor. Indeed, while women were 23% of the sellers on eBay, they only represented 16% of the auctions with a negotiation component. If that’s the case, people are likely willing to pay less to women because they perceive (correctly) some population differences in their ability to get a good deal. I suspect if you gave them individuating information about the seller’s abilities, sex would stop mattering even 5%. Second, that slight, 5% difference would by no means account for the 20% gap the authors report finding with respect to new product sales; not even close.

But maybe your next big idea will work out better…

Instead, my guess is that in spite of the authors’ use of the word “equally qualified” when referring to the men and women in their seller sample, there were some important differences in listings the buyers noticed; the type of differences that you can’t account for when you’re looking at over a million of them and rough control measures aren’t effective. Kricheli-Katz & Regev (2016) never seemed to consider – and I mean really consider – the possibility that something about these listings, something they didn’t control for, might have been driving sale price differences. While they do control for factors like the seller’s reputation, experience, number of pictures, year of the sale, and some of the sentiments expressed by words in the listing (how positive or negative it is), there’s more to making a good listing than that. A more likely story is that differences in sale prices reflect different behaviors on the part of male and female sellers (as we already know others differences exist in the sample), as the alternative story attempting to be championed would require a level of obsession with gender-based discrimination in the population so wide and deep that we wouldn’t need to research it; it would be plainly obvious to everyone already.

Then again, perhaps it’s time I make my way over to eBay to pick up a new tinfoil hat.

References: Kricheli-Katz, T. & Regev, T. (2016). How many cents on the dollar? Women and men in product markets. Science Advances, 2, DOI: 10.1126/sciadv.1500599

Thoughtful Suggestions For Communicating Sex Differences

Having spent quite a bit of time around the psychological literature – both academic and lay pieces alike – there are some words or phrases I can no longer read without an immediate, knee-jerk sense of skepticism arising in me, as if they taint everything that follows and precedes them. Included in this list are terms like bias, stereotype, discrimination, and, for the present purposes, fallacy. The reason these words elicit such skepticism on my end is due to the repeated failure of people using them to  consistently produce high-quality work or convincing lines of reasoning. This is almost surely due to the perceived social stakes when such terms are being used: if you can make members of a particular group appear uniquely talented, victimized, or otherwise valuable, you can subsequently direct social support towards and away from various ends. When the goal of argumentation becomes persuasion, truth is not a necessary component and can be pushed aside. Importantly, the people engaged in such persuasive endeavors do not usually recognize they are treating information or arguments differently, contingent on how it suits their ends.

“Of course I’m being fair about this”

There are few areas of research that seem to engender as much conflict – philosophically and socially – as sex differences, and it is here those words appear regularly. As there are social reasons people might wish to emphasize or downplay sex differences, it has steadily become impossible for me to approach most of the writing I see on the topic with the assumption it is at least sort of unbiased. That’s not to say every paper is hopelessly mired in a particular worldview, rejecting all contrary data, mind you; just that I don’t expect them to reflect earnest examinations of the capital-T, truth. Speaking of which, a new paper by Maney (2016) recently crossed my desk; a the paper that concerns itself with how sex differences get reported and how they ought to be discussed. Maney (2016) appears to take a dim view of the research on sex differences in general and attempts to highlight some perceived fallacies of people’s understandings of them. Unfortunately, for someone trying and educate people about issues surrounding the sex difference literature, the paper does not come off as one written by someone possessing a uniquely deep knowledge of the topic.

The first fallacy Maney (2016) seeks to highlight is the idea that sexes form discrete groups. Her logic for explaining why this is not the case revolves around the idea that while the sexes do indeed differ to some degree on a number of traits, they also often overlap a great deal on them. Instead, Maney (2016) argues that we ought to not be asking whether the sexes differ on a given trait, but rather by how much they do. Indeed, she even puts the word ‘differences’ in quotes, suggesting that these ‘differences’ between sexes aren’t, in many cases, real. I like this brief section, as it highlights well why I have grown to distrust words like fallacy. Taking her points in reverse order, if one is interested in how much groups (in this case, sexes) differ, then one must have, at least implicitly, already answered the question as whether or not they do. After all, if the sexes did not differ, it would pointless to talk about the extent of those non-differences; there simply wouldn’t be variation. Second, I know of zero researchers whose primarily interest resides in answering the question of whether the sexes differ to the exclusion of the extent of those differences. As far as I’m aware, Maney (2016) seems to be condemning a strange class of imaginary researchers who are content to find that a difference exists and then never look into it further or provide more details. Finally, I see little value in noting that the sexes often overlap a great deal when it comes to explaining the areas in which they do not. In much the same way, if you were interested in understanding the differences between humans and chimpanzees, you are unlikely to get very far by noting that we share a great deal of genes in common. Simply put, you can’t explain differences with similarities. If one’s goal is to minimize the perception of differences, though, this would be a helpful move.  

The second fallacy that Maney (2016) seeks to tackle is that idea that the cause of a sex differences in behavior can be attributed to differing brain structures. Her argument on this front is that it is logically invalid to do the following: (1) note that some brain structure between men and women differ, (2) note that this brain structure is related to a given behavior on which they also differ, and so (3) conclude that a sex difference in brain structure between men and women is responsible for that different behavior. Now while this argument is true within the rules of formal logic, it is clear that differences in brain structure will result in differences in behavior; the only way that idea could be false would be if brain structure was not connected to behavior, and I don’t know of anyone crazy enough to try and make that argument. The researchers engaging in the fallacy thus might not get the specifics right all the time, but their underlying approach is fine: if a difference exists in behavior (between sexes, species, or individuals), there will exist some corresponding structural differences in the brain. The tools we have for studying the matter are a far cry from perfect, making inquiry difficult, but that’s a different issue. Relatedly, then, noting that some formal bit of logic is invalid is assuredly not the same thing as demonstrating that a conclusion is incorrect or the general approach misguided. (Also worth noting is that the above validity issue stops being a problem when conclusions are probabilistic, rather than definitive.)

“Sorry, but it’s not logical to conclude his muscles might determine his strength”

The third fallacy Maney (2016) addresses is the idea that sex differences in the brain must be preprogrammed or fixed, attempting to dispel the notion that sex differences are rooted in biology and thus impervious to experience. In short, she is arguing against the idea of hard genetic determinism. Oddly enough, I have never met a single genetic determinist in person; in fact, I’ve never even read an article that advanced such an argument (though maybe I’ve just been unusually lucky…). As every writer on the subject I have come across has emphasized – often in great detail – the interactive nature of genes and environments in determining the direction of development, it again seems like Maney (2016) is attacking philosophical enemies that are more imagined than real. She could have, for instance, quoted researchers who made claims along the lines of, “trait X is biologically-determined and impervious to environmental inputs during development”; instead, it looks like everyone she cites for this fallacy is making a similar criticism of others, rather than anyone making the claims being criticized (though I did not check those references myself, so I’m not 100% there). Curiously, Maney (2016) doesn’t seem to be at all concerned about the people who, more-or-less, disregard the role of genetics or biology in understanding human behavior; at the very least she doesn’t devote any portion of her paper to addressing that particular fallacy. That rather glaring omission – coupled with what she does present – could leave one with the impression that she isn’t really trying to present a balanced view of the issue.

With those ostensibly fallacies out of the way, there are a few other claims worth mentioning in the paper. The first is that Maney (2016) seems to have a hard time reconciling the idea of sexual dimorphisms – traits that occur in one form typical of males and one typical of females – with the idea that the sexes overlap to varying degrees on many of them, such as height. While it’s true enough that you can’t tell someone’s sex for certain if you only know their height, that doesn’t mean you can’t make some good guesses that are liable to be right a lot more often than they’re wrong. Indeed, the only dimorphisms she mentions are the presence of sex chromosomes, external genitalia, and gonads and then continues to write as if these were of little to no consequence. Much like height, however, there couldn’t be selection for any physical sex differences if the sexes did not behave differently. Since behavior is controlled by the brain, physical differences between the sexes, like height and genitalia, are usually also indicative of some structural differences in the brain. This is the case whether the dimorphism is one of degree (like height) or kind (like chromosomes).

Returning to the main point, outside of these all-or-none traits, it is unclear what Maney (2016) would consider a genuine difference, much less any clear justification for that standard. For example, she notes some research that found a 90% overlap in interhemispheric connectivity between the male and female distributions, but then seems to imply that the corresponding 10% non-overlap does not reflect a ‘real’ sex difference. We would surely notice a 10% difference in other traits, like height, IQ, or number of fingers but, I suppose in the realm of the brain, 10% just doesn’t cut it.

Maney (2016) also seems to take an odd stance when it comes to explanations for these differences. In one instance, she writes about a study on multitasking that found a sex difference favoring men; a difference which, we are told, was explained by a ‘much larger difference in video game experience,’ rather than sex per se. Great, but what are we to make of that ‘much larger’ sex difference in video game experience? It would seem that that finding too requires an explanation, and one is not present. Perhaps video game experience is explained more by, I don’t know, competitiveness than sex, but then what are we to explain competitiveness with? These kinds of explanations usually end up going nowhere in a hurry unless they eventually land on some kind of adaptive endpoint, as once a trait’s reproductive value is explained, you don’t need to go any further. Unfortunately, Maney (2016) seems to oppose evolutionary explanations for sex differences, scolding those who propose ‘questionable’ functional or evolutionary explanations for sex differences for being genetic determinists who see no role for sociocultural influences. In her rush to condemn those genetic determinists (who, again, I have never met or read, apparently), Maney’s (2016) piece appears to fall victim to the warning laid out by Tinbergen (1963) several decades ago: rather than seeking to improve the shape and direction of evolutionary, functional analyses, Maney (2016) instead recommends that people simply avoid them altogether.

“Don’t ask people to think about these things; you’ll only hurt their unisex brains”

This is a real shame, as evolutionary theory is the only tool available for providing a deeper understanding of these sex differences (as well as our physical and psychological form more generally). Just as species will differ in morphology and behavior to the extent they have faced different adaptive problems, so too will the sexes within a species. By understanding the different challenges faced by the sexes historically, one can get a much clearer sense as to where psychological and physical difference will – and will not – be expected to exist, as well as why (this extra level of ‘why’ is important, as it allows you to better figure out where an analysis has gone wrong if the predictions don’t work). Maney (2016), it would seem, even missed a golden opportunity within her paper to explain to her readers that evolutionary explanations complement, rather than supplant, more proximate explanations when quoting an abstract that seemed to contrast the two. I suspect this opportunity was missed because she is either legitimately unaware of that point, or does not understand it (judging from the tone of her paper), believing (incorrectly) instead that evolutionary means genetic, and therefore immutable. If that is the case, it would be rather ironic for someone who does not seem to have much understanding of the evolutionary literature lecturing others on how it ought to be reported.

References: Maney, D. (2016). Perils and pitfalls of reporting sex differences. Philosophical Transactions B, 371, 1-11.

Tinbergen, N. (1964). On aims and methods of ethology. Zeitschrift für Tierpsychologie, 20, 410-433.