Count The Hits; Not The Misses

At various points in our lives, we have all read or been told anecdotes about how someone turned a bit of their life around. Some of these (or at least variations of them) likely sound familiar: “I cut out bread from my diet and all the sudden felt so much better”; “Amy made a fortune working from home selling diet pills online”; “After the doctors couldn’t figure out what was wrong with me, I started drinking this tea and my infection suddenly cleared up”. The whole point of such stories is to try and draw a casual link, in these cases: (1) eating bread makes you feel sick, (2) selling diet pills is a good way to make money, and (3) tea is useful for combating infections. Some or all of these statements may well be true, but the real problem with these stories is the paucity of data upon which they are based. If you wanted to be more certain about those statements, you want more information. Sure; you might have felt better after drinking that tea, but what about the other 10 people who drank similar tea and saw no results? How about all the other people selling diet pills who were in the financial hole from day one and never crawled out of it because it’s actually a scam? If you want to get closer to understanding the truth value of those statements, you need to consider the data as a whole; both stories of success and stories of failure. However, stories of someone not getting rich from selling diet pills aren’t quite as moving, and so don’t see the light of day; at least not initially. This facet of anecdotes was made light of by The Onion several years ago (and Clickhole had their own take more recently).

“At first he failed, but with some positive thinking he continued to fail over and over again”

These anecdotes often try and throw the spotlight on successful cases (hits) while ignoring the unsuccessful ones (misses), resulting in a biased picture of how things will work out. They don’t get us much closer to the truth. Most people who create and consume psychology research would like to think that psychologists go beyond these kinds of anecdotes and generate useful insights into how the mind works, but there have been a lot of concerns raised lately about precisely how much further they go on average, largely owing the the results of the reproducibility project. There have been numerous issues raised about the way psychology research is conducted: either in the form of advocacy for particular political and social positions (which distorts experimental designs and statistical interpretations) or the selective ways in which data is manipulated or reported to draw attention to successful data without acknowledging failed predictions. The result has been quite a number of false positives and overstated real ones cropping up in the literature.

While these concerns are warranted, it is difficult to quantify the extent of the problems. After all, very few researchers are going to come out and say they manipulated their experiments or data to find the results they wanted because (a) it would only hurt their careers and (b) in some cases, they aren’t even aware that they’re doing it, or that what they’re doing is wrong. Further, because most psychological research isn’t preregistered and null findings aren’t usually published, figuring out what researchers hoped to find (but did not) becomes a difficult undertaking just by reading the literature. Thankfully, a new paper from Franco et al (2016) brings some data to bear on the matter of how much underreporting is going on. While this data will not be the final word on the subject by any means (largely owing to their small sample size), they do provide some of the first steps in the right direction.

Franco et al (2016) report on a group of psychology experiments whose questionnaires and data were made publicly available. Specifically, these come from the Time-sharing Experiments for the Social Sciences (TESS), an NSF program in which online experiments are embedded in nationally-representative population surveys. Those researchers making use of TESS face strict limits on the number of questions they can ask, we are told, meaning that we ought to expect they would restrict their questions to the most theoretically-meaningful ones. In other words, we can be fairly confident that the researchers had some specific predictions they hoped to test for each experimental condition and outcome measure, and that these predictions were made in advance of actually getting the data. Franco et al (2016) were then able to track the TESS studies through to the eventual published versions of the papers to see what experimental manipulations and results were and were not reported. This provided the authors with a set of 32 semi-preregistered psychology experiments to examine for reporting biases.

A small sample I will recklessly generalize to all of psychology research

The first step was to compare the number of experimental conditions and outcome variables that were present in the TESS studies to the number that ultimately turned up in published manuscripts (i.e. are the authors reporting what they did and what they measured?). Overall, 41% of the TESS studies failed to report at least one of their experimental conditions; while there were an average of 2.5 experimental conditions in the studies, the published papers only mentioned an average of 1.8. In addition, 72% of the papers failed to report all their outcomes variables; while there were an average of 15.4 outcome variables in the questionnaires, the published reports only mentioned 10.4  Taken together, only about 1-in-4 of the experiments reported all of what they did and what they measured. Unsurprisingly, this pattern extended to the size of the reported effects as well. In terms of statistical significance, the median reported p-value was significant (.02), while the median unreported p-value was not (.32); two-thirds of the reported tests were significant, while only one-forth of the unreported tests were. Finally, published effect sizes were approximately twice as large as unreported ones.

Taken together, the pattern that emerged is that psychology research tends to underreport failed experimental manipulations, measures that didn’t pan out, and smaller effects. This should come as no surprise to almost anyone who has spent much time around psychology researchers or the researchers themselves who have tried to publish null findings (or, in fact, have tried to publish almost anything). Data is often messy and uncooperative, and people are less interested in reading about the things that didn’t work out (unless they’re placed in the proper contexts, where failures to find effects can actually be considered meaningful, such as when you’re trying to provide evidence against a theory). Nevertheless, the result of such selective reporting on what appears to be a fairly large scale is that the overall trustworthiness of reported psychology research dips ever lower, one false-positive at a time.

So what can be done about this issue? One suggestion that is often tossed around is the prospect that researchers should register their work in advance, making it clear what analyses they will be conducting and what predictions they have made. This was (sort of) the case in the present data, and Franco et al (2016) endorse this option. It allows people to assess research as more of a whole than just relying on the published accounts of it. While that’s a fine suggestion, it only goes so far to improving the state of the literature. Specifically, it doesn’t really help the problem of journals not publishing null findings in the first place, nor does it necessarily disallow researchers from doing post-hoc analyses of their data either and turning up additional false positives. What is perhaps a more ambitious way of alleviating these problems that comes to mind would be to collectively change the way journals accept papers for publication. In this alternate system, researchers would submit an outline of their article to a journal before the research is conducted, making clear (a) what their manipulations will be, (b) what their outcome measures will be, and (c) what statistical analyses they will undertake. Then, and this is important, before either the researcher or the journals know what the results will be, the decision will be made to publish the paper or not. This would allow null results to make their way into mainstream journals while also allowing the researchers to build up their own resumes if things don’t work out well. In essence, it removes some of the incentives for researchers to cheat statistically. The assessment of the journals will then be based not on whether interesting results emerged, but rather on whether a sufficiently important research question had been asked.

Which is good, considering how often real, strong results seem to show up

There are some downsides to that suggestion, however. For one, the plan would take some time to enact even if everyone was on board. Journals would need to accept a paper for publication weeks or months in advance of the paper itself actually being completed. This would pose some additional complications for journals inasmuch as researchers will occasionally fail to complete the research at all, in timely manner, or submit sub-par papers not worthy of print quite yet, leaving possible publication gaps. Further, it will sometimes mean that an issue of a journal goes out without containing any major advancements to the field of psychological research (no one happened to find anything this time), which might negatively affect the impact factor of the journals in question. Indeed, that last part is probably the biggest impediment to making major overhauls to the publication system that’s currently in place: most psychology research probably won’t work out all that well, and that will probably mean fewer people ultimately interested in reading about and citing it. While it is possible, I suppose, that null findings would actually be cited at similar rates to positive ones, that remains to be seen, and in the absence of that information I don’t foresee journals being terribly interested in changing their policies and taking that risk.

References: Franco, A., Malhotra, N., & Simonovits, G. (2016). Underreporting in psychology experiments: Evidence from a study registry. Social Psychological & Personality Science, 7, 8-12.

Who Deserves Healthcare And Unemployment Benefits?

As I find myself currently recovering from a cold, it’s a happy coincidence that I had planned to write about people’s intuitions about healthcare this week. In particular, a new paper by Jensen & Petersen (2016) attempted to demonstrate a fairly automatic cognitive link between the mental representation of someone as “sick” and of that same target as “deserving of help.” Sickness is fairly unique in this respect, it is argued, because of our evolutionary history with it: as compared with what many refer to as diseases of modern lifestyle (including those resulting from obesity and smoking), infections tended to strike people randomly; not randomly in the sense that anyone is equally as likely to get sick, but more in the sense that people often had little control over when they did. Infections were rarely the result of people intentionally seeking them out or behaving in certain ways. In essence, then, people view those who are sick as unlucky, and unlucky individuals are correspondingly viewed as being more deserving of help than those who are responsible for their own situation.

…and more deserving of delicious, delicious pills

This cognitive link between luck and deservingness can be partially explained by examining expected returns on investment in the social world (Tooby & Cosmides, 1996). In brief, helping others takes time and energy, and it would only be adaptive for an organism to sacrifice resources to help another if doing so was beneficial to the helper in the long term. This is often achieved by me helping you at a time when you need it (when my investment is more valuable to you than it is to me), and then you helping me in the future when I need it (when your investment is more valuable to me than it is to you). This is reciprocal altruism, known by the phrase, “I scratch your back and you scratch mine.” Crucially, the probability of receiving reciprocation from the target you help should depend on why that target needed help in the first place: if the person you’re helping is needy because of their own behavior (i.e., they’re lazy), their need today is indicative of their need tomorrow. They won’t be able to help you later for the same reasons they need help now. By contrast, if someone is needy because they’re unlucky, their current need is not as diagnostic of their future need, and so it is more likely they will repay you later. Because the latter type is more likely to repay than the former, our intuitions about who deserves help shift accordingly.

As previously mentioned, infections tend to be distributed more randomly; my being sick today (generally) doesn’t tell you much about the probability of my future ability to help you once I recover. Because of that, the need generated by infections tends to make sick individuals look like valuable targets of investment: their need state suggests they value your help and will be grateful for it, both of which likely translate into their helping you in the future. Moreover, the needs generated by illnesses can frequently be harmful, even to the point of death if assistance isn’t provided. The greater the need state to be filled, the greater the potential for alliances to be formed, both with and against you. To place that point in a quick, yet extreme, example, pulling someone from a burning building is more likely to ingratiate them to you than just helping them move; conversely, failing to save someone’s life when it’s well within your capabilities can set their existing allies against you.

The sum total of this reasoning is that people should intuitively perceive the sick as more deserving of help than those suffering from other problems that cause need. The particular other problem that Jensen & Petersen (2016) contrast sickness with is unemployment, which they suggest is a fairly modern problem. The conclusion drawn by the authors from these points is that the human mind – given its extensive history with infections and their random nature – should automatically tag sick individuals as deserving of assistance (i.e., broad support for government healthcare programs), while our intuitions about whether the unemployed deserve assistance should be much more varied, contingent on the extent to which unemployment is viewed as being more luck- or character-based. This fits well with the initial data that Jensen & Petersen (2016) present about the relative, cross-national support for government spending on healthcare and unemployment: not only is healthcare much more broadly supported than unemployment benefits (in the US, 90% vs 52% of the population support government assistance), but support for healthcare is also quite a bit less variable across countries.

Probably because the unemployed don’t have enough bake sales or ribbons

Some additional predictions drawn by the authors were examined across a number of studies in the paper, only two of which I would like to focus on for length constraints. The first of these studies presented 228 Danish participants with one of four scenarios: two in which the target was sick and two in which the target was unemployed. In each of these conditions, the target was also said to be lazy (hasn’t done much in life and only enjoys playing video games) or hardworking (is active and does volunteer work; of note, the authors label the lazy/hardworking conditions as high/low control, respectively, but I’m not sure that really captures the nature of the frame well). Participants were asked how much an individual like that deserved aid from the government when sick/unemployed on a 7-point scale (which was converted to a 0-1 scale for ease of interpretation).

Overall, support for government aid was lower in both conditions when the target was framed as being lazy, but this effect was much larger in the case of unemployment. When it came to the sick individual, support for healthcare for the hardworking target was about a 0.9, while support for the lazy one dipped to about 0.75; by contrast, the hardworking unemployed individual was supported with benefits at about 0.8, while the lazy one only received support around the 0.5 point. As the authors put it, the effect of the deservingness information was about 200% less influential when it came to sickness.

There is an obvious shortcoming in that study, however: being lazy has quite a bit less to do with getting sick than it does to getting a job. This issue was addressed better in the third study where the stimuli were more tailored to the problems. In the case of unemployed individuals, they were described as being unskilled workers who were told to get further training by their union, with the union even offering to help. The individual either takes or does not take the additional training, but either way eventually ends up unemployed. In the case of healthcare, the individual is described as being a long-term smoker who was repeatedly told by his doctor to quit. The person either eventually quits smoking or does not, but either way ends up getting lung cancer. The general pattern of results from study two replicated again: for the smoker, support for government aid hovered around 0.8 when he quit and 0.7 when he did not; for the unemployed person, support was about 0.75 when he took the training and around 0.55 when he did not.

“He deserves all that healthcare for looking so cool while smoking”

While there does seem to be evidence for sicknesses being cognitively tagged as more deserving of assistance than unemployment (there were also some association studies I won’t cover in detail), there is a recurrent point in the paper that I am hesitant about endorsing fully. The first mention of this point is found early on in the manuscript, and reads:

“Citizens appear to reason as if exposure to health problems is randomly distributed across social strata, not noting or caring that this is not, in fact, the case…we argue that the deservingness heuristic is built to automatically tag sickness-based needs as random events…”

A similar theme is mentioned later in the paper as well:

“Even using extremely well-tailored stimuli, we find that subjects are reluctant to accept explicit information that suggests that sick people are undeserving.”

In general I find the data they present to be fairly supportive of this idea, but I feel it could do with some additional precision. First and foremost, participants did utilize this information when determining deservingness. The dips might not have been as large as they were for unemployment (more on that later), but they were present. Second, participants were asked about helping one individual in particular. If, however, sickness is truly being automatically tagged as randomly distributed, then deservingness factors should not be expected to come into play when decisions involve making trade-offs between the welfare of two individuals. In a simple case, a hospital could be faced with a dilemma in which two patients need a lung transplant, but only a single lung is available. These two patients are otherwise identical except one has lung cancer due to a long history of smoking, while the other has lung cancer due to a rare infection. If you were to ask people which patient should get the organ, a psychological system that was treating all illness as approximately random should be indifferent between giving it to the smoker or the non-smoker. A similar analysis could be undertaken when it comes to trading-off spending on healthcare and non-healthcare items as well (such as making budget cuts to education or infrastructure in favor of healthcare). 

Finally, there are two additional factors which I would like to see explored by future research in this area. First, the costs of sickness and unemployment tend to be rather asymmetric in a number of ways: not only might sickness be more often life-threatening than unemployment (thus generating more need, which can swamp the effects of deservingness to some degree), but unemployment benefits might well need to be paid out over longer periods of time than medical ones (assuming sickness tends to be more transitory than unemployment). In fact, unemployment benefits might actively encourage people to remain unemployed, whereas medical benefits do not encourage people to remain sick. If these factors could somehow be held constant or removed, a different picture might begin to emerge. I could imagine deservingness information mattering more when a drug is required to alleviate discomfort, rather than save a life. Second - though I don’t know to what extent this is likely to be relevant – the stimulus materials in this research all ask about whether the government ought to be providing aid to sick/unemployed people. It is possible that somewhat different responses might have been obtained if some measures were taken about the participant’s own willingness to provide that aid. After all, it is much less of a burden on me to insist that someone else ought to be taking care of a problem relative to taking care of it myself.

References: Jensen, C. & Petersen, M. (2016). The deservingness heuristic and the politics of health care. American Journal of Political Science, DOI: 10.1111/ajps.12251

 Tooby, J. & Cosmides, L. (1996). Friendship and the banker’s paradox:Other pathways to the evolution of adaptations for altruism. Proceedings of the British Academy, 88, 119-143

Absolute Vs Relative Mate Preferences

As the comedian Louis CK quipped some time ago, “Everything is amazing right now and nobody is happy.” In that instance he was referring to the massive technological improvements that have arisen in the fairly-recent past which served to make our lives easier and more comfortable. Reflecting on the level of benefit that this technology has added to our lives (e.g., advanced medical treatments, the ability to communicate with people globally in an instant, or to travel globally in the matter of a few hours, etc), it might feel kind of silly that we aren’t content with the world; this kind of lifestyle sure beats living in the wilderness in a constant contest to find food, ward off predators and parasites, and endure the elements. So why aren’t we happy all the time? There are many ways to answer this question, but I wanted to focus on one in particular: specifically, given our nature as a social species, much of our happiness is determined by relative factors. If everyone is fairly well off in the absolute sense, you being well off doesn’t help you when it comes to being selected as a friend, cooperative partner, or mate because it doesn’t signal anything special about your value to others. What you are looking for in that context is not to be doing well on an absolute level, but to be doing better than others.

 If everyone has an iPhone, no one has an iPhone

To place this in a simple example, if you want to get picked for the basketball team, you’re looking to be taller than other people; increasing everyone’s height by 3 inches doesn’t uniquely benefit you, as your relative position and desirability has remained the same. On a related note, if you are doing well on some absolute metric but could be doing better, remaining content with one’s lot in life and forgoing those additional benefits is not the type of psychology one would predict to have proven adaptive. All else being equal, the male satisfied with a single mate that foregoes an additional one will be out-reproduced by the male who takes the second as well. Examples like these help to highlight the positional aspects of human satisfaction: even though some degree of our day-to-day lives are no doubt generally happier because people aren’t dying from smallpox and we have cell phones, people are often less happy than we might expect because so much of that happiness is not determined by one’s absolute state. Instead, our happiness is determined by our relative state: how good we could be doing relative to our current status, and how much we offer socially, relative to others.

A similar logic was applied in a recent paper by Conroy-Beam, Goetz, & Buss (2016) that examined people’s relationship satisfaction. The researchers were interested in testing the hypothesis that it’s not about how well one’s partner matches their ideal preferences on some absolute threshold when it comes to relationship satisfaction; instead, partner satisfaction is more likely to be a product of (a) whether more attractive alternative partners are available and (b) whether one is desirable enough to attract one of them. One might say that people are less concerned with how much they like their spouse and more concerned with whether they could get a better possible spouse: if one can move up in the dating world, then their satisfaction with their current partner should be relatively low; if one can’t move up, they ought to be satisfied with what they already have. After all, it makes little sense to abandon your mate for not meeting your preferences if your other options are worse.

These hypotheses were tested in a rather elegant and unique way across three studies, all of which utilized a broadly-similar methodology (though I’ll only be discussing two). The core of each involved participants who were currently in relationships completing four measures: one concerning how important 27 traits would be in an ideal mate (on a 7-point scale), another concerning how well those same traits described their current partner, a third regarding how those traits described themselves, and finally rating their relationship satisfaction.

To determine how well a participant’s current partner fulfilled their preferences, the squared difference between the participant’s ideal and actual partner was summed for all 27 traits and then the square root of that value was taken. This process generated a single number that provided a sense for how far off from some ideal an actual partner was across a large number of traits: the larger this number, the worse of a fit the actual partner was. A similar transformation was then carried out with respect to how all the other participants rated their partners on those traits. In other words, the authors calculated what percentage of other people’s actual mates fit the preferences of each participant better than their current partner. Finally, the authors calculated the discrepancy in mate value between the participant and their partner. This was done in a three-step process, the gist of which is that they calculated how well the participant and their partner met the average ideals of the opposite sex. If you are closer to the average ideal partner of the opposite sex than your partner, you have the higher mate value (i.e., are more desirable to others); if you are further away, you have the lower mate value.

 It’s just that simple!

In the interests of weeding out the mathematical complexity, there were three values calculated. Assuming you were taking the survey, they would correspond to (1) how well your actual partner matched your ideal (2) what percent of possible real mates out in the world are better overall fits, and (3) how much more or less desirable you are to others, relative to your partner. These values were then plugged into a regression predicting relationship satisfaction. As it turned out, in the first study (N = 260), the first value – how well one’s partner matched their ideal – barely predicted relationship satisfaction at all (ß = .06); by contrast, the number of other potential people who might make better fits was a much stronger predictor (ß = -.53), as was the difference in relative mate value between the participant and their partner (ß = .11). There was also an interaction between these latter two values (ß = .21). As the authors summarized these results:

Participants lower in mate value than their partners were generally satisfied regardless of the pool of potential mates; participants higher in mate value than their partners became increasingly dissatisfied with their relationships as better alternative partners became available”

So, if your partner is already more attractive than you, then you probably consider yourself pretty lucky. Even if there are a great number of better possible partners out there for you, you’re not likely to be able to attract them (you got lucky once dating up; better to not try your luck a second time). By contrast, if you are more attractive than your partner, then it might make sense to start looking around for better options. If few alternatives exist, you might want to stick around; if many do, then switching might be beneficial.

The second study addressed the point that partners in these relationships are not passive bystanders when it comes to being dumped; they’re wary about the possibility of their partner seeking greener pastures. For instance, if you understand that your partner is more attractive than you, you likely also understand (at least intuitively) that they might try to find someone who suits them better than you do (because they have that option). If you view being dumped as a bad thing (perhaps because you can’t do better than your current partner) you might try to do more to keep them around. Translating that into a survey, Conroy et al (2016) asked participants to indicate how often they engaged in 38 mate retention tactics over the course of the past year. These include a broad range of behaviors, including calling to check up on one’s partner, asking to deepen commitment to them, derogating potential alternative mates, buying gifts, or performing sexual favors, among others. Participants also filled out the mate preference measures as before.

The results from the first study regarding satisfaction were replicated. Additionally, as expected, there was a positive relationship between these retention behaviors and relationship satisfaction (ß = .20): the more satisfied one was with their partner, the more they behaved in ways that might help keep them around. There was also a negative relationship between trust and these mate retention behaviors (ß = -.38): the less one trusted their partner, the more they behaved in ways that might discourage them from leaving. While that might sound strange at first – why encourage someone you don’t trust to stick around? – it is fairly easy to understand to the extent that the perceptions of partner trust are intuitively tracking the probability that your partner can do better than you: it’s easier to trust someone who doesn’t have alternatives than it is to trust one who might be tempted.

It’s much easier avoid sinning when you don’t live around an orchard

Overall, I found this research an ingenious way to examine relationship satisfaction and partner fit across a wide range of different traits. There are, of course, some shortcomings to the paper which the authors do mention, including the fact that all the traits were given equal weighting (meaning that the fit for “intelligent” would be rated as being as important as the fit for “dominant” when determining how well your partner suited you) and the pool of potential mates was not considered in the context of a local sample (that is, it matters less if people across the country fit your ideal better than your current mate, relative to if people in your immediate vicinity do). However, given the fairly universal features of human mating psychology and the strength of the obtained results, these do not strike me as fatal to the design in any way; if anything, they raise the prospect that the predictive strength of this approach could actually be improved by tailoring it to specific populations.

References: Conroy-Beam, D., Goetz, C., & Buss, D. (2016). What predicts romantic relationship satisfaction and mate retention intensity: mate preference fulfillment or mate value discrepancies? Evolution & Human Behavior, DOI:

Psychology Research And Advocacy

I get the sense that many people get a degree in psychology because they’re looking to help others (since most clearly aren’t doing it for the pay). For those who get a degree in the clinical side of the field, this observation seems easy to make; at the very least, I don’t know of any counselors or therapists who seek to make their clients feel worse about the state their life is in and keep them there. For those who become involved in the research end of psychology, I believe this desire to help others is still a major motivator. Rather than trying to help specific clients, however, many psychological researchers are driven by a motivation to help particular groups in society: women, certain racial groups, the sexually promiscuous, the outliers, the politically liberal, or any group that the researcher believes to be unfairly marginalized, undervalued, or maligned. Their work is driven by a desire to show that the particular group in question has been misjudged by others, with those doing the misjudging being biased and, importantly, wrong. In other words, their role as a researcher is often driven by their role as an advocate, and the quality of their work and thinking can often take a back seat to their social goals.

When megaphones fail, try using research to make yourself louder

Two such examples are highlighted in a recent paper by Eagly (2016), both of which can broadly be considered to focus on the topic of diversity in the workplace. I want to summarize them quickly before turning to some of the other facets of the paper I find noteworthy. The first case concerns the prospect that having more women on corporate boards tends to increase their profitability, a point driven by a finding that Fortune 500 companies in the top quarter of female representation on boards of directors performed better than those in the bottom quarter of representation. Eagly (2016) rightly notes that such a basic data set would be all but unpublishable in academia for failing to do a lot of important things. Indeed, when more sophisticated research was considered in a meta-analysis of 140 studies, the gender diversity of the board of directors had about as close to no effect as possible on financial outcomes: the average correlations across all the studies ranged from about r = .01 all the way up to r = .05 depending on what measures were considered. Gender diversity per se seemed to have no meaningful effect despite a variety of advocacy sources claiming that increasing female representation would provide financial benefits. Rather than considering the full scope of the research, the advocates tended to cite only the most simplistic analyses that provided the conclusion they wanted (others) to hear.

The second area of research concerned how demographic diversity in work groups can affect performance. The general assumption that is often made about diversity is that it is a positive force for improving outcomes, given that a more cognitively-varied group of people can bring a greater number of skills and perspectives to bear on solving tasks than more homogeneous groups can. As it turns out, however, another meta-analysis of 146 studies concluded that demographic diversity (both in terms of gender and racial makeup) had effectively no impact on performance outcomes: the correlation for gender was r = -.01 and was r = -.05 for racial diversity. By contrast, differences in skill sets and knowledge had a positive, but still very small effect (r = .05). In summary, findings like these would suggest that groups don’t get better at solving problems just because they’re made up of enough [men/women/Blacks/Whites/Asians/etc]. Diversity in demographics per se, unsurprisingly, doesn’t help to magically solve complex problems.

While Eagly (2016) appears to generally be condemning the role of advocacy in research when it comes to getting things right (a laudable position), there were some passages in the paper that caught my eye. The first of these concerns what advocates for causes should do when the research, taken as a whole, doesn’t exactly agree with their preferred stance. In this case, Eagly (2016) focuses on the diversity research that did not show good evidence for diverse groups leading to positive outcomes. The first route one might take is to simply misrepresent the state of the research, which is obviously a bad idea. Instead, Eagly suggests advocates take one of two alternative routes: first, she recommends that researchers might conduct research into more specific conditions under which diversity (or whatever one’s preferred topic is) might be a good thing. This is an interesting suggestion to evaluate: on the one hand, people would often be inclined to say it’s a good idea; in some particular contexts diversity might be a good thing, even if it’s not always, or even generally, useful. This wouldn’t be the first time effects in psychology are found to be context-dependent. On the other hand, this suggestion also runs some serious risks of inflating type 1 errors. Specifically, if you keep slicing up data and looking at the issue in a number of different contexts, you will eventually uncover positive results even if they’re just due to chance. Repeated subgroup or subcontext analysis doesn’t sound much different from the questionable statistical practices currently being blamed for psychology’s replication problem: just keep conducting research and only report the parts of it that happened to work, or keep massaging the data until the right conclusion falls out.    

“…the rest goes in the dumpster out back”

Eagly’s second suggestion I find a bit more worrisome: arguing that relevant factors – like increases in profits, productivity, or finding better solutions – aren’t actually all that relevant when it comes to justifying why companies should increase diversity. What I find odd about this is that it seems to suggest that the advocates begin with their conclusion (in this case, that diversity in the work force ought to be increased) and then just keep looking for ways to justify it in spite of previous failures to do so. Again, while it is possible that there are benefits to diversity which aren’t yet being considered in the literature, bad research would likely result from a process where someone starts their analysis with the conclusion and keeps going until they justify it to others, no matter how often it requires shifting the goal posts. A major problematic implication with that suggestion mirrors other aspects of the questionable psychology research practices I mentioned before: when a researcher finds the conclusion they’re looking for, they stop looking. They only collect data up until the point it is useful, which rigs the system in favor of finding positive results where there are none. That could well mean, then, that there will be negative consequences to these diversity policies which are not being considered. 

What I think is a good example of this justification problem leading to shoddy research practices/interpretation follows shortly thereafter. In talking about some of these alternative benefits that more female hires might have, Eagly (2016) notes that women tend to be more compassionate and egalitarian than men; as such, hiring more women should be expected to increase less-considered benefits, such as a reduction in the laying-off of employees during economic downturns (referred to as labor hoarding), or more favorable policies towards time off for family care. Now something like this should be expected: if you have different people making the decisions, different decisions will be made. Forgoing for the moment the question of whether those different policies are better, in some objective sense of the word, if one is interested in encouraging those outcomes (that is, they’re preferred by the advocate) then one might wish to address those issue directly, rather than by proxy. That is to say if you are looking to make the leadership of some company more compassionate, then it makes sense to test for and hire more compassionate people, not hiring more women under the assumption you will be increasing compassion. 

This is an important matter because people are not perfect statistical representations of the groups to which they belong. On average, women may be more compassionate than men; the type of woman who is interested in actively pursuing a CEO position in a Fortune 500 company might not be as compassionate as your average woman, however, and, in fact, might even be less compassionate than a particular male candidate. What Eagly (2016) has ended up reaching, then, is not a justification for hiring more women; it’s a justification for hiring compassionate or egalitarian people. What is conspicuously absent from this section is a call for more research to be conducted on contexts in which men might be more compassionate than women; once the conclusion that hiring women is a good thing has been justified (in the advocate’s mind, anyway), the concerns for more information seem to sputter out. It should go without saying, but such a course of action wouldn’t be expected to lead to the most accurate scientific understanding of our world.

The solution to that problem being more diversity, of course..

To place this point in another quick example, if you’re looking to assemble a group of tall people, it would be better to use people’s height when making that decision rather than their sex, even if men do tend to be taller than women. Some advocates might suggest that being male is a good enough proxy for height, so you should favor male candidates; others would suggest that you shouldn’t be trying to assemble a group of tall people in the first place, as short people offer benefits that tall ones don’t; other still will argue that it doesn’t matter if short people don’t offer benefits as they should be preferentially selected to combat negative attitudes towards the short regardless (at the expense of selecting tall candidates). For what it’s worth, I find the attitude of “keep doing research until you justify your predetermined conclusion” to be unproductive and indicative of why the relationship between advocates and researchers ought not be a close one. Advocacy can only serve as a cognitive constraint that decreases research quality as the goal of advocacy is decidedly not truth. Advocates should update their conclusions in light of the research; not vice versa. 

References: Eagly, A. (2016). When passionate advocates meet research on diversity, does the honest broker stand a chance? Journal of Social Issues, 72, 199-222.