Intergenerational Epigenetics And You

Today I wanted to cover a theoretical matter I’ve discussed before but apparently not on this site: the idea of epigenetic intergenerational transmission. In brief, epigenetics refers to chemical markers attached to your DNA that regulate how it’s expressed and regulated without changing the DNA itself. You could imagine your DNA as a book full of information and each cell in your body contains the same book. However, not every cell expressed the full genome; each cell only expresses part of it (which is why skin cells are different from muscle cells, for instance). The epigenetic portion, then, could be thought of as black tape placed over certain passages in the books so they are not read. As this tape is added or removed by environmental influences, different portions of the DNA will become active. From what I understand about how this works (which is admittedly very little at this juncture), usually these markers are not passed onto offspring from parents. The life experiences of your parents, in other words, will not be passed onto you via epigenetics. However, there has been some talk lately of people hypothesizing that not only are these changes occasionally (perhaps regularly?) passed on from parents to offspring; the implication seems to be present that they also might be passed on in an adaptive fashion. In short, organisms might adapt to their environment not just through genetic factors, but also through epigenetic ones.  

Who would have guessed Lamarckian evolution was still alive?

One of the examples given in the target article on the subject concerns periods of feast and famine. While rare in most first-world nations these days, these events probably used to be more recurrent features of our evolutionary history. The example there involves the following context: during some years in early 1900 Sweden food was abundant, while during other years it was scarce. Boys who were hitting puberty just at the time of a feast season tended to have grandchildren who died six years earlier than the grandchildren of boys who have experienced famine season during the same developmental window. The causes of death, we are told, often involving diabetes. Another case involves the children of smokers: men who smoked right before puberty tended to have children who were fatter, on average, than fathers who smoked habitually but didn’t start until after puberty . The speculation, in this case, is that development was in some way affected in a permanent fashion by food availability (or smoking) during a critical window of development, and those developmental changes were passed onto their sons and the sons of their sons.

As I read about these examples, there were a few things that stuck out to me as rather strange. First, it seems odd that no mention was made of daughters or granddaughters in that case, whereas in the food example there wasn’t any mention of the in-between male generation (they only mentioned grandfathers and grandsons there; not fathers). Perhaps there’s more to the data that is let on there but – in the event that no effects were found for fathers or daughters or any kind – it is also possible that a single data set might have been sliced up into a number of different pieces until the researchers found something worth talking about (e.g., didn’t find an effect in general? Try breaking the data down by gender and testing again). Now that might or might not be the case here, but as we’ve learned from the replication troubles in psychology, one way of increasing your false-positive rate is to divide your sample into a number of different subgroups. For the sake of this post, I’m going to assume that is not the case and treat the data as representing something real, rather than a statistical fluke.   

Assuming this isn’t just a false-positive, there are two issues with the examples as I see them. I’m going to focus predominately on the food example to highlight these issues: first, passing on such epigenetic changes seems maladaptive and, second, the story behind it seems implausible. Let’s take the issues in turn.

To understand why this kind of inter-generational epigenetic transmission seems maladaptive, consider two hypothetical children born one year apart (in, say, the years 1900 and 1901). At the time the first child’s father was hitting puberty, there was a temporary famine taking place and food was scarce; at the time of the second child, the famine had passed and food was abundant. According to the logic laid out, we should expect that (a) both children will have their genetic expression altered due to the epigenetic markers passed down by their parents, affecting their long-term development, and (b) the children will, in turn, pass those markers on to their own children, and their children’s children (and so on).

The big Thanksgiving dinner that gave your grandson diabetes

The problems here should become apparent quickly enough. First, let’s begin by assuming these epigenetic changes are adaptive: they are passed on because they are reproductively useful at helping a child develop appropriately. Specifically, a famine or feast at or around the time of puberty would need to be a reliable cue as to the type of environments their children could expect to encounter. If a child is going to face shortages of food, they might want to develop in a different manner than if they’re expecting food to be abundant.

Now that sounds well and good, but in our example these two children were born just a year apart and, as such, should be expected to face (broadly) the same environment, at least with respect to food availability (since feast and famines tends to be more global). Clearly, if the children were adopting different developmental plans in response to that feast of famine, both of them (plan A affected by the famine and plan B not so affected) cannot be adaptive. Specifically, if this epigenetic inheritance is trying to anticipate children’s future conditions by those present around the time of their father’s puberty, at least one of the children’s developmental plans will be anticipating the wrong set of conditions. That said, both developmental plans could be wrong, and conditions could look different than either anticipated. Trying to anticipate the future conditions one will encounter over their lifespan (and over their children’s and grandchild’s lifespan) using only information from the brief window of time around puberty seems like a plan doomed for failure, or at least suboptimal results.

A second problem arises because these changes are hypothesized to be intergenerational: capable of transmission across multiple generations. If that is the case, why on Earth would the researchers in this study pay any mind to the conditions the grandparents were facing around the time of puberty per se? Shouldn’t we be more concerned with the conditions being faced a number of generations backs, rather than the more immediate ones? To phrase this in terms of a chicken/egg problem, shouldn’t the grandparents in question have inherited epigenetic markers of their own from their grandparents, and so on down the line? If that were the case, the conditions they were facing around their puberty would either be irrelevant (because they already inherited such markers from their own parents) or would have altered the epigenetic markers as well.

If we opt for the former possibility, than studying grandparent’s puberty conditions shouldn’t be too impactful. However, if we opt for the latter possibility, we are again left in a bit of a theoretical bind: if the conditions faced by the grandparents altered their epigenetic markers, shouldn’t those same markers also have been altered by the parent’s experiences, and their grandson’s experiences as well? If they are being altered by the environment each generation, then they are poor candidates for intergenerational transmission (just as DNA that was constantly mutating would be). There is our dilemma, then: if epigenetics change across one’s lifespan, they are unlikely candidates for transmission between generations; if epigenetic changes can be passed down across generations stably, why look at the specific period pre-puberty for grandparents? Shouldn’t we be concerned with their grandparents, and so on down the lines?

“Oh no you don’t; you’re not pinning this one all on me”

Now, to be clear, a famine around the time of conception could affect development in other, more mundane ways. If a child isn’t receiving adequate nutrition at the time they are growing, then it is likely certain parts of their developing body will not grow as they otherwise would. When you don’t have enough calories to support your full development, trade-offs need to be made, just like if you don’t have enough money to buy everything you want at the store you have to pass up on some items to afford others. Those kinds of developmental outcomes can certainly have downstream effects on future generations through behavior, but they don’t seem like the kind of changes that could be passed on the way genetic material can. The same can be said about the smoking example provided as well: people who smoked during critical developmental windows could do damage to their own development, which in turn impacts the quality of the offspring they produce, but that’s not like genetic transmission at all. It would be no more surprising than finding out that parents exposed to radioactive waste tend to have children of a different quality than those not so exposed.

To the extent that these intergenerational changes are real and not just statistical oddities, it doesn’t seem likely that they could be adaptive; they would instead likely reflect developmental errors. Basically, the matter comes down to the following question: are the environmental conditions surrounding a particular developmental window good indicators of future conditions to the point you’d want to not only focus your own development around them, but also the development of your children and their children in turn? To me, the answer seems like a resounding, ‘”No, and that seems like a prime example of developmental rigidity, rather than plasticity.” Such a plan would not allow offspring to meet the demands of their unique environments particularly well. I’m not hopeful that this kind of thinking will lead to any revolutions in evolutionary theory, but I’m always willing to be proven wrong if the right data comes up. 

What Might Research Ethics Teach Us About Effect Size?

Imagine for a moment that you’re in charge of overseeing medical research approval for ethical concerns. One day, a researcher approaches you with the following proposal: they are interested in testing whether a food stuff that some portion of the population occasionally consumes for fun is actually quite toxic, like spicy chilies. They think that eating even small doses of this compound will cause mental disturbances in the short term – like paranoia and suicidal thoughts – and might even cause those negative changes permanently in the long term. As such, they intend to test their hypothesis by bringing otherwise-healthy participants into the lab, providing them with a dose of the possibly-toxic compound (either just once or several times over the course of a few days), and then see if they observe any negative effects. What would your verdict on the ethical acceptability of this research be? If I had to guess, I suspect that many people would not allow the research to be conducted because one of the major tenants of research ethics is that harm should not befall your participants, except when absolutely necessary. In fact, I suspect that were you the researcher – rather than the person overseeing the research – you probably wouldn’t even propose the project in the first place because you might have some reservations about possibly poisoning people, either harming them directly and/or those around them indirectly.

“We’re curious if they make you a danger to yourself and others. Try some”

With that in mind, I want to examine a few other research hypotheses I have heard about over the years. The first of these is the idea that exposing men to pornography will cause a number of harmful consequences, such as increasing how appealing rape fantasies were, bolstering the belief that women would enjoy being raped, and decreasing the perceived seriousness of violence against women (as reviewed by Fisher et al, 2013). Presumably, the effect on those beliefs over time is serious as it might lead to real-life behavior on the part of men to rape women or approve of such acts on the parts of others. Other, less-serious harms have also been proposed, such as the possibility that exposure to pornography might have harmful effects on the viewer’s relationship, reducing their commitment, making it more likely that they would do things like cheat or abandon their partner. Now, if a researcher earnestly believed they would find such effects, that the effects would be appreciable in size to the point of being meaningful (i.e., are large enough to be reliably detected by statistical test in relatively small samples), and that their implications could be long-term in nature, could this researcher even ethically test such issues? Would it be ethically acceptable to bring people into the lab, randomly expose them to this kind of (in a manner of speaking) psychologically-toxic material, observe the negative effects, and then just let them go? 

Let’s move onto another hypothesis that I’ve been talking a lot about lately: the effects of violent media on real life aggression. Now I’ve been specifically talking about video game violence, but people have worried about violent themes in the context of TV, movies, comic books, and even music. Specifically, there are many researchers who believe that exposure to media violence will cause people to become more aggressive through making them perceive more hostility in the world, view violence as a more acceptable means of solving problems, or by making violence seem more rewarding. Again, presumably, changing these perceptions is thought to cause the harm of eventual, meaningful increases in real-life violence. Now, if a researcher earnestly believed they would find such effects, that the effects would be appreciable in size to the point of being meaningful, and that their implications could be long-term in nature, could this researcher even ethically test such issues? Would it be ethically acceptable to bring people into the lab, randomly expose them to this kind of (in a manner of speaking) psychologically-toxic material, observe the negative effects, and then just let them go?

Though I didn’t think much of it at first, the criticisms I read about the classic Bobo doll experiment are actually kind of interesting in this regard. In particular, researchers were purposefully exposing young children to models of aggression, the hope being that the children will come to view violence as acceptable and engage in it themselves. The reason I didn’t pay it much mind is that I didn’t view the experiment as causing any kind of meaningful, real-world, or lasting effects on the children’s aggression; I don’t think mere exposure to such behavior will have meaningful impacts. But if one truly believed that it would, I can see why that might cause some degree of ethical concerns. 

Since I’ve been talking about brief exposure, one might also worry about what would happen to researchers were to expose participants to such material – pornographic or violent – for weeks, months, or even years on end. Imagine a study that asked people to smoke for 20 years to test the negative effects in humans; probably not getting that past the IRB. As a worthy aside on that point, though, it’s worth noting that as pornography has become more widely available, rates of sexual offending have gone down (Fisher et al, 2013); as violent video games have become more available, rates of youth violent crime have done down too (Ferguson & Kilburn, 2010). Admittedly, it is possible that such declines would be even steeper if such media wasn’t in the picture, but the effects of this media – if they cause violence at all – are clearly not large enough to reverse those trends.

I would have been violent, but then this art convinced me otherwise

So what are we to make of the fact that these research was proposed, approved, and conducted? There are a few possibility to kick around. The first is that the research was proposed because the researchers themselves don’t give much thought to the ethical concerns, happy enough if it means they get a publication out of it regardless of the consequences, but that wouldn’t explain why it got approved by other bodies like IRBs. It is also possible that the researchers and those who approve it believe it to be harmful, but view the benefits to such research as outstripping the costs, working under the assumption that once the harmful effects are established, further regulation of such products might follow ultimately reducing the prevalence or use of such media (not unlike the warnings and restrictions placed on the sale of cigarettes). Since any declines in availability or censorship of such media have yet to manifest – especially given how access to the internet provides means for circumventing bans on the circulation of information – whatever practical benefits might have arisen from this research are hard to see (again, assuming that things like censorship would yield benefits at all) .

There is another aspect to consider as well: during discussions of this research outside of academia – such as on social media – I have not noted a great deal of outrage expressed by consumers of these findings. Anecdotal as this is, when people discuss such research, they do not appear to raising the concern that the research itself was unethical to conduct because it will doing harm to people’s relationships or women more generally (in the case of pornography), or because it will result in making people more violent and accepting of violence (in the video game studies). Perhaps those concerns exist en mass and I just haven’t seen them yet (always possible), but I see another possibility: people don’t really believe that the participants are being harmed in this case. People generally aren’t afraid that the participants in those experiments will dissolve their relationship or come to think rape is acceptable because they were exposed to pornography, or will get into fights because they played 20 minutes of a video game. In other words, they don’t think those negative effects are particularly large, if they even really believe they exist at all. While this point would be a rather implicit one, the lack of consistent moral outrage expressed over the ethics of this kind of research does speak to the matter of how serious these effects are perceived to be: at least in the short-term, not very. 

What I find very curious about these ideas – pornography causes rape, video games cause violence, and their ilk – is that they all seem to share a certain assumption: that people are effectively acted upon by information, placing human psychology in a distinctive passive role while information takes the active one. Indeed, in many respects, this kind of research strikes me as remarkably similar to the underlying assumptions of the research on stereotype threat: the idea that you can, say, make women worse at math by telling them men tend to do better at it. All of these theories seem to posit a very exploitable human psychology capable of being manipulated by information readily, rather than a psychology which interacts with, evaluates, and transforms the information it receives.

For instance, a psychology capable of distinguishing between reality and fantasy can play a video game without thinking it is being threatened physically, just like it can watch pornography (or, indeed, any videos) without actually believing the people depicted are present in the room with them. Now clearly some part of our psychology does treat pornography as an opportunity to mate (else there would be no sexual arousal generated in response to it), but that part does not necessarily govern other behaviors (generating arousal is biologically cheap; aggressing against someone else is not). The adaptive nature of a behavior depends on context.

Early hypotheses of the visual-arousal link were less successful empirically

As such, expecting something like a depiction to violence to translate consistently into some general perception that violence is acceptable and useful in all sorts of interactions throughout life is inappropriate. Learning that you can beat up someone weaker than you doesn’t mean it’s suddenly advisable to challenge someone stronger than you; relatedly, seeing a depiction of people who are not you (or your future opponent) fighting shouldn’t make it advisable for you to change your behavior either. Whatever the effects of this media, they will ultimately be assessed and manipulated internally by psychological mechanisms and tested against reality, rather than just accepted as useful and universally applied.  

I have seen similar thinking about information manipulating people another time as well: during discussions of memes. Memes are posited to be similar to infectious agents that will reproduce themselves at the expense of their host’s fitness; information that literally hijacks people’s minds for its own reproductive benefits. I haven’t seen much in the way of productive and successful research flowing from that school of thought quite yet – which might be a sign of its effectiveness and accuracy – but maybe I’m just still in the dark there. 

References: Ferguson, C. & Kilburn, J. (2010). Much ado about nothing: The misestimation and overinterpretation of violent video game effects in eastern and western nations: Comment on Anderson et al (2010). Psychological Bulletin, 136, 174-178.

Fisher, W., Kohut, T., Di Gioacchino, L., & Fedoroff , P. (2013). Pornography, sex crime, and paraphilia. Current Psychiatry Reports, 15, 362.

Getting To Know Your Outliers: More About Video Games

As I mentioned in my last post, I’m a big fan of games. For the last couple of years, the game which has held the majority of my attention has been a digital card game. In this game, people have the ability to design decks with different strategies, and the success of your strategy will depend on the strategy of your own opponent; you can think of it as a more complicated rock-paper-scissors component. The players in this game are often interested in understanding how well certain strategies match up against others, so, for the sake of figuring that out, some have taken it upon themselves to collect data from the players to answer those questions. You don’t need to know much about the game to understand the example I’m about to discuss, but let’s just consider two decks: deck A and deck B. Those collecting the data managed to aggregate the outcome of approximately 2,200 matches between the two and found that, overall, deck A was favored to win the match 55% of the time. This should be some pretty convincing data when it comes to getting a sense for how things generally worked out, given the large sample size.

Only about 466 more games to Legend with that win rate

However, this data will only be as useful to us as our ability to correctly interpret it. A 55% success rate captures the average performance, but there is at least one well-known outlier player within the game in that match. This individual manages to consistently perform at a substantially higher level than average, achieving wins in that same match up around 70-90% of the time across large sample sizes. What are we to make of that particular data point? How should it affect our interpretation of the match? One possible interpretation is that his massively positive success rate is simply due to variance and, given enough games, the win rate of that individual should be expected to drop. It hasn’t yet, as far as I know. Another possible explanation is that this player is particularly good, relative to his opponents, and that factor of general skill explains the difference. In much the same way, an absolutely weak 15-year-old might look pretty strong if you put him in a boxing match against a young child. However, the way the game is set up you can be assured that he will be matched against people of (relatively) equal skill, and that difference shouldn’t account for such a large disparity.

A third interpretation – one which I find more appealing, given my deep experience with the game – is that skill matters, but in a different way. Specifically, deck A is more difficult to play correctly than deck B; it’s just easier to make meaningful mistakes and you usually have a greater number of options available to you. As such, if you give two players of average skill decks A and B, you might observe the 55% win rate initially cited. On the other hand, if you give an expert player both decks (one who understands that match as well as possible), you might see something closer to the 80% figure. Expertise matters for one deck a lot more than the other. Depending on how you want to interpret the data, then, you’ll end up with two conclusions that are quite different: either the match is almost even, or the match is heavily lopsided. I bring this example up because it can tell us something very important about outliers: data points that are, in some way, quite unusual. Sometimes these data points can be flukes and worth disregarding if we want to learn about how relationships in the world tend to work; other times, however, these outliers can provide us valuable and novel insights that re-contextualize the way we look at vast swaths of other data points. It all hinges on the matter of why that outlier is one. 

This point bears on some reactions I received to the last post I wrote about a fairly-new study which finds no relationship between violent content in video games and subsequent measures of aggression once you account for the difficulty of a game (or, perhaps more precisely, the ability of a game to impede people’s feelings of competence). Glossing the results into a single sentence, the general finding is that the frustration induced by a game, but not violent content per se, is a predictor of short-term changes in aggression (the gaming community tends to agree with such a conclusion, for whatever that’s worth). In conducting this research, the authors hoped to address what they perceived to be a shortcoming in the literature: many previous studies had participants play either violent or non-violent games, but they usually achieved this method by having them play entirely different games. This means that while violent content did vary between conditions, so too could have a number of other factors, and the presence of those other factors poses some confounds in interpreting the data. Since more than violence varied, any subsequent changes in aggression are not necessarily attributable to violent content per se.

Other causes include being out $60 for a new controller

The study I wrote about, which found no effect of violence, stands in contrast to a somewhat older meta-analysis of the relationship between violent games and aggression. A meta-analysis – for those not in the know – is when a larger number of studies are examined jointly to better estimate the size of some effect. As any individual study only provides us with a snapshot of information and could be unreliable, it should be expected that a greater number of studies will provide us with a more accurate view of the world, just like running 50 participants through an experiment should give us a better sense than asking a single person or two. The results of some of those meta-analyses seem to settle on a pretty small relationship between violent video games and aggression/violence (approximately r = .15 to .20 for non-serious aggression, and about r = .04 for serious aggression depending on who you ask and what you look at; Anderson et a, 2010; Ferguson & Kilburn, 2010; Bushman et al, 2010), but there have been concerns raised about publication bias and the use of non-standardized measures of aggression.

Further, were there no publication bias to worry about, that does not mean the topic itself is being researched by people without biases, which can affect how data gets analyzed, research gets conducted, measures get created and interpreted, and so on. If r = .2 is about the best one can do with those degrees of freedom (in other words, assuming the people conducting such research are looking for the largest possible effect and develop their research accordingly), then it seems unlikely that this kind of effect is worth worrying too much about. As Ferguson & Kilburn (2010) note, youth violent crime rates have been steadily decreasing as the sale of violent games have been increasing (r = -.95; as well, the quality of that violence has improved over time; not just the quantity. Look at the violence in Doom over the years to get a better sense for that improvement). Now it’s true enough that the relationship between youth violent crime and violent video game sales is by no means a great examination of the relationship in question, but I do not doubt that if the relationship ran in the opposite direction (especially if were as large), many of the same people who disregard it as unimportant would never leave it alone.

Again, however, we run into that issue where our data is only as good as our ability to interpret it. We want to know why the meta-analysis turned up a positive (albeit small) relationship whereas the single paper did not turn up such a relationship, despite multiple chances to find it. Perhaps the paper I wrote about was simply a statistical fluke; for whatever reason, the samples recruited for those studies didn’t end up showing the effect of violent content, but the effect is still real in general (perhaps it’s just too small to be reliably detected). That seems to be the conclusion some responses I received contained. In fact, I had one commenter who cited the results of three different studies suggesting there was a casual link between violent content and aggression. However, when I dug up those studies and looked at the methods section, what I found was that, as I mentioned before, all of them had participants play entirely different games between violent and non-violent conditions. This messes with your ability to interpret the data only in light of violent content, because you are varying more than just violence (even if unintentionally). On the other hand, the paper I mentioned in my last post had participants playing the same game between conditions, just with content (like difficulty or violence levels) manipulated. As far as I can tell, then, the methods of the paper I discussed last week were superior, since they were able to control more, apparently-important factors.

This returns us to the card game example I raised initially: when people play a particular deck incorrectly, they find it is slightly favored to win; when someone plays it correctly they find it is massively favored. To turn that point to this analysis, when you conduct research that lacks the proper controls, you might find an effect; when you add those controls in, the effect vanishes. If one data point is an outlier because it reflects research done better than the others, you want to pay more attention to it. Now I’m not about to go digging through over 130 studies for the sake of a single post – I do have other things on my plate – but I wanted to make this point clear: if a meta-analysis contains 130 papers which all reflect the same basic confound, then looking at them together makes me no more convinced of their conclusion than looking at any of them alone (and given that the specific studies that were cited in response to my post all did contain that confound, I’ve seen no evidence inconsistent with that proposal yet). Repeating the same mistake a lot does not make it cease to be a mistake, and it doesn’t impress me concerning the weight of the evidence. The evidence acquired through weak methodologies is light indeed.  

Research: Making the same mistakes over and over again for similar results

So, in summation, you want to really get to know your data and understand why it looks the way it does before you draw much in the way of meaningful conclusions from it. A single outlier can potentially tell you more about what you want to know than lots of worse data points (in fact, it might not even be the case that poorly-interpreted data is recognized as such until contrary evidence rears its head). This isn’t always the case, but to write off any particular data point because it doesn’t conform to the rest of the average pattern – or to assume its value is equal to that of other points – isn’t always right either. Meeting your data, methods, and your measures is quite important for getting a sense for how to interpret it all. 

For instance, it has been proposed that – sure – the relationship between violent game content and aggression is small at best (there seems to be some heated debate over whether it’s closer to r = .1 or .2) but it could still be important because lots of small effects can add up over time into a big one. In other words, maybe you ought to be really wary of that guy who has been playing a violent game for an hour each night for the last three years. He could be about to snap at the slightest hint of a threat and harm you…at least to the extent that you’re afraid he might suggest you listen to loud noises or eat slightly more of something spicy; two methods used to assess “physical” aggression in this literature due to ethical limitations (despite the fact that, “Naturally, children (and adults) wishing to be aggressive do not chase after their targets with jars of hot sauce or headphones with which to administer bursts of white noise.” That small, r = .2 correlation I referenced before concerns behavior like that in a lab setting where experimental demand characteristics are almost surely present, suggesting the effect on aggressive behavior in naturalistic settings is likely overstated.)

Then again, in terms of meaningful impact, perhaps all those small effects weren’t really mounting to much. Indeed, the longitudinal research in this area seems to find the smallest effects (Anderson et al, 2010). To put that into what I think is a good example, imagine going to the gym. Listening to music helps many people work out, and the choice of music is relevant there. The type of music I would listen to when at the gym is not always the same kind I would listen to if I wanted to relax, or dance, or set a romantic mood. In fact, the music I listen to at the gym might even make me somewhat more aggressive in a manner of speaking (e.g., for an hour, aggressive thoughts might be more accessible to me while I listen than if I had no music, but that don’t actually lead to any meaningful changes in my violent behavior while at the gym or once I leave that anyone can observe). In that case, repeated exposure to this kind of aggressive music would not really make me any more aggressive in my day-to-day life than you’d expect overtime.

Thankfully, these warnings managed to save people from dangerous music

That’s not to say that media has no impact on people whatsoever: I fully suspect that people watching a horror movie probably feel more afraid than they otherwise would; I also suspect someone who just watched an action movie might have some violent fantasies in their head. However, I also suspect such changes are rather specific and of a short duration: watching that horror movie might increase someone’s fear of being eaten by zombies or ability to be startled, but not their fear of dying from the flu or their probability of being scared next week; that action movie might make someone think about attacking an enemy military base in the jungle with two machine guns, but it probably won’t increase their interest in kicking a puppy for fun, or lead to them fighting with their boss next month. These effects might push some feelings around in the very short term, but they’re not going to have lasting and general effects. As I said at the beginning of last week, things like violence are strategic acts, and it doesn’t seem plausible that violent media (like, say, comic books) will make them any more advisable.

References: Anderson, C. et al. (2010). Violent video game effects on aggression, empathy, and prosocial behavior in eastern and western counties: A meta-analytic review. Psychological Bulletin, 136, 151-173.

Bushman, B., Rothstein, H., & Anderson, C. (2010). Much ado about something: Violent video game effects and school of red herring: Reply to Ferguson & KIlburn (2010). Psychological Bulletin, 136, 182-187.

Elson, M. & Ferguson, C. (2013). Twenty-five years of research on violence in digital games and aggression: Empirical evidence, perspectives, and a debate gone astray. European Psychologist, 19, 33-46.

Ferguson, C. & Kilburn, J. (2010). Much ado about nothing: The misestimation and overinterpretation of violent video game effects in eastern and western nations: Comment on Anderson et al (2010). Psychological Bulletin, 136, 174-178.

The Fight Against Self-Improvement

In the abstract, most everyone wants to be the best version of themselves they can. More attractive bodies, developing and improving useful skills, a good education, achieving career success; who doesn’t want those things? In practice, lots of people, apparently. While people might like the idea of improving various parts of their life, self-improvement takes time, energy, dedication, and restraint; it involves doing things that might not be pleasant in the short-term with the hope that long-term rewards will follow. Those rewards are by no means guaranteed, though, either in terms of their happening at all or the degree to which they do. While people can usually improve various parts of their life, not everyone can achieve the levels of success they might prefer no matter how much time they devote to their crafts. All of those are common reasons people will sometimes avoid improving themselves (it’s difficult and contains opportunity costs), but they do not straightforwardly explain why people sometimes fight against others improving.

“How dare they try to make a better life for themselves!”

I was recently reading an article about the appeal of Trump and came across this passage concerning this fight against the self-improvement of others:

“Nearly everyone in my family who has achieved some financial success for themselves, from Mamaw to me, has been told that they’ve become “too big for their britches.”  I don’t think this value is all bad.  It forces us to stay grounded, reminds us that money and education are no substitute for common sense and humility. But, it does create a lot of pressure not to make a better life for yourself…”

At first blush, this seems like a rather strange idea: if people in your community – your friends and family – are struggling (or have yet to build a future for themselves), why would anyone object to the prospect of their achieving success and bettering their lot in life? Part of the answer is found a little further down:

“A lot of these [poor, struggling] people know nothing but judgment and condescension from those with financial and political power, and the thought of their children acquiring that same hostility is noxious.”

I wanted to explore this idea in a bit more depth to help explain why these feelings might rear their head when faced with the social or financial success of others, be they close or distant relations.

Understanding these feelings requires drawing on a concept my theory of morality leaned heavily on: association value. Association value refers to the abstract value that others in the social world have for each other; essentially, it asks the question, “how desirable of a friend would this person make for me (and vice versa)?” This value comes in two parts: first, there is the matter of how much value someone could add to your life. As an easy example, someone with a lot of money is more capable of adding value to your life than someone with less money; someone who is physically stronger tends to be able to provide benefits a weaker individual could not; the same goes for individuals who are more physically attractive or intelligent. It is for this reason that most people wish they could improve on some or all of these dimensions if doing so were possible and easy: you end up as a more desirable social asset to others.

The second part of that association value is a bit trickier, however, reflecting the crux of the problem: how willing someone is to add value to your life. Those who are unwilling to help me have a lower value than those willing to make the investment. Reliable friends are better than flaky ones, and charitable friends are better than stingy ones. As such, even if someone has a great potential value they could add to my life, they still might be unattractive as associates if they are not going to turn that potential into reality. An unachieved potential is effectively the same thing as having no potential value at all. Conversely, those who are very willing to add to my life but cannot actually do so in meaningful ways don’t make attractive options either. Simply put, eager but incompetent individuals wouldn’t make good hires for a job, but neither would competent yet absent ones.

“I could help you pay down your crippling debt. Won’t do it, though”

With this understanding of association value, there is only one piece left to add to equation: the zero-sum nature of friendship. Friendship is a relative term; it means that someone values me more than they value others. If someone is a better friend to me, it means they are a worse friend to others; they would value my welfare over the welfare of others and, if a choice had to be made, would aid me rather than someone else. Having friends is also useful in the adaptive sense of the word: they help provide access to desirable mates, protection, provisioning, and can even help you exploit others if you’re on the aggressive side of things. Putting all these pieces together, we end up with the following idea: people generally want access to the best friends possible. What makes a good friend is a combination of their ability and willingness to invest in you over others. However, their willingness to do so depends in turn on your association value to them: how willing and able you are to add things to their lives. If you aren’t able to help them out – now or in the future – why would they want to invest resources into benefiting you when they could instead put those resources into others who could?

Now we can finally return to the matter of self-improvement. By increasing your association value through various forms of self-improvement (e.g., making yourself more physically attractive and stronger through exercise, improving your income by moving forward in your career, learning new things, etc) you make yourself a more appealing friend to others. Crucially, this includes both existing friends and higher-status individuals who might not have been willing to invest in you prior to your ability to add value to their life materializing. In other words, as your value as an associate rises, unless the value of your existing associates rises in turn, it is quite possible that you can now do better than them socially, so to speak. If you have more appealing social prospects, then, you might begin to neglect or break-off existing contacts in favor of newer, more-profitable friendships or mates. It is likely that your existing contacts understand this – implicitly or otherwise – and might seek to discourage you from improving your life, or preemptively break-off contact with you if you do, under the assumptions you will do likewise to them in the future. After all, if you’re moving on eventually they would be better off building new connections sooner, rather than later. They don’t want to invest in failing relationships anymore than you do.

In turn, those who are thinking about self-improvement might actually decide against pursuing their goals not necessarily because they wouldn’t be able to achieve them, but because they’re afraid that their existing friends might abandon them, or even that they themselves might be the ones who do the abandoning. Ironically, improving yourself can sometimes make you look like a worse social prospect.

To put that in a simple example, we could consider the world of fitness. The classic trope of weak high-schooler being bullied by the strong, jock type has been ingrained in many stories in our culture. For those doing the bullying, their targets don’t offer them much socially (their association value to others is low, while the bully’s is high) and they are unable to effectively defend themselves, making exploitation appear as an attractive option. In turn, those who are the targets of this bullying are, in some sense, wary of adopting some of the self-improvement behaviors that the jocks engage in, such as working out, because they either don’t feel they can effectively compete against the jocks in that realm (e.g., they wouldn’t be able to get as strong, so why bother getting stronger) or because they worry that improving their association value by working out will lead to them adopting a similar pattern of behavior to those they already dislike, resulting in their losing value to their current friends (usually those of similar, but relatively-low association value). The movie Mean Girls is an example of this dynamic struggle in a different domain.

So many years later, and “Fetch” still never happened…

This line of thought has, as far as I can tell, also been leveraged (again, consciously or otherwise) by one brand within the fitness community: Planet Fitness. Last I heard an advertisement for their company on the radio, their slogan appeared to be, “we’re not a gym; we’re planet fitness.” An odd statement to be sure, because they are a gym, so what are we to make of it? Presumably that they are in some important respects different from their competition. How are they different from other gyms? The “About” section on their website lays their differences out in true, ironic form:

“Make yourself comfy. Because we’re Judgement Free…you deserve a little cred just for being here. We believe no one should ever feel Gymtimidated by Lunky behavior and that everyone should feel at ease in our gyms, no matter what his or her workout goals are…We’re fiercely protective of our Planet and the rights of our members to feel like they belong. So we create an environment where you can relax, go at your own pace and just do your own thing without ever having to worry about being judged.”

This marketing is fairly transparent pandering to those who currently do not feel they can compete with those who are very fit or are worried about becoming a “lunk” themselves (they even have an alarm in the gym designed to bet set off if someone is making too much noise while lifting, or wearing the wrong outfit). However, in doing so, they devalue those who are successful or passionate in their pursuits of self-improvement. While I have never seen a gym more obsessed with judging their would-be members than Planet Fitness, so long as that judgment is pointed at the right targets, they try to appeal (presumably effectively) to certain portions of the population untapped by other gyms. Planet Fitness wants to be your friend; not the friend of those jerks who make you feel bad.

There is value in not letting success go to one’s head; no one wants a fair-weather friend who will leave the moment it’s expedient. Such an attitude undermines loyalty. The converse, however, is that using that as an excuse to avoid (or condemn) self-improvement will make you and others worse-off in the long term. A better solution to this dilemma is to improve yourself so you can improve those who matter the most to you, hoping they reciprocate in turn (or improve together for even better success).

Musings About Police Violence

I was going to write about something else today (the finding from a meta-analysis that artificial surveillance cues do not appear to appreciably increase generosity; the effects fail to reliably replicate), but I decided to switch topics up to something more topical: police violence. My goal today is not to provide answers to this on-going public debate – I certainly don’t know enough about the topic to consider myself an expert – but rather to try and add some clarity to certain features of the discussions surrounding the matter, and hopefully help people think about it in somewhat unusual ways. If you expect me to take a specific stance on the issue, be that one that agrees or disagrees with your own, I’m going to disappoint you. That alone may upset some people who take anything other than definite agreement as a sign of aggression against them, but there isn’t much to do about that. That said, the discussion about police violence itself is a large and complex one, the scope of which far exceeds the length constraints of my usual posts. Accordingly, I wanted to limit my thoughts on the matter to two main domains: important questions worth answering, and addressing the matter of why many people find the “Black Lives Matter” hashtag needlessly divisive.

Which I’m sure will receive a warm, measured response

First, let’s jump into the matter of important questions. One of the questions I’ve never seen explicitly raised in the context of these discussions – let alone answered – is the following: How many people should we expect to get killed by police each year? There is a gut response that many would no doubt have to that question: zero. Surely someone getting killed is a tragedy that we should seek to avoid at all times, regardless of the situation; at best, it’s a regrettable state of affairs that sometimes occurs because the alternative is worse. While zero might be the ideal world outcome, this question is asking more about the world that we find ourselves in now. Even if you don’t particularly like the expectation that police will kill people from time to time, we need to have some expectation of just how often it will happen to put the violence in context. These killings, of course, include a variety of scenarios: there are those in which the police justifiably kill someone (usually in defense of themselves or others), those cases where the police mistakenly kill someone (usually when an error of judgment occurs regarding the need for defense, such as when someone has a toy gun), and those cases where police maliciously kill someone (the killing is aggressive, rather than defensive, in nature). How are we to go about generating these expectations?

One popular method seems to be comparisons of police shootings cross-nationally. The picture that results from such analyses appears to suggest that US police shoot people much more frequently than police from other modern countries. For instance, The Guardian claims that Canadian police shoot and kill about 25 people a year, as compared with approximately 1,000 such shootings in the US in 2015. Assuming those numbers are correct, once we correct for population size (the US is about ten-times more populated than Canada), we can see that US police shoot and kill about four-times as many people. That sure seems like a lot, probably because it is a lot. We want to do more than note that there is a difference, however; we want to see whether that difference violates our expectations, and to do that, we need to be clear about why our expectations were generated. If, for example, police in the US face threatening situations more often than Canadian police, this is a relevant piece of information.

To begin engaging with that idea, we might consider how many police die each year in the line of duty, cross-nationally as well. In Canada, the number for 2015 looks to be three; adjusting for population size again, we would generate an expectation of 30 US police officer deaths if all else were equal. All else is apparently not equal, however, as the actual number for 2015 in the US is about 130. Not only are the US police killing four-times as often as their Canadian counterparts, then, but they’re also dying at approximately the same rate as well. That said, those numbers include factors other than homicides, and so that too should be taken into account when generating our expectations (in Canada, the number of police shot was 2 in 2015, compared to 40 in the US, which is still twice as high as one would expect from population size

. There are also other methods of killing police, such as the 50 US police killed by bombs or cars; 0 for Canada). Given the prevalence of firearm ownership in the US, it might not be too surprising that the rates of violence between police and citizens – as well as between citizens and other citizens – looks substantially different than in other countries. There are other facts which might adjust our expectations up or down. For instance, while the US has 10 times the population of Canada, the number of police per 100,000 people (376) is different than that of Canada (202). How we should adjust the numbers to make a comparison based on population differences, then, is a matter worth thinking about (should we expect ratio of police officers to citizens per se to increase the number of them that are shot, or is population the better metric?). Also worth mentioning is that the general homicide rate per 100,000 people is quite a bit higher in the US (3.9) than in Canada (1.4). While this list of considerations is very clearly not exhaustive, I hope it generates some thoughts regarding the importance of figuring out what our expectations are, as well as why. The numbers of shootings alone are going to be useless without good context. 

Factor 10: Perceived silliness of uniforms

The second question concerns bias within these shootings in the US. In addition to our expectations for the number of people being killed each year by police, we also want to generate some expectations for the demographics of those who are shot: what should we expect the demographics of those being killed by police to be? Before we can claim there is a bias in the shooting data, we need to both have a sense for what our expectation in that regard are, why they are such, and only then can we look at how those expectations are violated.

The obvious benchmark that many people would begin would be the demographics of the US as a whole. We might expect, for instance, that the victims of police violence in the US are 63% white, 12% black, about 50% male, and so on, mirroring the population of the country. Some data I’ve come across suggests that this is not the case, however, with approximately 50% of the victims being white and 26% being black. Now that we know the demographics don’t match up as we’d expect from population alone, we want to know why. One tempting answer that many people fall back on is that police are racially motivated: after all, if black people make up 12% of the population but represent 26% of police killings, this might mean police specifically target black suspects. Then again, males make up about 50% of the population but represent about 96% of police killings. While one could similarly posit that police have a wide-spread hatred of men and seek to harm them, that seems unlikely. A better explanation for more of the variation is that men are behaving differently than women: less compliant, more aggressive, or something along those lines. After all, the only reasons you’d expect police shootings to match population demographics perfectly would be either if police shot people at random (they don’t) or police shot people based on some nonrandom factors that did not differ between groups of people (which also seems unlikely).

One such factor that we might use to adjust our expectations would be crime rates in general; perhaps violent crime in particular, as that class likely generates a greater need for officers to defend themselves. In that respect, men tend to commit much more crime than women, which likely begins to explain why men are also shot by police more often. Along those lines, there are also rather stark differences between racial groups when it comes to involvement in criminal activity: while 12% of the US population is black, approximately 40% of the prison population is, suggesting differences in patterns of offending. While some might claim that prison percentage too is due to racial discrimination against blacks, the arrest records tend to agree with victim reports, suggesting a real differential involvement in criminal activity.

That said, criminal activity per se shouldn’t get one shot by police. When generating our expectations, we also might want to consider factors such as whether people resist arrest or otherwise threaten the officers in some way. In testing theories of racial biases, we would want to consider whether officers of different races are more or less likely to shoot citizens of various demographics (that is to ask whether, say, black officers are any more or less likely to shoot black civilians than white officers are. I could have sworn I’ve seen data on that before but cannot appear to locate it at this time. What I did find, however, was a case-matched study of NYPD officers, reporting that black officers were about three times as likely to discharge their weapon as white officers at the scene, spanning 106 shooting and about 300 officers; Ridgeway, 2016). Again, while this is not a comprehensive list of things to think about, factors like these should help us generate our expectations about what the demographics of police shooting victims should look like, and it is only from there that we can begin to make claims about racial biases in the data.

It’s hard to be surprised at the outcomes sometimes

Regardless of where you settled on your answer to the above expectations, I suspect that many people would nonetheless want to reduce those numbers, if possible. Fewer people getting killed by police is a good thing most of the time. So how do we want to go about seeing that outcome achieved? Some have harnessed the “Black Lives Matter” (BLM) hashtag and suggest that police (and other) violence should be addressed via a focus on, and reductions in, explicit, and presumably implicit, racism (I think; finding an outline of the goals of the movement proves a bit difficult).

One common response to this hashtag has been the notion that BLM is needlessly divisive, suggesting instead that “All Lives Matter” (ALM) be used as a more appropriate description. In turn, the reply to ALM by BLM is that the lack of focus on black people is an attempt to turn a blind eye to problems viewed a disproportionately affecting black populations. The ALM idea was recently criticized by the writer Maddox, who compared the ALM expression to a person who, went confronted with the idea of “supporting the troops,” suggests that we should support all people (the latter being a notion that receives quite a bit of support, in fact). This line of argument is not unique to Maddox, of course, and I wanted to address that thought briefly to show why I don’t think it works particularly well here.

First, I would agree that “support the troops” slogan is met with a much lower degree of resistance than “black lives matter,” at least as far as I’ve seen. So why this differential response? As I see it, the reason this comparison breaks down involves the zero-sum nature of each issue: if you spend $5 to buy a “support the troops” ribbon magnet to attach to your car, that money is usually intended to be designated towards military-related causes. Now, importantly, money that is spent relieving the problems in the military domain cannot be spent elsewhere. That $5 cannot be given to both military causes and also given to cancer research and also given to teachers and also used to repave roads, and so on. There need to be trade-offs in whom you support in that case. However, if you want to address the problem of police violence against civilians, it seems that tactics which effectively reduce violence against black populations should also be able to reduce violence against non-black populations, such as use-of-force training or body cameras.

The problems, essentially, have a very high degree of overlap and, in terms of the raw numbers, many more non-black people are killed by police than black ones. If we can alleviate both at the same time with the same methods, focusing on one group seems needless. It is only those killings of civilians that effect black populations (24% of the shootings) and are also driven predominately or wholly by racism (an unknown percent of that 24%) that could be effectively addressed by a myopic focus on the race of the person being killed per se. I suspect that many people have independently figured that out – consciously or otherwise – and so dislike the specific attention drawn to race. While a focus on race might be useful for virtue signaling, I don’t think it will be very productive in actually reducing police violence.

“Look at how high my horse is!”

To summarize, to meaningfully talk about police violence, we need to articulate our expectations about how much of it we should see, as well as its shape. It makes no sense to talk about how violence is biased against one group or another until those benchmarks have been established (this logic applies to all discussions of bias in data, regardless of topic). None of this is intended to be me telling you how much or what kind of violence to expect; I’m by no means in possession of the necessary expertise. Regardless, if one wants to reduce police violence, inclusive solutions are likely going to be superior to exclusive ones, as a large degree of overlap in causes likely exists between cases, and solving the problems of one group will help solve the problems of another. There is merit to addressing specific problems as well – as that overlap is certainly less than 100% – but in doing so, it is important to not lose sight of the commonalities and distance those who might otherwise be your allies. 

References: Ridgeway, G. (2016). Officer risk factors associated with police shootings: a matched case-control study. Statistics & Public Policy, 3, 1-6.

Psychology Research And Advocacy

I get the sense that many people get a degree in psychology because they’re looking to help others (since most clearly aren’t doing it for the pay). For those who get a degree in the clinical side of the field, this observation seems easy to make; at the very least, I don’t know of any counselors or therapists who seek to make their clients feel worse about the state their life is in and keep them there. For those who become involved in the research end of psychology, I believe this desire to help others is still a major motivator. Rather than trying to help specific clients, however, many psychological researchers are driven by a motivation to help particular groups in society: women, certain racial groups, the sexually promiscuous, the outliers, the politically liberal, or any group that the researcher believes to be unfairly marginalized, undervalued, or maligned. Their work is driven by a desire to show that the particular group in question has been misjudged by others, with those doing the misjudging being biased and, importantly, wrong. In other words, their role as a researcher is often driven by their role as an advocate, and the quality of their work and thinking can often take a back seat to their social goals.

When megaphones fail, try using research to make yourself louder

Two such examples are highlighted in a recent paper by Eagly (2016), both of which can broadly be considered to focus on the topic of diversity in the workplace. I want to summarize them quickly before turning to some of the other facets of the paper I find noteworthy. The first case concerns the prospect that having more women on corporate boards tends to increase their profitability, a point driven by a finding that Fortune 500 companies in the top quarter of female representation on boards of directors performed better than those in the bottom quarter of representation. Eagly (2016) rightly notes that such a basic data set would be all but unpublishable in academia for failing to do a lot of important things. Indeed, when more sophisticated research was considered in a meta-analysis of 140 studies, the gender diversity of the board of directors had about as close to no effect as possible on financial outcomes: the average correlations across all the studies ranged from about r = .01 all the way up to r = .05 depending on what measures were considered. Gender diversity per se seemed to have no meaningful effect despite a variety of advocacy sources claiming that increasing female representation would provide financial benefits. Rather than considering the full scope of the research, the advocates tended to cite only the most simplistic analyses that provided the conclusion they wanted (others) to hear.

The second area of research concerned how demographic diversity in work groups can affect performance. The general assumption that is often made about diversity is that it is a positive force for improving outcomes, given that a more cognitively-varied group of people can bring a greater number of skills and perspectives to bear on solving tasks than more homogeneous groups can. As it turns out, however, another meta-analysis of 146 studies concluded that demographic diversity (both in terms of gender and racial makeup) had effectively no impact on performance outcomes: the correlation for gender was r = -.01 and was r = -.05 for racial diversity. By contrast, differences in skill sets and knowledge had a positive, but still very small effect (r = .05). In summary, findings like these would suggest that groups don’t get better at solving problems just because they’re made up of enough [men/women/Blacks/Whites/Asians/etc]. Diversity in demographics per se, unsurprisingly, doesn’t help to magically solve complex problems.

While Eagly (2016) appears to generally be condemning the role of advocacy in research when it comes to getting things right (a laudable position), there were some passages in the paper that caught my eye. The first of these concerns what advocates for causes should do when the research, taken as a whole, doesn’t exactly agree with their preferred stance. In this case, Eagly (2016) focuses on the diversity research that did not show good evidence for diverse groups leading to positive outcomes. The first route one might take is to simply misrepresent the state of the research, which is obviously a bad idea. Instead, Eagly suggests advocates take one of two alternative routes: first, she recommends that researchers might conduct research into more specific conditions under which diversity (or whatever one’s preferred topic is) might be a good thing. This is an interesting suggestion to evaluate: on the one hand, people would often be inclined to say it’s a good idea; in some particular contexts diversity might be a good thing, even if it’s not always, or even generally, useful. This wouldn’t be the first time effects in psychology are found to be context-dependent. On the other hand, this suggestion also runs some serious risks of inflating type 1 errors. Specifically, if you keep slicing up data and looking at the issue in a number of different contexts, you will eventually uncover positive results even if they’re just due to chance. Repeated subgroup or subcontext analysis doesn’t sound much different from the questionable statistical practices currently being blamed for psychology’s replication problem: just keep conducting research and only report the parts of it that happened to work, or keep massaging the data until the right conclusion falls out.    

“…the rest goes in the dumpster out back”

Eagly’s second suggestion I find a bit more worrisome: arguing that relevant factors – like increases in profits, productivity, or finding better solutions – aren’t actually all that relevant when it comes to justifying why companies should increase diversity. What I find odd about this is that it seems to suggest that the advocates begin with their conclusion (in this case, that diversity in the work force ought to be increased) and then just keep looking for ways to justify it in spite of previous failures to do so. Again, while it is possible that there are benefits to diversity which aren’t yet being considered in the literature, bad research would likely result from a process where someone starts their analysis with the conclusion and keeps going until they justify it to others, no matter how often it requires shifting the goal posts. A major problematic implication with that suggestion mirrors other aspects of the questionable psychology research practices I mentioned before: when a researcher finds the conclusion they’re looking for, they stop looking. They only collect data up until the point it is useful, which rigs the system in favor of finding positive results where there are none. That could well mean, then, that there will be negative consequences to these diversity policies which are not being considered. 

What I think is a good example of this justification problem leading to shoddy research practices/interpretation follows shortly thereafter. In talking about some of these alternative benefits that more female hires might have, Eagly (2016) notes that women tend to be more compassionate and egalitarian than men; as such, hiring more women should be expected to increase less-considered benefits, such as a reduction in the laying-off of employees during economic downturns (referred to as labor hoarding), or more favorable policies towards time off for family care. Now something like this should be expected: if you have different people making the decisions, different decisions will be made. Forgoing for the moment the question of whether those different policies are better, in some objective sense of the word, if one is interested in encouraging those outcomes (that is, they’re preferred by the advocate) then one might wish to address those issue directly, rather than by proxy. That is to say if you are looking to make the leadership of some company more compassionate, then it makes sense to test for and hire more compassionate people, not hiring more women under the assumption you will be increasing compassion. 

This is an important matter because people are not perfect statistical representations of the groups to which they belong. On average, women may be more compassionate than men; the type of woman who is interested in actively pursuing a CEO position in a Fortune 500 company might not be as compassionate as your average woman, however, and, in fact, might even be less compassionate than a particular male candidate. What Eagly (2016) has ended up reaching, then, is not a justification for hiring more women; it’s a justification for hiring compassionate or egalitarian people. What is conspicuously absent from this section is a call for more research to be conducted on contexts in which men might be more compassionate than women; once the conclusion that hiring women is a good thing has been justified (in the advocate’s mind, anyway), the concerns for more information seem to sputter out. It should go without saying, but such a course of action wouldn’t be expected to lead to the most accurate scientific understanding of our world.

The solution to that problem being more diversity, of course..

To place this point in another quick example, if you’re looking to assemble a group of tall people, it would be better to use people’s height when making that decision rather than their sex, even if men do tend to be taller than women. Some advocates might suggest that being male is a good enough proxy for height, so you should favor male candidates; others would suggest that you shouldn’t be trying to assemble a group of tall people in the first place, as short people offer benefits that tall ones don’t; other still will argue that it doesn’t matter if short people don’t offer benefits as they should be preferentially selected to combat negative attitudes towards the short regardless (at the expense of selecting tall candidates). For what it’s worth, I find the attitude of “keep doing research until you justify your predetermined conclusion” to be unproductive and indicative of why the relationship between advocates and researchers ought not be a close one. Advocacy can only serve as a cognitive constraint that decreases research quality as the goal of advocacy is decidedly not truth. Advocates should update their conclusions in light of the research; not vice versa. 

References: Eagly, A. (2016). When passionate advocates meet research on diversity, does the honest broker stand a chance? Journal of Social Issues, 72, 199-222.

More About Psychology Research Replicating

By now, many of you have no doubt heard about the reproducibility project, where 100 psychological findings were subjected to replication attempts. In case you’re not familiar with it, the results of this project were less than a ringing endorsement of research in the field: of the expected 89 replications, only 37 were obtained and the average size of the effects fell dramatically; social psychology research in particular seemed uniquely bad in this regard. This suggests that, in many cases, one would be well served by taking many psychological findings with a couple grains of salt. Naturally, this leads many people to wonder whether there’s anyway they might be more confident that an effect is real, so to speak. One possible means through which your confidence might be bolstered is whether or not the research in question contains conceptual replications. What this refers to are cases where the authors of a manuscript report the results of several different studies purporting to measure the same underlying thing with varying methods; that is, they are studying topic A with methods X, Y, and Z. If all of these turn up positive, you ought to be more confident that an effect is real. Indeed, I have had a paper rejected more than once for only containing a single experiment. Journals often want to see several studies in one paper, and that is likely part of the reason why: a single experiment is surely less reliable than multiple ones.

It doesn’t go anywhere, but at least it does so reliably

According to the unknown moderator account of replication failure, psychological research findings are, in essence, often fickle. Some findings might depend on the time of day that measurements were taken, the country of the sample, some particular detail of the stimulus material, whether the experimenter is a man or a woman; you name it. In other words, it is possible that these published effects are real, but only occur in some rather specific contexts of which we are not adequately aware; that is to say they are moderated by unknown variables. If that’s the case, it is unlikely that some replication efforts will be successful, as it is quite unlikely that all of the unique, unknown, and unappreciated moderators will be replicated as well. This is where conceptual replications come in: if a paper contains two, three, or more different attempts at studying the same topic, we should expect that the effect they turn up is more likely to extend beyond a very limited set of contexts and should replicate more readily.

That’s a flattering hypothesis for explaining these replication failures; there’s just not enough replication going on prepublication, so limited findings are getting published as if they were more generalizable. The less-flattering hypothesis is that many researchers are, for lack of a better word, cheating by employing dishonest research tactics. These tactics can include hypothesizing after data is collected, only collecting participants until the data says what the researchers want and then stopping, splitting samples up into different groups until differences are discovered, and so on. There’s also the notorious issue of journals only publishing positive results rather than negative ones (creating a large incentive to cheat, as punishment for doing so is all but non-existent so long as you aren’t just making up the data). It is for these reasons that requiring the pre-registering of research – explicitly stating what you’re going to look at ahead of time – drops positive findings markedly. If research is failing to replicate because the system is being cheated, more internal replications (those from the same authors) don’t really help that much when it comes to predicting external replications (those conducted by outside parties). Internal replications just provide researchers the ability to report multiple attempts at cheating.

These two hypotheses make different predictions concerning the data from the aforementioned reproducibility project: specifically, research containing internal replications ought to be more likely to successfully replicate if the unknown moderator hypothesis is accurate. It certainly would be a strange state of affairs from a “this finding is true” perspective if multiple conceptual replications were no more likely to prove reproducible than single-study papers. It would be similar to saying that effects which have been replicated are no more likely to subsequently replicate than effects which have not. By contrast, the cheating hypothesis (or, more politely, questionable research practices hypothesis) has no problem at all with the idea that internal replications might prove to be as externally replicable as single-study papers; cheating a finding out three times doesn’t mean it’s more likely to be true than cheating it out once.

It’s not cheating; it’s just a “questionable testing strategy”

This brings me to a new paper by Kunert (2016) who reexamined some of the data from the reproducibility project. Of the 100 original papers, 44 contained internal replications: 20 contained just one replication, 10 were replicated twice, 9 were replicated 3 times, and 5 contained more than three. These were compared against the 56 papers which did not contain internal replications to see which would subsequently replicate better (as measured by achieving statistical significance). As it turned out, papers with internal replications externally replicated about 30% of the time, whereas papers without internal replications externally replicated about 40% of the time. Not only were the internally-replicated papers not substantially better, they were actually slightly worse in that regard. A similar conclusion was reached regarding the average effect size: papers with internal replications were no more likely to subsequently contain a larger effect size, relative to papers without such replications.

It is possible, of course, that papers containing internal replications are different than papers which do not contain such replications. This means it might be possible that internal replications are actually a good thing, but their positive effects are being outweighed by other, negative factors. For example, someone proposing a particularly novel hypothesis might be inclined to include more internal replications in their paper than someone studying an established one; the latter researcher doesn’t need more replications in his paper to get it published because the effect has already been replicated in other work. Towards examining this point, Kunert (2016) made use of the 7 identified reproducibility predictors from the Open Science Collaboration – field of study, effect type, original P-value, original effect size, replication power, surprisingness of original effect, and the challenge of conducting the replication – to assess whether internally-replicated work differed in any notable ways from the non-internally-replicated sample. As it turns out, the two samples were pretty similar overall on all the factors except one: field of study. Internally-replicated effects tended to come from social psychology more frequently (70%) than cognitive psychology (54%). As I mentioned before, social psychology papers did tend to replicate less often. However, the unknown moderator effect was not particularly well supported for either field when examined individually.

In summary, then, papers containing internal replications were no more likely to do well when it came to external replications which, in my mind, suggests that something is going very wrong in the process somewhere. Perhaps researchers are making use of their freedom to analyze and collect data as they see fit in order deliver the conclusions they want to see; perhaps journals are preferentially publishing the findings of people who got lucky, relative to those who got it right. These possibilities, of course, are not mutually exclusive. Now I suppose one could continue to make an argument that goes something like, “papers that contain conceptual replications are more likely to be doing something else different, relative to papers with only a single study,” which could potentially explain the lack of strength provided by internal replications, and whatever that “something” is might not be directly tapped by the variables considered in the current paper. In essence, such an argument would suggest that there are unknown moderators all the way down.

“…and that turtle stands on the shell of an even larger turtle…”

While it’s true enough that such an explanation is not ruled out by the current results, it should not be taken as any kind of default stance on why this research is failing to replicate. The “researchers are cheating” explanation strikes me as a bit more plausible at this stage, given that there aren’t many other obvious explanations for why ostensibly replicated papers are no better at replicating. As Kunert (2016) plainly puts it:

This report suggests that, without widespread changes to psychological science, it will become difficult to distinguish it from informal observations, anecdotes and guess work.

This brings us to the matter of what might be done about the issue. There are procedural ways of attempting to address the problem – such as Kunert’s (2016) recommendation for getting journals to publish papers independent of their results – but my focus has, and continues to be, on the theoretical aspects of publication. Too many papers in psychology get published without any apparent need for the researchers to explain their findings in any meaningful sense; instead, they usually just restate and label their findings, or they posit some biologically-implausible function for what they found. Without the serious and consistent application of evolutionary theory to psychological research, implausible effects will continue to be published and subsequently fail to replicate because there’s otherwise little way to tell whether a finding makes sense. By contrast, I find it plausible that unlikely effects can be more plainly spotted – by reviewers, readers, and replicators – if they are all couched within the same theoretical framework; even better, the problems in design can be more easily identified and rectified by considering the underlying functional logic, leading to productive future research.  

References: Kunert, R. (2016). Internal conceptual replications do not increase independent replication success. Psychological Bulletin Review, DOI 10.3758/s13423-016-1030-9

When Intuitions Meet Reality

Let’s talk research ethics for a moment.

Would you rather have someone actually take $20 from your payment for taking part in a research project, or would you rather be told – incorrectly – that someone had taken $20, only to later (almost immediately, in fact) find out that your money is safely intact and that the other person who supposedly took it doesn’t actually exist? I have no data on that question, but I suspect most people would prefer the second option; after all, not losing money tends to be preferable to losing money, and the lie is relatively benign. To use a pop culture example, Jimmy Kimmel has aired a segment where parents lie to their children about having eaten all their Halloween candy. The children are naturally upset for a moment and their reactions are captured so people can laugh at them, only to later have their candy returned and the lie exposed (I would hope). Would it be more ethical, then, for parents to actually eat their children’s candy so as to avoid lying to their children? Would children prefer that outcome?

“I wasn’t actually going to eat your candy, but I wanted to be ethical”

I happen to think that answer is, “no; it’s better to lie about eating the candy than to actually do it” if you are primarily looking out for the children’s welfare (there is obviously the argument to be made that it’s neither OK to eat the candy or to lie about it, but that’s a separate discussion). That sounds simple enough, but according to some arguments I have heard, it is unethical to design research that, basically, mimics the lying outcome. The costs being suffered by participants need to be real in order for research on suffering costs to be ethically acceptable. Well, sort of; more precisely, what I’ve been told is that it’s OK to lie to my subjects (deceive them) about little matters, but only in the context of using participants drawn from undergraduate research pools. By contrast, it’s wrong for me to deceive participants I’ve recruited from online crowd-sourcing sites, like Mturk. Why is that the case? Because, as the logic continues, many researchers rely on MTurk for their participants, and my deception is bad for those researchers because it means participants may not take future research seriously. If I lied to them, perhaps other researchers would too, and I have poisoned the well, so to speak. In comparison, lying to undergraduates is acceptable because, once I’m done with them, they probably won’t be taking part in many future experiments, so their trust in future research is less relevant (at least they won’t take part in many research projects once they get out of the introductory courses that require them to do so. Forcing undergraduates to take part in research for the sake of their grade is, of course, perfectly ethical).

This scenario, it seems, creates a rather interesting ethical tension. What I think is happening here is that a conflict has been created between looking out for the welfare of research participants (in common research pools; not undergraduates) and looking out for the welfare of researchers. On the one hand, it’s probably better for participants’ welfare to briefly think they lost money, rather than to let them actually lose money; at least I’m fairly confident that is the option subjects would select if given the choice. On the other hand, it’s better for researchers if those participants actually lose money, rather than briefly hold the false believe that they did, so participants continue to take their other projects seriously. An ethical dilemma indeed, balancing the interests of the participants against those of the researchers.

I am sympathetic to the concerns here; don’t get me wrong. I find it plausible to suggest that if, say, 80% of researchers outright deceived their participants about something important, people taking this kind of research over and over again would likely come to assume some parts of it were unlikely to be true. Would this affect the answers participants provide to these surveys in any consistent manner? Possibly, but I can’t say with any confidence if or how it would. There also seems to be workarounds for this poisoning-the-well problem; perhaps honest researchers could write in big, bold letters, “the following research does not contain the use of deception” and research that did use deception would be prohibited from attaching that bit by the various institutional review boards that need to approve these projects. Barring the use of deception across the board would, of course, create its own set of problems too. For instance, many participants taking part in research are likely curious as to what the goals of the project are. If researchers were required to be honest and transparent about their purposes upfront so as to allow their participants to make informed decisions regarding their desire to participate (e.g., “I am studying X…”), this can lead to all sorts of interesting results being due to demand characteristics - where participants behave in unusual manners as a result of their knowledge about the purpose of the experiment – rather than the natural responses of the subjects to the experimental materials. One could argue (and many have) that not telling participants about the real purpose of the study is fine, since it’s not a lie as much as an omission. Other consequences of barring explicitly deception exist as well, though, including the lack of control over experimental stimuli during interactions between participants and the inability to feasibly even test some hypotheses (such as whether people prefer the tastes of identical foods, contingent on whether they’re labeled in non-identical ways).

Something tells me this one might be a knock off

Now this debate is all well and good to have in the abstract sense, but it’s important to bring some evidence to the matter if you want to move the discussion forward. After all, it’s not terribly difficult for people to come up with plausible-sounding, but ultimately incorrect, lines of reasoning as for why some research practice is possibly (un)ethical. For example, some review boards have raised concerns about psychologists asking people to take surveys on “sensitive topics”, under the fear that answering questions about things like sexual histories might send students into an abyss of anxiety. As it turns out, such concerns were ultimately empirically unfounded, but that does not always prevent them from holding up otherwise interesting or valuable research. So let’s take a quick break from thinking about how deception might be harmful in the abstract to see what effects it has (or doesn’t have) empirically.

Drawn by the debate between economists (who tend to think deception is bad) and social scientists (who tend to think it’s fine), Barrera & Simpson (2012) conducted two experiments to examine how deceiving participants affected their future behavior. The first of these studies tested the direct effects of deception: did deceiving a participant make them behave differently in a subsequent experiment? In this study, participants were recruited as part of a two-phase experiment from introductory undergraduate courses (so as to minimize their previous exposure to research deception, the story goes; it just so happens they’re likely also the easiest sample to get). In the first phase of this experiment, 150 participants played a prisoner’s dilemma game which involved cooperating with or defecting on another player; a decision which would affect both player’s payments. Once the decisions had been made, half the participants were told (correctly) that they had been interacting with another real person in the other room; the other half were told they had been deceived, and that no other player was actually present. Everyone was paid and sent home.

Two to three weeks later, 140 of these participants returned for phase two. Here, they played 4 rounds of similar economic games: two rounds of dictator-games and two rounds of trust-games. In the dictator games, subjects could divide $20 between themselves and their partner; in the trust games, subjects could send some amount of $10 to the other player, this amount would be multiplied by three, and that player could then keep it all or send some of it back. The question of interest, then, is whether the previously-deceived subjects would behave any differently, contingent on their doubts as to whether they were being deceived again. The thinking here is that if you don’t believe you’re interacting with another real person, then you might as well be more selfish than you otherwise would. The results showed that while the previously-deceived participants were more likely to believe that social science researchers used deception somewhat more regularly, relative to the non-deceived participants their behavior was actually no different. Not only were the amounts of money sent to others no different (participants gave $5.75 on average in the dictator condition and trusted $3.29 when they were not previously deceived, and gave $5.52 and trusted $3.92 when they had been), but the behavior was no more erratic either. The deceived participants behaved just like the non-deceived ones.

In the second study the indirect effects of deception were examined. One-hundred-six participants first completed the same dictator and trust games as above. They were then either assigned to read about an experiment that did or did not make use of deception; a deception which included the simulation of non-existent participants. They then played another round of dictator and trust games immediately afterwards to see if their behavior would differ, contingent on knowing about how researchers might be deceive them. As in the first study, no behavioral differences emerged. Neither directly deceiving participants about the presence of others in the experiment or providing them with information that deception does take place in such research seemed to have any noticeable effects on subsequent behavior.

“Fool me once, shame on me; Fool me twice? Sure, go ahead”

Now it is possible that the lack of any effect in the present research had to do with the fact that participants were only deceived once. It is certainly possible that repeated exposures to deception, if frequent enough, will begin to have an effect and that effect will be a lasting one and it will not just be limited to the researcher employing the deception. In essence, it is possible that some spillover between experimenters over time might occur. However, this is something that needs to be demonstrated; not just assumed. Ironically, as Barrera & Simpson (2012) note, demonstrating such a spillover effect can be difficult in some instances, as designing non-deceptive control conditions to test against the deceptive ones is not always a straightforward task. In other words, as I mentioned before, some research is quite difficult – if not impossible – to conduct without being able to use deception. Accordingly, some control conditions might require that you deceive participants about deceiving them, which is awfully meta. Barrera & Simpson (2012) also mention some research findings that report even when no deception is used, participants who repeatedly take part in these kinds of economic experiments tend to get less cooperative over time. If that finding holds true, then the effects of repeated deception need to be filtered out from the effects of repeated participation in general. In any case, there does not appear to any good evidence that minor deceptions are doing harm to participants or other researchers. They might still be doing harm, but I’d like to see it demonstrated before I accept that they do. 

References: Barrera, D. & Simpson, B. (2012). Much ado about deception: Consequences of deceiving research participants in the social sciences. Sociological Methods & Research, 41, 383-413.

Health Food Nazis

“Hitler was a vegetarian. Just goes to show, vegetarianism, not always a good thing. Can, in some extreme cases, lead to genocide.” – Bill Bailey

There’s a burgeoning new field of research in psychology known as health licensing*. Health licensing is the idea that once people do something health-promoting, they subsequently give themselves psychological license to do other, unhealthy things. A classic example of this kind of research might go something like this: an experimenter will give participants a chance to do something healthy, like go on a jog or eat a nutritious lunch. After participants engage in this healthy behavior, they are then given a chance to do something unhealthy, like break their own legs. Typical results show that once people have engaged in these otherwise healthy behaviors, they are significantly more likely to engage in self-destructive ones, like leg-breaking, in order to achieve a balance between their healthy and unhealthy behaviors. This is just one more cognitive quirk to add to the ever-lengthening list of human psychological foibles.

Now that you engaged in hospital-visiting behavior, feel free to burn yourself to even it out.

Now many of you are probably thinking one or both of two things: “that sounds strange” and “that’s not true”. If you are thinking those things, I’m happy that we’re on the same page so far. The problems with the above hypothetical area of research are clear. First, it seems strange that people would go do something unhealthy and harmful because they had previously done something which was good for them; it’s not like healthy and unhealthy behaviors need to be intrinsically balanced out for any reason, at least not one that readily comes to mind. Second, it seems strange that people would want to engage in the harmful behaviors at all. Just because an option to do something unhealthy is presented, it doesn’t mean people are going to want to take it, as it might have little appeal to them. When people typically engage in behaviors which are deemed harmful in the long-term – such as smoking, overeating junk food, or other such acts which are said to be psychologically ‘licensed’ by healthy behaviors – they do so because of the perceived short-term benefits of such things. People certainly don’t drink for the hangover; they drink for the pleasant feelings induced by the booze.

So, with that in mind, what are we to make of a study that suggests doing something healthy can give people a psychological license to adopt immoral political stances? In case that sounds too abstract, the research on the table today examines whether drinking sauerkraut juice make people more likely to endorse Nazi-like politics, and no; I’m not kidding (as much as I wish I was). The paper (Messner & Brugger, 2015) itself leans heavily on moral licensing: the idea that engaging in moral behaviors activates compensating psychological mechanisms that encourage the actor to engage in immoral ones. So, if you told the truth today, you get to lie tomorrow to balance things out. Before moving further into the details of the paper, it’s worth mentioning that the authors have already bumped up against one of the problems from my initial example: I cannot think of a reason that ‘moral’ and ‘immoral’ behaviors need to be “balanced out” psychologically (whatever that even means), and none is provided. Indeed, as some people continuously refrain from immoral (or unhealthy) behaviors, whereas others continuously indulge in them, compensation or balance doesn’t seem to factor into the equation in the same way (or at all) for everyone.

Messner & Brugger (2015) try to draw on a banking analogy, whereby moral behavior gives one “credit” into their account that can be “spent” on immoral behavior. However, this analogy is largely unhelpful as you cannot spend money you do not have, but you can engage in immoral behaviors even if you have no morally-good “credit”. It’s also unhelpful in that it presumes immoral behavior is something one wants to spend their moral credit on; the type of immoral behavior seems to be besides the point, as we will soon see. Much like my leg-breaking example, this too seems to make little sense: people don’t seem to want to engage in immoral behavior because it is immoral. As the bank account analogy is not at all helpful for understanding the phenomenon in question, it seems better to drop it altogether, since it’s only likely to sow confusion in the minds of anyone trying to really figure out what’s going on here. Then again, perhaps the confusion is only present in the paper to compensate for all the useful understanding the researchers are going to provide us later.

“We broke half the lights to compensate for the fact that the other half work”

Moving forward, the authors argue that, because health-relevant behavior is moralized, engaging in some kind of health-promoting behavior – in this case, drinking sauerkraut juice (high in fiber and vitamin C, we are told) – ought to give people good moral “credit” which they will subsequently spend on immoral behavior (in much the same way buying eco-friendly products leads to people giving themselves a moral license to steal, we are also told). Accordingly, the authors first asked 128 Swiss students to indicate who was more moral: someone who drinks sauerkraut juice or someone who drinks Nestea. As predicted, 78% agreed that the sauerkraut-juice drinker was more moral, though whether a “neither, and this question is silly” option existed is not mentioned. The students also indicated how morally acceptable and right wing a number of attitudes were; statements which related to, according to the authors, a number of nasty topics like devaluing the culture of others (i.e., seeing a woman wearing a burka making someone uncomfortable), devaluing other nations (viewing foreign nationals as a burden on the state), affirming antisemitism (disliking some aspects of Israeli politics), devaluing the humanity of others (not agreeing that all public buildings ought to be modified for handicapped access), and a few others. Now all of these statements were rated as immoral by the students, but whether they represent what the authors think they do (Nazi-like politics) is up for interpretation.

In any case, another 111 participants were then collected and assigned to drink sauerkraut juice, Nestea, or nothing. Those who drank the sauerkraut juice rated it as healthier than those who drank the Nestea and, correspondingly, were also more likely to endorse the Nazi-like statements (M = 4.46 on a 10-point scale) than those who drank Nestea (M = 3.82) or nothing (M = 3.73). Neat. There are, however, a few other major issues to address. The first of these is that, depending on who you sample, you’re going to get different answers to the “are these attitudes morally acceptable?” questions. Since it’s Swiss students being assessed in both cases, I’ll let that issue slide for the more pressing, theoretical one: the author’s interpretation of the results would imply that the students who indicated that such attitudes are immoral also wished to express them. That is to say, because they just did something healthy (drank sauerkraut juice) they now want to engage in immoral behavior. They don’t seem to picky about what immoral behavior they engage in either, as they’re apparently more willing to adopt political stances they would otherwise oppose, were it not for the disgusting, yet healthy, sauerkraut juice.

This strikes me very much as the kind of metaphorical leg-breaking I mentioned earlier. When people engage in immoral (or unhealthy) behaviors, they typically do so because of some associated benefit: stealing grants you access to resources you otherwise wouldn’t obtain; eating that Twinkie gives you the pleasant taste and the quick burst of calories, even if they make you fat when you do that too much. What benefits are being obtained by the Swiss students who are now (slightly) more likely to endorse right-wing, Nazi-like politics? None are made clear in the paper and I’m having a hard time thinking up any myself. This seems to be a case of immoral behavior for the sake of it, which could only arise from a rather strange psychology. Perhaps there is something worth noting going on here that isn’t being highlighted well; perhaps the authors just stumbled on a statistical fluke (which does happen regularly). In either case, the idea of moral licensing doesn’t seem to help us understand what’s happening at all, and the banking metaphors and references to “balancing” and “compensation” seem similarly impotent to move us forward.

“Just give him the money; he eats well, so it’s OK”

The moral licensing idea is even worse than all that, though, as it doesn’t engage with the main adaptive reason people avoid self-beneficial, but immoral behaviors: other people will punish you for them. If I steal from someone else, they or their allies might well take revenge on me; that I assure them of my healthy diet will likely create little to no effective deterrence against the punishment I would soon receive. If that is the case – and I suspect it is – then this self-granted “moral license” would be about as useful as my simply believing that stealing from others isn’t wrong and won’t be punished (which is to say, “not at all”). Any type of moral license needs to be granted by potential condemners in order to be of any practical use in that regard, and the current research does not assess whether that is the case. This limited focus on conscience – rather than condemnation – complete with the suggestion that people are likely to adopt social politics they would otherwise oppose for the sake of achieving some kind of moral balance after drinking 100 ml of gross sauerkraut juice makes for a very strange paper indeed.

References: Messner, C. & Brugger, A. (2015). Nazis by Kraut: A playful application of moral self-licensing. Psychology, 6, http://dx.doi.org/10.4236/psych.2015.69112

*This statement has not been evaluated by the FDA or any such governmental body; the field doesn’t actually exist to the best of my knowledge, but I’ll tell you it does anyway.

 

Real Diversity Means Disagreement

Diversity is one of the big buzzwords of the recent decades. Institutions, both public and private, often take great pains to emphasize their inclusive stances and colorful cast of a staff. I have long found the displays of diversity to be rather queer in one major respect, however: they almost always focus on diversity in the realms of race and gender. The underlying message behind such displays would seem to suggest that men and women, or members of different ethnic groups, are, in some relevant psychological respects, different from one another. What’s strange about that idea is that, as many of the same people might also like to point out, there’s less diversity between those groups than within them, while others are entirely uncomfortable with the claim of sex or racial differences from the start. The ambivalent feelings many people have surrounding such a message were captured well by Principle Skinner on The Simpsons:

It’s the differences…of which…there are none…that make the sameness… exceptional

Regardless of how one feels about such a premise, the fact remains that diversity in race or gender per se is not what people are seeking to maximize in many cases; they’re trying to increase diversity of thought (or, as Maddox put it many years ago: “people who look different must think different because of it; otherwise, why the hell embrace anything? Why not just assume that diversity comes from within, regardless of their skin color, sex, age or religion?“)

Renting that wheel chair was a nice touch, but it’s time to get up and return it before we lose the deposit

If diversity in perspective is what most people are after when they talk about seeking diversity, it seems like it would be a reasonable step to assess people’s perspectives directly, rather than trying to use proxies for it, like race and gender (or clothing, or hair styles, or musical tastes, or…). If, for instance, one was hiring a number of people for a job involving problem solving, it’s quite possible for the person doing the hiring to select a group of men and women from different races who all end up thinking about things in pretty much the same way: not only would the hires likely have the same kinds of educational background, but they’d probably also have comparable interests since they applied for the same job. On top of that initial similarity the person doing the hiring might be partial towards those who hold agreeable points of view. After all, why would you hire someone who holds a perspective you don’t agree with? It sounds as if that decision would make work that much more unpleasant during the day-to-day operations of the company, even if it was irrelevant to the work they do.

Speaking of areas in which diversity of thought seem to be lacking in certain respects, an interesting new paper from Duarte et al (2015) puts forth the proposition that social psychology – as a field – isn’t all that politically diverse, and that’s probably something of a problem for research quality. For example, if social psychologists can be said to be a rather politically homogeneous bunch, this could result in particular (and important) questions not being asked as a result of how that answer might pan out for the images of liberals and their political rivals. After all, if the conclusions of psychology research, by some happy coincidence, tend to demonstrate that liberals (and, by extension, the liberal researchers conducting it) happen to have a firm grasp on reality, whereas their more conservative counterparts are hopelessly biased and delusional, all the better for the liberal group’s public image; all the worse for the truth value of psychological research, however, if those results are obtained by only asking about scenarios in which conservatives, but not liberals, are likely to look biased. If some liberal assumptions about what is right or good are shaping their research to point in certain directions, we’re going to end up making a number of unwarranted interpretative conclusions.

The problems could mount further if the research purporting to deliver conclusions counter to certain liberal interests is reviewed with disproportionate amounts of scrutiny, whereas research supporting those interests is given a pass when their methods are equivalent or worse. Indeed, Duarte et al (2015) discuss some good reasons to think this might be the state of affairs in psychology, not least of which is that quite a number of social psychologists will explicitly admit they would discriminate against those who do not share their beliefs. When surveyed about their self-assessed probability of voting either for or against a known conservative job applicant (when both alternatives are equally qualified for the job), about 82% of social psychologists indicated they would be at least a little more likely to vote against the conservative hire, with about 43% indicating a fairly high degree of certainty they would (above the midpoint of the scale). These kinds of attitudes might well dissuade more conservatives from wanting to enter the field, especially given that the liberals likely to discriminate against them outnumber the conservatives by about 10-to-1.

“Don’t worry, buddy; you can take ‘em”

Not to put too fine of a point on it, but if these ratios were discovered elsewhere – say, a 10:1 ratio of men to women in a field, and about half of the men explicitly say they would vote against hiring women – I imagine that many social psychologists would tripping over themselves to try and inject some justice and moral outrage into the mix. Compared with some other explicit racist tendencies (4% of respondents wouldn’t vote for a black presidential candidate), or sexist ones (5% wouldn’t vote for a woman), there’s a bit of a gulf in discrimination. While the way the question is asked is not quite the same, social psychologists might be about as likely to want to vote for the conservative job candidate as Americans are to vote for a Muslim or an atheist if we assumed equivalence (which is is to say “not very”).

It is at least promising, then, to see that the reactions to this paper were fairly universal in at least recognizing that there might be something of a political diversity problem in psychology, both in terms of its existence and possible consequences. There was more disagreement with respect to the cause of this diversity problem and whether including more conservative minds would increase research quality, but that’s to be expected. I – like the authors – am happy enough that even social psychologists, by in large, seem to accept that social psychology is not all that politically diverse and that such a state of affairs is likely – or at least potentially – harmful to research in some respects (yet another example where stereotypes seem to track reality well).

That said, there is another point to which I want to draw attention. As I mentioned initially, seeking diversity for diversity’s sake is a pointless endeavor, and one that is certainly not guaranteed to improve the quality of work produced. This is the case regardless of the criteria on which candidates are selected, be they physical, political, or something else.  For example, psychology departments could strive to hire people from a variety of different cultural or ethnic groups, but unless those new hires are better at doing psychology, this diversity won’t improve their products. Similarly, psychology departments could strive to hire people with degrees in other fields, like computer science, chemistry, and fine arts; that would likely increase the diversity of thought in psychology, but since there are many more ways of doing poor psychology than there are of doing good psychology, this diversity in backgrounds wouldn’t necessarily be desirable.

Say “Hello” to your new collaborators

Put bluntly, I wouldn’t want people to begin hiring those from non-liberal groups in greater numbers and believe this will, de facto, improve the quality of their research. More specifically, while greater political diversity might, to some extent, reduce the number of bad research projects by diluting or checking existing liberal biases, I don’t know that it would increase in the number of good papers substantially; the relative numbers might change, but I’m more concerned with the absolutes, as a field which fails to produce quality research in sufficient quantities is not demonstrating much value (just like how the guy without a particular failing doesn’t necessarily offer much as a dating prospect). In my humble (and no doubt biased, but not necessarily incorrect) view, there is an important dimension of thought along which I do not wish psychologists to differ, and that is in their application of evolutionary theory as a guiding foundation for their work. Evolutionary theory not only allows one to find previously unappreciated aspects of psychological functioning by considerations of adaptive value, but also allows for building on previous research in a meaningful way and for the effective rooting out of problematic underlying assumptions. In that sense, even failed research projects can contribute in a more meaningful way when framed in an evolutionary perspective, relative to failed projects lacking one.

Evolutionary theory is by no means a cure-all for the bias problem; people will still sometimes get caught up trying to rationalize behaviors or preferences they morally approve of – like homosexuality – as adaptive, for example. In spite of that, I do not particularly hope to see a diversity of perspectives in psychology regarding the theoretical language we all ought to speak by this point. There are many more ways to think about psychology unproductively than there are of doing it well, and more diversity in those respects will make for a much weaker science.

References: Duarte , J., Crawford, J., Stern, C., Haidt, J., Jussim, L., & Tetlock, P. (2015). Political diversity will improve social psychological science. Behavioral & Brain Sciences, 38, 1-58.