More About Race And Police Violence

A couple months back, I offered some thoughts on police violence. The most important, take-home message from that piece was that you need to be clear about what your expectations about the world are – as well as why they are that way – before you make claims of discrimination about population level data. If, for instance, you believe that men and women should be approximately equally likely to be killed by police – as both groups are approximately equal in the US population – then the information that approximately 95% or so of civilians killed by police are male might look odd to you. It means that some factors beyond simple representation in the population are responsible for determining who is likely to get shot and killed. Crucially, that gap cannot be automatically chalked up to any other particular factor by default. Just because men are overwhelmingly more likely to be killed by police, that assuredly does not mean police are biased against men and have an interest in killing them simply because of their sex.

“You can tell they just hate men; it’s so obvious”

Today, I wanted to continue on the theme from my last post and ask about what patterns of data we ought to expect with respect to police killing civilians and race. If we wanted to test the hypothesis that police killings tend to be racially-motivated (i.e., driven by anti-black prejudice), I would think we should expect a different pattern of data from the hypothesis that such killings are driven by race-neutral practices (e.g., cases in which the police are defending against perceived lethal threats, regardless of race). In this case, if police killings are driven by anti-black prejudice, we might propose the following hypothesis: all else being equal, we ought to expect white officers to kill black civilians in greater numbers than black officers. This expectation could be reasonably driven by the prospect that members of a group are less likely to be biased against their in-group than out-group members, on average (in other words, the non-fictional Clayton Bigsbys and Uncle Ruckus’s of the world ought to be rare).

If there was good evidence in favor of the racially-motivated hypothesis for police killings, there would be real implications for the trust people – especially minority groups – should put in the police, as well as for particular social reforms. By contrast, if the evidence is more consistent with the race-neutrality hypothesis, then a continuous emphasis of the importance of race could prove a red herring, distracting people from the other causes of police violence and preventing more effective interventions from being discussed. The issue is basically analogous to a doctor trying to treat an infection with a correct or incorrect diagnosis. It is unfortunate (and rather strange, frankly), then, that good data on police killings is apparently difficult to come by. One would think this is the kind of thing that people would have collected more information on, but apparently that’s not exactly the case. Thankfully,  we now have some fresh data on the topic that was just published by Lott & Moody (2016).

The authors collected their own data set of police killings from 2013 to 2015 by digging through Lexis/Nexis, Google, Google Alerts, and a number of other online databases, as well as directly contacting police departments. In total, they were able to compile information on on 2,700 police killings. Compared with the FBI’s information, the authors found about 1,300 more, about 741 more than the CDC, and 18 more than the Washington Post. Importantly, the authors were also able to collect a number of other pieces of information not consistently included in the other sources, including the number of officers on the scene, their age, gender, sex, and race, among a number of other factors. In demonstrating the importance of having good data, whereas the FBI had been reporting a 6% decrease in police killings over that period, the current data actually found a 29% increase. For those curious – and this is preview of what’s to come – the largest increase was attributed to white citizens being killed (312 in 2013 up to 509 in 2015; the comparable numbers for black citizens were 198 and 257).

“Good data is important, you say?”

In general, black civilians represented 25% of those killed by police, but only 12% of the overall population. Many people take this fact to reflect racial bias, but there are other things to consider, perhaps chief among which is that the crime rates were substantially higher in black neighborhoods. The reported violent crime rates were 758 per 100,000 in cities were black citizens were killed, compared with the 480 in which white citizens were killed (the rates of murder were 11.2 and 4.6, respectively). Thus, to the extent that police are only responding to criminal activity and not race, we should expect a greater representation of the black population relative to the overall population (just like we should expect more males than females to be shot, and more young people than older ones).

Turning to the matter of whether the race of the officer mattered, data was available for 904 cases (whereas the race of all those who were killed was known). When that information was entered into a number of regressions predicting the odds of the officer killing a black suspect, it was actually the case that black officers were quite a bit more likely to have killed a black suspect than a white officer in all cases (consistent with other data I’ve talked about before). It should be noted at this point, however, that for 67% of the cases, the race of the officers was unknown, whereas only 2% of the shootings for which race is known involve a black officer. As the CIA data I mentioned earlier highlighted, this unknown factor can be a big deal; perhaps black officers are actually less likely to have shot black suspects but we just can’t see it here. Since the killings of black citizens from the unknown race group did not differ from white officers, however, it seems unlikely that white officers would end up being unusually likely to shoot black suspects. Moreover, the racial composition of the police force was unrelated to those killings.

A number of other interesting findings cropped up as well. First, there was no effect of body cameras on police killings. This might suggest that when officers do kill someone – given the extremity and possible consequences of the action – it is something they tend to undertake earnestly out of fear for their life. Consistent with that idea, the greater the number of officers on the scene, the greater the reduction in the police killing anyone (about a 14-18% decline per additional officer present). Further, white female officers (though their numbers were low in the data) were also quite a bit more likely to shoot unarmed citizens (79% more), likely as a byproduct of their reduced capabilities to prevail in a physical conflict during which their weapon might be taken or they could get killed. To the extent these shootings are being driven by legitimate fears on the parts of the officers, all this data would appear to consistently fit together.

“Unarmed” does not always equal “Not Dangerous”

In sum, there doesn’t appear to be particularly strong empirical evidence that white officers are killing black citizens at higher rates than black officers; quite the opposite, in fact.  While such information might be viewed as a welcome relief, to those who have wed themselves to the idea that black populations are being targeted for lethal violence by police this data will likely be shrugged off. It will almost always be possible for someone seeking to find racism to manipulate their expectations into the world of empirical unfalsifiability. For example, given the current data of a lack of bias against black civilians by white officers, the racism hypothesis could be pushed one step back to some population-level bias whereby all officers, even black ones, are impacted by anti-black prejudice in their judgments (regardless of the department’s racial makeup, the presence of cameras, or any other such factor). It is also entirely possible that any racial biases don’t show up in the patterns of police killings, but might well show up in other patterns of less-lethal aggression or harassment. After all, there are very real consequences for killing a person – even when the killings are deemed justified and lawful – and many people would rather not subject themselves to such complications. Whatever the case, white officers do not appear unusually likely to shoot black suspects. 

References: Lott, J. & Moody, C. (2016). Do white officers unfairly target black suspects? (November 15, 2016). Available at SSRN: https://ssrn.com/abstract=2870189

When It’s Not About Race Per Se

We can use facts about human evolutionary history to understand the shape of our minds; using it to understand people’s reactions to race is no exception. As I have discussed before, it is unlikely that ancestral human populations ever traveled far enough, consistently enough throughout our history as a species to have encountered members of other races with any regularity. Different races, in other words, were unlikely to be a persistent feature of our evolutionary history. As such, it seems correspondingly unlikely that human minds contain any modules that function to attend to race per se. Yet we do seem to automatically attend to race on a cognitive level (just as we do with sex and age), so what’s going on here? The best hypothesis I’ve seen as of yet is that people aren’t paying attention to race itself as much as they are using it as a proxy for something else that likely was recurrently relevant during our history: group membership and social coalitions (Kurzban, Tooby, & Cosmides, 2001). Indeed, when people are provided with alternate visual cues to group membership – such as different color shirts – the automaticity of race being attended to appears to be diminished, even to the point of being erased entirely at times.

Bright colors; more relevant than race at times

If people attend to race as a byproduct of our interest in social coalitions, then there are implications here for understanding racial biases as well. Specifically, it would seem unlikely for widespread racial biases to exist simply because of superficial differences like skin color or facial features; instead, it seems more likely that racial biases are a product of other considerations, such as the possibility that different groups – racial or otherwise – simply hold different values as social associates to others. For instance, if the best interests of group X are opposed to those group Y, then we might expect those groups to hold negative opinions of each other on the whole, since the success of one appears to handicap the success of the other (for an easy example of this, think about how more monogamous individuals tend to come into conflict with promiscuous ones). Importantly, to the extent that those best interests just so happen to correlate with race, people might mistake a negative bias due to varying social values or best interests for one due to race.

In case that sounds a bit too abstract, here’s an example to make it immediately understandable: imagine an insurance company that is trying to set its premiums only in accordance with risk. If someone lives in an area at a high risk of some negative outcome (like flooding or robbery), it makes sense for the insurance company to set a higher premium for them, as there’s a greater chance they will need to pay out; conversely, those in low-risk areas can pay reduced premiums for the same reason. In general, people have no problem with this idea of discrimination: it is morally acceptable to charge different rates for insurance based on risk factors. However, if that high-risk area just so happens to be one in which a particular racial group lives, then people might mistake a risk-based policy for a race-based one. In fact, in previous research, certain groups (specifically liberal ones) generally say it is unacceptable for insurance companies to require those living in high-risk areas pay higher premiums if they happen to be predominately black (Tetlock et al, 2000).

Returning the main idea at hand, previous research in psychology has tended to associate conservatives – but not liberals – with prejudice. However, there has been something of a confounding factor in that literature (which might be expected, given that academics in psychology are overwhelmingly liberal): specifically, much of that literature on prejudice asks about attitudes towards groups whose values tend to lean more towards the liberal side of the political spectrum, like homosexual, immigrant, and black populations (groups that might tend to support things like affirmative action, which conservative groups would tend to oppose). When that confound is present, then it’s not terribly surprising that conservatives would look more prejudiced, but that prejudice might ultimately have little to do with the target’s race or sexual orientation per se.  More specifically, if animosity between different racial groups is due primarily to a factor like race itself, then you might expect those negative feelings to persist even in the face of compatible values. That is, if a white person happens to not like black people because they are black, then the views of a particular black person shouldn’t be liable to change those racist sentiments too much. However, if those negative attitudes are instead more of a product of a perceived conflict of values, then altering those political or social values should dampen or remove the effects of race altogether. 

Shaving the mustache is probably a good place to start

This idea was tested by Chambers et al (2012) over the course of three studies. The first of these involved 170 Mturk participants who indicated their own ideological position (strongly liberal to strongly conservative, 5-point scale), their impressions of 34 different groups (in terms of whether they’re usually liberal or conservative on the same scale, as well as how much they liked the target group), as well as a few other measures related to the prejudice construct, like system justification and modern racism. As it turns out, both liberals and conservatives tended to agree with one another about how liberal or conservative the target groups tended to be (r = .97), so their ratings were averaged. Importantly, when the target group in question tended to be liberal (such as Feminists or Atheists), liberals tended to have higher favorability ratings of them (M = 3.48) than did conservatives (M = 2.57; d = 1.23); conversely, when the target group was perceived as conservative (such as business people or the elderly), liberals now tended to have lower favorability ratings (M = 2.99) of them than conservatives (M = 3.86; d = 1.22). In short, liberals tended to feel positive about liberals, and conservatives tended to feel positive about conservatives. The more extreme the perceived political differences of the target were, the larger these biases were (r = .84). Further, when group memberships needed to be chosen, the biases were larger than when they were involuntary (e.g., as a group, “feminist”s generated more bias from liberals and conservatives than “women”).

Since that was all correlational, studies 2 and 3 took a more experimental approach. Here, participants were exposed to a target whose race (white/black) and positions (conservatives or liberal) were manipulated on six different issues (welfare, affirmative action, wealth redistribution, abortion, gun control, and the Iraq war).In study 2 this was done on a within-subjects basis with 67 participants, and in study 3 it was done between-subjects with 152 participants. In both cases, however, the results were similar: in general, the results showed that while the target’s attitudes mattered when it came to how much the participants liked them, the target’s race did not. Liberals didn’t like black targets who disagreed any more than conservatives did. The conservatives happened to like the targets who expressed conservative views more, whereas liberals tended to like targets who expressed liberal views more. The participants had also provided scores on measures of system justification, modern racism, and attitudes towards blacks. Even when these factors were controlled for, however, the pattern of results remained: people tended to react favorably towards those who shared views and unfavorably to those who did not. The race of the person with those views seemed besides the point for both liberals and conservatives. Not to hammer the point home too much, but perceiving ideological agreement – not race – was doing the metaphorical lifting here. 

Now perhaps these results would have looked different if the samples in question were comprised of people who held, more or less, extreme and explicit racist views; the type of people would wouldn’t want to live next to someone of a different race. While that’s possible, there are a few points to make about that suggestion: first, it’s becoming increasing difficult to find people who hold such racist or sexist views, despite certain rhetoric to the contrary; that’s the reason researchers ask about “symbolic” or “modern” or “implicit” racism, rather than just racism. Such openly-racist individuals are clearly the exceptions, rather than the rule. This brings me to the second point, which is that, even if biases did look different among the hardcore racists (we don’t know if they do), for more average people, like the kind in these studies, there doesn’t appear to be a widespread problem with race per se; at least not if the current data have any bearing on the matter. Instead, it seems possible that people might be inferring a racial motivation where it doesn’t exist because of correlations with race (just like in our insurance example).

Pictured: unusual people; not everyone you disagree with

For some, the reaction to this finding might be to say that it doesn’t matter. After all, we want to reduce racism, so being incredibly vigilant for it should ensure that we catch it where it exists, rather than miss it or make it seem permissible. Now that likely true enough, but there are other considerations to add into that equation. One of them is that by reducing your type-two errors (failing to see racism where it exists) you increase your type-one errors (seeing racism where there is none). As long as accusations of being a racist are tied to social condemnation (not praise; a fact alone which ought to tell you something), you will be harming people by overperceiving the issue. Moreover, if you perceive racism where it doesn’t exist too often, you will end up with people who don’t take your claims of racism seriously anymore. Another point to make is that if you’re actually serious about addressing a social problem you see, accurately understanding its causes will go a long way. That is to say that time and energy invested in interventions to reduce racism is time not spent trying to address other problems. If you have misdiagnosed the issue you seek to tackle as being grounded in race, then your efforts to address it will be less successful than they otherwise could be, not unlike a doctor prescribing the wrong medication to treat an infection.          

References: Chambers, J., Schlenker, B., & Collisson, B. (2012). Ideology and prejudice: The role of value conflicts. Psychological Science, 24, 140-149.

Kurzban, R., Tooby, J., & Cosmides, L. (2001). Can race be erased? Coalitional computation and social categorization. PNAS, 98, 15387-15392.

Tetlock, P., Kristel, O., Elson, S., Green, M., & Lerner, J. (2000). The psychology of the unthinkable: Taboo trade-offs, forbidden base rates, and heretical counterfactuals. Journal of Personality and Social Psychology, 78 (5), 853-870 DOI: 10.1037//0022-3514.78.5.853

What Might Research Ethics Teach Us About Effect Size?

Imagine for a moment that you’re in charge of overseeing medical research approval for ethical concerns. One day, a researcher approaches you with the following proposal: they are interested in testing whether a food stuff that some portion of the population occasionally consumes for fun is actually quite toxic, like spicy chilies. They think that eating even small doses of this compound will cause mental disturbances in the short term – like paranoia and suicidal thoughts – and might even cause those negative changes permanently in the long term. As such, they intend to test their hypothesis by bringing otherwise-healthy participants into the lab, providing them with a dose of the possibly-toxic compound (either just once or several times over the course of a few days), and then see if they observe any negative effects. What would your verdict on the ethical acceptability of this research be? If I had to guess, I suspect that many people would not allow the research to be conducted because one of the major tenants of research ethics is that harm should not befall your participants, except when absolutely necessary. In fact, I suspect that were you the researcher – rather than the person overseeing the research – you probably wouldn’t even propose the project in the first place because you might have some reservations about possibly poisoning people, either harming them directly and/or those around them indirectly.

“We’re curious if they make you a danger to yourself and others. Try some”

With that in mind, I want to examine a few other research hypotheses I have heard about over the years. The first of these is the idea that exposing men to pornography will cause a number of harmful consequences, such as increasing how appealing rape fantasies were, bolstering the belief that women would enjoy being raped, and decreasing the perceived seriousness of violence against women (as reviewed by Fisher et al, 2013). Presumably, the effect on those beliefs over time is serious as it might lead to real-life behavior on the part of men to rape women or approve of such acts on the parts of others. Other, less-serious harms have also been proposed, such as the possibility that exposure to pornography might have harmful effects on the viewer’s relationship, reducing their commitment, making it more likely that they would do things like cheat or abandon their partner. Now, if a researcher earnestly believed they would find such effects, that the effects would be appreciable in size to the point of being meaningful (i.e., are large enough to be reliably detected by statistical test in relatively small samples), and that their implications could be long-term in nature, could this researcher even ethically test such issues? Would it be ethically acceptable to bring people into the lab, randomly expose them to this kind of (in a manner of speaking) psychologically-toxic material, observe the negative effects, and then just let them go? 

Let’s move onto another hypothesis that I’ve been talking a lot about lately: the effects of violent media on real life aggression. Now I’ve been specifically talking about video game violence, but people have worried about violent themes in the context of TV, movies, comic books, and even music. Specifically, there are many researchers who believe that exposure to media violence will cause people to become more aggressive through making them perceive more hostility in the world, view violence as a more acceptable means of solving problems, or by making violence seem more rewarding. Again, presumably, changing these perceptions is thought to cause the harm of eventual, meaningful increases in real-life violence. Now, if a researcher earnestly believed they would find such effects, that the effects would be appreciable in size to the point of being meaningful, and that their implications could be long-term in nature, could this researcher even ethically test such issues? Would it be ethically acceptable to bring people into the lab, randomly expose them to this kind of (in a manner of speaking) psychologically-toxic material, observe the negative effects, and then just let them go?

Though I didn’t think much of it at first, the criticisms I read about the classic Bobo doll experiment are actually kind of interesting in this regard. In particular, researchers were purposefully exposing young children to models of aggression, the hope being that the children will come to view violence as acceptable and engage in it themselves. The reason I didn’t pay it much mind is that I didn’t view the experiment as causing any kind of meaningful, real-world, or lasting effects on the children’s aggression; I don’t think mere exposure to such behavior will have meaningful impacts. But if one truly believed that it would, I can see why that might cause some degree of ethical concerns. 

Since I’ve been talking about brief exposure, one might also worry about what would happen to researchers were to expose participants to such material – pornographic or violent – for weeks, months, or even years on end. Imagine a study that asked people to smoke for 20 years to test the negative effects in humans; probably not getting that past the IRB. As a worthy aside on that point, though, it’s worth noting that as pornography has become more widely available, rates of sexual offending have gone down (Fisher et al, 2013); as violent video games have become more available, rates of youth violent crime have done down too (Ferguson & Kilburn, 2010). Admittedly, it is possible that such declines would be even steeper if such media wasn’t in the picture, but the effects of this media – if they cause violence at all – are clearly not large enough to reverse those trends.

I would have been violent, but then this art convinced me otherwise

So what are we to make of the fact that these research was proposed, approved, and conducted? There are a few possibility to kick around. The first is that the research was proposed because the researchers themselves don’t give much thought to the ethical concerns, happy enough if it means they get a publication out of it regardless of the consequences, but that wouldn’t explain why it got approved by other bodies like IRBs. It is also possible that the researchers and those who approve it believe it to be harmful, but view the benefits to such research as outstripping the costs, working under the assumption that once the harmful effects are established, further regulation of such products might follow ultimately reducing the prevalence or use of such media (not unlike the warnings and restrictions placed on the sale of cigarettes). Since any declines in availability or censorship of such media have yet to manifest – especially given how access to the internet provides means for circumventing bans on the circulation of information – whatever practical benefits might have arisen from this research are hard to see (again, assuming that things like censorship would yield benefits at all) .

There is another aspect to consider as well: during discussions of this research outside of academia – such as on social media – I have not noted a great deal of outrage expressed by consumers of these findings. Anecdotal as this is, when people discuss such research, they do not appear to raising the concern that the research itself was unethical to conduct because it will doing harm to people’s relationships or women more generally (in the case of pornography), or because it will result in making people more violent and accepting of violence (in the video game studies). Perhaps those concerns exist en mass and I just haven’t seen them yet (always possible), but I see another possibility: people don’t really believe that the participants are being harmed in this case. People generally aren’t afraid that the participants in those experiments will dissolve their relationship or come to think rape is acceptable because they were exposed to pornography, or will get into fights because they played 20 minutes of a video game. In other words, they don’t think those negative effects are particularly large, if they even really believe they exist at all. While this point would be a rather implicit one, the lack of consistent moral outrage expressed over the ethics of this kind of research does speak to the matter of how serious these effects are perceived to be: at least in the short-term, not very. 

What I find very curious about these ideas – pornography causes rape, video games cause violence, and their ilk – is that they all seem to share a certain assumption: that people are effectively acted upon by information, placing human psychology in a distinctive passive role while information takes the active one. Indeed, in many respects, this kind of research strikes me as remarkably similar to the underlying assumptions of the research on stereotype threat: the idea that you can, say, make women worse at math by telling them men tend to do better at it. All of these theories seem to posit a very exploitable human psychology capable of being manipulated by information readily, rather than a psychology which interacts with, evaluates, and transforms the information it receives.

For instance, a psychology capable of distinguishing between reality and fantasy can play a video game without thinking it is being threatened physically, just like it can watch pornography (or, indeed, any videos) without actually believing the people depicted are present in the room with them. Now clearly some part of our psychology does treat pornography as an opportunity to mate (else there would be no sexual arousal generated in response to it), but that part does not necessarily govern other behaviors (generating arousal is biologically cheap; aggressing against someone else is not). The adaptive nature of a behavior depends on context.

Early hypotheses of the visual-arousal link were less successful empirically

As such, expecting something like a depiction to violence to translate consistently into some general perception that violence is acceptable and useful in all sorts of interactions throughout life is inappropriate. Learning that you can beat up someone weaker than you doesn’t mean it’s suddenly advisable to challenge someone stronger than you; relatedly, seeing a depiction of people who are not you (or your future opponent) fighting shouldn’t make it advisable for you to change your behavior either. Whatever the effects of this media, they will ultimately be assessed and manipulated internally by psychological mechanisms and tested against reality, rather than just accepted as useful and universally applied.  

I have seen similar thinking about information manipulating people another time as well: during discussions of memes. Memes are posited to be similar to infectious agents that will reproduce themselves at the expense of their host’s fitness; information that literally hijacks people’s minds for its own reproductive benefits. I haven’t seen much in the way of productive and successful research flowing from that school of thought quite yet – which might be a sign of its effectiveness and accuracy – but maybe I’m just still in the dark there. 

References: Ferguson, C. & Kilburn, J. (2010). Much ado about nothing: The misestimation and overinterpretation of violent video game effects in eastern and western nations: Comment on Anderson et al (2010). Psychological Bulletin, 136, 174-178.

Fisher, W., Kohut, T., Di Gioacchino, L., & Fedoroff , P. (2013). Pornography, sex crime, and paraphilia. Current Psychiatry Reports, 15, 362.

Getting To Know Your Outliers: More About Video Games

As I mentioned in my last post, I’m a big fan of games. For the last couple of years, the game which has held the majority of my attention has been a digital card game. In this game, people have the ability to design decks with different strategies, and the success of your strategy will depend on the strategy of your own opponent; you can think of it as a more complicated rock-paper-scissors component. The players in this game are often interested in understanding how well certain strategies match up against others, so, for the sake of figuring that out, some have taken it upon themselves to collect data from the players to answer those questions. You don’t need to know much about the game to understand the example I’m about to discuss, but let’s just consider two decks: deck A and deck B. Those collecting the data managed to aggregate the outcome of approximately 2,200 matches between the two and found that, overall, deck A was favored to win the match 55% of the time. This should be some pretty convincing data when it comes to getting a sense for how things generally worked out, given the large sample size.

Only about 466 more games to Legend with that win rate

However, this data will only be as useful to us as our ability to correctly interpret it. A 55% success rate captures the average performance, but there is at least one well-known outlier player within the game in that match. This individual manages to consistently perform at a substantially higher level than average, achieving wins in that same match up around 70-90% of the time across large sample sizes. What are we to make of that particular data point? How should it affect our interpretation of the match? One possible interpretation is that his massively positive success rate is simply due to variance and, given enough games, the win rate of that individual should be expected to drop. It hasn’t yet, as far as I know. Another possible explanation is that this player is particularly good, relative to his opponents, and that factor of general skill explains the difference. In much the same way, an absolutely weak 15-year-old might look pretty strong if you put him in a boxing match against a young child. However, the way the game is set up you can be assured that he will be matched against people of (relatively) equal skill, and that difference shouldn’t account for such a large disparity.

A third interpretation – one which I find more appealing, given my deep experience with the game – is that skill matters, but in a different way. Specifically, deck A is more difficult to play correctly than deck B; it’s just easier to make meaningful mistakes and you usually have a greater number of options available to you. As such, if you give two players of average skill decks A and B, you might observe the 55% win rate initially cited. On the other hand, if you give an expert player both decks (one who understands that match as well as possible), you might see something closer to the 80% figure. Expertise matters for one deck a lot more than the other. Depending on how you want to interpret the data, then, you’ll end up with two conclusions that are quite different: either the match is almost even, or the match is heavily lopsided. I bring this example up because it can tell us something very important about outliers: data points that are, in some way, quite unusual. Sometimes these data points can be flukes and worth disregarding if we want to learn about how relationships in the world tend to work; other times, however, these outliers can provide us valuable and novel insights that re-contextualize the way we look at vast swaths of other data points. It all hinges on the matter of why that outlier is one. 

This point bears on some reactions I received to the last post I wrote about a fairly-new study which finds no relationship between violent content in video games and subsequent measures of aggression once you account for the difficulty of a game (or, perhaps more precisely, the ability of a game to impede people’s feelings of competence). Glossing the results into a single sentence, the general finding is that the frustration induced by a game, but not violent content per se, is a predictor of short-term changes in aggression (the gaming community tends to agree with such a conclusion, for whatever that’s worth). In conducting this research, the authors hoped to address what they perceived to be a shortcoming in the literature: many previous studies had participants play either violent or non-violent games, but they usually achieved this method by having them play entirely different games. This means that while violent content did vary between conditions, so too could have a number of other factors, and the presence of those other factors poses some confounds in interpreting the data. Since more than violence varied, any subsequent changes in aggression are not necessarily attributable to violent content per se.

Other causes include being out $60 for a new controller

The study I wrote about, which found no effect of violence, stands in contrast to a somewhat older meta-analysis of the relationship between violent games and aggression. A meta-analysis – for those not in the know – is when a larger number of studies are examined jointly to better estimate the size of some effect. As any individual study only provides us with a snapshot of information and could be unreliable, it should be expected that a greater number of studies will provide us with a more accurate view of the world, just like running 50 participants through an experiment should give us a better sense than asking a single person or two. The results of some of those meta-analyses seem to settle on a pretty small relationship between violent video games and aggression/violence (approximately r = .15 to .20 for non-serious aggression, and about r = .04 for serious aggression depending on who you ask and what you look at; Anderson et a, 2010; Ferguson & Kilburn, 2010; Bushman et al, 2010), but there have been concerns raised about publication bias and the use of non-standardized measures of aggression.

Further, were there no publication bias to worry about, that does not mean the topic itself is being researched by people without biases, which can affect how data gets analyzed, research gets conducted, measures get created and interpreted, and so on. If r = .2 is about the best one can do with those degrees of freedom (in other words, assuming the people conducting such research are looking for the largest possible effect and develop their research accordingly), then it seems unlikely that this kind of effect is worth worrying too much about. As Ferguson & Kilburn (2010) note, youth violent crime rates have been steadily decreasing as the sale of violent games have been increasing (r = -.95; as well, the quality of that violence has improved over time; not just the quantity. Look at the violence in Doom over the years to get a better sense for that improvement). Now it’s true enough that the relationship between youth violent crime and violent video game sales is by no means a great examination of the relationship in question, but I do not doubt that if the relationship ran in the opposite direction (especially if were as large), many of the same people who disregard it as unimportant would never leave it alone.

Again, however, we run into that issue where our data is only as good as our ability to interpret it. We want to know why the meta-analysis turned up a positive (albeit small) relationship whereas the single paper did not turn up such a relationship, despite multiple chances to find it. Perhaps the paper I wrote about was simply a statistical fluke; for whatever reason, the samples recruited for those studies didn’t end up showing the effect of violent content, but the effect is still real in general (perhaps it’s just too small to be reliably detected). That seems to be the conclusion some responses I received contained. In fact, I had one commenter who cited the results of three different studies suggesting there was a casual link between violent content and aggression. However, when I dug up those studies and looked at the methods section, what I found was that, as I mentioned before, all of them had participants play entirely different games between violent and non-violent conditions. This messes with your ability to interpret the data only in light of violent content, because you are varying more than just violence (even if unintentionally). On the other hand, the paper I mentioned in my last post had participants playing the same game between conditions, just with content (like difficulty or violence levels) manipulated. As far as I can tell, then, the methods of the paper I discussed last week were superior, since they were able to control more, apparently-important factors.

This returns us to the card game example I raised initially: when people play a particular deck incorrectly, they find it is slightly favored to win; when someone plays it correctly they find it is massively favored. To turn that point to this analysis, when you conduct research that lacks the proper controls, you might find an effect; when you add those controls in, the effect vanishes. If one data point is an outlier because it reflects research done better than the others, you want to pay more attention to it. Now I’m not about to go digging through over 130 studies for the sake of a single post – I do have other things on my plate – but I wanted to make this point clear: if a meta-analysis contains 130 papers which all reflect the same basic confound, then looking at them together makes me no more convinced of their conclusion than looking at any of them alone (and given that the specific studies that were cited in response to my post all did contain that confound, I’ve seen no evidence inconsistent with that proposal yet). Repeating the same mistake a lot does not make it cease to be a mistake, and it doesn’t impress me concerning the weight of the evidence. The evidence acquired through weak methodologies is light indeed.  

Research: Making the same mistakes over and over again for similar results

So, in summation, you want to really get to know your data and understand why it looks the way it does before you draw much in the way of meaningful conclusions from it. A single outlier can potentially tell you more about what you want to know than lots of worse data points (in fact, it might not even be the case that poorly-interpreted data is recognized as such until contrary evidence rears its head). This isn’t always the case, but to write off any particular data point because it doesn’t conform to the rest of the average pattern – or to assume its value is equal to that of other points – isn’t always right either. Meeting your data, methods, and your measures is quite important for getting a sense for how to interpret it all. 

For instance, it has been proposed that – sure – the relationship between violent game content and aggression is small at best (there seems to be some heated debate over whether it’s closer to r = .1 or .2) but it could still be important because lots of small effects can add up over time into a big one. In other words, maybe you ought to be really wary of that guy who has been playing a violent game for an hour each night for the last three years. He could be about to snap at the slightest hint of a threat and harm you…at least to the extent that you’re afraid he might suggest you listen to loud noises or eat slightly more of something spicy; two methods used to assess “physical” aggression in this literature due to ethical limitations (despite the fact that, “Naturally, children (and adults) wishing to be aggressive do not chase after their targets with jars of hot sauce or headphones with which to administer bursts of white noise.” That small, r = .2 correlation I referenced before concerns behavior like that in a lab setting where experimental demand characteristics are almost surely present, suggesting the effect on aggressive behavior in naturalistic settings is likely overstated.)

Then again, in terms of meaningful impact, perhaps all those small effects weren’t really mounting to much. Indeed, the longitudinal research in this area seems to find the smallest effects (Anderson et al, 2010). To put that into what I think is a good example, imagine going to the gym. Listening to music helps many people work out, and the choice of music is relevant there. The type of music I would listen to when at the gym is not always the same kind I would listen to if I wanted to relax, or dance, or set a romantic mood. In fact, the music I listen to at the gym might even make me somewhat more aggressive in a manner of speaking (e.g., for an hour, aggressive thoughts might be more accessible to me while I listen than if I had no music, but that don’t actually lead to any meaningful changes in my violent behavior while at the gym or once I leave that anyone can observe). In that case, repeated exposure to this kind of aggressive music would not really make me any more aggressive in my day-to-day life than you’d expect overtime.

Thankfully, these warnings managed to save people from dangerous music

That’s not to say that media has no impact on people whatsoever: I fully suspect that people watching a horror movie probably feel more afraid than they otherwise would; I also suspect someone who just watched an action movie might have some violent fantasies in their head. However, I also suspect such changes are rather specific and of a short duration: watching that horror movie might increase someone’s fear of being eaten by zombies or ability to be startled, but not their fear of dying from the flu or their probability of being scared next week; that action movie might make someone think about attacking an enemy military base in the jungle with two machine guns, but it probably won’t increase their interest in kicking a puppy for fun, or lead to them fighting with their boss next month. These effects might push some feelings around in the very short term, but they’re not going to have lasting and general effects. As I said at the beginning of last week, things like violence are strategic acts, and it doesn’t seem plausible that violent media (like, say, comic books) will make them any more advisable.

References: Anderson, C. et al. (2010). Violent video game effects on aggression, empathy, and prosocial behavior in eastern and western counties: A meta-analytic review. Psychological Bulletin, 136, 151-173.

Bushman, B., Rothstein, H., & Anderson, C. (2010). Much ado about something: Violent video game effects and school of red herring: Reply to Ferguson & KIlburn (2010). Psychological Bulletin, 136, 182-187.

Elson, M. & Ferguson, C. (2013). Twenty-five years of research on violence in digital games and aggression: Empirical evidence, perspectives, and a debate gone astray. European Psychologist, 19, 33-46.

Ferguson, C. & Kilburn, J. (2010). Much ado about nothing: The misestimation and overinterpretation of violent video game effects in eastern and western nations: Comment on Anderson et al (2010). Psychological Bulletin, 136, 174-178.

Violence In Games Does Not Cause Real-Life Violence

Violence is a strategic act. What I mean by this is that a threat to employ physical aggression against someone else unless they do what you want is one that needs to be credible to be useful. If a 5-year-old child threatened to beat up her parents if they don’t stop for ice cream, the parents understand that the child does not actually pose a real physical risk and, if push came to shove, the parents would win a physical contest; by contrast, if you happen to be hanging out with a heavy-weight MMA fighter and he demands you pull over for ice cream, you should be more inclined to take his request seriously. If you cannot realistically threaten others with credible claims of violence – if you are not likely to be able to inflict harmful costs on others physically – then posturing aggressively shouldn’t be expected to do you any favors; if anything, adopting aggressive stances you cannot back up will result in your suffering costs inflicted by others, and that’s generally an outcome to be avoided. It’s for this reason that – on a theoretical level – we should have expected research on power poses to fail to replicate: simply acting more dominant will not make you more able to actually back up those boasts, and people shouldn’t be expected to take such posturing seriously. If you apply that same logic to nonhumans – say Rams – a male who behaves dominantly will occasionally encourage another male who will challenge that dominance. If neither backs down the result is a physical conflict, and the subsequent realization that writing metaphorical checks you cannot cash is a bad idea.

“You have his attention; sure hope you also have a thick skull, too”

This cursory analysis already suggests there might be a theoretical problem with the idea that people who are exposed to violent content in media will subsequently become more aggressive in real life. Yes, watching Rambo or playing Dark Souls might inspire some penchant for spilling fantasy blood (at least in the short term), but seeing violence doesn’t suddenly increase the advisability of your pursuing such a strategy, as you are no more likely to be able effectively employ it than you were before your exposure. Again, to place that in a nonhuman example (always a good idea when you’re dealing with psychology research to see if an idea still make sense; if it only makes sense for humans, odds are it’s lacking in interpretation), if you exposed a male ram to media depicting males aggressively slamming their horns into other males, that doesn’t suddenly mean your subject ram will be inspired to run out and challenge a rival. His chances of winning that contest haven’t changed, so why should his behavior?

Now the matter is more complex than this analysis lets on, admittedly, but it does give us something of a starting point for understanding why violent content in media – video games in particular – should not be expected to have uniform or lasting impacts on the player’s subsequent behavior. Before I get into the empirical side of this issue, however, I think it’s important I lay my potential bias on the table: I’m a gamer; have been my entire life, at least as far as I can remember. I’ve played games in all sorts of mediums – video, card, board, and sometimes combinations of those – and across a variety of genres, including violent ones. As such, when I see someone leveling accusations against one of my more cherished hobbies, my first response is probably defensive. That is, I don’t perceive people who research the link between violent games and aggression to be doing so for no particular reason; I assume they have some specific goals in mind (consciously understood or not) that center around telling other people what they shouldn’t do or enjoy, perhaps even ranging as far as trying to build a case for the censorship of such materials. As such, I’m by no means an unbiased observed in this matter, but I am also something of an expert in the subject matter as well, which can provide me with insights that others might not possess.

That disclaimer out the way, I wanted to examine some research today which examines the possibility that the relationship people have sometimes spotted between violent video game content and aggression isn’t casual (Przybylski et al, 2014; I say sometimes because apparently this link between the two is inconsistently present, possibly only short-term in nature, and the subject of some debate). The thrust of this paper focuses on the idea that human aggression (proximately) is a response to having one’s psychological needs thwarted. I think there are better ways to think about what aggression is, but this general idea is probably close enough to that truth to do us just fine. In brief, the idea motivating this paper is that people play video games (again, proximately), in part, because they provide feelings of competency and skill growth. Something about the challenges games offers to be overcome proves sufficiently motivating for players to get pleasure out of the experience. Importantly, this should hold true across gaming content: people don’t find content appealing because it is violent generally, but rather because it provides us abilities to test, display, and strengthen certain competencies. As such, manipulating the content of the games (from violent to non-violent) should be much less effective at causing subsequent aggression than manipulating the difficulty of the game (from easy/intuitive to difficult/confusing).    

“I’ll teach him a lesson about beating me in Halo”

This is a rather important factor to consider because the content of a game (whether it is violent or not, for instance) might be related to how difficult the game is to learn or master. As such, if researchers have been trying to vary the content without paying much mind to the other factors that correlate with it, that could handicap the usefulness of subsequent interpretations. Towards that end, Przybylski et al (2014) report on the results of seven studies designed to examine just that issue. I won’t be able to go over all of them in depth, but try to provide a general adequate summary of their methods and findings. In their first study, they examined how 99 participants reacted to playing a simple but non-violent game (about flying a paper airplane through rings) or a complex but violent one (a shooter with extensive controls). The players were then asked about their change in aggressive feelings (pre- and post-test difference) and mastery of the controls. The quick summary of the results was that aggressive content did not predict change in aggression scores above and beyond the effects of frustrations over the controls, while the control scores did predict aggression.

Their second study actually manipulated the content and complexity factors (N = 101). Two versions of the same game (Half-Life 2) were created, such that one contained violent content and the other did not, while the overall environment and difficulty were held constant. Again, there were no effects of content on aggression, but there was an effect of perceived mastery. In other words, people felt angry when they were frustrated with the game; not because of the content. Their third study (N = 104) examined what happened when a non-violent puzzle game (Tetris) was modified to either contain simple or complex control interface. As before, those who had to deal with the frustrating controls were quicker to access aggressive thoughts and terms than those in the intuitive control condition. Study 4 basically repeated that design with some additional variables and found the same type of results: perceived competency in the game correlated negatively with aggression and that people become more aggressive the less they enjoyed the game, among a few other things.The fifth study had 112 participants all play a complex game that was either (a) violent or non-violent, but also gave them either (b) 10 minutes of practice time with the game or no experience with it. As expected, there was an effect of being able to practice on subsequent aggression, but no effect of violent content.

Study 6 asked participants to first submerge their arm in ice water for 25 seconds (a time period ostensibly determined by the last participant), then play a game of Tetris for a few minutes that was modified to be either easy or difficult (but not because of the controls this time). Those assigned to play the more difficult version of Tetris also reported more aggressive feelings, and assigned the next subject to submerge their arm for about 30 seconds in the ice water (relative to the 22 second average assignment in the easy group). The final study surveyed regular players about their experiences gaming over the last month and aggressive feelings, again finding that the ratings of content did not predict aggressive self-reported reactions to gaming, but frustrations with playing the game did.

“I’m going to find the developer of this game and kill him for it!”

In summation, then, violent content per se does not appear to make players more aggressive; instead, frustration and losing seem to play a much larger role. It is at this point that my experience as a gamer comes in handy, because such an insight should be readily apparent to anyone who has watched many other people play games. As an ever-expanding library of YouTube rage-quit videos document, a gamer can become immediately enraged by losing at almost any game, regardless of the content (for those of you not in the know, rage-quitting refers to aggressively quitting out of a game following a loss, often accompanied by yelling, frustrating, and broken controllers). I’ve seen people losing their minds over shooters, sports games, card games, board games, and storming off while shouting. Usually such outbursts are short-term affairs – you don’t see that person the next day and notice they’re visibly more aggressive towards others indiscriminately – but the important part is that they almost always occur in response to losses (and usually losses deemed to be unfair, in some sense).

As a final thought, in addition to the introductory analysis and empirical evidence presented here, there are other reasons one might not predict that violent content per se would be related to subsequent aggression even if one wants to hold onto the idea that mere exposure to content is enough to alter future behavior. In this case, most of the themes found within games that have violent content are not violence and aggression as usually envisioned (like The Onion‘s piece on Close Range: the video game about shooting people point blank in the face). Instead, those themes usually focus on the context in which that violence is used: defeating monsters or enemies that threaten the safety of you or others, killing corrupt and dangerous individuals in positions of power, or getting revenge for past wrongs. Those themes are centered more around heroism and self-defense than aggression for the sake of violence. Despite that, I haven’t heard of many research projects examining whether playing such violent games could lead to increased psychological desires to be helpful, or encourage people to take risks to save others from suffering costs.

References: Przybylski, A., Rigby, C., Deci, E., & Ryan, R. (2014). Competent-impeding electronic games and players’ aggressive feelings, thoughts, and behaviors. Journal of Personality & Social Psychology, 16, 441-457.

Sensitive Topics: Not All That Sensitive

Standards and Practices are a vital link in keeping good and funny ideas away from you, the television viewer

If you’ve ever been involved in getting an academic research project off the ground, you likely share some form of frustration with the Institutional Review Boards (or IRBs) that you had to go through before you could begin. For those of you not the know, the IRB is an independent council set up by universities tasked with assessing and monitoring research proposals associated with the university for possible ethical violations. Their main goal is in protecting subjects – usually humans, but also nonhumans – from researchers who might otherwise cause them harm during the course of research. For instance, let’s say a researcher is testing an experimental drug for effectiveness in treating a harmful illness. The research begins by creating two groups of participants: one who receive the real drug and one who receives a placebo. Over the course of the study, if it becomes apparent that the experimental drug is working, it would be considered unethical for the researcher to withhold the effective treatment from the placebo group. Unfortunately, ethical breaches like that have happened historically and (probably) continue to happen today. It’s the IRB’s job to help reduce the prevalence of such issues.

Because the research ethics penguin just wasn’t cutting it

Well-intentioned as the idea is, the introduction of required IRB approval to conduct any research involving humans – including giving them simple surveys to fill out – places some important roadblocks in the way of researcher efficiency; in much the same way, after the 9/11 attacks airport security became much more of a headache to get through. First and foremost, the IRB usually requires a lot of paperwork and time for the proposal to be processes and examined. It’s not all that unusual for what should be a straightforward and perfectly ethical research project to sit in the waiting room of the IRB for six-to-eight weeks just to get green lit. That approval is not always forthcoming, though, with the IRB sending back revisions or concerns about projects regularly; revisions which, in turn, can hold the process up for additional days or weeks. For any motivated researcher, these kinds of delays can be productivity poison, as one’s motivation to conduct a project might have waned somewhat over the course of the two or three months since its inception. If you’re on a tight deadline, things can get even worse.

On the subject of concerns the IRB might express over research, today I wanted to talk about a matter referred to as sensitive topics research. Specifically, there are some topics – such as those related to sexual behavior, trauma, and victimization – that are deemed to pose greater than minimal risk to participants being asked about them. The fear in this case stems from the assumption that merely asking people (usually undergraduates) about these topics could be enough to re-traumatize them and cause them psychological distress above and beyond what they would experience in daily life. In that sense, then, research on certain topics can deemed above minimal risk, resulting in such projects being put under greater scrutiny and ultimately subjected to additional delays or modifications (relative to more “low-risk” topics like visual search tasks or personality measures, anyway).

That said, the IRBs are not necessarily composed of experts on the matter of ethics, nor do their concerns need empirical grounding to be raised; the mere possibility that harm might be caused can be considered grounds enough for not taking any chances and risking reputational or financial damage to the institution (or the participants, of course). That these concerns were raised frequently (but not supported) led Yeater et al (2012) to examine the matter empirically. The authors sought to subject their participants to a battery of questions and measures designated to be either (a) minimal risk, which were predominately cognitive tasks, or (b) above minimal risk, which were measures that asked about matters like sexual behavior and trauma. Before and after each set of measures, the participants would have their emotional states measured to see if any negative or positive changes resulted from taking part in the research.

The usual emotional response to lengthy surveys is always positive

The sample for this research involved approximately 500 undergraduates assigned to either the trauma-sex condition (n = 263) or the cognitive condition (n = 241). All of the participants first completed some demographic and affect measures designed to assess their positive and negative emotions. After that, those in the trauma-sex condition filled out surveys concerning their dating behavior, sexual histories, the rape myth acceptance scale, questions concerning their interest in short-term sex, sexual confidence, trauma and post-traumatic checklists, and childhood sexual and trauma histories. Additionally, females answered questions about their body, menstrual cycle, and sexual victimization histories; males completed similar surveys asking about their bodies, masturbation schedules, and whether they had sexually victimized women. Those in the cognitive condition filled out a similarly-long battery of tests measuring things like their verbal and abstract reasoning abilities.

Once these measures were completed, the emotional state of all the participants was again assessed along with other post-test reaction questions, including matters like whether they perceived any costs and benefits from engaging in the study, how mentally taxing their participation felt, and how their participation measured up to other life stressors in life like losing $20, getting a paper cut, a bad grade on a test, or waiting on line in the bank for 20 minutes.

The results from the study cut against the idea that undergraduate participants were particularly psychologically vulnerable to these sensitive topics. In both conditions, participants reported a decrease in negative affect over the course of the study. There was even an increase in positive affect, but only for the trauma-sex group. While those in the trauma-sex condition did report greater post-test negative emotions, the absolute value of those negative emotions were close to floor levels for both groups (both means were below a 2 on a scale of 1-7). That said, those in the trauma-sex condition also reported lower mental costs to taking part in the research and perceived greater benefits overall. Both groups reported equivalent positive emotions.

Some outliers were then considered. In terms of those reporting negative emotions, 2.1% of those in the cognitive condition (5 participants) and 3.4% of those in the trauma-sex condition (9 participants) reported negative emotions above the midpoint of the scale. However, the maximum value for those handful of participants were 4.15 and 5.52 (respectively) out of 7, falling well short of the ceiling. Looking specifically at women who had reported histories of victimization, there was no apparent difference between conditions with regard to affect on almost any of the post-test measures; the one exception was that women who had experienced a history of victimization reported the trauma-sex measures to be slightly more mentally taxing, but that could be a function of their having to spend additional time filling out the large number of extensive questionnaires rather than any kind of serious emotional harm. Even those who had been harmed in the past didn’t seem terribly bothered by answering some questions.

“While we have you here, would you like to answer a quick survey about your experience?”

The good news is that it would seem undergraduates are more resilient than they are often given credit for and not so easily triggered by topics like sex or abuse (which are frequently discussed on social platforms like Facebook and news sources). The sensitive topics didn’t seem to be all that sensitive; certainly not substantially more so than the standard types of minimal risk questions asked on other psychological measures. Even for those with histories of victimization. The question remains as to whether such a finding would be enough to convince those making the decisions about the risks inherent in this kind of research. I’d like to be optimistic on that front, but it would rely on the researchers being aware of the present paper (as you can’t rely on the IRB to follow the literature on that front, or indeed any front) and the IRB being open to hearing evidence to the contrary. As I have encountered reviewers who seem uninterested in hearing contrary evidence concerning deception, it’s a distinct possibility that the present research might not have the intended effect on mollifying IRB concerns. I certainly wouldn’t rule out it’s potential effectiveness, though, and this is definitely a good resource for researchers to have in their pocket if they encounter such issues.

References: Yeater, E., Miller, G., Rinehart, J., & Nason, E. (2012). Trauma and sex surveys meet minimal risk standards: Implications for institutional review boards. Psychological Science, 23, 780-787.

 

Spinning Sexism Research On Accuracy

When it comes to research on sexism, there appear to be many parties interested in the notion that sexism ought to be reduced. This is a laudable goal, and one that I would support; I am very much in favor in treating people as individuals rather than representatives of their race, sex, or any other demographic characteristics. It is unfortunately, however, that this goal often gets side-tracked by an entirely different one: trying to get people to reduce the extent to which people view men and women as different. What I mean by this is that I have seen many attempts to combat sexism by trying to reduce the perception that men and women differ in terms of their psychology, personality, intelligence, and so on; it’s much more seldom that those same voices appear to convince people who inaccurately perceive sex differences as unusually small to adjust their estimate upwards. In other words, rather that championing accuracy is perceptions, there appears to be a more targeted effort for minimizing particular differences; while those are sometimes the same thing (sometimes people are wrong because they overestimate), they are often not (sometimes people are wrong because they underestimate), and when those goals do overlap, the minimization side tends to win out.

Just toss your perceptions in with the rest of the laundry; they’ll shrink

In my last post, I discussed some research by Zell et al (2016) primarily in the service of examining measures of sexism and the interpretation of the data they produce (which I recommend reading first). Today I wanted to give that paper a more in-depth look to illustrate this (perhaps unconscious) goal of trying to get people to view the sexes as more similar than they actually are. Zell et al (2016) begin their introduction by suggesting that most psychological differences between men and women are small, and the cases in which medium to large differences exist – like mating preferences and aggression – tend to be rare. David Schmitt has already put remarks like that into some context, and I highly recommend you read his post on the subject. In the event you can’t be bothered to do so at the moment, one of the most important takeaway points from his post is that even if the differences in any one domain tend to be small on average, when considered across all those domains simultaneously, those small differences can aggregate into much larger ones.

Moreover, the significance of a gender difference is not necessarily determined by its absolute size, either. This was a point Steven Pinker mentioned in a somewhat-recent debate with Elizabeth Spelke (and was touched on again in a recent talk by Jon Haidt at SUNY New Paltz). To summarize this point briefly, if you’re looking at a trait in two normally-distributed populations that are, on average, quite similar, the further from that average value you get, the most extreme the difference between populations become. Pinker makes the point clear in this example:

“…it’s obvious that distributions of height for men and women overlap: it’s not the case that all men are taller than all women. But while at five foot ten there are thirty men for every woman, at six feet there are two thousand men for every woman. Now, sex differences in cognition tend not to be so extreme, but the statistical phenomenon is the same.”

Not only are small sex differences sometimes important, then, (such as when you’re trying to hire people for a job who are in the top 1% of distribution for a trait like intelligence, speed, conscientiousness; you name it) but a large number of small effects (as well as some medium and large ones) can all add up to collectively represent some rather large differences (and that assumes you’re accounting for all relevant sex differences; not just a non-representative sample of them). With all this considered, the declaration at the beginning of Zell et al’s paper that most sex differences tend to be small strikes me less as a statement of empirical concern, but rather one that serves to set up the premise for the rest of their project: specifically, the researchers wanted to test whether people’s scores on the ambivalent sexism inventory predicted (a) the extent to which they perceive sex differences as being large and (b) the extent to which they are inaccurate in their perceptions. The prediction in this case was that people who scored high on their ostensible measures of sexism would be more likely to exaggerate sex differences and more likely to be wrong about their size overall (as an aside, I don’t think those sexism questions measure what the authors hope they do; see my last post).

Pictured: Something not even close to what was being assessed in this study

In their first study, Zell et al (2016) asked about 320 participants to estimate how large they think sex differences are between men and women (from 1-99) were for 48 traits and to answer 6 questions intended to measure their hostile and benevolent sexism (as another aside, I have no idea why those 48 traits in particular were selected). These answers were then averaged for each participant to create an overall score for how large they viewed the sex differences to be, and how high they scored on hostile and benevolent sexism. When the relevant factors were plugged into their regression, the results showed that those higher in hostile (ß = .19) and benevolent (ß = .29) sexism tended to perceive sex differences as larger, on average. When examined by gender, it was found that women (ß = .41) who were higher in benevolent sexism were more likely to perceive sex differences as large (but this was not true for men: ß = .11) and – though it was not significant – the reverse pattern held for hostile sexism, such that women high in hostile sexism were nominally less likely to perceive sex differences as large (ß = -.32).

The more interesting finding, at least as far as I’m concerned, is that in spite of those scoring higher on their sexism scores perceiving sex differences to be larger, they were not really more likely to be wrong about them. Specifically, those who scored higher on benevolent sexism were slightly less accurate (ß = -.20), just as women tended to be less accurate than men (ß = -.19); however, hostile sexism scores were unrelated to accuracy altogether (ß = .003), and no interactions with gender and sexism emerged. To put that in terms of the simple correlations, hostile and benevolent sexism correlated much better with the perceived size of sex differences (rs = .26 and .43, respectively) than they did with accuracy (rs = -.12 and -.22, with the former not being significant and the latter being rather small). Now since we’re dealing with two genders, two sexism scales, and relatively small effects, it is possible that some of these findings are a bit more likely to be statistical flukes; that does tend to happen as you keep slicing data up. Nevertheless, these results are discussed repeated within the context of their paper as representing exaggerations: those scoring higher on these sexism measures are said to exaggerate sex differences, which is odd on account of them not consistently getting them all that wrong.

This interpretation extends to their second study as well. In that experiment, about 230 participants were presented with two mock abstracts and told that only one of them represented an accurate summary of psychological research on sex differences. The accurate version, of course, was the one that said sex differences were small on average and therefore concluded that men and women are very similar to each other, whereas the bogus abstract concluded that gender differences are often large and therefore men and women are very different from one another. As I reviewed in the beginning of the post, small differences can often have meaningful impacts both individually and collectively, so the lines about how men and women are very similar to each other might not reflect an entirely accurate reading of the literature even if the part about small average sex differences did. This setup is already conflating the two statements (“average effect sizes on all these traits is small” and “men and women are very similar across the board”).

“Most of the components aren’t that different from modern cars, so they’re basically the same”

As before, those higher in hostile and benevolent sexism tended to say that the larger sex difference abstract more closely reflected their personal views (women tended to select the large-difference abstract 50.4% of the time compared to men’s 44.2% as well). Now because the authors view the large sex difference abstract as being the fabricated one, they conclude that those higher in those sexism measures are less accurate and more likely to exaggerate these views (they also make a remark that their sexism measures indicate which people “endorse sexist ideologies”; a determination it’s not at all cut out for making). In other words, the authors interpret this finding as those selecting the large-differences abstract to hold “empirically unsupported” views (which in a sort-of ironic sense means that, as the late George Carlin put it, “Men are better at it” when it comes to recognizing sex differences).

This is an interesting methodological trick they employ: since they failed to find much in the way of a correlation between sexism scores and accuracy in their first study (it existed sometimes, but was quite small across the board and certainly much smaller than the perception of size correlation), they created a coarser and altogether worse measure of accuracy in the second study and use that to support their views that believing men and women tend to be rather different is wrong instead. As the old saying goes, if at first you don’t succeed, change your measures until you do.

References: Zell, E., Strickhouser, J., Lane, T., & Teeter, S. (2016). Mars, Venus, or Earth? Sexism and the exaggeration of psychological gender differences. Sex Roles, 75, 287-300.

Research Tip: Ask About What You Want To Measure

Recently I served as a reviewer for a research article that had been submitted to a journal for publication. Without going into too much detail as to why, the authors of this paper wanted to control for people’s attitudes towards casual sex when conducting their analysis. They thought that it was possible people who were more sexually-permissive when it comes to infidelity might respond to certain scenarios differently than those who were less sexually-permissive. If you were the sensible type of researcher, you might do something like ask your participants to indicate on some scale as to how acceptable or unacceptable they think sexually infidelity is, then. The authors of this particular paper opted for a different, altogether stranger route: they noted that people’s attitudes towards infidelity correlate (imperfectly) with their political ideology (i.e., whether they consider themselves to be liberals or conservatives). So, rather than ask participants directly about how acceptable infidelity is (what they actually wanted to know), they asked participants about their political ideology and used that as a control instead.

 ”People who exercise get tired, so we measured how much people napped to assess physical fitness”

This example is by no means unique; psychology researchers frequently try to ask questions about topic X in the hopes of understanding something about topic Y. This can be acceptable at times, specifically when topic Y is unusually difficult – but not impossible – to study directly. After all, if topic Y is impossible to directly study, then one obviously cannot say that studying topic X tells you something about Y with much confidence, as you would have no way of assessing the relationship between X and Y to begin with. Assuming that the relationship between X and Y has been established and it is sufficiently strong and Y is unusually difficult to study directly, then there’s a good, practical case to be made for using X instead. When that is done, however, it should always be remembered that you aren’t actually studying what you’d like to study, so it’s important to not get carried away with the interpretation of your results.

This brings us nicely to the topic of research on sexism. When people hear the word “sexism” a couple things come to mind: someone who believes one sex is (or should be) – socially, morally, legally, psychologically, etc – inferior to the other, or worth less; someone who wouldn’t want to hire a member of one sex for a job (or intentionally pays them less if they did) strictly because of that variable regardless of their qualifications; someone who inherently dislikes members of one sex. While this list is by no means exhaustive, I suspect things like these are probably the prototypical examples of sexism; some kind of explicit, negative attitude about people because of their sex per se that directly translates into behavior. Despite this, people who research sexism don’t usually ask about such matters directly, as far as I’ve seen. To be clear, they easily could ask such questions assessing such attitudes in straightforward manners (in fact, they used to do just that with measures like the “Attitudes Towards Women Scale” in the 1970s), but they do not. As I understand it, the justification for not asking about such matters directly is because it has become more difficult to find people who actually express such views (Loo & Thorpe, 1998). As attitudes had already become markedly less sexist from 1972 to 1998, one can only guess at how much more change occurred from then to now. In short, it’s becoming rare to find blatant sexists anymore, especially if you’re asking college students.

Many researchers interpret that difficulty as being the result of people still holding sexist attitudes but either (a) are not willing express them publicly for fear of condemnation, or (b) are not consciously aware that they hold such views. As such, researchers like to ask about questions about “Modern Sexism” or “Ambivalent Sexism“; they maintain the word “sexism” in their scales, but they begin to ask about things which are not what people first think of when they hear the term. They no longer ask about explicitly sexist attitudes. Therein lies something of a problem, though: if what you really want to know is whether people hold particular sexist beliefs or attitudes, you need some way of assessing those attitudes directly in order to determine that other questions which don’t directly ask about that sexism will accurately reflect it. However, if such a method of assessing those beliefs accurately, directly, and easily does exist, then it seems altogether preferable to use that method instead. In short, just ask about the things you want to ask about. 

“We wanted to measure sugar content, so we assessed how much fruit the recipe called for”

If you continue on with using an alternate measure – like using the Ambivalent Sexism Inventory (ASI), rather than the Attitudes towards Women Scale – then you really should restrict your interpretations to things you’re actually asking about. As a quick example, let’s consider the ASI, which is made up of a hostile and benevolent sexism component. Zell et al (2016) summarize the scale as follows:

“Hostile sexism is an adversarial view of gender relations in which women are perceived as seeking control over men. Benevolent sexism is a subjectively positive view of gender relations in which women are perceived as pure creatures who ought to be protected, supported, and adored; as necessary companions to make a man complete; but as weak and therefore best relegated to traditional gender roles (e.g., homemaker).”

In other words, the benevolent scale measures the extent to which women are viewed as children: incapable of making their own decisions and, as such, in need of protection and provisioning by men. The hostile scale measures the extent to which men don’t trust women and view them as enemies. Glick & Fiske (1996) claim that  ”...hostile and benevolent sexism…combine notions of the exploited group’s lack of competence to exercise structural power with self-serving “benevolent” justifications.” However, not a single measure on either the hostile or benevolent sexism inventory actually asks about female competencies or whether women ought to be restricted socially. 

To make this explicit, let’s consider the questions Zell et al (2016) used to assess both components. In terms of hostile sexism, participants were asked to indicate their agreement with the following three statements:

  • Women seek power by gaining control over men
  • Women seek special favors under the guise of equality
  • Women exaggerate their problems at work

There are a few points to make about these questions: first, they are all clearly true to some extent. I say that because these are behaviors that all kinds of people engage in. If these behaviors are not specific to one sex – if both men and women exaggerate their problems at work – then agreement with the idea that women do does not stop me from believing men do this as well and, accordingly, does not necessarily track any kind of sexist belief (the alternative, I suppose, is to believe that women never exaggerate problems, which seems unlikely). If the questions are meant to be interpreted as a relative statement (e.g., “women exaggerate their problems at work more than men do”), then that statement needs to first be assessed empirically as true or false before you can say that endorsement of it represents sexism. If women actually do tend to exaggerate problems at work more (a matter that is quite difficult to objectively determine because of what the term exaggerate means), then agreement with the statement just means you accurately perceive reality; not that you’re a sexist.

More to the point, however, none of the measures ask about what the researchers interpret them to mean: women seeking special favors does not imply they are incompetent or unfit to hold positions outside of the home, nor does it imply that one views gender relations primarily as adversarial. If those views are really what a researcher is trying to get at, then they ought to just ask about them directly. A similar story emerges for the benevolent questions:

  • Women have a quality of purity few men possess
  • Men should sacrifice to provide for women
  • Despite accomplishment, men are incomplete without women

 Again, I see no mention of women’s competency, ability, intelligence, or someone’s endorsement of strict gender roles. Saying that men ought to behave altruistically towards women in no way implies that women can’t manage without men’s help. When a man offers to pay for an anniversary dinner (a behavior which I have seen labeled sexist before), he is usually not doing so because he feels his partner is incapable of paying anymore than my helping a friend move suggests I view them as a helpless child. 

“Our saving you from this fire implies you’re unfit to hold public office”

The argument can, of course, be made that scores on the ASI are related to the things these researchers actually want to measure. Indeed, Glick & Fiske (1996) made that very argument: they report that the hostile sexism scores (controlling for the benevolent scores) did correlate with “Old Fashion Sexism” and “Attitudes towards Women” scores (rs = .43 and .60, respectively, bearing in mind that was almost 20 years ago and these attitudes are changing). However, the correlations between benevolent sexism scores and these sexist attitudes were effectively zero (rs = -.03 and .04, respectively). In other words, it appears that people endorse these statements for reasons that have nothing at all to do with whether they view women as weak, or stupid, or any other pejorative you might throw out there, and their responses may tell you nothing at all about their opinion concerning gender roles. If you want to know about those matters, then ask about them. In general, it’s fine to speculate about what your results might mean – how they can best be interpreted – but an altogether easier path is to simply ask about such matters directly and reduce the need for pointless speculation.

 References: Glick, P. & Fiske, S. (1996). The ambivalent sexism inventory: Differentiating hostile and benevolent sexism. Journal of Personality & Social Psychology, 70, 491-512.

Loo, R. & Thorpe, K. (1998). Attitudes towards women’s roles in society: A replication after 20 years. Sex Roles, 39, 903-912.

Zell, E., Strickhouser, J., Lane, T., & Teeter, S. (2016). Mars, Venus, or Earth? Sexism and the exaggeration of psychological gender differences. Sex Roles, 75, 287-300.

The Value Of Association Value

Sometime ago I was invited to give a radio interview regarding a post I had written: The Politics of Fear. Having never been exposed to this kind of a format before, I found myself having to try and make some adjustments to my planned presentation on the fly, as it quickly became apparent that the interviewer was looking more for quick and overly-simplified answers, rather than anything with real depth (and who can blame him? It’s not like many people are tuning into the radio with the expectation of receiving anything resembling a college education). At one point I was posed with a question along the lines of, “how people can avoid letting their political biases get the better of them,” which was a matter I was not adequately prepared to answer. In the interests of compromise and giving the poor host at least something he could work with (rather than the real answer: “I have no idea; give me a day or two and I’ll see what I can find”), I came up with a plausible sounding guess: try to avoid social isolation of your viewpoints. In other words, don’t remove people from your friend groups or social media just because you disagree with they they say, and actively seek out opposing views. I also suggested that one attempt to expand their legitimate interests in the welfare of other groups in order to help take their views more seriously. Without real and constant challenges to your views, you can end up stuck in a political and social echo chamber, and that will often hinder your ability to see the world as it actually is.

“Can you believe those nuts who think flooding poses real risks?”

As luck would have it, a new paper (Almaatouq et al, 2016) fell into my lap recently that – at least in some, indirect extent – helps speak to the quality of the answer I had provided at the time (spoiler: as expected, my answer was pointing in the right direction but was incomplete and overly-simplified). The first part of the paper examines the shape of friendships themselves: specifically whether they tend to be reciprocal or more unrequited in one direction or the other. The second part leverages those factors to try and explain what kinds of friendships can be useful for generating behavioral change (in this case, getting people to be more active). Put simply, if you want to change someone’s behavior (or, presumably, their opinions) does it matter if (a) you think they’re your friend, but they disagree, (b) they think you’re their friend, but you disagree, (c) whether you both agree, and (d) how close you are as friends?

The first set of data reports on some general friendship demographics. Surveys were provided to 84 students in a single undergraduate course that asked to indicate, from 0-5, whether they considered the other students to be strangers (0), friends (3), or one of their best friends (5). The students were also asked to predict how each other student in the class would rate them.  In other words, you would be asked, “How close do you rate your relationship with X?” and “How close does X rate their relationship to you?” A friendship was considered mutual if both parties rated each other as at least a 3 or greater. There was indeed a positive correlation between the two ratings (r = .36), as we should expect: if I rate you highly as a friend, there should be a good chance you also rate me highly. However, that reality did diverge significantly from what the students predicted. If a student has nominated someone as a friend, their prediction as to how that person would rate them showed substantially more correspondence (r = .95). Expressed in percentages, if I nominated someone as a friend, I would expect them to nominate me back about 95% of the time. In reality, however, they would only do so about 53% of the time.

The matter of why this inaccuracy exists is curious. Almaatouq et al, (2016) put forward two explanations, one of which is terrible and one of which is quite plausible. The former explanation (which isn’t really examined in any detail, and so might just have been tossed in) is that people are inaccurate at predicting these friendships because non-reciprocal friendships “challenge one’s self-image.” This is a bad explanation because (a) the idea of a “self” isn’t consistent with what we know about how the brain works, (b) maintaining a positive attitude about oneself does nothing adaptive per se, and (c) it would need to posit a mind that is troubled by unflattering information and so chooses to ignore it, rather than the simpler solution of a mind that is simply not troubled by such information in the first place. The second, plausible explanation is that some of these ratings of friendships actually reflect some degree of aspiration, rather than just current reality: because people want friendships with particular others, they behave in ways likely to help them obtain such friendships (such as by nominating their relationship as mutual). If these ratings are partially reflective of one’s intent to develop them over time, that could explain some inaccuracy.

Though not discussed in the paper, it is also possible that perceivers aren’t entirely accurate because people intentionally conceal friendship information from others. Imagine, for instance, what consequences might arise for someone who finally works up the nerve to go tell their co-workers how they really feel about them. By disguising the strength of our friendships publicly, we can leverage social advantages from that information asymmetry. Better to have people think you like them than know you don’t in many cases.

 ”Of course I wasn’t thinking of murdering you to finally get some quiet”

With this understanding of how and why relationships can be reciprocal or asymmetrical, we can turn to the matter of how they might influence our behavior and, in turn, how satisfactory my answer was. The authors utilized a data set from the Friends and Family study, which had asked a group of 108 people to rate each other as friends on a 0-7 scale, as well as collected information about their physical activity level (passively, via a device in their smartphones). In this study, participants could earn money by becoming more physically active. In the control condition, participants could only see their own information; in the two social conditions (that were combined for analysis) they could see both their own activity levels and those of two other peers: in one case, participants earned a reward based only on their own behavior, and in the other the reward was based on the behavior of their peers (it was intended to be a peer-pressure condition). The relationship variables and conditions were entered into a regression to predict the participant’s change in physical activity.

In general, having information about the activity levels of peers tended to increase the activity of the participants, but the nature of those relationships mattered. Information about the behavior of peers in reciprocal friendships had the largest effect (b = 0.44) on affecting change. In other words, if you got information about people you liked who also liked you, this appeared to be most relevant. The other type of relationship that significantly predicted change was one in which someone else valued you as a friend, even if you might not value them as much (b = 0.31). By contrast, if you valued someone else who did not share that feeling, information about their activity didn’t seem to predict behavioral changes well (b = 0.15) and, moreover, the strength of friendships seemed to be rather besides the point (b = -0.04), which was rather interesting. Whether people were friends seemed to matter more than the depth of that friendship.

So what do these results tell us about my initial answer regarding how to avoid perceptual biases in the social world? This requires a bit of speculation, but I was heading in the right direction: if you want to affect some kind of behavioral change (in this case, reducing one’s biases rather than increasing physical activity), information from or about other people is likely a tool that could be effectively leveraged for that end. Learning that other people hold different views than your own could cause you to think about the matter a little more deeply, or in a new light. However, it’s often not going to be good enough to simply see these dissenting opinions in your everyday life if you want to end up with a meaningful change. If you don’t value someone else as an associate, they don’t value you, or neither of you value the other, then their opinions are going to be less effective at changing yours than they otherwise might be, relative to when you both value each other.

At least if mutual friendship doesn’t work, there’s always violence

The real tricky part of that equation is how one goes about generating those bonds with others who hold divergent opinions. It’s certainly not the easiest thing in the world to form meaningful, mutual friendships with people who disagree (sometimes vehemently) with your outlooks on life. Moreover, achieving an outcome like “reducing cognitive biases” isn’t even always an adaptive thing to do; if it were, it would be remarkable that those biases existed in the first place. When people are biased in their assessment of research evidence, for instance, they’re usually biased because something is on the line, as far as they’re concerned. It does an academic who has built his career on his personal theory no favors to proudly proclaim, “I’ve spent the last 20 years of my life being wrong and achieving nothing of lasting importance, but thanks for the salary and grant funding.” As such, the motivation to make meaningful friendships with those who disagree with them is probably a bit on the negative side (unless their hope is that through this friendship they can persuade the other person to adopt their views, rather than vice versa because – surely – the bias lies with other people; not me). As such, I’m not hopeful that my recommendation would play out well in practice, but at least it sounds plausible enough in theory.

References: Almaatouq, A., Radaelli, L., Pentland, A., & Shmueli, E. (2016). Are you your friends’ friends? Poor perception of friendship ties limits the ability to promote behavioral change. PLOS One, 11, e0151588. doi:10.1371/journal.pone.0151588

More Evidence Regarding The Causes Of Homosexuality

Many years ago, the initial inspiration for beginning my blog was a critique I had written of the logic underlying a Lady Gaga song, “Born This Way,” which I felt committed itself firmly to the naturalistic fallacy (it’s also where the namesake of the site came from: Pop Psychology, or the psychological theory found within a pop song). Specifically, I felt that many aspects of the development of homosexuality (both the male and female varieties) were not as well understood as they should be in order to make some of the claims that many people felt confident in expressing. Today, however, I’m pleased to report on some new – and very interesting – research that might pave the way for furthering that understanding. Many important questions still remain regarding how to interpret the results of this research, but I believe that they are certainly looking in the right places for useful leads. 

“Ur-u-guay, huh? Sounds like as good as place to start as any…”

There’s a lot to discuss regarding the results of the paper (Skorska et al, 2016), so I wanted to jump right into it. The researchers were examining the possibility that a maternal immune response might play a key role in the developmental of a homosexual orientation in males. This effect is said to be the result of the mother’s immune system having a maladaptive reaction to the male-specific proteins associated with the Y-chromosome during pregnancy. Effectively, then, the mother’s immune system would (sometimes) treat certain male proteins produced by the fetus as a foreign pathogen and attempt to attack it, resulting in a few quirks of development, such as a homosexual orientation or even fetal loss if the reaction was strong enough (i.e. miscarriages). Already there is a lot to like about this hypothesis on a theoretical level, as it doesn’t posit any hidden adaptive benefits for a homosexual orientation (as such proposed benefits have not received sound empirical support historically). The question remains as to how to test for this kind of an effect, however. The method that the authors use is a rather simple one: examining maternal reports of fetal loss and birth weights. The logic here is that higher rates of fetal loss and lower birth weights both index perturbations in development. As such, they could provide indirect evidence for some kind of maternal immune response doing the causing.

The researchers recruited approximately 130 mothers and classified them on the basis of what kind of children they had: those who had at least 1 gay son (n = 54), and those who only had heterosexual sons (n = 72). These mothers were asked about their age, pregnancy history (numbers of miscarriages, stillbirths, and live births), the duration of their pregnancies, and the sex and sexual orientation of their offspring. These mothers were then classified further into one of five groups: those with gay male only-children (n = 8), those with gay male offspring that had no older brothers (n = 23), those with gay male offspring with older brothers (n = 23), those with heterosexual male only-children (n = 11), and those with heterosexual male offspring with siblings (n = 61). 

First, the authors compared the history of fetal loss between these groups of mothers. In total, 62 instances of fetal loss were reported (60 miscarriages, 1 still birth, and 1 unreported). As predicted, the average number of fetal losses were higher in the first group (mothers of gay male only-children; M = 1.25), relative to all the other groups (d = 0.76), which did not significantly differ from each other (respective Ms = 0.43, 0.74, 0.09, and 0.39). When considered in terms of the ratio of miscarriages to live to births, a similar picture emerged: mothers of gay male only-children reported more miscarriages to live births (M = 1.25) than the other groups (d = 1.55), which did not differ from each other (respective Ms = 0.14, 0.24, 0.09, and 0.17).  

Next, the authors sought to compare birth weight between the former groups. As birth weight tends to increase over successive pregnancies, the comparisons were limited to first live-born sons only (n = 63); this left 4 gay male only-children, 7 gay males with no older brothers, 14 heterosexual males with gay younger brothers, 10 heterosexual male only-children, and 28 heterosexual males with siblings. The results mirrored those of the fetal-loss data: mothers of gay male only-children tended to give birth to infants that weighed significantly less (M = 2970 grams), than all other groups (d = 1.21), which did not differ (respective Ms = 3713, 3489, 3506, and 3633). This was the case despite the duration of pregnancies not differing between any of the groups.

“Please just get out of me”

In sum, then, mothers of gay male only-children tended to have a greater number of miscarriages and give birth to significantly lighter offspring than mothers of other kinds. While it’s important to not get carried away with this finding given the relatively small sample size (I wouldn’t put too much stock in an N of 8), there is some suggestive evidence here worth pursuing further that something might be going awry with fetal development in the case of gay male offspring. That said, I’m going to assume for a moment that these results are indicative of more general patterns in order to speculate about what they could mean.

In general, these results present us with more questions than answers concerning both what might be going on, as well as why it is happening. The first question that comes to mind is why this effect seemed to be specific to gay male only-children, rather than gay male children with siblings? Skorska et al (2016) posit that this might have something to do with some mothers showing a greater immune response against male offspring, resulting in more fetal loss, the net result being that such mothers are both less likely to have any children at all and more likely to have gay male children in particular. While that might have some degree of plausibility to it, it seems that such an effect should be male-specific, and not expected to impact the number of live female births a mother has. In other words, mothers with gay male offspring should be expected to have proportionately more female children owing to a greater male fetal loss. I don’t know of any data bearing on that point, but it seems easy enough to obtain. If mothers of gay men do not tend to have a greater ratio of female-to-male offspring, this would cast some doubt on the explanation (and, since the only data I’ve heard reports that gay men tend to have more older brothers, it seems they would have noticed the sister point by now if it existed). On the other hand, if this is a more general immune reaction against fetal bodies, regardless of their sex, we would not expect such a pattern (it might also predict that mothers taking immunosuppressants would be less likely to have gay offspring/miscarry, but things are unlikely to be that simple owing to the fact that other effects would result too).

Another piece worth considering is the twin data on homosexuality. Identical male twins – those who share both their genetics and maternal fetal environment – only show a concordance rate of homosexuality of approximately 30%. The extent to which this complicates the maternal immune hypothesis is hard to say: it could be possible that one twin tends to get exposed to the brunt of these maternal antibodies despite both being approximately as vulnerable to them, but that remains to be seen.  

On a broader, theoretical level, however, the maternal immune response hypothesis raises an important question. As far as I’m aware, homosexual preferences (not the occasional behavior) do not appear to be well documented in nonhuman species; the only exception I’m aware of is Rams. If it is truly the case that maternal immune responses are the drivers of homosexual development in humans, if would be very curious that similar outcomes don’t appear to obtain across at least other mammalian species. I suppose it’s possible that these outcomes do occur in other species and it’s just the case that no one has really noticed it yet, but I doubt that’s very likely. So the matter of why humans seem rather unique in that regard is a question that needs answering. Has evolution managed to “figure out” a solution to this problem in other species (metaphorically speaking)? If it has, why hasn’t a similar solution arisen in humans and sheep?

It just ran out of square-shaped blocks?

This brings me to the final idea; one that I’ve discussed before. It is indeed possible that looking for something immune-related is in the right ballpark, but maybe in the wrong area. Perhaps what we’re seeing isn’t necessarily the result of a maternal immune response against male fetuses, but rather the result of an immune response against an actual infectious agent (or the result of that agent’s behavior itself). Admittedly, I’m no expert in the realm of immune system functioning or infectious agents, but two possibilities come to mind: first, perhaps mothers infected with a particular pathogen during fetal development might ramp up their immune response temporarily, a byproduct of which being that fetal bodies get fewer resources from the mother or caught up in the immune response themselves, both of which could plausibly affect development. Mothers more-chronically affected might have fewer children in general and more gay male children in particular, potentially explaining the current pattern of results. Alternatively, it is possible that some infectious agent itself affects the development of the fetus (such as how pathogens can render people blind or deaf). As a byproduct of that infection, if acquired during a particular critical developmental window, the child comes to develop a homosexual orientation (or is miscarried by the mother). At present, I am not aware of any evidence that speaks to this possibility, but it certainly accords with the known data.

References: Skorska, M., Blanchard, R., VanderLaan, D., Zucker, K., & Bogaert, A. (2016). Gay male only-children: Evidence for low birth weight and high maternal miscarriage rates. Archives of Sexual Behavior, DOI: 10.1007/s10508-016-0829-9