Some Thoughts On Gender Bias In Academia

Gender bias can be something of a “sexy” topic for many; the kind of issue that can easily get large groups of people worked up and in the mood to express opinions full of anger, mockery, and the word “duh”. On a related note, there’s been an article going around by Moss-Rascusin et al (2012) concerning whether some science faculty members tend to be slightly biased in favor of men, relative to women, and whether said subtle biases might be responsible for some portion of some gender gaps. This paper, and some associated commentary, has brought to mind a few thoughts that I happen to find quite interesting; thoughts about bias that I would like to at least give some consideration amidst all the rest of the coverage this article has been getting.

First one to encounter life-altering discrimination… wins, I guess…

First, about the study itself: Moss-Racusin et al (2012) sent out fake undergraduate materials (which included a brief statement about future goals and a little bit of background information regarding things like letters of recommendation and GRE scores) to 127 faculty members in either biology, chemistry, or physics departments. These materials differed only in terms of the name of applications (either John or Jennifer), and the faculty members were asked to evaluate the student, falsely believing that these evaluations would be used to help the student’s career development.The results of this experiment showed that the faculty members tended to rate the student’s competence and hireability lower when it was Jennifer, relative to John. Further, these faculty members offered more mentoring advice to John as well as recommending an annual salary of $4000 less to Jennifer, on average (though that salary was still around $25,000, which isn’t too bad…). Also, the faculty members tended to report that they liked Jennifer more.

What we have here looks like a straightforward case of sex-based discrimination. While people liked the woman more, they also saw her as less competent, at least in these fields particular fields, given identical credentials (even if these credentials were rather merge in scope). Off the top of my head, I see nothing glaringly wrong with this study, so I’m fine with accepting the results; there most certainly did seem to be a bias in this context, albeit not an overtly hostile one. There are, however. a few notes worthy of consideration: first, the authors don’t really examine why this bias exists. The authors suggest (i.e. say it’s reasonable…) that this bias is due to pervasive cultural stereotypes, but as far as I can see, that’s just an assertion; they really didn’t do anything to test in order to see if that’s the case here or not. Sure, they administered the “Modern Sexism Scale*”, but I have my reservations about precisely what that scale is supposed to be measuring. Like many studies in psychology, this paper is big on presenting and restating findings (people discriminate by sex because they’re sexist) but light on explanatory power.

Another interesting piece of information worthy of consideration that comes to mind relates to a previous paper, published the same journal one year prior. Ceci & Williams (2011) documented an impressive amount of evidence that ran counter to claims of women being discriminated against in science fields in terms of having their manuscripts reviewed, being awarded grant funding, and also being interviewed and subsequently hired (at least in regards to PhDs applying for tenure-track positions at R1 institutions in the natural sciences). When discrimination was found in their analysis, it was typically fleeting in size, inconsistent in which gender it favored, and, further, it often wasn’t found at all. So, potentially, the results of the current paper, which are themselves rather modest in size, could just be a fluke, resulting from how little information about these applicants was provided (in other words, faculty members might have been falling back on sex as an important source of information, given that they lacked much else in the way of other useful information). While Moss-Racusin et al (2012) suggest that the subtle biases they found might translate into later discrimination resulting in gender gaps, it would require a fairly odd pattern of discrimination, where, on the one hand, women are discriminated against in some contexts because they’re viewed as less competent, but then are subsequently published, awarded grants, and hired at the same rate as men anyway, despite those perceptions (which could potentially be interpreted as suggesting that the standards are subsequently set lower for women).

“Our hiring committee has deemed you incompetent as a researcher; welcome aboard!”

Peculiar patterns of how and when discrimination would need to work aside, there’s another point that I found to be the most interesting of all, and it’s the one I was hoping to focus on. This point comes in the form of a comment made by Jerry Coyne over at Why Evolution Is True. Coyne apparently finds it very surprising that this bias against women in the Moss-Racusin et al (2012) paper was displayed in equal force by both male and female faculty members. Coyne later repeats his surprise in a subsequent post on the topic, so this doesn’t just appear to be a slip on the keyboard; he really was surprised. What I find so interesting about this surprise is what it would seem to imply: the default assumption is that when a woman is being discriminated against, a man ought to be the culprit.

Granted, that interpretation takes a little bit of reading between the lines, but there’s something to it, I feel. There must have been some expectation that was violated in order for there to be surprise, so if that wasn’t Coyne’s default assumption, I would be curious as to what his assumption was. I get the sense that this assumption would not be limited to Coyne, however; it seems to have come up in other areas as well, perhaps most notably in the case of the abortion issue. Abortion debates often get framed as part of “The War on Women”, with opposition to abortion being seen as the male side and support for abortion being seen as the female side. This is fairly interesting considering the fact that men and women tend to hold very similar views on abortion, with both groups opposing it roughly as often as they support it.

If I had to guess at the underlying psychology behind that read-in assumption (assuming my assessment is correct), it would go something like this: when people perceive a victim, they’re more or less required to perceive a perpetrator as well; it’s a requirement of the cognitive moral template. Whether that perpetrator actually exists or not can be beside the point, but some people are going to look like better perpetrators than others. In this specific instance, when women, as a group, are supposed to be the victims, that really only leaves non-women as potential perpetrators. This is due to two major reasons: first, men may make better perpetrators in general for a variety of reasons and, second, the parties represented in this moral template for perpetrator and victim can’t be the same party; if you want an effective moral claim, you can’t be a victim of yourself. A tendency to assume men are the culprits when women are supposed to be the victims could be further exacerbated in the event that women are also more likely to be seen as victims generally.

An observation made by Alice Cooper (1975) when he penned the line, “Only women bleed…”

The larger point is, assuming that all the effects reported in the Moss-Racusin et al (2012) study were accurately detected and consistently replicated, there are two gender biases reported here: Jennifer is rated as less competent and John is rated as less likable, both strictly on gendered grounds. However, I get the impression that only one of those biases will likely be paid much mind, as has been the case in pretty much all the reporting about the study. While people may talk about the need to remedy the bias against women, I doubt that those same people will be concerned about bridging the “likability gap” between men and women as well. It would seem that ostensible concerns for sexism can be, ironically, inadvertently sexist themselves.

*[EDIT] As an aside, it’s rather odd that the Modern Sexism Scale only concerns itself with (what it assumes is) sexism against women specifically; nothing in that scale would in anyway appear to assess sexism against men.

References: Ceci SJ, & Williams WM (2011). Understanding current causes of women’s underrepresentation in science. Proceedings of the National Academy of Sciences of the United States of America, 108 (8), 3157-62 PMID: 21300892

Moss-Racusin CA, Dovidio JF, Brescoll VL, Graham MJ, & Handelsman J (2012). Science faculty’s subtle gender biases favor male students. Proceedings of the National Academy of Sciences of the United States of America PMID: 22988126

Dinner, With A Side Of Moral Stances

One night, let’s say you’re out to dinner with your friends (assuming, of course, that you’re the type with friends). One of these friends decides to order a delightful medium-rare steak with a side of steamed carrots. By the time that the orders arrive, however, some mistake in the kitchen has led said friend to receive the salmon special instead. Now, in the event you’ve ever been out to dinner and this has happened, one of these two things probably followed: (1) your friend doesn’t react, eats the new dish as if they had ordered it, and then goes on about how they made such a good decision to order the salmon, or (2) they grab the waiter and yell a string of profanities at him until he breaks down in tears.

OK; maybe a bit of an exaggeration, but the pattern of behavior that we see in the event of a mixed-up order at a restaurant typically more closely resembles the latter pattern. Given that most people can recognize that they didn’t receive the order they actually made, what are we to make about the proposition that people seem to have trouble recognizing some moral principles they just endorsed?

“I’ll endorse what she’s endorsing…”

A new study by Hall et al (2012) examined, what they’re calling, “choice blindness”, which is, apparently, quite a lot like “change blindness”, except with decisions instead of people. In this experiment, a researcher with a survey about general moral principles or moral stances on certain specific issues approached 160 strangers who happened to be walking through the park. Once the subjects had filled out the first page of the survey and flipped the piece of  paper over the clipboard to move onto the second, an adhesive on the back of the clipboard held on to and removed the lightly-attached portion of the survey to reveal a new set of questions. The twist is that the new set of questions were the opposite set of moral stances, so if a subject said they agreed that the government shouldn’t be monitoring emails, the new question would imply that the subject felt the government should be monitoring emails.

Overall, only about a third to a half of the subjects appeared to catch that the questions had been altered, a number which is very similar to the results found for the change blindness research. Further, many of the subjects that missed the deception also went on to give verbal justifications for their ‘decisions’ that appeared to be in opposition to their initial choice on the survey. That said, only about a third of the subjects who expressed extremely polarized scores (a 1 or a 9) failed to catch the manipulation, and authors also found that those who rated themselves as more politically involved were similarly more likely to detect the change.

So what are we to make of these findings? The authors suggest their is no straight-forward interpretation, but also suggest that choice blindness disqualifies vast swaths of research from being useful, as the results suggest that people don’t have “real” opinions. Though they say they are hesitant to suggest such an interpretation, Hall et al (2012) feel those interpretations need to be taken seriously as well, so perhaps they aren’t so hesitant after all. It might almost seem ironic that Hall et al (2012) seem “blind” to the opinion they had just expressed (don’t want to suggest such alternatives, but also do want to suggest such alternatives), despite that opinion being in print, and both opinions residing within the same sentence.

“Alright, alright; I’ll get the coin…”

It would seem plausible that the authors have no solid explanation of their results because they seemed to have gone into the study without any clearly stated theory. Such is the unfortunate state of much of the research in psychology; a dead-horse issue I will continue to beat. Describing an effect as a psychological “blindness” alone does not tell us anything; it merely restates the finding, and restatements of findings without additional explanations are not terribly useful for understanding what we’re seeing.

There are a number of points to consider regarding these results, so let’s start with the obvious: these subjects were not seeking to express their opinions so much as they were approached by a stranger with a survey. It seems plausible that at least some of these subjects really weren’t paying much attention to what they were doing or not really engaged in the task at hand. I can’t say to what extent this would be a problem, but it’s at least worth keeping in mind. One possible way of remedying this might be to have subjects first not only mark their agreement with an issue on the scale, but also briefly justify that opinion. If you got subjects to then try and argue against their previously stated justifications moments later, that might be a touch more interesting.

Given that there’s no strategic context under which these morals stances are being made in this experiment, some random fluctuation in answers might be expected. In fact, lack of context might be the reason that some subjects may not have been particularly engaged in the task in the first place, as evidenced by people who had more extreme scores or who were more involved in politics being more attentive to these changes. Accordingly, another potential issue here concerns the mere expectation of consistency in responses: research has already shown that people don’t hold universally to one set of moral principles or moral stances (i.e. the results from various versions of the trolley and footbridge dilemmas, among others). Indeed, we should expect moral judgments (and justifications for those judgments) to be made strategically, not universally, for the very simple reason that universal behaviors will not always lead to useful outcomes. For instance, eating when you’re hungry is a good idea; continuing to eat at all points, even when you aren’t hungry, is generally not. What that’s all getting at is that the justification of a moral stance is a different task than the generation of a moral stance, and if memory fails to retain information about what you wrote on a survey some strange researcher just handed you when you’re trying to get through the park,  you’re perfectly capable of reasoning about why some other moral stance is acceptable.

“I could have sworn I was against gay marriage. Ah well”

Phrased in those terms (“when people don’t remember what stance they just endorsed – after being approached by a stranger that was asking them to endorse some stance they might not have given any thought to until moments prior – they’re capable of articulating supportive arguments for an opposing stance”), the results of this study are not terribly strange. People often have to reason differently about whether a moral act is acceptable or not, contingent on where they currently stand in any moral interaction. For example, deciding whether an instance of murder was morally acceptable or not will probably depend, in large part, on which side of that murder you happen to stand on: did you just kill someone you don’t like, or did someone else just kill someone you did like? An individual that stated murder is always wrong in all contexts might be at something of a disadvantage, relative to one with a bit more flexibly in their moral justifications (to the extent that those justifications will persuade others about whether to punish the act or not, of course).

One could worry about what people’s “real” opinions are, then, but it would seem that doing so fundamentally misstates the question. Saying that when something bad happens to you is wrong, and when that same something bad happens to someone you dislike is right, both represent real opinions, but they’re not universal opinions; they’re context-specific. Asking about “real” universal moral opinions would be like asking about “real” universal emotions or states (“Ah, but how happy is he really? He might be happy now, but he won’t be tomorrow, so he’s not actually happy, is he?”). Now, of course, some opinions might be more stable than others, but that will likely be the case only insomuch as the contexts surrounding those judgments doesn’t tend to change.

References: Hall, L., Johansson, P., & Strandberg, T. (2012). Lifting the veil of morality: Choice blindness and attitude reversals on a self-transforming Survey PLOS ONE

Your Mama’s So Fat…

Recently, there’s been a (free) paper going around the various psychology blogging sites by Swami & Tovee (2012) that deals with how stress appears to affect men’s ratings of women’s attractiveness by body type. The study purports to find that men, when placed in an apparently stressful situation, subsequently report finding heavier women more attractive. My take on the issue, for what it’s worth, is that the authors (and a few bloggers who have picked up the study) might have, in the excitement of talking about this result, seemed to have overlooked the fact that their explanation for it does not appear to make much sense.

On the plus side, at least they tried; “A” for…affort, I guess…

Swami & Tovee (2012) referenced what they call the “Environmental Security Hypothesis”. This hypothesis suggests that when an individual is facing some environmental stress, they will tend to prefer mates that can more successfully navigate those stressful life events. In certain contexts, then, the author’s further suggest that physical attractiveness ideals should change. So, in the case of body size, their general argument would seem to go something like this: since fat stores are a measure of caloric security and physical maturity, when their caloric security is low, men should subsequently find women with more fat more attractive because they hold a higher mate value in those contexts.This argument strikes me as distinctly bad.

Presumably there are a number of modules inside our brain that function to assess the mate value of others. We should expect these modules to being paying attention, so to speak, to traits that correlate with the reproductive potential of those potential mates. Given that the current caloric state of women is one of those traits, we certainly should expect some of men’s mating modules to assess it. That’s all well and good, but here’s where the authors lose me: when a man is assessing a woman’s reproductive potential, how does information about that man’s current state help in that assessment? My being hungry or stressed should, in principle, have little or nothing to do with whether any individual woman is fertile or capable of successfully dealing with stressful life events, or anything, really.

Now maybe if I was chronically hungry or stressed, there might be some value in selecting a mate with more fat, but only insomuch as my levels of hunger and stress are predictive of theirs. This argument would hinge on the notion that stress and hunger are shared, more or less, communally. However, even granting that chronic levels of hunger or stress for me might be predictive of the risks that others will encounter these things as well, this study was not examining chronic levels of these variables; it was examining acute levels of stress or hunger. This makes the argument seem even weaker. The mate value of others should not really change because I have a stressful day (or, in the case of this experiment, a stressful few minutes competing for a fake job and counting backwards in intervals of 13 in front of a few people).

They should only change after I make it to happy hour.

Because of that, the question then becomes: what value would information about my current state have when it comes to assessing another individual’s state? As far as I can tell, this answer amounts to “not much”. If I want to assess someone else’s state my best bet would probably be to, well, assess it directly, rather than assessing mine and assuming mine reflects theirs. Despite this, the research did show that men were assessing heavier figures as more attractive after they had been stressed, so how should we explain this?

We can start by noting that neither men’s BMI or current hunger levels correlated with their ratings of attractiveness. Since adipose tissue is supposed to be signaling caloric security, this casts some doubt on at least part of the Environmental Security Hypothesis put forth by Swami & Tovee (2012). It would also appear to contradict some previous research they present in the introduction about how men’s preferences for female body size shift with their hunger levels. Nevertheless, men in the stressed group did tend to find the heavier figures more attractive. Those same men also happened to find the figures in the normal weight category more attractive, and, even though the preference was slightly shifted, also still found women in the underweight category to be the average ideal. In other words, their ratings of attractiveness shifted up in overall magnitude about as much as they shifted towards the heavier end. While the authors focus on the latter shift, they don’t seem to pay any mind to the former, which is a rather severe oversight.

Let’s consider that finding in light of a hunger analogy. There’s no denying that preferences can shift on the basis of one’s current caloric state. How appealing I find the idea of eating an unpalatable food will change on the basis of how recently I ate and how long I’ll likely have to wait before being able to eat something else. When I’m hungry, normally unpalatable food might appear more acceptable whereas food that was initially appetizing will now be highly appetizing. How attractive food seems, in general, would shift upwards. You might also find that, provided that not all food is equally as attainable, that I shift my standards towards food that I can more easily acquire and away from food that appears more difficult to obtain. When you’re hungry, a meal of lower quality now might seem more appealing than a meal of higher quality later, provided that meal of higher quality would even be available at all. Finally, you might find that no matter how hungry I get, my preference for eating things like bark or sand remains relatively unchanged, no matter how easy or difficult they are to obtain.

It’s not so bad once you ketchup it up…

Returning to the attractiveness ratings in the current study, this is basically what the paper showed: there was little variance in whether or not men found starving or obese women attractive (they didn’t). Stressed men also shifted their ratings to the right (perhaps towards more attainable mates) and similarly shifted them up (women were generally more attractive). Taking both of these effects into account gives us a better grasp for what’s really going on.

Now maybe the title of “stressed men lower their standards” has a bit less of a positive ring to it than the authors and bloggers intended, but it’s certainly consistent with the pattern of data observed here. It would at least appear to be more consistent than the author’s explanation for the pattern of results which hinges on ecological variation in access to resources, since the overall ecology for men wasn’t changing in this study: acute stress levels were. Whether your stress level is more useful for predicting useful things about other people, or whether it’s more useful for predicting which course of action you yourself should pursue, I feel, should be clear.

References: Swami V, & Tovée MJ (2012). The Impact of Psychological Stress on Men’s Judgements of Female Body Size. PloS one, 7 (8) PMID: 22905153

Mate Choices Can Be Complex, But Are They Oedipal Complex?

Theory is arguably the most important part of research. A good theory helps researchers formulate better research questions as well as understand the results that their research projects end up producing.I’ve said this so often that expressing the idea is closer to a reflex than a thought at this point. Unfortunately, “theories” in psychology – if we can even call them theories – are frequently of poor quality, if not altogether absent from research, leading to similarly poorly formulated projects and explanations. Evolutionary theory offers an escape from this theoretically shallowness, and it’s the major reason the field appeals to me. I find myself somewhat disappointed, then, to see a new paper published in Evolutionary Psychology that appears to be, well, atheoretical.

No, I’m not mad; I’m just disappointed…

The paper was ostensibly looking at whether or not human children sexually imprint on the facial traits of their opposite sex parent, or, more specifically (for those of you that don’t know about imprinting):

Positive sexual imprinting has been defined as a sexual preference for individuals possessing the characteristics of one’s parents… It is said to be a result of acquiring sexual preferences via exposure to the parental phenotype during a sensitive period in early childhood.

The first sentence of that definition seems to me to be unnecessary. One could have preferences for characteristics that one’s parents also happen to possess without those preferences being the result of any developmental mechanism that uses parental phenotype as its input. So I’d recommend using the second part of the definition, which seems fine, as far as describing sexual imprinting on parents goes. As the definition suggests, such a mechanism would require (1) a specified developmental window during which the imprinting takes place (i.e. the preferences would not be acquired prior to or after that time, and would be relatively resistant to change afterwards) and (2)  that mechanism to be specifically focused on parental features.

So how did Marcinkowska & Rantala (2012) go about testing this hypothesis? Seventy subjects, their sexual partner, and their opposite sex parent (totaling 210 people) were each photographed from straight ahead and in profile. These subjects were also asked to report about their upbringing as a child. Next, a new group of subjects were presented with an array of pictures: on one side of the array was a picture of one of the opposite sex parents; on the other side there were four pictures, one of which was the partner of that parent’s child and three of which were controls. The new subjects were asked to rate how similar the picture of the parent was to the pictures of the people on the other side of the display.

The results showed that the group of independent raters felt that a man’s mother resembled slightly more closely his later partner than the controls did. The results also showed that the same raters did not feel that a woman’s father more closely resembled her later partner than the control did. Neither of these findings were in any way related to the self-reports that subjects had delivered about their upbringing either. If you’ve been following along so far, you might be curious as to what these results have to do with a sexual imprinting hypothesis. As far as I can tell, the answer is a resounding, “nothing”.

Discussion: Never mind

Let’s consider what these results don’t tell us: they certainly don’t speak to the matter of preferences. As Marcinkowska & Rantala (2012) note, actual mating preferences can be constrained by other factors. Everyone in the population might wish to monopolize the matings of a series of beautiful others, but if those beautiful others have different plans, that desire will not be fulfilled. Since the initial definition of imprinting specifically referenced preferences – not actual choices – the findings would have very little relevance to the matter of imprinting no matter how the data fell out. It’s worse than that, however: this study didn’t even attempt to look for any developmental window either. The authors seemed to just assume it existed without any demonstration that it actually does.

What’s particularly peculiar about this oversight is that, in the discussion, the authors note they did not look at any adoptive families. This suggests that the authors at least realized there were ways of testing to see if this developmental window even exists, but didn’t seem to bother running the required tests. A better test – one that might suggest such a developmental window exists – would be to test preferences of adoptive or step-children towards the features of their biological and adoptive/step-parents. If the imprinting hypothesis was true, you would expect that adoptive/step-children would prefer the characteristics of their adoptive/step-parents, not their biological ones. Further, this research could be run with respect to the time at which the new parent came into the picture (and the old one left). If there is a critical developmental window, you should only expect to see this effect when the new parent entered into the equation at a certain age; not before or beyond that point.

The problems don’t even end there, however. As I mentioned previously, this paper appears atheoretical in nature, in that the authors give absolutely no reason as to why one would expect to find a sexual imprinting mechanism in the first place, why it would operate in early childhood, let alone why that mechanism would be inclined to imprint on one’s close, biological kin. What the precise fitness benefits to such a mechanism would be are entirely unclear to me, though, at the very least, I could see it carrying fitness costs in that it might heighten the probability of incest taking place. Further, if this mechanism is presumably,active in all members of our species, and each person is looking to mate with someone who resembles their opposite sex parent, it would seem that such a preference might actively disincline people from having what would be otherwise adaptive matings. Lacking any theoretical explanation for any of this, the purpose of the research seems very confusing.

On the plus side, you can still add it to your resume, and we all know how important publications are.

All that said, even if research did find that people tended to be attracted to the traits of their opposite sex parent, such a finding could, in principle, be explained by sexual selection. Offspring inherent genes from their parents that both contributed to their parent’s phenotype as well as genes that contributed to their parent’s psychological preferences. If preferences were not similarly inherited, sexual selection would be impossible and ornaments like the peacock’s tail could never have come into existence. So, presuming your parents found each other at least attractive enough to get together and mate, you could expect their offspring to resemble them both physically and psychologically to some extent. When those offspring are then making their own mate choices, you might then expect them to make a similar set of choices (all else being equal, of course).

What can be said for the study is that it’s a great example of how not to do research. Don’t just assume the effect you’re looking to study exists; demonstrate that it does. Don’t assume that it works in a particular way in the event that it actually exists either. Most importantly, don’t formulate your research project in absence of a clearly stated theory that explains why such an effect would exist and, further, why it would work the way you expect it might. You should also try and rule out alternative explanations for whatever findings you’re expecting. Without good theory, the quality of your research will likely suffer, and suffer badly.

 References: Marcinkowska, U.M., & Rantala, M.J. (2012). Sexual Imprinting on Facial Traits of Opposite-Sex Parents in Humans. Evolutionary Psychology, 10, 621-630

The Salience Of Cute Experiments

In the course of proposing new research and getting input from others, I have had multiple researchers raise the same basic concern to me: the project I’m proposing might be unlikely to eventually get published because, given that I find the results I predict that I will, reviewers might feel the results are not interesting or attention-grabbing enough. While I don’t doubt that the concern is, to some degree, legitimate*, it has me wondering about whether their exists an effect that is essentially the reverse of that issue. That is, how often does bad research get published simply on the grounds that it appears to be interesting, and are reviewers willing to overlook some or all the flaws of a research project because it is, in a word, cute?

Which is why I always make sure my kitten is an author on all my papers.

The cute experiment of the day is Simons & Levin (1998). If you would like to see a firsthand example of the phenomenon this experiment is looking at before I start discussing it, I’d recommend this video of the color changing card trick. For those of you who just want to skip right to the ending, or have already seen the video, the Simons & Levin (1998) paper sought to examine “change blindness”: the frequent inability of people to detect changes in their visual field from one moment to the next. While the color changing card trick only replaced the colors of people’s shirts, tablecloths, or backdrops, the experiment conducted by Simons & Levin (1998) replaced actual people in the middle of a conversation to see if anyone would notice.The premise of this study would appear to be interesting on the grounds that many people might assume that they would notice something like the fact that they were suddenly talking to a different person then they were a moment prior, and the results of this study would seem to suggest otherwise. Sure sounds interesting when you phrase it like that.

So how did the researchers manage to pull off this stunt? The experiment began when a confederate holding a map approached a subject on campus. After approximately 10 or 15 seconds of talking, two men holding a door would pass in between the confederate and the subject. Behind this door was a second confederate who changed places with the first. The second confederate would, in turn, carry on the conversation as if nothing had happened. Of the 15 subjects approached in such a manner, only 7 reported noticing the change of confederate in the following interview. The authors mention that out of the 7 subjects that did notice the change, there seemed to be a bias in age: specifically, the subjects in the 20-30 age range (which was similar to that of the confederates) seemed to notice the change, whereas the older subjects (in the 35-65 range) did not. To explain this effect, Simons & Levin (1998) suggested that younger subjects might have been treating the confederates as their “in-group” because of their age (and accordingly paying more attention to their individual features) whereas the older subjects were treating the confederates as their “out-group”, also because of their age (and accordingly paying less attention to their features).

In order to ostensibly test their explanation, the authors ran a follow-up study. This time the same two confederates were dressed as construction workers (i.e. they wore slightly different construction hats, different outfits, and different tool belts) in order to make them appear as more of an “out-group” member to the younger subjects. The confederates then exclusively approached people in the younger age group. Lo and behold, when the door trick was pulled, this time only 4 of the 12 subjects caught on. So here we have a cute study with a counter-intuitive set of results and possible implications for all sorts of terms that end in -ism.

And the psychology community goes wild!

It seems to have gone unnoticed, however, that the interpretation of the study wasn’t particularly good. The first issue, though perhaps the smallest, is the sample size. Since these studies only ran a total of 13.5 subjects each, on average, the extent to which this difference in change blindness (approximately 15%) across groups is just due to chance is unknown. Let’s say, however, that we give the results the benefit of the doubt and assume that they would remain stable if sample size was scaled up. Even given that consideration, there are still some very serious problems remaining.

The larger problem is that the authors did not actually test their explanation. This issue comes in two parts. First, Simons and Levin (1998) proposed that subjects were using cues of group membership in determining whether or not to pay attention to an individual’s features. In their first study, this cue was assumed to be age; in the second study, this cue was assumed to now be construction worker. Of note, however, is that the same two confederates took part in both experiments, and I doubt their age changed much between the two trials. This means that if Simons and Levin (1998) were right, age only served as an indicator of group membership in first context; in the second, that cue was overridden by another – construction worker. Why that might be the case is left completely untouched by the authors, and that seems like a major oversight. The second part is that the authors didn’t test whether the assumed “in-group” would be less change blind. In order to do that they would have had to, presumably, pull the same door trick using construction workers as their subjects. Since Simons and Levin (1998) only tested an assumed out-group, they are unable to make a solid case for differences in group membership being responsible for the effect they’re talking about.

Finally, the authors seem to just assume that the subjects were paying attention in the first place. Without that assumption these results are not as counter-intuitive as they might initially seem, just as people might not be terribly impressed by a magician who insisted everyone just turned around while he did his tricks. The subjects had only known the confederates for a matter of seconds before the change took place, and during those seconds they were also focused on another task: giving directions. Further, the confederate (who is still a complete stranger at this point) is swapped out for another very similar one (both are male, both are approximately the same age, race, and height, as well as being dressed very similarly). If the same door trick was pulled with a male and female confederate, or a friend and a stranger, or people of different races, or people of different ages, and so on, one would predict you’d see much less change blindness.

My only change blindness involves being so rich I can’t see bills smaller than $20s

The real interesting questions would then seem to be what cues to people attend to, why do they attend to them, and in what order are they attended to? None of these questions are really dealt with by the paper. If the results they present are to be taken at face value, we can say the important variables are often not the color of one’s shirt, the sound of one’s voice (within reason), very slight differences in height, and modestly different hairstyles (when one isn’t wearing a hat) when dealing with complete strangers of similar gender and age, while also involved in another task.

So maybe that’s not a terribly surprising result, when phrased in such a manner. Perhaps the surprising part might even be that so many people noticed the apparently not so obvious change. Returning to the initial point, however, I don’t think many researchers would say that an experiment designed to demonstrate that people aren’t always paying attention to and remembering every single facet of their environment would be a publishable paper. Make it cute enough, however, and it can become a classic.

*Note: whether the concerns are legitimate or not, I’m going to do the project anyway.

References: Simons, D.J., & Levin, D.T. (1998). Failure to detect changes to people during a real-world interaction Psychonomic Bulletin & Review, 5, 644-649 DOI: 10.3758/BF03208840