Is The Exotic Erotic? Probably Not…

Last time I wrote about the likely determinants of homosexuality, I ended up favoring the pathogen hypothesis that was put forth by Cochran, Ewald, and Cochran (2000) as the theory that had the most currently going for it. What is particularly interesting about my conclusion is how much empirical evidence directly confirms the theory: none. Don’t get me wrong; the pathogen hypothesis is certainly consistent with the known findings about homosexuality – such as the widely-varying reported concordance rates and the large fitness costs associated with the orientation – but being consistent with certain findings is not the same as being demonstrated by that evidence. If the currently most plausible theory for explaining homosexuality has, in essence, no direct evidence in its favor, that clearly must not be saying a lot about the alternative prospects. The two theories I covered last time – kin selection and sexually antagonistic selection – can’t even seem to account well for the existing evidence, so a neutral point with regard to the evidence is actually preferable. There was one theory that I neglected to mention last time, however, and this is a theory that purports to be able to explain both how heterosexual and homosexual orientations come to develop, and in both sexes, no less. If such a theory proved to have anything to it, then, it would be a highly valuable perspective indeed, so it deserves careful inspection.

“Nope; still not finding any indication of plausibility yet. Get the bigger microscope”

The theory, known as “Exotic Becomes Erotic” (EBE) was proposed by Daryl Bem (1996). If that name sounds familiar, it’s because this is the same Daryl Bem who also thought he found evidence for “extra-sensory porn-ception” in 2011, so we’re already not off to a good start. Pressing ahead despite that association, EBE puts the causal emphasis of developing a preferential attraction towards one sex or another on an individual’s perceptions of feeling different from other members of one sex: for instance, if a boy happens to not like sports, he will feel different from the majority of the other boys who do seem to like sports; if he does like sports, he’d feel different from the girls who did not. Following this perception of one sex as exotic, EBE posits that individuals will come to experience “non-specific, autonomic arousal” to the exotic group in question and, subsequently, that arousal will be transformed into an erotic preference for members of the initially exotic group. So, if you feel different from the boys or the girls, regardless of whether you’re a boy or a girl, you’ll come to be vaguely aroused by that sex – either by apprehension, anger, fear, curiosity, or really anything works, so long as it’s physiologically arousing – and then your body will, at some point, automatically turn that arousal into lasting sexual preferences.

Like most of the theories regarding homosexuality I discussed previously, this one also have very little actual evidence to support it. What it does have is a correlation between retrospective reports of childhood gender nonconformity and current sexual orientation. In fact, that single, underwhelming correlation is about all that EBE has going for it; everything else in the model is an assumption that’s largely built off that correlation. While a retrospective correlation is slightly better than having no evidence at all, it’s not better by a whole lot (in much the same way that 53% accuracy at guessing where some stimuli will show up between two options isn’t much better than 50%, yet apparently both are publishable). So, now that we’ve covered what the theory has going for it, let’s consider some of the things that EBE does not have going for it. You might want to take a break now to use the bathroom or get a snack, because this is a long list.

Let’s begin with the matter of this “non-specific physiological arousal”: at a bare minimum, EBE would require that the sex an individual perceived to be the least exotic ought to be consistently less physiologically arousing, on average, than the gender that individual did identify with. Without that arousal, there would be nothing to eventually convert into later sexual preference. So what does Bem have to say about the presence or absence of this arousal?

“To my knowledge, there is no direct evidence for the first step in this sequence beyond the well-documented observation that novel (“exotic”) stimuli produce heightened physiological arousal in many species, including our own”

So, in other words, there is no empirical evidence for this suggestion whatsoever. The problems, however, do not stop there: EBE is also massively under-specified in regards to how this hypothetical “non-specific arousal” is turned into eroticism in some cases but not others. While Bem (1996) proposes three possible mechanisms through which that transition might take place – sexual imprinting, the opponent process, and the extrinsic arousal effect – there are clearly non-human stimuli that produce a great deal of arousal (such as spiders, luxury cars, or, if we are talking about children, new toys) that does not get translated into later sexual attraction. Further, there are also many contexts in which gender-conforming children of the same sex will be around other while highly physiologically aroused (such as when boys are playing sports and competing against a rival team), but EBE would seem to posit that these high-aroused children would not develop any short- or long-term eroticism towards each other.

Nope; nothing even potentially erotic about that…

Bem might object that this kind of physiological arousal is somehow different, or missing a key variable. Perhaps, he might say, that in addition to this yet-to-be-demonstrated arousal, one also needs to feel different from the target of that arousal. Without being both exotic and arousing, there will be no lasting sexual preference developed. While such a clarification might seem to rescue EBE conceptually in this regard, the theory again falters by being massively under-specified. As Bem (1996) writes:

“…[T]he assertion that exotic becomes erotic should be amended to exotic – but not too exotic – becomes erotic. Thus, an erotic or romantic preference for partners of a different sex, race, or ethnicity is relatively common, but a preference for lying with the beasts in the field is not.”

In addition to not figuring out whether the arousal required for the model to work is even present, in no treatment of the subject does Bem specify precisely how much arousal and/or exoticism ought to be required for eroticism to develop, or how these two variables might interact in ways that are either beneficial or detrimental to that process. While animals might be both “exotic” and “highly arousing” to children, very rarely does a persistent sexual preference towards them develop; the same can be said for feelings between rival groups of boys, though in this case the arousal is generated by fear or anger. EBE does not deal with this issue so much as it avoids it through definitional obscurity.

Continuing along this thread of under-specificity, the only definition of “exotic” that Bem offers involves a perception of being different. Unfortunately for EBE, there are a near incalculable number of potential ways that children might feel different from each other, and almost none of those potential representations are predicted to result in later eroticism. While Bem (1996) does note that feeling different about certain things – interest in sports seems to be important here – appears to be important for predicting later homosexual orientation, he does not attempt to explain why feeling different about gender-related variables ought to be the determining factor, relative to non-gender related variables (such as intelligence, social status, or hair color). While erotic feelings do typically develop along gendered lines, EBE gives no a priori reason for why this should be expected. One could imagine a hypothetical population of people who develop preferential sexual attractions to other individuals across any number of non-gendered grounds, and EBE would have little to say about why this outcome does not obtain in any known human population.

The problem with this loose definition of exotic does not even end there. According to the data presented by Bem, many men and women who later reported a homosexual attraction also reported having enjoyed gender-typical activities (37 and 37%, respectively), having been averse to gender-atypical activities (52 and 19%), and having most of their childhood friends be of their same sex (58 and 40%). While these percentages are clearly different between homosexual and heterosexual respondents – with homosexuals reporting enjoying typical activities less, atypical ones more, and being more likely to predominately have friends of the opposite sex – EBE would seem to be at loss when attempting to explain why roughly half of homosexual men and women do not seem to report differing from their heterosexual counterparts in these important regards. If many homosexuals apparently did not view their own sex to be particularly exotic during childhood, there could be no hypothetical arousal and, accordingly, no eroticism. This is, of course, provided these retrospective accounts are even accurate in the first place and do not retroactively inflate the perceptions of feeling different to accord with their current sexual orientation.

“In light of not being hired, I can now officially say I never wanted your stupid job”

On a conceptual level, however, EBE runs into an even more serious concern. Though Bem (1996) is less than explicit about this, it would seem his model suggests that homosexuality is a byproduct of an otherwise adaptive system designed for developing heterosexual mate preferences. While Bem (1996) is likely correct in suggesting that homosexuality is not adaptive itself, his postulated mechanism for developing mate preference would likely have been far too detrimental to have been selected for. Bem’s model would imply that the mechanism responsible for generating sexual attraction, when functioning properly, functions so poorly that it would, essentially, render a rather large minority of the population effectively sterile. This would generate an intense selection pressure either towards any modification of the mechanism that did not preclude its transfer from one generation to the next or decisive selection towards a much greater gender conformity. Neither outcome seems to have obtained, which poses a new set of questions regarding why.

Precisely how such a poorly-functioning mechanism would have even come to exist in human populations in the first place is a matter that Bem never addresses. A major issue with the EBE perspective, then, is that it more-or-less takes for granted the base rate existence of homosexuality in human populations without asking why it ought to be that prevalent for humans but almost no other known species. Though Bem does not discuss it, almost every other species appears to navigate the process of developing sexual attraction in ways that do not result in large numbers of males or females developing exclusive same-sex attractions. If this was any other key adaptation, such a vision, and significant minorities of the population consistently went blind at very young ages in a world where being able to see is adaptive, we would want a better explanation for that failure than the kind that EBE can provide. Now if only the creator of EBE has some kind of ability to see into the future – an extra-sensory ability, if you will – to help him predict that his theory would run into these problems, they might have been avoided or dealt with…

References:Bem, D. (1996). Exotic becomes erotic: A developmental theory of sexual orientation. Psychological Review, 103 (2), 320-335 DOI: 10.1037//0033-295X.103.2.320

Cochran, G., Ewald, P., & Cochran, K. (2000). Infectious Causation of Disease: An Evolutionary Perspective Perspectives in Biology and Medicine, 43 (3), 406-448 DOI: 10.1353/pbm.2000.0016

Full Frontal Nerdity: Understanding Elitists And Fakes

“There is no such thing as “fake geek girls”; there are, only, girls who are at different, varying levels of falling in love with something that society generically considers to fall under the “nerd culture” category” -albinwonderland

About a month ago, Tony Harris (a comic book artist) posted a less-than-eloquently phrased rant about how he perceived certain populations within the geek subculture to be “fakes”; specifically, he targeted many female cosplayers (for a more put-together and related train of thought, see here). Predictably, reactions were had and aspersions were cast. Though I have barely dipped my toes into this particular controversy, the foundations of it are hardly new. This experience – the tensions between the “elites” and “fakes” – is immediately relatable to all people, no matter how big of a fish they happen to be in the various social ponds they inhabit. While the specific informal rules of subcultures (how one should dress, what one may or may not be allowed to like, and so on) may differ from group to group, these superficial differences dissolve into the background when considering the underlying similarities of their logic; nerd culture will just be our guide to it at present.

NERRRRRRDS!”

I get the sense that, because the issue involved gender, a good deal of the collective cognitive work that went into this debate focused on generating and defending against claims of sexism, which, while an interesting topic in its own right, I find largely unproductive for my current purposes. In the interests of examining the underlying issues without the baggage that gender brings, I’d like to start by answering a seemingly unrelated question: why might Maddox feel that bacon has been “ruined” for him by its surge in popularity?

The Internet needs to collectively stop sucking Neil deGrasse Tyson’s dick. And add bacon and zombies to that list. I love bacon, but fuck you for ruining it, everyone. Holy shit, just shut the fuck up about bacon. Yeah, it’s great, we know. Bacon cups, bacon salt, bacon shirts, bacon gum, bacon, bacon, bacon, WE GET IT. Bacon is your Jesus, we know, now do a shot of bleach and take some buckshot to the face.

This comment likely seems strange (if also a bit relatable) to many people: why should anyone else’s preferences influence Maddox’s? If I like chocolate ice cream, it would indeed be odd if I started liking it less if I was around other people who also seemed to like it, especially if the objective properties of the ice cream in question haven’t changed; it’s still the same ice cream (or bacon) that was there a moment ago. It seems clear from that consideration that bacon per se isn’t what Maddox feels has been ruined. What Maddox doesn’t seem to like is that other people like it; too many other people, however many that works out to be. So now that we’ve honed the question somewhat (why doesn’t Maddox like that other people like bacon?), let’s turn to Dr. Seuss to, at least partially, answer it.

In Dr. Seuss’s story, The Sneetches, we find an imaginary population of anthropomorphic birds, some of which have a star on their belly and others of which do not. The Sneetches form group memberships along these lines, with the stars – or lack thereof – serving as signals for which group any given Sneetch belongs to. Knowing whether or not a Sneetch had a star could, in this world, provide you with useful information about that individual: where might this Sneetch stand socially, relative to its peers; who might this Sneetch tend to associate with; what kind of resources might this Sneetch have access to. However, when a man rolls into town and starts to affix stars to the bellies of the starless Sneetches, this system gets thrown out of order. The special status of the initially-starred Sneetches is now questioned, because every Sneetch is sending an identical signal, meaning that signal can no longer provide any useful information. The initially-starred Sneetches then, apparently feeling that the others have “ruined” stars for them, subsequently remove their stars in an attempt to restore the signal value and everyone eventually learns something about racism. In this example, though, it becomes readily apparently why preferences for having stars changed: it was the signal value of the stars – the information they conveyed – that changed, not the stars themselves, and this information is, or rather, was, a valuable resource.

The key insight here is that if an individual is trying to signal some unique quality about itself, it does them no good to try and achieve that goal through a non-unique or easy-to-fake signal. Any benefits that the signal can bring will soon be driven to non-existence if other organisms are free to start sending that same signal. So let’s apply that lesson back to bacon question: Maddox is likely bothered by too many other people liking bacon because, to some extent, it means the signal strength of his liking bacon has been degraded. As part of Maddox’s affinity for bacon seemed to extend beyond its physical properties, some part of his affinity for the product was lost with that signal value; it no longer said much about him as a person. You’ll notice that I’ve forgone the question of what precisely Maddox might have been trying to signal by advertising his love of bacon. I’ve done this because, while it might be an interesting matter in its own right, it’s entirely beside the point. Regardless of what the signal is supposed to be about, ubiquitousness will always threaten it.

The “Where’s Waldo” principle

While many might be tempted to take this point as a strike against the elitists (“they don’t really like what they say they like; they only like what liking those things says about them”, would resemble how that might get phrased), it’s important to bear in mind that this phenomenon is not restricted to the elites. As Maddox suggests in his post, many of the people whom he deems to be “fake” nerds are attempting to make use of that signal value themselves, despite lacking the trait that the signal is supposed to advertise:

 People love science in the same way they love classical music or art. Science and “geeky” subjects are perceived as being hip, cool and intellectual. So people take a passing interest just long enough to glom unto these labels and call themselves “geeks” or “nerds” every chance they get.

Like the starless Sneetches, these counterfeiters are trying to gain some benefit (in this case, perhaps the perception that they’re intelligent or interesting) through a dishonest means (by not being either of those things, but still telling people they are). This poses a very real social problem for signalers to overcome: how to ensure that (their) communication is (perceived as being) honest so the value of the communication can be maintained (for them)? If I can send a signal that advertises some latent quality about myself, I can potentially gain some benefit by doing so. However, if people who do not have that underlying quality also start sending the signal, they can reap those same benefits at my expense. The resources in question here are zero-sum, so more of those resources going to others means less for me, and vice versa. This means it’s in my interests to send signals that others cannot send in order to better monopolize those benefits, and to likewise strive to give receivers the impression that my signal is of a greater value than the signals of others.

Despite being beset by possible deception from senders, it is also in the interests of those receiving the signals that said signals remain honest. Resources, social or otherwise, that I invest in one individual cannot also be invested in another. Accordingly, when deciding how to allocate that limited investment budget, it’s in my interests to do so on the basis of good information; information which I would no longer have access to as the signal value degrades. Appreciating this problem helps answer a question posed by BlackNerdComedy: why does he remember information about an obscure cartoon called “SpaceCats”? More generally, he wonders aloud why people get “challenged” on their gamer credibility on the basis of their obscure knowledge and what purpose such challenges might serve (ironically, he also has another video where he complains about how some nerds consider others to not be “real nerds”, and then goes on to say that the people judging others for not being “real nerds” are, themselves, not “real nerds” because of it). In the light of the communication problem, the function of these challenges seems to become clear: they’re attempts at assessing honesty. Good information about someone’s latent qualities simply cannot be well-assessed by superficial knowledge. In much the same way, the depths of one’s math abilities can’t be well-assessed by testing on basic addition and subtraction alone; testing on the basics simply doesn’t provide much useful information, especially if the basics are widely known. If you start giving people calculus problems to do instead, you now have a better gauge for relative math abilities.

Obscure knowledge is no the only means through which one can try and guarantee the honesty of the signal, however. Another point that has frequently been raised in this debate involves people talk about “paying your dues” in the community, or suffering because of one’s nerdy inclinations. As Maddox puts it:

Well someone forgot to give the “nerds-are-sexy” memo to my friends, because most of them are nerds and none of them are getting laid. Here’s a quick rule of thumb: if you don’t have to make an effort to get laid, you’re not a nerd.Being a nerd is a byproduct of losing yourself in what you do, often at the expense of friends, family and hygiene. Until or unless you’ve paid your dues, you aren’t welcome.

For starters, the emphasis on costs is enlightening: paying costs helps ensure the signal is harder to fake (Zahavi, 1975), and the greater those costs, the more likely the signal is honest. If someone is socially ridiculed for their hobby, it’s a fairly good sign that they aren’t doing it just to be popular, as this would be rather counterproductive. Maddox’s comment also taps into the distinction that Tooby and Cosmides (1996) made in regards to “genuine” and “fair-weather” friends: genuine friends have a deep interest in your well-being, making them far less likely to abandon you when the goings get tough. By contrast, fair-weather friends stick around only so long as your deliver them the proper benefits, but will be more likely to turn on you or disappear when you become too costly. Again, people are faced with the problem of discriminating one type from the other in the face of less-than-honest communication. Just because someone tells you they deeply value your friendship, it doesn’t mean that they have your best interests at heart. This requires people to make use of other cues in assessing the social commitments of others and, it seems, one good way of doing this involves looking for people who literally have no other social prospects. If one has been rejected by all other social groups except the nerd community, they will be less likely to abandon that community because they have no better alternative. By contrast, those who are deemed to have plenty of viable social alternatives (such as physically attractive people) can be met with a greater degree of skepticism; they have other avenues they could take if the going gets too rough, which might suggest their social commitment is not as strong as it could otherwise be.

His loyalty to the nerd community is the only strong thing about him.

This is only a general sketch of the some of the issues at play in this debate; the specifics can get substantially more complicated. For instance, the goal of a signaler, as mentioned before, is to beat out other signalers, and this can involve a good deal of dishonesty in the signaler’s representations of other signalers. It’s worth bearing in mind that all signalers, whether they’re “real” or “fake”, have this same vested interest: beating the other signalers. Being honest is only one way of potentially doing that. There’s also the matter of social value, in that even if a signaler is sending a completely honest signal about their latent qualities and commitments, they might, for other reasons, simply not be very good social investments to most people, even within the community. As I said, it gets complicated, and because the majority of these calculations seem to be made without any conscious awareness, the subject gets even trickier to untangle.

One final point of the debate that caught my eye was the solution that people from either side of the debate appear to agree on (at least to some extent) for dealing with the problem: people should stop trying to label themselves as “gamers” or “nerds” altogether and simply enjoy their hobbies. In the language of the underlying theory, people can free themselves from harassment (from within the community, anyway) if they stop trying to signal something about themselves to others, and, by proxy, stop competing for whatever resources at are stake, through their hobbies. The problem with this suggestion is two-fold: (a) the resources at stake here are valuable, so it’s unlikely that competition for them will stop, and (b) most people don’t consciously recognize they’re competing in the first place; in fact, most people would explicitly deny such an implication. After all, if one is trying to maintain the perception they’re honestly saying something about themselves, it would do them no favors to acknowledge any ulterior motives, as this acknowledgement would tend to degrade the value of their signals on its own. It would degrade them, that is, unless one is trying to signal they’re more conscious of their own biases than other people, and a good social partner because of it…

References: Tooby, J., & Cosmides, L. (1996). Friendship and the banker’s paradox: Other pathways to the evolution of adaptations for altruism. Proceedings of The British Academy, 88, 119-143

Zahavi, A. (1975). Mate selection—A selection for a handicap Journal of Theoretical Biology, 53 (1), 205-214 DOI: 10.1016/0022-5193(75)90111-3

The Drifting Nose And Other “Just-So”s

In my last post dealing with PZ Myer’s praise for the adaptationist paradigm, which was confusingly dressed up as criticism, PZ suggested the following hypothesis about variation in nose shape:

Most of the obvious phenotypic variation we see in people, for instance, is not a product of selection: your nose does not have the shape it does, which differs from my nose, which differs from Barack Obama’s nose, which differs from George Takei’s nose, because we independently descend from populations which had intensely differing patterns of natural and sexual selection for nose shape; no, what we’re seeing are chance variations amplified in frequency by drift in different populations.

Today’s post will be a follow-up on this point. Now, as I said before, I currently have no strong hypotheses about what past selection pressures (or lack thereof) might have been at work shaping the phenotypic variation found in noses; noses which differ in shape and size noticeably from chimps, gorillas. orangutans, and bonobos. The cross-species consideration, of course, is a separate matter from phenotypic variation within our species, but these comparisons might at least make one wonder why the human nose might look the way it does compared to other apes. If that reason(s) could be discerned, it might also tell us something about current types of variation we see in modern human populations. The reason why noses vary between species might indeed be “genetic drift” or “developmental constraint” rather than “selection”, just as the answer to within-species variation of that trait might be as well. Before simply accepting those conclusions as “obviously true” on the basis of intuition alone, though, it might do us some good to give them a deeper consideration.

Follow your nose; it always knows! Alternatively, following evidence can work too!”

One of the concerns I raised about PZ’s hypothesis is that it does not immediately appear to make anything resembling a novel or useful prediction. This concern itself is far from new, with a similar (and more eloquently stated) point being raised by Tooby and Cosmides in 1997 [H/T to one of the commenters for providing the link]:

Modern selectionist theories are used to generate rich and specific prior predictions about new design features and mechanisms that no one would have thought to look in the absence of these theories, which is why they appeal so strongly to the empirically minded….It is exactly this issue of predictive utility, and not “dogma”, that leads adaptationists to use selectionist theories more often than they do Gould’s favorites, such as drift and historical contingency. We are embarrassed to be forced, Gould-style, to state such a palpably obvious thing, but random walks and historical contingency do not, for the most part, make tight or useful prior predictions about the unknown design features of any single species.

That’s not to say that one could not, in principle, derive a useful or novel prediction from the drift hypothesis; just that one doesn’t immediate jump out at me in this case, nor does PZ explicitly mention any specific predictions he had in mind. Without any specific predictions, PZ’s suggestion about variation in nose shape, while it may well be true to some small or large degree (given that PZ’s language chalks most to all of variation up to drift, rather than selection; it’s unclear precisely what proportion he had in mind), his claim also runs the risk of falling prey to the label of “just-so story“.

Since PZ appears to be really concerned that evolutionary psychologists do not make use of drift in their research as often as he’d like, this, it seems, would have been the perfect opportunity for him to show us how things ought to be done: he could have derived a number of useful and novel predictions from the drift hypothesis and/or shown how drift might better account for some aspects of the data in nose variation that he had in mind, relative to other current competing adaptationist theories on nose variation. I’m not even that particular about the topic of noses, really; PZ might prefer to examine a psychological phenomenon instead, as this is evolutionary psychology he’s aiming his criticisms at. This isn’t just a mindless swipe at PZ’s apparent lack of a testable hypothesis either: as long as his predictions derived from a drift hypothesis lead to some interesting research, that would be a welcome addition to any field.

Let’s move on from the prediction point towards the matter of whether selection played a role in determining current nose variation. In this area, there is another concern of mine about the drift hypothesis that reaches beyond the pragmatic one. As PZ mentions in his post, selection pressures are generally “blind” to very small fitness benefits or detriments. If your nose differs in size from mine by 1/10 of a millimeter, that probably won’t have much of an effect on eventual fitness outcomes, so that variation might stick around in the next generation relatively unperturbed. The next generation will, in turn, introduce new variation into the population due to sexual recombination and mutation. If the average difference in nose shape was 1/10 of a millimeter in the previous generation, that difference may now grow to, say, 2/10th of a millimeter. Since that difference still isn’t likely enough to make much of a difference, it sticks around into the next generation, which introduces new variation that isn’t selected for or against, and so on. These growths in average variation, while insignificant when considered in isolation, can begin to become the target of selection as they accumulate and their fitness costs and benefits begin to become non-negligible. In this hypothetical example, nose shape and size might begin to become the target of stabilizing selection where the more extreme variations are weeded out of the population, perhaps because they’re viewed as less sexually appealing or become less functional than other, less extreme variants (the factors that PZ singled out as not being important).

Ladies; start your engines…

So let’s say one was to apply an adaptationist research paradigm to nose variation and compare it to PZ’s drift hypothesis (falsely assuming, for the moment, that an adaptationist research paradigm is in some way supposed to be opposed to a drift one). A researcher might begin by wondering what functions nose shape could have. Note that these need not be definitive conclusions; merely plausible alternatives. Once our researcher has generated some possible functions, they would begin to figure out ways of testing these candidate alternatives. Noback et al (2011), for instance, postulated that the nasal cavity might function, in part, to warm and humidify incoming air before it reaches the lungs and, accordingly, predicted that nasal cavities ought to be expected to vary contingent on the requirements of warming and humidifying across varying climates.

This adaptationist research paradigm generated six novel predictions, which is a good start compared to PZ’s zero. Noback et al (2011) then tested test predictions against 100 skulls from 10 different populations spanning 5 different climates. The resulted indicated significant correlations between nearly every climate factor (temperature and humidity) and nasal cavity shape. Further, the authors managed to disconfirm more than one of their initial hypotheses, and were also able to suggest that these variations in nasal cavity shape were not due solely to allometric effects. They also mention plenty of variation is left unexplained, and some variation in nasal cavity variation might also be due to tradeoffs between warming and humidifying incoming air and functions of the nose (such as olfaction).

So, just to recap, this adaptationist research concerning nose variation yielded a number of testable predictions (it’s useful), found evidence consistent with them in some cases but not others (it’s falsifiable), tested alternative explanations (variation not solely due to allometry), mentioned tradeoffs between functions, and left plenty of variation unexplained (did not assume every feature was an adaptation). This is compared to PZ’s drift hypothesis, which made no explicit predictions, cited no data, made no mention of function (presumably because it would postulate there isn’t one), and would seem to not be able to account well for this pattern of results. Perhaps PZ might note this research deals primarily with internal features of the nose; not external ones, and the external features are what he had mind when he proposed the drift hypothesis. As he’s not explicit about which parts of nose shape were supposed to be under discussion, it’s hard to say whether he feels results like these would pose any problems for his drift hypothesis.

Moving targets can be notoriously difficult to hit

While I still remain agnostic about the precise degree to which variation in nose shape has been the target of selection, as I’m by no means an expert on the subject, the larger point here is how useful adaptationist research can be. It’s not enough to just declare that variation in a trait is obviously the product of drift and not selection and leave it at that in much the same way that one can’t just assume a trait in an adaptation. As far as I see it, neither drift nor adaptation ought to be the null hypothesis in this case. Predictions need to be developed and tested against the available data, and the adaptationist paradigm is very helpful in generating those predictions and figuring out what data might be worth testing. That’s most certainly not to say those predictions will always be right, or that the research flowing from someone using that framework will always be good. The point is just that adaptationism itself is not the problem PZ seems to think it is.

References: Noback, M., Harvati, K., & Spoor, F. (2011). Climate-related variation of the human nasal cavity American Journal of Physical Anthropology, 145 (4), 599-614 DOI: 10.1002/ajpa.21523

PZ Myers: Missing The Mark

As this year winds itself to a close, I’ve decided to treat myself to writing another post that allows me to engage more fully in my debating habit. The last post I did along these lines dealt with the apparent moral objection some people have for taking money from the wrong person, and why I felt they were misplaced. Today, the subject will be PZ Myers, who, as normally seems to be the case, appears to still have a dim view of evolutionary psychology. In this recent post, PZ suggested that evolutionary psychology is rotten right down to its theoretical core because of an apparently fatal misconception: adaptationism. Confusingly, PZ begins his attack on this fatal misconception by affirming that selection is an important mechanism, essential for fully understanding evolution, and ought not be ignored by researchers. In essence, PZ’s starting point is that the fatal flaw of evolutionary psychology is, in addition to not being a flaw, a vital conceptual paradigm.

Take that?

If you’re looking for anything in this post about why adaptationism per se is problematic, or a comparison demonstrating that research in psychology that makes use of adaptationism is generally inferior to research conducted without that paradigm, you’re liable to disappointed by PZ’s latest offering. This is probably because very little of his post actually discusses adaptationism besides his praise of it; you know, that thing that’s supposed to be a seriously flawed foundation. So given that PZ doesn’t appear to actually be talking about adaptationism itself being a problem, what is he talking about? His main concern would seem to be that he feels that other evolutionary mechanisms – specifically, genetic drift and chance – are not as appreciated as explanatory factors as he would prefer. He’s more than welcome to his perception of whether or not some factors are under-appreciated. In fact, he’s even willing to share an example:

Most of the obvious phenotypic variation we see in people, for instance, is not a product of selection: your nose does not have the shape it does, which differs from my nose, which differs from Barack Obama’s nose, which differs from George Takei’s nose, because we independently descend from populations which had intensely differing patterns of natural and sexual selection for nose shape; no, what we’re seeing are chance variations amplified in frequency by drift in different populations.

While I currently have no strong hypotheses one way or another about past selections on nose shape, PZ certainly seems to: he feels that current variation in nose shape is obviously due to genetic drift. Now I know it might seem like PZ is advancing a claim about past selections pressures with absolutely no evidence; it also might seem like his claim makes no readily apparent testable predictions, making it more of a just-so story; it might even seem that these sort of claims are the kind that are relatively less likely to ever see publication for the former two reasons. In all fairness, though, all of that only seems that way because all those things happen to also be true.

Moving onto his next point, PZ notes that chance factors are very important in determining the direction evolution will take when selection coefficients are small and the alleles in question aren’t well-represented in the gene pool. In other words, there will be some deleterious mutations that happen to linger around in populations because they aren’t bad enough to be weeded out by selection, and some variations that would be advantageous but never end up being selected. This is a fine point, really; it just has very little to do with adaptationism. It has even less to do with his next point, which involves whether color preference has any functional design. Apparently, as an evolutionary psychologist, I’m supposed to have some kind of feelings about the matter of color preference by association, and these feelings are supposed to be obviously wrong. (If I’m interpreting PZ properly, that is. Of course, if I’m not supposed to share some opinion about color preference, it would be strange indeed for him to bring that example up…)

“Well, I guess I can’t argue with that logic…”

Unfortunately, PZ doesn’t get his fill of the Pop Anti-Evolutionary Psychology Game in this first go of collective guilt by association, so he takes another pass at it by asserting that evolutionary psychologists begin doing research by assuming that what they’re studying is a functional adaptation. For those unwilling to click through the link:

…[T]he “Pop Anti-Evolutionary Psychology Game.” Anyone can play…First, assert something that evolutionary psychologists think. These assertions can come in any of a number of flavors, the only requirement being that it has to be something that is obviously false, obviously stupid, or both…hyper-adaptationism is always a good option, that evolutionary psychologists assume that all traits are adaptations…The second part of the game should be obvious. Once you’ve baldly asserted what evolutionary psychologists believe…point out the blindingly obvious opposite of the view you’ve hung on evolutionary psychology.

This is, I think, supposed to be the problem that PZ was implying he had with evolutionary psychology more generally and adaptationism specifically. If this was supposed to be his point all along, he really should have put it at the beginning. In fact, had he simply written “not all traits and variations of those traits are adaptations” he could have saved a lot of time and been met with agreement from, of all people, evolutionary psychologists.

Breaking with tradition, PZ does mention that there have been some evolutionary psychology papers that he likes. I can only suppose their foundational concept was somehow different from the ones he doesn’t like. Confusingly, however, PZ also goes on to say that he tends to like evolutionary psychology papers more as they gets away from the “psychology” part of things (the quotes are his and I have no idea what they are supposed to mean), and focus more on genetics, which makes me wonder about whether he’s actually reading papers in the field he thinks he is…

“No; I’m not lost, and no, I won’t stop and ask for directions”

Finally, PZ ends his rather strange post by asserting that we can’t learn anything of importance evolutionarily from studying undergraduates (which isn’t a novel claim for him). I’m most certainly in favor of research with more diverse cross-cultural samples, and moving beyond the subject pool is a good thing for all researchers in psychology to do. The assertion that we can’t learn anything of value from this sample of people strikes me as rather strange, though. It would be nice, I suppose, if PZ could helpfully inform us as to which types of people we could potentially learn important psychological things from, what kind of important things those might be, and why those things are specific to those samples, but I suspect he’s saving that wisdom up for another day.

Are Associations Attitudes?

If there’s one phrase that people discussing the results of experiments have heard more than any other, a good candidate might be “correlation does not equal causation”. Correlations can often get mistaken for (at least implying) causation, especially if the results are congenial to a preferred conclusion or interpretation. This is a relatively uncontroversial matter which has been discussed to death, so there’s little need to continue on with it. There is, however, a related reasoning error people also tend to make with regard to correlation; one that is less discussed than the former. This mistake is to assume that a lack of correlation (or a very low one) means no causation. Here are two reasons one might find no correlation, despite underlying relationships: in the first case, no correlation could result from something as simple as there being no linear relationship between two variables. As correlations only measure linear relationships, distributions that resemble bell curves would tend to yield correlations equal to zero.

For the second case, consider the following example: event A causes event B, but only in the absence of variable C. If variable C randomly varies (it’s present half the time and absent the other half), [EDIT: H/T Jeff Goldberg] you might end up with no correlation, or at least a very reduced one, despite direct causation. This example becomes immediately more understandable if you relabel “A” as heterosexual intercourse, “B” as pregnancy, and “C” as contraceptives (ovulation works too, provided you also replace “absence” with presence). That said, even if contraceptives aren’t in the picture, the correlation between sexual intercourse and pregnancy is still pretty low.

And just in case you find that correlation reaching significance, there’s always this.

So why all this talk about correlation and causation? Two reasons: first, this is my website and I find the matter pretty neat. More importantly, though, I’d like to discuss the IAT (implicit association test) today; specifically, I’d like to address the matter of how well the racial IAT correlates (or rather, fails to correlate) with other measures of racial prejudice, and how we ought to interpret that result. While I have touched on this test very briefly before, it was in the context of discussing modularity; not dissecting the test itself. Since the IAT has recently crossed my academic path again on more than one occasion, I feel it’s time for a more complete engagement with it. I’ll start by discussing what the IAT is, what many people seem to think it measures, and finally what I feel it actually assesses.

The IAT was introduced by Greenwald et al in 1998. As per its namesake, the test was ostensibly designed to do something it would appear to do fairly well: measure the relative strengths of initial, automatic cognitive associations between two concepts. If you’d like to see how this test works firsthand, feel free to follow the link above, but, just in case you don’t feel like going through the hassle, here’s the basic design (using the race-version of the test): subjects are asked to respond as quickly as possible to a number of stimuli. In the first phase, subjects will view pictures of black and white faces flashed on the screen and asked to press one key if the face is black and another if it’s white. In the second phase, subjects will do the same task, but this time they’ll press one key if the word that flashes on the screen is positive and another if it’s negative. Finally, these two tasks are combined, with subjects asked to press one key if the face is white or the word is positive, and another key if the face is black or the word is negative (these conditions then flip). Different reaction times in this test are taken to be measures of implicit cognitive associations. So, if you’re faster to categorize black faces with positive words, you’re said to have a more positive association towards black people.

Having demonstrated that many people seem to show a stronger association between white faces and positive concepts, the natural question arises about how to interpret these results. Unfortunately, many psychological researchers and laypeople alike have taken a unwarranted conceptual leap: they assume that these differential association strengths imply implicit racist attitudes. This assumption happens to meet with an unfortunate snag, however, which is that these implicit associations tend to have very weak to no correlations with explicit measures of racial prejudice (even if the measures themselves, like the Modern Racism Scale, are of questionable validity to begin with). Indeed, as reviewed by Arkes & Tetlock (2004), whereas the vast majority of undergraduates tested manifest exceedingly low levels of “modern racism”, almost all of them display a stronger association between white faces and positivity. Faced with this lack of correlation, many people have gone on to make a second assumption to account for this lack, that assumption being that the implicit measure is able to tap some “truer” prejudiced attitude that the explicit measures are not as able to tease out. I can’t help but wonder, though, what those same people would have had to say if positive correlations had turned up…

“Correlations or no, there’s literally no data that could possibly prove us wrong”

Arkes & Tetlock (2004) put forth three convincing reasons to not make that conceptual jump from implicit associations to implicit attitudes. Since I don’t have the space to cover all their objections, I’ll focus on the key points of them. The first is one that I feel ought to be fairly obvious: quicker associations between whites and positive concepts are capable of being generated by merely being aware of racial stereotypes, irrespective of whether one endorses them on any level, conscious or not. Indeed, even African American subjects were found to manifest pro-white biases in these tests. One could take those results as indicative of black subjects being implicit racist against their own ethnic group, though it would seem to make more sense to interpret those results in terms of the black subjects being aware of the stereotypes they did not endorse. The latter interpretation also goes a long way towards understanding the small and inconsistent correlations between the explicit and implicit measures; the IAT is measuring a different concept (knowledge of stereotypes) than the explicit measures (endorsement of stereotypes).

In order to appreciate the next criticism of this conceptual leap, there’s an important point worth bearing in mind concerning this IAT: the test doesn’t measure where two concepts are associated in any sense whatsoever; it merely measures relative strengths of these associations (for example, “bread” might be more strongly associated with “butter” than it is with “banana”, though it might be more associated with both than with “wall”). This importance of this point is that the results of the IAT do not test whether there is a negative association towards any one group; just whether one group is rated more positively than another. While whites might have a stronger association with positive concepts than blacks, it does not follow that blacks have a negative association overall, nor that whites have a particularly positive one either. Both groups could be held in high or low regard overall, with one being slightly favored. In much the same way, I might enjoy eating both pizza and turkey sandwiches, but I would tend to enjoy eating pizza more. Since the IAT does not track whether these response time differentials are due to hostility, these results do not automatically seem to apply well to most definitions of prejudice.

Finally, the authors make the (perhaps politically incorrect) point that noticing behavioral differences between groups – racial or otherwise – and altering behavior accordingly is not, de facto, evidence of an irrational racial biases; it could well represent the proper use of Bayesian inference, passing correspondence benchmarks for rational behavior. If one group, A, happens to perform behavior X more than group B, it would be peculiar to ignore this information if you’re trying to predict the behavior of an individual from one of those groups. In fact, when people fail to do as much in other situations, people tend to call that failure a bias or an error. However, given that race is touchy political subject, people tend to condemn others for using what Arkes & Tetlock (2004) call “forbidden base rates”. Indeed, the authors report that previous research found subjects were willing to condemn an insurance company for using base rate data for the likelihood of property damage in certain neighborhoods when that base rate also happened to correlate with the racial makeup of that neighborhood (but not when those racial correlates were absent).

A result which fits nicely with other theory I’ve written about, so subscribe now and don’t miss any more exciting updates!

To end this on a lighter, (possibly) less politically charged note, a final point worth considering is that this test measures the automaticity of activation; not necessarily the pattern of activation which will eventually obtain. While my immediate reaction towards a brownie within the first 200 milliseconds might be “eat that”, that doesn’t mean that I will eventually end up eating said brownie, nor would it make me implicitly opposed toward the idea of dieting. It would seem that, in spite of these implicit associations, society as a whole has been getting less overtly racist. The need for researchers to dig this deep to try and study racism could be taken as heartening, given that we, “now attempt to gauge prejudice not by what people do, or by what people say, but rather by millisecs of response facilitation of inhibition in implicit association paradigms” (p.275). While I’m sure there are still many people who will make a lot about these reaction time differentials for reasons that aren’t entirely free from their personal politics, it’s nice to know just how much successful progress our culture seems to have made towards eliminating racism.

References: Arkes, H.R., & Tetlock, P.E. (2004). Attributions of implicit prejudice, or “Would Jesse Jackson ‘fail’ the implicit association test?” Psychological Inquiry , 15, 257-278

Greenwald, A.G., McGhee, D.E., & Schwartz, J.L.K. (1998). Measuring individual differences in implicit cognition: The implicit association test. Journal of Personality and Social Psychology, 74, 1464-1480

The Sometimes Significant Effects Of Sexism

Earlier this week I got an email from a reader, Patrick, who recommended I review a paper entitled, “More than “just a joke”: The prejudice-releasing function of sexist humor” by Ford et al (2007). As I happen to find discussions of sexism quite interesting and this article reviewable, I’m happy to share some of my thoughts about it. I would like to start by noting that the title of the article is a bit off-putting for me due to the author’s use of word “function”. While I’m all for more psychological research taking a functionalist perspective, their phrasing would, at least in my academic circle, carry the implication that the use of sexist humor evolved because of its role in releasing prejudice, and I’m fairly certain that is not the conclusion that Ford et al (2007) intended to convey. Though Ford et al (2007) manage to come close to something resembling a functionalist account (there is some mention of costs and the avoiding of them), it’s far from close enough for the coveted label of “function” to be applied. Until their account has been improved in that regard, I feel the urge to defend my academic semantic turf.

So grab your knife and best dancing shoes.

Concerning the study itself, Ford et al (2007) sought to demonstrate that sexist humor would lead men who were high in “hostile sexism” to act in a discriminatory fashion towards women. Broadly speaking, the authors suggest that people who hold sexist beliefs often try to suppress the expression of those beliefs in the hopes of avoiding condemnation by others who are less sexist; however, when condemnation is perceived to be unlikely, those who hold sexist beliefs will stop suppressing them, at least to some extent. The authors further suggest that humor can serve to create an atmosphere where condemnation of socially unpopular views is perceived as less likely. Stringing this all together, we end up with the conclusion that sexist humor can create an environment that sexist men will perceive as more welcoming for their sexist attitudes, which they will subsequently be more likely to express.

Across two studies, Ford et al (2007) found support for this conclusion. In the first study, male subject’s hostile sexism scores were assessed through the “Ambivalent Sexism Inventory” 2 to 4 weeks prior to the study. The study itself involved presenting the males with one of three vignettes that included sexist jokes, sexist statements, or neutral jokes, followed by asking them how much they would hypothetically be willing to donate to a woman’s organization. The results showed that while measures of hostile sexism alone did not predict how much men were willing to donate, when confronted with sexist humor, those men who scored higher on the sexism measure tended to donate less of their hypothetical $20 to a woman’s group. Further, neither the sexist statements or neutral joke conditions had any effect on a man’s willingness to donate, regardless of his sexism score. In fact, though it was not significant, men who scored higher in hostile sexism were more likely to donate more to a woman’s group relative to those who scored low, following the sexist statements.

There are two important points to consider regarding this first set of findings. The first of these points relates to the sexist statement condition: if the mechanism through which Ford et al (2007) are proposing hostile sexism becomes acted upon is the perception of tolerance of sexist beliefs, the sexist statements condition is rather strange. In that condition, it would appear rather unambiguous that there is a local norm for the acceptance of sexism against women, yet the men high in sexism don’t seem to “release” theirs. This is a point that the authors don’t engage with, and that seems like a rather major oversight. My guess is that the point isn’t engaged in because it would be rather difficult for the author’s model to account for, but perhaps they had some other reason for not considering it (even though it easily could have easily and profitably been examined in their second study). The second point that I wanted to deal with concerns the way in which Ford et al (2007) seem to write about sexism. Like many people (presumably) concerned about sexism, they only appear concerned with one specific type of sexism: the type where men appear biased against women.

“If he didn’t want to get hit, he shouldn’t have provoked her!”

For instance, in the first study, the authors report, truthfully, that men higher in hostile sexism tended to donate less to a woman’s group then men lower in hostile sexism did. What they do not explicitly mention is how those two groups compare to a control: the neutral joke condition. Regardless of one’s sexism score, people donated equally in the neutral condition. In the sexist joke condition, those men high in hostile sexism donate less, relative to the neutral condition; on the other hand, those men low in hostile sexism donated more in the sexist humor condition, relative to the neutral control. While the former is taken as an indication of a bias against women, the latter is not discussed as a bias in favor of women. As I’ve written about before, only discussing one set of biases does not appear uncommon when it comes to sexism research, and I happen to find that peculiar. This entire study is dedicated towards looking at the result of ostensibly sexist attitudes held by men against women; there is no condition where women’s biases (either towards men or women) are assessed before and after hearing sexist jokes about men. Again, this is a rather odd omission if Ford et al (2007) are seeking to study gender biases (that is, unless the costs for expressing sexist beliefs are lower for women, though this point is never talked about either). The authors do at least mention in a postscript that women’s results on the hostile sexism scale don’t predict their attitudes towards other women, which kind of calls into question what this measure of “hostile sexism” is actually supposed to be measuring (but more on that later).

The second study had a few (only 15 per condition) male subjects watching sexist or non-sexist comedic skits in small groups, after which they were asked to fill out a measure concerning how they would allocate a 20% budget cut among 5 different school groups, one of which was a woman’s group (the others were an African American, Jewish, study abroad, and Safe Arrival for Everyone). Following this, subjects were asked how people in their group might approve of their budget cuts, as well as how others outside of the group might approve of their cuts. As before, those who were higher in hostile sexism were more likely to reduce more of the budget of the woman’s group, but only in the sexist joke condition. Those with higher hostile sexism scores were also more likely to deduct more money from the African American group as well, but only in the neutral humor condition, though little is said about this effect (the authors do mention it is unlikely to be driven by the same mechanism; I think it might just reflect chance). Those in the high sexism, sexist humor group were also likely to believe that others in their condition would approve of their budget reductions to the woman’s group, though they were no more likely to think students at large would approve of their budget cuts than other groups.

The sample size for this second study is a bit concerning. There were 30 subjects total, across two groups, and the sample was further divided by sexism scores. If we assume there were equal numbers of subjects with high and low sexism scores, we’re looking at only about 7 or 8 subjects per condition, and that’s only if we divide the sample by sexism scores above and below the mid-point. I can’t think of a good reason for collecting such a small sample, and I have some concerns that it might reflect data-peaking, though I have no evidence that it does. Nevertheless, the authors make a lot of the idea that subjects higher in sexism felt there was more consensus about the local approval rating of their budget cuts, but only in the sexist humor condition; that is, they might have perceived the local norms about sexism to be less condemning of sexist behavior following the sexist jokes. As I mentioned before, it would have been a good idea for them to test their mechanism using other, non-humor conditions, such as the sexist statement they used initially and subsequently dropped. There’s not much more to say about that except that Ford et al (2009) mention in their introduction the statement manipulations seemed to work as a releaser for racial bias without mentioning why they didn’t work for gender.

So it might be safer to direct any derogatory comments you have to the right…

I would like to talk a bit more about the Ambivalent Sexism Inventory before finishing up. I took the test online in order to see what items were being used as research for this post (and you can as well, by following the link above) and I have some reservations as to what precisely it’s measuring. Rather than measuring sexism, per se, the hostile portion of the inventory appears to deal, at least in part, with whether one or one agrees with certain feminist ideas. For instance, two questions which stand out as being explicit about this are (1) “feminists are making entirely reasonable demands of men”, and (2) “feminists are not seeking for women to have more power than men”. Not only do such questions not necessarily reflect one’s views of women more generally (provided one can be said to have a view of women more generally), but they are so hopelessly vague in their wording that they can be interpreted to have a unacceptably wide range of meanings. As the two previous studies and the footnote demonstrate, there doesn’t seem to be a consistent main effect of one’s score on this test, so I have reservations as to whether it’s really tapping sexism per se.

The other part of the sexism inventory involves what is known as “benevolent sexism” – essentially the notion that men ought to do things for women, like gain their affection or protect and provide for them, or that women are in some respects “better” than men. As the website with the survey helpfully informs us, men and women don’t seem to differ substantially in this type of sexism. However, this type of benevolent sexism is also framed as sexism against women that could turn “ugly” for them; not as sexism directed against men, which I find curious, given certain questions (such as, “Women, compared to men, tend to have a superior moral sensibility.” or “Men should be willing to sacrifice their own well being in order to provide financially for the women in their lives.”). Since this is already getting long, I’ll just wonder aloud why no data of the measures of benevolent sexism appear in this paper anywhere, given that the authors appears to have collected it.

References: Ford, T., Boxer, C., Armstrong, J., & Edel, J. (2007). More Than “Just a Joke”: The Prejudice-Releasing Function of Sexist Humor Personality and Social Psychology Bulletin, 34 (2), 159-170 DOI: 10.1177/0146167207310022

 

Does “Statistical Significance” Imply “Actually Significant”?

P-values below 0.05; the finding and reporting of these values might be considered the backbone of most psychological research. Conceptually, these values are supposed to represent the notion that, if the null hypothesis is true, the odds of observing some set of results are under 5%. As such, if one observes a result unlikely to be obtained by chance, this would seem to carry the implication that the null hypothesis is unlikely to be true and there are likely real differences between the group means under examination. Despite null hypothesis significance testing becoming the standard means of statistical testing in psychology, the method is not without its flaws, both on the conceptual and practical levels. According to a paper by Simmons et al (2011), on the practical end of things, some of the ways in which researchers are able to selectively collect and analyze data can dramatically inflate the odds of obtaining a statistically significant result.

Don’t worry though; it probably won’t blow up in your face until much later in your career.

Before getting to their paper, it’s worth covering some of the conceptual issues inherent with null hypothesis significance testing, as the practical issues can be said to apply just as well to other kinds of statistical testing. Brandstaetter (1999) raises two large concerns about null hypothesis significance testing, though really they’re more like two parts of the same concern, and, ironically enough, almost sound as if they’re opposing points. The first part of this concern is that classic significance testing does not tell us whether the results we observed came from a sample with a mean that was actually different from the null hypothesis. In other words, a statistically significant result does not tell us that the null hypothesis is false; in fact, it doesn’t even tell us the null hypothesis is unlikely. According to Brandstaetter (1999), this is due to the logic underlying significance testing being invalid. The specific example that Brandstaetter uses references the rolling of dice: if you roll a twenty-sided die, it’s unlikely (5%) that you will observe a 1; however, if you observe a 1, it doesn’t follow that it’s unlikely you rolled the die.

While that example addresses null hypothesis testing at a strictly logical level, this objection can be dealt with fairly easily, I feel: in Brandstaetter’s example, the hypothesis that one would be testing is not “the die was rolled”, so that specific example seems a bit strange. If you were comparing the heights of two different groups (say, men and women), and you found one group was, in your sample, an average of six inches, it might be reasonable to conclude that it’s unlikely that the population means that the two samples come from are the same. This is where the second part of the criticism comes into play: in reality, the means of different groups are almost guaranteed to be different in some way, no matter how small or large that difference is. This means that, strictly speaking, the null hypothesis (there is no mean difference) is pretty much always false; the matter then becomes whether your test has enough power to reach statistical significance, and increasing your sample size can generally do the trick in that regard. So, in addition to not telling us about whether the null hypothesis is true or false, the best that this kind of significance testing can do is tell us a specific value  that a population mean is not. However, since there are an infinite number of possible values that a population mean could hypothetically take, the value of this information may be minimal.

Even in the best of times, then, significance testing has some rather definite conceptual concerns. These two conceptual issues, however, seem to be overshadowed in importance by that practical issues that arise during the conducting of research; what Simmons et al (2011) call “researcher degrees of freedom”. This term is designed to capture some of the various decisions that researchers might make over the course of collecting and analyzing data while hunting for statistically significant results capable of being published. As publications are important for any researcher’s career, and statistically significant results are the kind that are most likely to be published (or so I’ve been told), this combination of pressures can lead to researchers making choices – albeit not typically malicious ones – that increase their chances of finding such results.

“There’s a significant p-value in this mountain of data somewhere, I tell you!”

Simmons et al (2011) began by generating random samples all pulled from a normal distribution across 15,000 independent simulations. Since they were testing for how often statistically significant effects were found, if they were using classic significance testing, that rate should not tend to exceed 5%. When there were two dependent measures capable of being analyzed (in their example, these were willingness to pay and liking), the ability to analyze these two measures separately or in combination nearly doubled the chances of finding a statistically significant “effect” at the 0.05 level. That is to say, the odds of finding an effect by chance were no longer 5%, but closer to 10%. A similar effect was found given the researchers controlled for gender. This makes intuitive sense, as it’s basically the same manipulation as the former two-measure case, just with a different label.

There’s similar bad news for the peak-and-test method that some researchers make use of with their data. In these cases, a researcher will collect some number of subjects for each condition – say 20 – and conduct a test to see if they found an effect. If an effect is found, the researcher will stop collecting data; if the effect isn’t found, the researcher will then collect another number of observations per condition – say another 10 – and then retest for significance. A researcher’s ability to peak at their data increased the odds of finding an effect by chance up to about 8%. Finally, if the researcher decides to run multiple levels of a condition (Simmons et al’s example concerned splitting the sample into low, medium, and high conditions), the ability to selectively compare these conditions to each other brought the false positive rate up to 12.6%. Worrying, if these four degrees of researcher freedom were combined, the odds of finding a false positive were as high as 60%; that is, the odds are better that you would find some effect strictly by chance than you wouldn’t. While these results might have been statistically significant, they are not actually significant. This is a fine example of Brandstaetter’s (1999) initial point: significance testing does not tell us that the null hypothesis is true or likely, as it should have been in all these cases.

As Simmons et al (2011) also note, this rate of false positives might even be conservative, given that there are other, unconsidered liberties that researchers can take. Making matters even worse, there’s the aforementioned publication bias, in that, at least as far as I’ve been led to believe, journals tend to favor publications that (a) find statistically significant results and (b) are novel in their design (i.e. journals tend to not publish replications). This means that when false positives are found, they’re both more likely to make their way into journals and less likely to subsequently be corrected. In turn, those false positives could lead to poor research outcomes, such as researchers wasting time and money chasing effects that are unlikely to be found again, or actually reinforcing the initial false-positive in the event they go chasing after it, it actually is found by chance, and subsequently published again.

“With such a solid foundation, it’s difficult to see how this could have happened”

Simmons et al (2011) do put forth some suggestions as to how these problems could begin to be remedied. While I think their suggestions are all, in the abstract, good ideas, they would likely also generate a good deal more paperwork for researchers to deal with, and I don’t know a researcher alive who craves more paperwork. While there might be some tradeoff, in this case, between some amount of paperwork and eventual research quality, there is one point that Simmons et al (2011) do not discuss when it comes to remedying this issue, and that’s the matter I have been writing about for some time: the inclusion of theory in research. In my experience, a typical paper in psychology will give one of two explicit reasons for its being conducted: (1) an effect was found previously, so the researchers are looking to either find it again (or not find it), or (2) the authors have a hunch they will find an effect. Without an real theoretical framework surrounding these research projects, there is little need to make sense of or actually explain a finding; one can simply say they discovered a “bias” or a “cognitive blindness” and leave it at that. While I can’t say how much of the false-positive problem could be dealt with by requiring the inclusion of some theoretical framework for understanding one’s results when submitting a manuscript, if any, I feel some theory requirement would still go a long way towards improving the quality of research that ends up getting published. It would encourage researchers to think more deeply about why they’re doing what they’re doing, as well as help readers to understand (and critique) the results they end up seeing. While dealing with false positives should certainly be a concern, merely cutting down on their appearance is not be enough to help research quality in psychology progress appreciably.

References: Brandstaetter (1999). Confidence intervals as an alternative to significance testing. Methods of Psychological Researcher Online, 4.

Simmons, J., Nelson, L., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant Psychological Science, 22 (11), 1359-1366 DOI: 10.1177/0956797611417632

A Frequentist And A Bayesian Walk Into Infinity…

I’m going to preface this post by stating that statistics is not my primary area of expertise. Admittedly, this might not be the best way of generating interest, but non-expertise hasn’t seem to have stopped many a teacher or writer, so I’m hoping it won’t be too much of a problem here. This non-expertise, however, has apparently also not stopped me from stumbling upon an interesting question concerning Bayesian statistics. Whether this conceptual problem I’ve been mulling over would actually prove to be a problem in real-world data collection is another matter entirely. Then again, there doesn’t appear to be a required link between academia and reality, so I won’t worry too much about that while I indulge in the pleasure of a little bit of philosophical play time.

The link between academia and reality is about as strong as the link between my degree and a good job.

So first, let’s run through a quick problem using Bayesian statistics. This is the classic example that I was introduced to the idea by: say that you’re doctor trying to treat an infection that has broken out among a specific population of people. You happened to know that 5% of the people in this population are actually infected and you’re trying to figure out who those people are so you can at least quarantine them. Luckily for you, you happen to have a device that can test for the presence of this infection. If you use this device to test an individual who actually has the disease, it will come back positive 95% of the time; if the individual does not have the disease, it will come back positive 5% of the time. Given that an individual has tested positive for the disease, what is the probability that they actually have it? The answer, unintuitive to most, is 50%.

Though the odds of someone testing positive if they have the disease are high (95%), very few people actually have the disease (5%). So 5% of the 95% of the people who don’t have an infection will test positive and 95% of the of 5% of people who do have an infection also will. In case that example ran by too quickly, here’s another brief video example using hipsters drinking beer over treating infection. This method of statistical testing would seem to have some distinct benefits: for example, it will tell you the probability of your hypothesis, given your data, rather than the probability of your data, given your hypothesis (which, I’m told, is what most people actually want to be calculating). That said, I see two (possibly major) conceptual issue with this type of statistical analysis. If anyone more versed in these matters feels they have good answers to them, I’d be happy to hear it in the comments section.

The first issue was raised by Gelman (2008), who was discussing the usefulness of our prior knowledge. In the above examples, we know some information ahead of time (the prevalence of an infection or hipsters); in real life, we frequently don’t know this information; in fact, it’s often what we’re trying to estimate when we’re doing our hypothesis tests. This puts us in something of a bind when it comes to using Bayes’ formula. Lacking objective knowledge, one could use what are called subjective priors, which represent your own set of preexisting beliefs about how likely certain hypotheses are. Of course, subjective priors have two issues: first, they’re unlikely to be shared uniformly between people, and if your subjective beliefs are not my subjective beliefs, we’ll end up coming to two different conclusions given the same set of data. It’s also probably worth mentioned that subjective beliefs do not, to the best of my knowledge, actually effect the goings-on in the world: that I believe it’s highly probable it won’t rain tomorrow doesn’t matter; it either will or I won’t, and no amount of belief will change that. The second issue concerns the point of the hypothesis test; if you already have a strong prior belief about the truth of a hypothesis, for whatever reason you do, that would seem to suggest there’s little need for you to actually collect any new data.

On the plus side, doing research just got way easier!

One could attempt to get around this problem by using a subjective, but uninformative prior; that is, distribute your belief uniformly over your set of possible outcomes, or to enter into your data analysis with no preconceptions about how it’ll turn out. This might seem like a good solution to the problem, but it would also seem to make your priors all but useless. If you’re multiplying by the same constant, you can just drop it from your analysis. So it would seem in both cases, priors don’t do you a lot of good: they’re either strong, in which case you don’t need to collect more data, or uninformative, in which case they’re pointless to include in the analysis. Now perhaps there are good arguments to be made for subjective priors, but that’s not the primary point I hoped to address; my main criticism involves what’s known as the gambler’s fallacy.

This logical fallacy can be demonstrated with the following example: say you’re flipping a fair coin; given that this coin has come up heads 10 times in a row, how likely will the probability of a tails outcome be on the next flip? The answer, of course, is 50%, as a fair coin is one that is unbiased with respect to which outcome will obtain when you flip it; the probability of a heads outcome using this coin is always as likely as a tails outcome. However, someone making the gambler’s fallacy will suggest that the coin is more likely to come up tails, as all the heads outcomes makes the tails outcome feel more likely; as if a tails outcome is “due” to come up. This is incorrect, as each flip of this coin is independent of the other flips, so knowing what the previous outcomes of this coin have been tell you nothing about what the future outcomes of the coin will be, or, as others have put it, the coin has no memory. As I see it, Bayesian analysis could lead one to engaging in this fallacy (or, more precisely, something like the reverse gambler’s fallacy).

Here’s the example I’ve been thinking about: consider that you have a fair coin and an infinite stretch of time over which you’ll be flipping it. Long strings of heads or tails outcomes (say 10,000 in a row, or even 1,000,000 and beyond in a row) are certainly improbable, but given an infinite amount of time, they become an inevitability outcomes that will obtain eventually. Now, if you’re a good Bayesian, you’ll update your posterior beliefs following each outcome. In essence, after a coin comes up heads, you’ll be more likely to think that it will come up heads on the subsequent flip; since heads have been coming up, more heads are due to come up. Essentially, you’ll be suggesting that these independent events are not actually independent of each other, at least with respect to your posterior beliefs. Given these long strings of heads and tails which will inevitably crop up, over time you will go from believing the coin is fair, to believing that it is nearly completely biased towards both heads and tails and back again.

Though your beliefs about the world can never have enough pairs of flip-flips…

It seems to me, then, that you want some statistical test that will, to some extent, try and take into account data that you did not obtain, but might have if you want to more accurately estimate the parameter (in this case, the fairness of the coin: what might have happened if I flipped the coin another X number of times). This is, generally speaking, anathema to Bayesian statistics as I understand it, who only concern themselves with the data that was collected. Of course, that does raise the question of how one can accurately predict what data they might have obtained, but did not, for which I don’t have a good answer. There’s also the matter of precisely how large of a problem this hypothetical example poses for Bayesian statistics when you’re not dealing with an infinite number of random observations; in the real world, this conceptual problem might not be much of one as these events are highly improbable, so it’s rare that anyone will actually end up making this kind of mistake. That said, it is generally a good thing to be as conceptually aware of possible problems as we can be if we want any hope of fixing them.

References: Gelman, A. (2008). Objections to Bayesian statistics Bayesian Analysis, 3, 445-450 DOI: 10.1214/08-BA318

(Not So) Simple Jury Persuasion: Beauty And Guilt

It should come as no shock to anyone, really, that people have all sorts of interesting cognitive biases. Finding and describing these biases would seem to make up a healthy portion of the research in psychology, and one can really make a name for themselves if the cognitive bias they find happens to be particularly cute. Despite this well-accepted description of the goings-on in the human mind (it’s frequently biased), most research in the field of psychology tends to overlook, explicitly or implicitly, those ever-important “why” questions concerning said biases; the paper by Herrera et al (2012) that I’ll be writing about today (and the Jury Room covered recently) is no exception, but we’ll deal with that in a minute. Before I get to this paper, I would like to talk briefly about why we should expect cognitive biases in the most general terms.

Hypothesis 1: Haters gonna hate?

When it comes to the way our mind perceives and processes information, one might consider two possible goals for those perceptions: (1) being accurate – i.e. perceiving the world in an “objective” or “correct” way – or (2) doing (evolutionarily) useful things. A point worth bearing in mind is that the latter goal is the only possible route by which any cognitive adaptation could evolve; a cognitive mechanism that did not eventually result in a reproductive advantage would, unsurprisingly, not be likely to spread throughout the population. That’s most certainly not to say that accuracy doesn’t matter; it does, without question. However, accuracy is only important insomuch as it leads to doing useful things. Accuracy for accuracy’s sake is not even a potential selection pressure that could shape our psychology. While, generally speaking, having accurate perceptions can often lead towards adaptive ends, when those two goals are in conflict, we should expect doing useful things to win every time, and, when that happens, we should see a cognitive bias as the result.

A quick example can drive this point home: your very good friend finds himself in conflict with a complete stranger. You have arrived late to the scene, so you only have your friend’s word and the word of the stranger as to what’s going on. If you were an objectively accurate type, you might take the time to listen to both of their stories carefully, do your best to figure out how credible each party is, find out who was harmed and how much, and find the “real” victim in the altercation. Then, you might decide whether or not to get involved on the basis of that information. Now that may sound all well and good, but if you opt for this route you also run the risk of jeopardizing your friendship to help out a stranger, and losing the benefits of that friendship is a cost. Suffering that cost is, all things considered, evolutionarily, would be a “bad” thing, even if uninvolved parties might consider it to be it the morally correct action (skirting for the moment the possibility of costs that other parties might impose, though avoiding those could easily be fit in the “doing useful things” sides of the equation). This suggests that, all else being equal, there should be some bias that pushes people towards siding with their friends, as siding against them is a costlier alternative.

So where all this leads us is to the conclusion that when you see someone proposing that a cognitive bias exists, they are, implicitly or explicitly, suggesting that there is a conflict between accuracy and some cost of that accuracy, be that conflict over behaving in a way that generates an adaptive outcome, trade-offs between cognitive costs of computation and accuracy, or anything else. With that out of the way, we can now consider the paper by Herrera et al (2012) that purports to find a strange cognitive bias when it comes to the interaction of (a) perceptions of credibility, responsibility, and control of a situation when it comes to domestic violence against women, (b) their physical attractiveness, and (c) their prototypicality as a victim. According to their results, attractiveness might not always be a good thing.

Though, let’s face it, attractiveness is, on the whole, typically a good thing.

In their study, Herrera et al (2012) recruited a sample of 169 police offers (153 of which were men) from various regions of Spain. They were divided into four groups, each of which read a different vignette about a hypothetical woman who had filed a self-defense plea for killing her husband by stabbing him in the back several times, citing a history of domestic abuse a fear that he would have killed her during an argument. The woman in these stories – Maria – was either described as attractive or unattractive (no pictures were actually included) along the following lines: thick versus thin lips, smooth features versus stern and jarring ones, straight blonde hair versus dark bundled hair, and slender versus non-slender appearance. In terms of whether Maria was a prototypical battered woman, she was either described as having 2 children, no job with an income, hiding her face during the trial, being poorly dressed, and timid in answering questions, or as having no children, a well-paying job, being well dressed, and resolute in her interactions.

Working under the assumption that these manipulations are valid (I feel they would have done better to have used actual pictures of women rather than brief written descriptions, but they didn’t), the authors found an interesting interaction: when Maria was attractive and prototypical, she was rated as being more credible than when she was unattractive and prototypical (4.18 vs 3.30 out of 7). The opposite pattern held for when Maria was not prototypical; here, attractive Maria was rated as being less credible than her unattractive counterpart (3.72 vs 3.85). So, whether attractiveness was a good or a bad thing for Maria’s credibility depended on how well she otherwise met some criteria for your typical victim of domestic abuse. On the other hand, more responsibility was attributed to Maria for the purported abuse when she was attractive overall (5.42 for attractive, 5.99 for unattractive).

Herrera et al (2012) attempt to explain the attractiveness portion of their results by suggesting that attractiveness might not fit in with the prototypical picture of a female version of domestic abuse, which results in less lenient judgments of their behavior. It seems to me this explanation could have been tested with the data they collected, but they either failed to do so or did and did not find significant results. More to the point, this explanation is admittedly strange, given that attractive women were also rated as more credible when they were otherwise prototypical, and the author’s proximate explanation should, it seems, predict precisely the opposite pattern in that regard. Perhaps they might have had ended up with a more convincing explanation for their results had their research been guided with some theory as to why we should see these biases with regard to attractiveness, (i.e. what the conflict in perception should be being driven by) but it was not.

I mean, it seems like a handicap to me, but maybe you’ll find something worthwhile…

There was one final comment in the paper I would like to briefly consider with regard to what the authors consider two fundamental due process requirements in cases of women’s domestic abuse: (1) the presumption of innocence on the part of the woman making the claim of abuse and (2) the woman’s right to a fair hearing without the risk of revictimization; revictimization, in this case, referring to instances where the woman’s claims are doubted and her motives are called into question. What is interesting about that claim is that it would seem to set up an apparently unnoticed or unmentioned double-standard: it would seem to imply that women making claims of abuse are supposed to be, by default, believed; this would seem to do violence to the right that the potential perpetrator is supposed to have with regard to their presumption of innocence. Given that part of the focus of this research is on the matter of credibility, this unmentioned double-standard seems out of place. This apparent oversight might have to do with the fact that this research was only examining moral claims made by a hypothetical woman, rather than another claim also made by a man, but it’s hard to say for sure.

References: Herrera, A., Valor-Segura, I., & Expósito, F (2012). Is Miss Sympathy a Credible Defendant Alleging Intimate Partner Violence in a Trial for Murder? The European Journal of Psychology Applied to Legal Context, 4, 179-196

Differentiating Between Effects And Functions

A few days ago, I had the misfortune of forgetting my iPod when I got to the gym. As it turns out, I hadn’t actually forgotten it; it had merely fallen out of my bag in the car and I hadn’t noticed, but the point is that I didn’t have it on me. Without the music that normally accompanies my workout I found the experience to be far less enjoyable than it normally is; I would even go so far as to say that it was more difficult to lift what I normally do without much problem. When I mentioned the incident to a friend of mine she expressed surprise that I actually managed to stick around to finish my workout without it; in fact, on the rare occasions I end up arriving at the gym without any source of music, I typically don’t end up even working out at all, demonstrating the point nicely.

“If you didn’t want that bar to be crushing your windpipe, you probably shouldn’t have forgotten your headphones…”

In my experience, listening to music most certainly has the effect of allowing me to enjoy my workout more and push myself harder. The question remains, however, as to whether such effects are part of the function of music; that is to ask do we have some cognitive adaptation(s) designed to generate that outcome from certain given inputs? On a somewhat related note, I recently got around to reading George C Williams book, Adaptation and Natural Selection (1966). While I had already been familiar with most of what he talked about, it never hurts to actually go back and read the classics. In the book, Williams makes a lot of the above distinction between effects and functions throughout the book; what we might also label as byproducts and adaptations respectively. A simple example would demonstrate the point: while a pile of dung might serve as a valuable resource for certain species of insects, the animals which produce such dung are not doing so because it benefits the insects; the effect in this case (benefiting insects) is not the function of the behavior (excreting wastes).

This is an important theoretical point; one which Williams repeatedly brings to bear against the group selection arguments that people were putting forth at the time he was writing. Just because populations of organisms tend to have relatively stable population sizes – largely by virtue or available resources and predation – that effect does not imply there is a functional group-size-regulation adaptation activity generating that outcome. While effects might be suggestive of functions, or at least preliminary requirements for demonstrating function, they are not alone sufficient evidence for them. Adapted functionality itself is often a difficult thing to demonstrate conclusively, which is why Williams offered his now famous quote about adaptation being an onerous concept.

This finally brings us to a recent paper by Dunbar et al (2012) in which the authors find an effect of performing music on pain tolerance; specifically, it’s the performance of music per se, not the act of passively listening to it, that results in an increased pain tolerance. While it’s certainly a neat effect, effects are a dime a dozen; the question of relevance would seem to be whether this effect bears on a possible function for music. While Dunbar et al (2012) seem to think it does, or at least that it might, I find myself disagreeing with that suggestion rather strongly; what they found strikes me more as an effect without any major theoretical implications.

If that criticism stings too much, might I recommend some vigorous singing?

First, a quick overview of the paper: subjects were tested twice for their pain tolerance (as measured by the time people could stand the application of increasing pressure or holding cold objects), both before and after a situation in which they either performed music (singing, drumming, dancing, or practicing) or listened to it (varying the tempo of the music). In most cases it was the active performance of music which led to a subsequent increase in pain tolerance, rather than listening. The exception to that set of findings was that the groups that were simply practicing in a band setting did not show this increase, a finding which Dunbar et al (2012) suggest has to do with the vigor, likely the physical kind, with which the musicians were engaged in their task, not the performance of music per se.

Admittedly, that last point is rather strange from the point of view of trying to build a functional account for music. If it’s the physical activity that causes an increase in pain tolerance, that would not make the performance of music special with respect to any other kind of physical activity. In other words, one might be able to make a functional account for pain sensitivity, but it would be orthogonal to music. For example, in their discussion, the authors also note that laughter can also lead to an increase in pain tolerance as well. So really there isn’t much in this study that can speak to a function of music specifically. Taking this point further, Dunbar et al (2012) also fail to provide a good theoretical account as to how one goes from an increased pain tolerance following music production to increases in reproductive success. From my point of view, I’m still unclear as to why they bothered to examine the link between music production and pain the first place (or, for that matter, why they included dancing, since while dancing can accompany music, it is not itself a form of music, just like my exercise can accompany music, but it not music-related itself).

Dunbar et al (2012) also mention in passing at the end of their paper that music might provide some help to the ability to entrain synchronized behavior, which in turn could lead to increases in group cooperation which, presumably, they feel would be a good thing, adaptively speaking, for the individuals involved in said group. Why this is in the paper is also a bit confusing to me, since it appears to have nothing to do with anything they were talking about or researching up to that point. While it would appear to be, at least on the face of it, a possible theoretical account for a function of music (or at least a more plausible one than their non-existent reason for examining pain tolerance) nothing in the paper seems to directly or indirectly speak to it.

And believe you me, I know a thing or two about not being spoken to…

While this paper serves as an excellent example of some of the difficulties in going from effect to function, another point worth bearing in mind is how little gets added to this account by sketching out the underlying physical substrates through which this effect is generated. Large sections of the Dunbar et al paper is dedicated to these physiological outlines of the effect without many apparent payoff. Don’t get me wrong: I’m not saying that exploring the physiological pathways through which adaptations act is a useless endeavor, it’s just that such sketches do not add anything to an account that’s already deficient in the first place. They’re the icing on top of the cake; not it’s substance. Physiological accounts, while they can be neat if they’re your thing, are not sufficient for demonstrating functionality for exactly the same reasons that effects aren’t; all physiological accounts are, essentially, simply detailed accounts of effects, and byproducts and adaptations alike both have effects.

While this review of the paper itself might have been cursory, there are some valuable lessons to learn from it: (1) always try and start your research with some clearly stated theoretical basis, (2) finding effects does not mean you’ve found a function, (3) sketching effects in greater detail at a physiological level does not always help for developing a functional account, and (4) try and make sure the research you’re doing maps onto your theoretical basis, as tacking on an unrelated functional account at the end of your paper is not good policy; that account should come first, not as an afterthought.

References: Dunbar RI, Kaskatis K, Macdonald I, & Barra V (2012). Performance of music elevates pain threshold and positive affect: Implications for the evolutionary function of music. Evolutionary psychology : an international journal of evolutionary approaches to psychology and behavior, 10 (4), 688-702 PMID: 23089077

Williams, G.C. (1966). Adaptation and natural selection: A critique of some current evolutionary thought. Princeton University Press: NJ