Statisticial Issues In Psychology And What Not To Do About Them

As I’ve discussed previously, there are a number of theoretical and practical issues that plague psychological research in terms of statistical testing. On the theoretical end of things, if you collect enough subjects, you’re all but guaranteed to find some statistically significant result, no matter how small or unimportant it might be. On the practical end of things, even if a researcher is given a random set of data they can end up finding a statistically significant (though not actually significant) result more often than they don’t by exercising certain “researcher degrees of freedom”. These degrees of freedom can take many forms, from breaking the data down into different sections, such as by sex, or high, medium, and low values of the variable of interest, or peaking at the data ahead of time and using that information to decide when to stop collecting subjects, among other methods. At the heart of many of these practical issues is the idea that the more statistic tests you can run, the better your chances of finding something significant. Even if the false-positive rate for any one test is low, with enough tests, the chances of a false-positive result rises dramatically. For instance, running 20 tests with an alpha of 0.05 on random data would result in a false-positive around 64% of the time.

“Hey every body, we got one; call off the data analysis and write it up!”

In attempts to banish false-positives from the published literature, some have advocated the use of what are known as Bonferroni corrections. The logic here seems simple enough: the more tests you run, the greater the likelihood that you’ll find something by chance so, to better avoid fluke results, you raise the evidentiary bar for each statistical test you run (or, more precisely, lower your alpha level). So, if you were to run the same 20 tests on random data as before, you can maintain an experiment-wide false-positive rate of 5% (instead of 64%) by adjusting your per-experiment error-rate to approximately 0.25% (instead of 5%). The correction, then, makes each test you do more conservative as a function of the total number of tests you run. Problem solved, right? Well, no; not exactly. According to Perneger (1998), these corrections not only fail to solve the initial problem we were interested in, but also create a series of new problems that we’re better off avoiding.

Taking these two issues in order, the first is that the Bonferroni correction will only serve to keep the experiment-wide false-positive rate a constant. While it might do a fine job at that, people very rarely care about that number. That is, we don’t care about whether there is a false-positive finding; we care about whether a specific finding is a false positive, and these two values are far from the same thing. To understand why, let’s return to our researcher who was running 20 independent hypothesis tests. Let’s say that, hypothetically, out of those 20 tests, 4 come back as significant at the 0.05 level. Now we know that the probability of making at least one type 1 error (false-positives) is 64%; what we don’t know is (a) whether any of our positive results are false-positives or, assuming at least one of them is, (b) which result(s) that happens to be. The most viable solution to this problem, in my mind, is not to raise the evidentiary bar across all tests, threatening to make all the results insignificant on account of the fact that one of them might just be a fluke.

There are two major reasons for not doing this: the first is that it will dramatically boost our type 2 error rate (failing to find an effect when one actually exists) and, even though this error rate is not the one that many conservative statisticians are predominately interested in, they’re still errors all the same. Even more worryingly, though, it doesn’t seem to make much sense to deem a result significant or not contingent on what other results you were examining. Consider two experimenters: one collects data on three variables of interest from the same group of subjects while a second researcher collects data on those three variables of interest, but from three different groups. Both researchers are thus running three hypothesis tests, but they’re either running them together or separately. If the two researchers were using a Bonferroni correction contingent on the number of tests they ran per experiment, the results might be significant in the latter case but not in the former, even the two researchers got identical sets of results. This lack of consistency in terms of which results get to be counted as “real” will only add to the confusion in the psychological literature.

“My results would have been significant, if it wasn’t for those other meddling tests!”

The full scale of the last issue might not have been captured by the two researcher example, so let’s consider another, single researcher example. Here, a researcher is giving a test to a group of subjects with the same 20 variables of interest, looking for differences between men and women. Among these variables, there is one hypothesis that we’ll call a “real” hypothesis: women will be shorter than men. The other 19 variables being assessed are being used to test “fake” hypotheses: things like whether men or women have a preference for drinking out of blue cups or whether they prefer green pens. A Bonferroni correction would, essentially, treat the results of the “fake” hypotheses as being equally as likely to generate a false-positive as the “real” hypothesis. In other words, Bonferroni corrections are theory-independent. Given that some differences between groups are more likely to be real than others, applying a uniform correction to all those tests seems to miss the mark.

To build on that point, as I initially mentioned, any difference between groups, no matter how small, could be considered statistically significant if your sample size is large enough due to the way that significance is calculated; this is one of the major theoretical criticisms of null hypothesis testing. Conversely, however, any difference, no matter how large, could be considered statistically insignificant if you run enough additional irrelevant tests and apply a Bonferroni correction. Granted, in many cases that might require a vast number of additional tests, but the precise number of tests is not the point. The point is that, on a theoretical level, the correction doesn’t make much sense.

While some might claim that the Bonferroni correct guards against researchers making excessive, unwarranted claims, there are better ways of guarding against this issue. As Perneger (1998) suggests, if researchers simply describes what they did (“we ran 40 tests and 3 were significant, but just barely”), that can generally be enough to help readers figure out whether the results were likely to be the chance outcomes of a fishing expedition or not. The issue is that this potential safeguard is that it would require researchers to accurately report all their failed manipulations as well their successful ones, which, for their own good, many don’t seem to do. One guard that Perneger (1998) does not explicitly mention which can get around that reporting issue, however, is the importance of theory in interpreting the results. As most psychological literature currently stands, results are simply redescribed, rather than explained. In this world of observations equaling explanations and theory, there is little way to separate out the meaningful significant results from the meaningless ones, especially when publication bias generally hinders the failed experiments from making it into print.

What failures-to-replicate are you talking about?

So long as people continue to be impressed by statistically significant results, even when those results cannot be adequately explained or placed into some larger theoretical context, these statistical problems will persist. Applying statistical corrections will not solve, or likely even stem, the research issues in the way psychological research is current conducted. Even if such corrections were honestly and consistently applied, they would likely only change the way psychological research is conducted, with researchers turn to an altogether less-efficient means in order to compensate for the reduced power (running one hypothesis per experiment, for instance).  Rather than demanding a higher standard of evidence for fishing expeditions, one might instead focus on reducing the prevalence of these fishing expeditions in the first place.

References: Perneger TV (1998). What’s wrong with Bonferroni adjustments? BMJ (Clinical research ed.), 316 (7139), 1236-8 PMID: 9553006

“Nice Guys”, The Friend Zone, And Social Semantics

A little over a year ago, a video entitled, “Why men and women can’t be friends” was uploaded to YouTube. In the video, a man approaches various men and women and presents them with the question, “can men and women just be friends?”. While many of the woman answered in the affirmative, most of the men seemed to answer in the negative, suggesting that men would generally be interested in something more; something sexual. When asked about whether their male friends were interested in having sex them, many of the women seemed to similarly acknowledge that, yes, their male friends probably were interested, so maybe there was more lurking behind that “just friendship”. In a follow-up video, the same man asked whether it would be alright for people in relationships to hang out alone with same-sex friends. While the men seemed to be of relatively one mind (no, it would not be appropriate), women, again, initially stated that opposite-sex friends are fine. However, when confronted with the possibility of their significant other hanging out alone with a member of opposite-sex, the tune seemed to change dramatically: now men and women agreed that they would, indeed, be bothered by that state of events.

“And here I thought his sudden interest in jogging was purely platonic”

So why the discrepancy in women’s responses, but not men’s? Perhaps it’s simply due to the magic of video editing, where only certain responses were kept to make a point, but, working under the assumption that’s not happening, I think there’s something interesting going on here. Understanding what that something is will require us to dig deeper into two concepts that have been floating around for some time: “the friend zone” and “Nice guys”. The friend zone, as many of you know, refers to the context where someone wants a relationship with another, but that other doesn’t return the affection. Since the interest isn’t mutual, the party interested in the relationship settles for a friendship with the target of their affections, often with the hope that someday things will change. “Nice guys” on the other hand, are typically men who are stuck in the friend zone and, upon the eventual realization that their friendship will probably not transition into a relationship, become irritated with the person they were interested in, resulting in the friendship being called off and feelings being hurt. The friendship, after all, is not what they were after; they wanted the full relationship (or at least an occasional hook up).

“Nice guys”, in other words, are only being nice because they want to get sex, so they’re not really nice, people seem to feel; hence the quotation marks. Further, “nice guys” are frequently socially maligned, seemingly because of their (actually held or assumed to be held) attitude that women are obligated to have sex or start a relationship with them because they are nice (whether any substantial number of them consciously think this is another matter entirely). Alternatively, “nice guys” are looked down on because they view the friendship – or the friend zone – as, at best, a consolation prize to what they were actually going after or, at worst, something they couldn’t care less about having. The nerve of these people; insisting that just a friendship isn’t enough! There are some very peculiar things about the label of “nice guy”, though; things that don’t quite fit at first glance. The first of these is that the earning of the “nice guy” label appears to be contingent on the target of the affections not returning them. If whomever the “nice guy” is interested in does return the affections, there is no way to tell whether he was “nice guy” or one of those actually nice guys. In other words, you could have two identical guys enacting identical sets of behavior right up to the moment of truth: if the target returns the man’s affections, he’s a nice guy; if she doesn’t and the man doesn’t find that state of affairs satisfactory, he is now a “nice guy”; not a nice guy.

That, however, is only a surface issue. The much more substantial issue is in the label itself, which would, given its namesake, seem to imply that the problem is the nice behavior of the guys, rather than the attitude of entitlement that the label is ostensibly aimed at. This is very curious. If the entitled attitude is what is supposed to be the problem of the people this term is aimed at, why would the label focus on their otherwise nice behavior; behavior that might not differ in any substantial way from the behavior of genuine nice guys? Further, why is the label male-specific (it’s “nice guys” not “nice people”, and even when it’s a woman doing it, well, she’s just being a “nice guy” too)? With these two questions in mind, we’re now prepared to begin to tackle the initial question: why do women’s response to the friendship questions, but not men’s, seem discrepant?

“Thanks for taking me shopping; I’m so lucky to have a friend like you…”

Let’s take the questions in a partially-reversed order: the first is why the term focuses on the nice behavior. The answer here is would seem to revolve around the matter of cooperation and reciprocity more generally. In the social world, when an altruistic individual provides you with a benefit at a cost to themselves, the altruist generally expects repayment at some point down the line. It’s what’s called reciprocal altruism – or, less formally, cooperation – and forms the backbone of pretty much every successful social relationship among non-kin (Trivers, 1971). However, sometimes relationships are not quite as reciprocal in nature: one individual will continuously reap the benefits of altruism without returning them in kind. Names for those types of individuals abound, though the most common are probably exploiters or cheaters. Having a reputation as a cheater is, generally speaking, bad for business when it comes to making and maintaining friendships, so it’s helpful to maintain a good reputation amongst others.

The implications for why the “nice guy” label focuses on otherwise nice behavior should be immediately apparent: if someone is behaving nicely towards you – even if that nice behavior might be unwanted – it creates the expectation of reciprocity, both among the altruist and potentially other third parties. Failing to return the favor, then, can make one look like a social cheater. This obviously puts the recipient in a bind: while they would certainly like to enjoy the benefits of the nice individual’s behavior (free meals, social support, and so on), they don’t want to have the obligation to repay it if it’s avoidable (it’s that expectation that makes people uncomfortable about accepting gifts; not because they don’t want said gifts). So how can that obligation be effectively avoided? One way seems to be to question the altruist’s motives: if the altruist was only giving to get something else (like sex), and if that something else is viewed to be of substantially more valuable than what was initially given (also like sex tends to be), one can frame the ostensible altruist as the exploiter, the cheater, or, in this case, the “nice guy”. If a woman wants to either (a) reap the benefits of nice guys, (b) avoid the costs in not reciprocating what the nice guy wants, or (c) both, then the label of “nice guy” can be quite effective. Since there behavior wasn’t actually nice, there’s no need to reciprocate it.

Bear in mind, none of this needs to be consciously entertained. In fact, in some cases it’s better to not have conscious awareness of such things. For instance, to make that reframing (nice to “nice”) more successful, the person doing the reframing has to come off as having innocent motives themselves: if the woman in question was explicit about her desire to take advantage of men’s niceness towards her with no intentions of any repayment, she’s back to being the cheater in the situation (just as the “nice guy’s” behavior is back to just being plain old nice, if a bit naive). Understanding this point helps us answer the third question: why are women’s responses to the friendship question seemingly discrepant? Conscious awareness of these kinds of mental calculations will typically do a woman no favors, as they might “leak out” into the world, so to speak. To think of it in another way, you’ll have an easier time trying to convince people that you didn’t do something wrong if you legitimately can’t access any memories of you doing something wrong (as opposed to having access to those memories and needing to suppress them). To relate this to the answers in the videos, when a woman is receiving benefits from her male friends, keeping the knowledge that her male friends are trying to get something more from her out of mind can help her defend against the criticism of being a social cheater, as well as avoid the need to pay her male friends back. On the other hand, when it’s her boyfriend who’s now being “nice” to other women, there are benefits to her being rather aware of the underlying motives.

“I swear I was just giving her my opinion about her new bra as a friend!”

Finally, we turn to the answer to second question, the answer to which ought to be obvious by now: why is the “nice guy” term male-specific? This answer has a lot to do with the simple fact that, all else being equal, women do prefer men who invest in them, both in the short and long term, but investment plays a substantially lessened role for women in drawing and maintaining male interest (Buss, 2003). Put simply, males invest because females tend to find that investment attractive. So, to sum up, women want to receive investment and males are generally willing to provide that investment. However, male investment typically comes contingent on the possibility or reality of mating, and when that possibility is withdrawn, so too does male investment wane. The term “nice guy” might serve to both avoid the costs that come with receiving that investment but not returning it, as well as a potential shaming tactic for men who withdraw their niceness when it becomes clear that niceness will not pay off as intended. Similarly, a woman might doubt her partner’s “niceness” when it’s directed towards another. This analysis, however, only examines the female-end of things; males face a related set of problems, just from a different angle. Further, the underlying male strategy is, I assure you, not any less strategic.

References: Buss, D. (2003). The evolution of desire: Strategies of human mating. Basic Books: New York

Trivers, R. (1971). The Evolution of Reciprocal Altruism The Quarterly Review of Biology, 46 (1) DOI: 10.1086/406755

What Should We Mean When We Say “Universal”?

My last post prompted a series of spirited discussions, each of which I found interesting for slightly different reasons. Over the course of one of those discussion, a commenter over at Psychology Today (H/T to Anthro_girl) referred me to an article entitled “Darwin in mind: New opportunities for evolutionary psychology” (Bolhuis et al. 2011). I haven’t yet decided if I this will turn into a series of posts on the ideas presented in that article, but there is one point in particular I would like to focus in on for the current purposes, and it’s entirely semantic in nature: what the term “universal” ought to mean. Attempts on clearing up semantic confusion tend to be unproductive in my experience, but I think it’s important to at least give these matters a deeper consideration, as they can breed the appearance of disagreement, despite two parties saying essentially the same thing (what has been previously called “violent agreement“, and I think represents the bulk of the ideas found in the article).

“You’re absolutely right and I respect your position, which is also my own!”

The first point I would like to mention is that I find Bolhuis et al’s (2011) wording quite peculiar: they seem to, at least at some points, contrast “flexibility” with universality. It sounds as if they are trying to contrast “genetic determinism” with flexibility instead, which seems to be a fairly common mistake people make when criticizing what they think evolutionary psychology assumes. Since that point is a fairly common misunderstanding, there’s little need to go over it again here, but it does give me an opportunity to think about what it means for a trait to be universal, using their example of sexual selection. The authors suggest that as a number of environmental cues (encounter rates, cost of parental investment, etc) change, so too should we expect mating strategies to change: change the inputs to a system, change the outputs. Now nothing about that analysis strikes me as particularly incorrect, but the implication that follows it does: specifically, a universal trait ought not to show much, if any, variation. Well, OK, they don’t really imply it so much as they flat-out say it:

“Arguably, the more flexible and variable the exhibited behaviour, the less explanatory power can be attributed to evolved structure in the mind.”  

Their analysis seems to misstep in regard to why those other variables might matter in determining variation. In order for variables, like encounter rates or the likely costs of parental investment, to matter in the first place, some other psychological mechanisms need to be sensitive to those inputs; other evolved structures of the mind. If no evolved structures are sensitive to those inputs, or the structures which are sensitive to those variables aren’t hooked up to the structures that determine sexual behavior, there wouldn’t be any consistent effect of their presence or absence. Thus, finding variation in a trait, like sexual selectively, doesn’t tell you much about whether the mechanisms involved in determining said behavior are universal or not. This does, however, raise in inevitable question about universality: do we need to expect a near-perfectly consistent expression of a trait in order to call it universal?

I would think not. This gets at a distinction highlighted by Norenzayan & Heine (2005) between various types of universality, specifically the “functional” and “accessible” varieties. The functional type refers to traits that use the same underlying mechanisms and solve the same kinds of problems (so if people in all cultures use hammers to beat in nails, hammers would be functionally universal); the accessible type is the same as the functional type, only that it is used to pretty much the same degree across different cultures (all cultures would need to use their hammers approximately the same amount). In other words, then, different cultures might differ with respect to how sexually selective men tend to be relative to women, but in all people there are still the same underlying mechanisms at work and they are still used to solve the same kinds of problems, so we can still feel pretty good about calling that difference in sexual selectivity a universal. While that’s all well and good, it does create a new problem, though: how much variation counts as “a lot of it”, or at least enough of it to warrant one classification or the other?

Fairly mundane for basketball, but maybe the most exciting soccer match ever.

Two examples should help clear this up. Let’s say you’re a fairly boring kind of researcher and find yourself examining finger length cross-culturally, trying to determine if finger length is universal. You get your ruler out, figure out a way to convince thousands of people the world over to let you examine their hands in dozens of different languages, and locate a nice grant to cover all your travel costs (along with the time you won’t be spending doing other things like teaching or seeing your friends and family). After months of hard work, you’re finally able to report your findings: you have found that middle fingers are approximately 2.75 inches in length and, between cultures, that mean varies between 2.65 inches and 3.25 inches. From this, are we to conclude that middle finger length is or is not universal?

The answer to this question is by no means straight forward; it seems to be more of an “I know it when I see it” kind of judgment call. There clearly is some variation, but is there enough variation there to be meaningful? Would middle fingers be classified as a “functional” universal or an “accessible” universal (if such labels made sense in case of fingers, that is)?  While the finger might seem a bit strange as an example, it has a major benefit: it involves a trait that is rather easy to find a generally agreed upon definition and form of measurement. Let’s say that you’re interested in looking at something a more difficult to assess, like the aforementioned sexually selectively. Now all sorts of new questions will come creeping in: is your test the best way of assessing what you hope to? Is your method one that is likely to be interpreted in a consistent manner across cultures? The initial question still needs to be answered as well: how much variation is enough? If the difference in sexual selectively between men and women is twice as large in culture A, relative to culture B, does that make it a functional or an accessible universal? What is that difference was only 1.5 times the size from culture to culture, or 3 times the size? From what I could gather, there really is no hard or fast rule for determining this, so the distinction might appear to be more arbitrary than real.

While these are all worthwhile questions to consider and difficult ones to answer, let’s assume that we were able to provide answers to them, in some form and find that sexually selectively, while functionally universal, is not what we would consider an accessible universal (that is there is a significant amount, whatever that happens to be, of variance between cultures in its size). While the variance you turned up is all well and good, what precisely is that variance a product of? There are many cognitive mechanisms that play a role in determining sexual selectivity, and our finding that sexual selectively isn’t an accessible universal doesn’t answer the question as to which components that determine that trait are or are not accessible universals. Perhaps approach rate is an accessible universal, but the male/female ratio in a population is only a functional universal. This could, in particular cases, even lead us to some odd conclusions: if one of the mechanisms that helps determines sexual selectivity isn’t an accessible universal in that instance, it might well be considered an accessible universal in another where its output is used to determine some other trait. For instance, hypothetically, sex ratio might not be an accessible universal when it comes to sexual selectivity, but could be one when it comes to determining some propensity for violence. In other cases, sex ratio might be a functional or accessible universal, but only depending on what test you’re using (on a Likert scale, it might only be functionally universal; in a singles bar, it might be accessibly universal).

Riveting as I’m sure you all find this, I’ll try and wrap it up.

So, as before, attempts to clear up semantic confusion have not necessarily been successful. Then again, if matters like this were simple, it’s doubtful that these kinds of disagreement would have cropped up in the first place. Hopefully, some the issues between focusing on the outputs of mechanisms versus the mechanisms themselves have at least been highlighted. There are two final points to make about the idea of universality: first, if there was no underlying universal human nature, cross-cultural research would be all but impossible to conduct, as foreign cultures would not be able to be understood at all in the first place. Secondly, that point is demonstrated well by what I would call cross-cultural cross-fostering. More precisely, as Norenzayan & Heine (2005) note, when infants from other cultures are raised in a new one (say an Asian family immigrates to America), within two or three generations, the children of that family will be all but indistinguishable from their “new” cultural peers. Without an underlying set of universal psychological mechanisms, it’s unclear precisely how such adaptation would be possible.

So yes, while WEIRD undergraduates might not give you a complete picture of human psychology, it doesn’t mean that they offer nothing, or even very little. The differences between cultures can hide the oceans of similarity that lurk right underneath the surface. It’s important to not lose sight of the forest for a few trees.

References: Bolhuis JJ, Brown GR, Richardson RC, & Laland KN (2011). Darwin in mind: new opportunities for evolutionary psychology. PLoS biology, 9 (7) PMID: 21811401

Norenzayan, A., & Heine, S. (2005). Psychological Universals: What Are They and How Can We Know? Psychological Bulletin, 131 (5), 763-784 DOI: 10.1037/0033-2909.131.5.763


Is It Only “Good” Science When It Confirms Your World View?

Most people, when critical of some finding or some field, try to do things like keep their biases hidden, opting instead to try and argue from a position of perceived intellectual neutrality. Kate Clancy, evidently, is not most people. In her recent post at Scientific American, she lays it all out there, right in the title: “5 Ways to Make Progress in Evolutionary Psychology: Smash, Not Match, Stereotypes“. So, there you have it: if evolutionary psychology wants to progress as field, the practitioners ought to ensure we are getting results that Kate finds to be personally palatable so, rather than run experiments, we ought to just ask her what she likes instead. I can only imagine how much time and money this will save us all when it comes to collecting data and getting through the review boards, never mind all that pesky theory development. Of course, her suggestion for progression in the field might not be useful when it comes to developing and testing hypotheses about subjects that aren’t (heavily) stereotyped, but, in all fairness, her suggestion isn’t likely to be helpful in any case at all.

“Sure, it might not run, but at it does that 100% of the time!”

Thankfully, Kate is willing to suggest five more specific criticisms of where she thinks evolutionary psychology stands to be improved. I’m sure that her criticisms here will be enlightening for all the evolutionary psychologists, as the alternative – that she’s proposing things which have already been repeatedly acknowledged and cautioned against by every major researcher in the field from its inception – would probably be pretty embarrassing for her. Sure, the critics of evolutionary psychology have been known to be ignorant of the field they’re criticizing as a general rule, but stereotypes aren’t always true. Hopefully Kate will, like any good scientist should, according to her, bust that stereotype, demonstrating both her fluency in understanding the theoretical commitments of the field and also pointing out their deficiencies. Since I’m a non-progressive evolutionary psychologist, this leaves me stuck with the grim task of confirming the stereotype that critics of my field tend to, in fact, know very little about it. Five rounds and one issue: the progression of evolutionary psychology as a field.

Round 1: [Evolutionary Psychologists] aren’t measuring what we think we are.

The point here is that evolutionary psychologists sometimes use proxy measures to measure other variables. So, for instance, if you want to study some theoretical construct like, say, “general intelligence”, you might use the results of some other test, like an IQ test, to draw inferences about the initial construct (people who score high on the IQ test have a lot of general intelligence). Now there’s nothing wrong with pointing out the fact that these proxy measures might not be tapping the underlying construct that you think they are, nor is it particularly problematic to point out that the underlying construct you think you’re measuring might not even exist. I’m fine with all that. Where I get lost is when I consider what any of it has to do with evolutionary psychology, specifically. Are evolutionary researchers worse at creating or using proxy measures? Does this point speak to the theoretical foundations of evolutionary psychology in any way? Since Kate provides no evidence to help answer the first question, I’ll assume that answer is probably a no (unless Kate is just stereotyping evolutionary researchers as poor in this department). Since proxy measures in no way at all speak to the theoretical commitments of the field itself, this entire point seems rather misguided. If she was talking about the field of psychology more generally, sure, this is a research pitfall to avoid; it’s just not one specific to my field. Round one goes to stereotype confirming evolutionary psychology.

Round 2: Undergrads only teach us about undergrads.

Kate’s criticism here comes in two parts: concerns for generalizability across samples and concerns that undergraduates can’t tell us anything useful about human psychology. Taking them in order, in psychology more generally there is a reliance on undergraduate samples, mainly because they’re cheap and convenient. The problem, though, is that the results of research on some of these undergraduates (typically those taking introductory to psychology, no less), might not tell us much about people who differ from them, either in age, race, education, nationality, social life, etc. On that account, Kate is indeed correct: there might or might not be problems in generalizing from handfuls of undergraduates to the human race more generally. Again, however, this criticism runs directly into the same hurdle her last one did: it’s not specific to any of the theoretical commitments of evolutionary psychology. The problem here is one faced by psychology more generally and, if anything, the people who tend to realize the importance of cross-cultural as well as cross-species research tend to be evolutionary people, at least in my experience.

Her second point, however, is even worse. Kate seems to go from undergraduates might not be able to tell us much about the human species to undergraduates definitely do not tell us anything useful, or, as she puts it, are “about as far removed from the conditions in which we evolved as you can get“. What Kate fails to recognize is that, in the vast majority of respects, these undergraduates are very similar to people everywhere else: they form relationships, both sexual and social, they discriminate between potential mates, they reason, they morally condemn others, they defend against moral condemnation, they eat, they sleep, they reciprocate, they punish non-reciprocation, they learn language, and so on. Focusing on a few superficial differences between groups of people can, it seems, make one miss the oceans of similarity between them. Just because undergraduates aren’t living as hunter-gatherers, it does not follow that they have nothing useful to tell us about human psychology. Round two also goes to stereotype confirming evolutionary psychology.

Three more rounds to go. I’m sure you’ll turn it around…

Round 3: It’s not true that everything happens for a reason.

This charge is a classic one: there’s more to evolution than selection; there are also byproducts, drift, and mutation and those evolutionary psychologists need to recognize this! In Kate’s example, for instance, evolutionary psychologists might make up adaptive stories about her choice of sock color. If that was the state of evolutionary psychology, we truly would be a field in need of scolding. Now I could point out that, a little over two decades ago, in what might be considered the foundational text of the field, the byproduct, drift, and mutation issues are all discussed, and every major figure in the field has, at many points, explicitly acknowledged the role of these forces (see here, specifically charge 2) and leave it at that. I could also point out, as I have done before, that predictions derived from hypotheses of drift don’t tend to make very useful predictions. However, there are two additional points to not miss.

First, suggesting that psychological traits have adaptive functions is a step up from most non-evolutionary psychology, which tends to either posits unless functions (i.e. self-esteem or ego defense) or no functions at all. In this regard, evolutionary psychology is better, not worse, for it. Secondly, and more importantly, Kate gets a lot wrong in this section. Her initial point about how not all behaviors are the result of psychological adaptations misses the point entirely. Her behavior – choice of sock color, in this case – might not be the result of a specific module designed with the function of choosing sock color, but it would be a mistake to, from that, conclude it wasn’t result of other psychological adaptations. This would be as silly as my concluding that, because my body didn’t evolve to eat pop-tarts, my ability to digest them must not be a result of any physiological adaptations designed for digestion. On top of this misunderstanding, she then goes on to suggest that adaptations are heritable, by which she means some variation in them must be due to unique genetic factors. Under this logic, hands aren’t adaptations, because variation in having hands tends to not have a heritable genetic component (as well pretty much all do have hands). Anyone familiar with adaptationist logic will tell you pretty much the opposite: many adaptations – like livers and hands – tend to show very low heritability, because selection tends to remove heritability from the population. Round three is over, and it’s not looking so good for stereotype disconfirmation.

Round 4: There is more than one way [to reproduce]

This point suggests that, apparently, evolutionary psychologists have yet to realize that there’s more than one successful strategy that people can adopt when it comes to reproduction. We apparently don’t realize that there are many possible routes to take, and variable degrees of taking them. This is not only false; it’s stunningly false. In fact, in the next paragraph, Kate mentions that, sure, evolutionary psychologists have done research on some of these different, competing strategies, but it apparently wasn’t up to her standards. If she prefers a more nuanced view than the one she (likely incorrectly) perceives in the people doing research concerning whether one is more of a cad or a dad, she’s more than welcome to it. The researchers in the field would, if her view is better or has something they missed, happily accept the contribution. Were she to offer her view, however, my guess is that she’ll end up publicly disagreeing with an opinion that no serious researcher holds; basically what she is doing here. However, to imply, as she does, that evolutionary researchers don’t appreciate and attempt to understand variation, is just plain stupid, especially right after she points out that evolutionary psychologists already do it.

Kate then seems to try and say something about homosexuality, but, I admit, her point there is lost on me. It might be something along the lines of, “people who identify as non-straight sometimes have children, so there’s nothing to see here”, but I’ll admit that I’m having a hard time following what she’s trying to say, much less what the relevance would be. Round four, unsurprisingly, isn’t going to Kate.

Round 5: Just because [it's currently adaptive, that doesn't mean it previously was]

The only point I really want to make here is noting that Kate gets the definition of the environment of evolutionary adaptedness (EEA) dead wrong. As anyone familiar with this concept, or the primer on the subject, can tell you, the EEA is not a time or a place (much less on a savannah where everyone lived happily, as Kate seems to think it is), but the statistical aggregate of selective forces that shaped an adaptation. Thus, the EEA for language is different from the EEA for mate preference which is different still from the EEA for hands. I suppose I could also mention that every evolutionary psychologists knows that people do some things today – like wear heels and use hormonal birth control – that they used to not do during our evolutionary history, but, at this point, it seems to be so blindingly obvious to anyone that it hardly seems worth repeating. Final round goes to stereotype confirmation.

“I don’t understand your position, yet remain convinced you’re wrong!”

Now I would love to be the good, progressive scientist that Kate wants me to be and disconfirm the stereotype that evolutionary psychology’s critics are ignorant of the field they’re criticizing, but it’s difficult to do so when she, like so many others, confirms that stereotype. Of the concerns she lists, a collective none of them deal with the theoretical foundations of the field, the first two have more to do with research methodology than evolutionary psychology specifically (and even those two don’t paint evolutionary psychologists in a particularly bad light), and the remaining ones get basic definitions wrong while simultaneously misrepresenting the researchers in the field as being unsophisticated. Now, in all fairness to Kate, she does mention that she’s talking about what she thinks bad evolutionary psychology is, but it’s not clear to me that she has a solid enough grasp of the field to be making those kinds of pronouncements in the first place (not to mention she waivers back and forth between using that qualifier and dropping it, writing about evolutionary psychology as a whole). I also really don’t appreciate her insinuation that our field does politically-motivated research with the intent of keeping LBGT folks second-class citizens at the end either (which, by the way, we don’t; Tybur, Miller, & Gangestad, 2007), but at least she’s upfront about her biases, no matter how incorrect they happen to be.

References: Tybur, J., Miller, G., & Gangestad, S. (2007). Testing the controversy: An empirical examination of adaptationists’ attitudes towards politics and science. Human Nature, 18 (4), 313-328 DOI: 10.1007/s12110-007-9024-y

Should You Give A Damn About Your Reputation (Part 2)

In my last post, I outlined a number of theoretical problems that stand in the way of reputation being a substantial force for maintaining cooperation via indirect reciprocity. Just to recap them quickly: (1) reputational information is unlikely to be spread much via direct observation, (2) when it is spread, it’s most likely to flow towards people who already have a substantial amount of direct interactions with the bearer of the reputation, and (3) reputational information, whether observed visually or transmitted through language, might often be inaccurate (due to manipulation or misperception) or non-diagnostic of an individual’s future behavior, either in general or towards the observer. Now all of this is not to say that reputational information would be entirely useless in predicting the future behavior of others; just that it seems to be an unlikely force for sustaining cooperation in reality, despite what some philosophical intuitions written in the language of math might say. My goal today is to try and rescue reputation as a force to be reckoned with.

In all fairness, I did only say that I would try

The first – and, I think, the most important – step is to fundamentally rethink what this reputational information is being used to assess. The most common current thinking about what third-party reputation information is being used to assess would seem to be the obvious: you want to know about the character of that third party, because that knowledge might predict how that third party will act towards you. On top of assuming away the above problems, then, one would also need to add in the assumption that interactions between you and the third party would be relatively probable. Let’s return to the example of your friend getting punched by a stranger at a bar one night. Assuming that you accurately observed all the relevant parts of the incident and the behavior of the stranger there was also predictive of how he would behave towards you (that is, he would attack you unprovoked), if you weren’t going to interact with that stranger anyway, regardless of whether you received that information or not, while that information might be true, it’s not valuable.

But what if part of what people are trying to assess isn’t how that third party will behave towards them, but rather how that third party will behave towards their social allies. To clarify this point, let’s take a simple example with three people: A, B, and X. Person A and B will represent you and your friend, respectively; person X will represent the third party. Now let’s say that A and B have a healthy, mutually-cooperative relationship. Both A and B benefit form this relationship and have extensive histories with each other. Person B and X also have a relationship and extensive histories with one another, but this one is not nearly as cooperative; in fact, person X is downright exploitative over B. Given that A and X are otherwise unlikely to ever interact with each other directly, why would A care about what X does?

The answer to this question – or at least part of that answer – involves A and X interacting indirectly. This requires the addition of a simple assumption, however: the benefits that person B delivers to person A are contingent on person B’s state. To make this a little less abstract, let’s just use money. Person B has $10 and can invest that money with A. For every dollar that B invests, both players end up making two. If B invests all his money, then, both him and person A end up with $20. In the next round, B has his $10, but before he gets a chance to invest it with A, person X comes along and robs B of half of it. Now, person B only has $5 left to invest with A, netting them both $10. In essence, person X has now become person A’s problem, even though the two never interacted. All this assumption does, then, is make clear the fact that people are interacting in a broader social context, rather than in a series of prisoner’s dilemmas where your payoff only depends on your own, personal interactions.

Now if only there was a good metaphor for that idea…

With the addition of this assumption, we’re able to circumvent many of the initial problems that reputational models faced. Taking them in reverse order, we are able to get around the direct-interaction issue, since your social payoffs now co-vary to some extent with your friends, making direct interaction no longer a necessary condition. It also allows us to circumvent the diagnosticity issue: there’s less of a concern about how a third party might interact with you differently than your friend because it’s the third party’s behavior towards your friend that you’re trying to alter. It also, to some extent, allows us to get around the accuracy issue: if your friend was attacked and lies to you about why they were attacked, it matter less, as one of your primary concerns is simply making sure that your friend isn’t hurt, regardless of whether your friend was in the right or not. This takes some of the sting out of the issues of misperception or misinformation.

That said, it does not take all the sting out. In the previous example, person A has a vested interest in making sure B is not exploited, which gives person B some leverage. Let’s alter the example a bit, and say that person B can only invest $5 with person A during any given round; in that case, if X steals $5 from B’s initial $10, it wouldn’t affect person A at all. Since person B would rather not be exploited, they might wish to enlist A’s help, but find person A less than eager to pitch in. This leaves person B with three options: first, B might just suck it up and suffer the exploitation. Alternative, B might consider withholding cooperation from A until A is willing to help out, similar to B going on a strike. If person B opts for this route, then all concerns for accuracy are gone; person A helping out is merely a precondition of maintaining B’s cooperation. This strategy is risky for B, however, as it might look like exploitation from A’s point of view. As this makes B a costlier interaction partner, person A might consider taking his business elsewhere, so to speak. This would leave B still exploited and out a cooperative partner.

There is another potential way around the issue, though: person B might attempt to persuade A that person X really was interfering in such a way that made B unable to invest; that is, person B might try to convince A that X had really stolen $8 instead of $5. If person B is successful in this task, it might still make him look like a costlier social investment, but not because he is himself attempting to exploit A. Person B looks like he really does want to cooperate, but is being prevented from doing so by another. In other words, B looks more like a true friend to A, rather than just a fair-weather one or an exploiter (Tooby & Cosmides, 1996). In this case, something like manifesting depression might work well for B to recruit support to deal with X (Hagen, 2003). Even if such behavior doesn’t directly stop X from interfering in B’s life, though, it might also prompt A to increase their investment in B to help maintain the relationship despite those losses. Either way, whether through avoiding costs or gaining benefits, B can leverage their value with A in these interactions and maintain their reputation as a cooperator.

“I’ll only show back up to work after you help me kill my cheating wife”

Finally, let’s step out of the simple interaction into the bigger picture. I also mentioned last time that, sometimes, cooperating with one individual necessitates defecting on another. If person A and B allied against person X, if person Y is cooperating with X, person Y may now also incur some of the punishment A and B direct at X, either directly or indirectly. Again, to make this less abstract, consider that you recently found out your friend holds a very unpopular social opinion (say, that women shouldn’t be allowed to vote) that you do not. Other people’s scorn for your friend now makes your association with him all the more harmful for you: by benefiting him, you can, by proxy, be seen to either be helping him promote his views, or be inferred to hold those same views yourself. In either case, being his friend has now become that much costlier, and the value of the relationship might need to be reassessed in that light, even if his views might otherwise have little impact on your relationship directly. Knowing that someone has a good or bad reputation more generally can be seen as useful information in this light, as it might tell you all sorts of things about how costly an association with them might eventually prove to be.

References: Hagen, E.H. (2003). The bargaining model of depression. In: Genetic and Cultural Evolution of Cooperation, P. Hammerstein (ed.). MIT Press, 95-123

Tooby, J., & Cosmides, L. (1996). Friendship and the banker’s paradox:Other pathways to the evolution of adaptations for altruism. Proceedings of the British Academy (88), 119-143

Should You Give A Damn About Your Reputation? (Part 1)

According to Nowak (2012) and his endlessly-helpful mathematical models, once one assumes that cooperation can be sustained via one’s reputation, one ends up with the conclusion that cooperation can, indeed, be sustained (solely) by reputation, even if the same two individuals in a population never interact with each other more than once. As evidenced by the popular Joan Jett song, Bad Reputation, however, one can conclude there’s likely something profoundly incomplete about this picture: why would Joan give her reputation the finger in this now-famous rock anthem, and why would millions of fans be eagerly singing along, if reputation was that powerful of a force? The answer to this question will involve digging deeper into the assumptions that went into Nowak’s model and finding where they have gone wrong. In this case, not only are some of the assumptions of Nowak’s model a poor fit to reality in terms of the one’s he makes, but, perhaps more importantly, also poor in regards to what assumptions he doesn’t make.

Unfortunately, my reply to some current thinking about reputation can’t be expressed as succinctly.

The first thing worth pointing out here is probably that Joan Jett was wrong, even if she wasn’t lying: she most certainly did give a damn about her reputation. In fact, some part of her gave so much of a damn about her reputation that she ended up writing a song about it, despite that not being her conscious intent. More precisely, if she didn’t care about her reputation on any level, advertising that fact to others would be rather strange; it’s not as if that advertisement would provide Joan herself with any additional information. However, if that advertisement had an effect on the way that other people viewed her – updating her reputation among the listeners – her penning of the lyrics is immediately more understandable. She wants other people to think she doesn’t care about her (bad) reputation; she’s not trying to remind herself. There are a number of key insights that come from this understanding, many of which speak to the assumptions of these models of cooperation.

The initial point is that Joan needed to advertise her reputation. Reputations do not follow their owners around like a badge; they’re not the type of thing that can be accurately assessed on sight. Accordingly, if one does not have access to information about someone’s reputation, then their reputation, good or bad, would be entirely ineffective at deciding how to treat that someone. This problem is clearly not unsolvable, though. According to Sigmund (2012), the simple way around this problem involves direct observation: if I observe a person being mean to you, I can avoid that person without having to suffer the costs of their meanness firsthand. Simple enough, sure, but there are many problems with this suggestion too, some of which are more obvious than others. The first of these problems would be that a substantial amount – if not the vast majority – of (informative and relevant) human interactions are not visible to many people beyond those parties who are already directly involved. Affairs can be hidden, thieves can go undetected, and promises can be made in private, among other things (like, say, browsing histories being deleted…). Now that concern alone would not stop reputations derived from indirect information from being useful, but it would weaken its influence substantially if few people ever have access to it.

There’s a second, related concern that weakens it further, though: provided an interaction is observed by other parties, those who most likely to be doing the observing in the first place are the people who probably already have directly interacted with one or more of the others they’re observing; a natural result of people not spending their time around each other at random. People only have a limited amount of time to spend around others, and, since one can’t be in two places at once, you naturally end up spending a good deal of that time with friends (for a variety of good reasons that we need not get into now). So, if the people who can make the most use of reputational information (strangers) are the least likely to be observing anything that will tell them much about it, this would make indirect reciprocity a rather weak force. Indeed, as I’ve covered previously, research has found that people can make use of indirectly-acquired reputation information, and do make use of it when that’s all they have. Once they have information from direct interactions, however, the indirect variety of reputational information ceases to have an effect on their behavior. It’s your local (in the social sense; not necessarily physical-distance sense) reputation that’s most valuable. Your reputation more globally – among those you’re unlikely to ever interact much with – would be far less important.

See how you don’t care about anyone pictured here? The feeling’s mutual.

The problems don’t end there, though; not by a long shot. On top of information not being available, and not being important, there’s also the untouched matter concerning whether the information is even accurate. Potential inaccuracies can come in three forms: passive misunderstandings, active misinformation, and diagnosticity. Taking these in order, consider a case where you see your friend get punched in the nose from across the room by a stranger. From this information, you might decide that it’s best to steer clear of that stranger. This seems like a smart move, except for what you didn’t see: a moment prior your friend, being a bit drunk, had told the stranger’s wife to leave her husband at the bar and come home with him instead. So, what does this example show us? That even if you’ve directly observed an interaction, you probably didn’t observe one or more previous interactions that led up to the current one, and those might well have mattered. To put this in the language of game theorists, did you just witness a cooperator punishing a defector, a defector harming a cooperator, or some other combination? From your lone observation, there’s no sure way to tell.

But what if your friend told you that the other person had attacked them without provocation? Most reputational information would seem to spread this way, given that most human interaction is not observed by most other people. We could call this the “taking someone else’s word for it” model of reputation. The problems here should be clear to anyone who has ever had friends: it’s possible your friend had misinterpreted the situation, or that your friend had some ulterior motive for actively manipulating your perception that person’s reputation. To again rephrase this in terms of game theorist’s language, if cooperators can be manipulated into punishing other cooperators, either through misperception or misinformation, this throws another sizable wrench into the gears of the reputation model. If one’s reputation can be easily manipulated, this, to some extent, will make cooperation more costly (if one fails to reap some of cooperation’s benefits or can offset some of defection’s costs). Talk is cheap, and indirect reciprocity models seem to require a lot of it.

This brings us to the final accuracy point: diagnosticity. Let’s say that, hypothetically, the stranger did attack your friend without provocation, and this was observed accurately. What have you learned from this encounter? Perhaps you might infer that the stranger is likely to be an all-around nasty person, but there’s no way to tell precisely how predictive that incident is of the stranger’s later behavior, either towards your friend or towards you. Just because the stranger might make a bad social asset for someone else, it does not mean they’ll make a bad social asset for you, in much the same way that my not giving a homeless person change doesn’t mean my friends can’t count on my assistance when in need. Further, having a “bad” reputation among one group can even result in my having a good relationship with a different group; the enemy of my enemy is my friend, as the saying goes. In fact, that last point is probably what Joan Jett was advertising in her iconic song: not that she has a bad reputation with everyone, just that she has a bad reputation among those other people. The video for her song would lead us to believe those other people are also, more or less, without morals, only taking a liking to Joan when she has something to offer them.

The type of people who really don’t give a damn about their reputation.

While this in not an exhaustive list of ways in which many current assumptions of reputation models are lacking (there are, for instance, also cases where cooperating with one individual necessitates defecting on another), it still poses many severe problems that need to be overcome. Just to recap: information flow is limited, that flow is generally biased away from the people who need it the most, there’s no guarantee of the accuracy of that information if it’s received, and that information, even if received and accurate, is not necessarily predictive of future behavior. The information might not exist, might not be accurate, or might not matter. Despite these shortcomings, however, what other people think of you does seem to matter; it’s just that the reasons it matters need to be, in some respects, fundamentally rethought. Those reasons will be the subject of the next post.

References: Nowak, M. (2012). Evolving cooperation. Journal of Theoretical Biology, 299, 1-8.

Sigmund, K. (2012). Moral assessment in indirect reciprocity Journal of Theoretical Biology, 299, 25-30 DOI: 10.1016/j.jtbi.2011.03.024