Welcome To Introduction To Psychology

In my last post, I mentioned a hypothetical relatively-average psychologist (caveat: the term doesn’t necessarily apply to any specific person, living or dead). I found him to be a bit strange, since he tended to come up with hypotheses that were relatively theory-free; there was no underlying conceptual framework he was using to draw his hypotheses. Instead, most of his research was based on some hunch or personal experience. Perhaps this relatively-average psychologist might also have made predictions on the basis of what previous research had found. For instance, if one relatively-average psychologist found that priming people to think about the elderly made them walk marginally slower, another relatively-average psychologist might predict that priming people to think of a professor would make them marginally smarter. I posited that these relatively-average psychologists might run into some issues when it comes to evaluating published research because, without a theoretical framework with which to understand the findings, all one can really consider are the statistics; without a framework, relatively-average psychologists have a harder time thinking about why some finding might make sense or not.

If you’re not willing to properly frame something, it’s probably not wall-worthy.

So, if a population of these relatively-average psychologists are looking to evaluate research, what are they supposed to evaluate it against? I suppose they could check and see if the results of some paper jibe with their set of personal experiences, hunches, or knowledge of previous research, but that seems to be a bit dissatisfying. Those kinds of practices would seem to make evaluations of research look more like Justice Stewart trying to define pornography: “I know [good research] when I see it”. Perhaps good research would involve projects that delivered results highly consistent with people’s personal general experiences; perhaps good research would be a project that found highly counter-intuitive or surprising results; perhaps good research would be something else still. In any case, such a practice – if widespread enough – would make the field of psychology look like grab bag of seemingly scattered and random findings. Learning how to think about one topic in psychology (say, priming) wouldn’t be very helpful when it came to learning how to think about another topic (say, learning). That’s not to say that the relatively-average psychologists have nothing helpful at all to add, mind you; just that their additions aren’t being driven by anything other than those same initial considerations, such as hunches or personal experience. Sometimes people have good guesses; in the land of psychology, however, it can be difficult to differentiate between good and bad ones a priori in many cases.

It seems like topic-to-topic issues would be hard enough for our relatively-average psychologists to deal with, but that problem becomes magnified once the topics shift outside of what’s typical for one’s local culture, and even further when topics shift outside of one’s species. Sure; maybe male birds will abandon a female partner after a mating season if the pair is unable to produce any eggs because the male birds feel a threat to their masculinity that they defend against by reasserting their virility elsewhere. On the flip side, maybe female birds leave the pair because their sense of intrinsic motivation was undermined by the extrinsic reward of a clutch of eggs. Maybe male ducks force copulations on seemingly unwilling female ducks because male ducks use rape as a tactic to keep female ducks socially subordinate and afraid. Maybe female elephant seals aren’t as combative as their male counterparts because of sexist elephant seal culture. Then again, maybe female elephant seals don’t fight as much as males because of their locus of control or stereotype threat. Maybe all of that is true, but my prior on such ideas is that they’re unlikely to end up treading much explanatory water. Applied to non-human species, their conceptual issues seem to pop out a bit better. Your relatively-average psychologist, then, ends up being rather human-centric, if not a little culture- and topic-centric as well. Their focus is on what’s familiar to them, largely because what they know doesn’t help them think about what they do not know too much.

So let’s say that our relatively-average psychologist has been tasked with designing a college-level introduction to psychology course. This course will be the first time many of the students are being formally exposed to psychology; for the non-psychology majors in the class, it may also be their last time. This limits what the course is capable of doing, in several regards, as there isn’t much information you can take for granted. The problems don’t end there, however: the students, having a less-than-perfect memory, will generally forget many, if not the majority, of the specifics they will be taught. Further, students may never again in their life encounter the topics they learned about in the intro course, even if they do retain the knowledge about them. If you’re like most of the population, knowing the structure of a neuron or who William James was will probably never come up in any meaningful way unless you find yourself at a trivia night (and even then it’s pretty iffy). Given these constraints, how is our relatively-average psychologist supposed to give their students an education of value? Our relatively-average psychologist could just keep pouring information out, hoping some of it sticks and is relevant later. They could also focus on some specific topics, boosting retention, but at the cost of breadth and, accordingly, chance of possible relevance. They could even try to focus on a series of counter-intuitive findings in the hopes of totally blowing their students’ minds (to encourage students’ motivation to show up and stay awake), or perhaps some intended to push a certain social agenda – they might not learn much about psychology, but at least they’ll have some talking points for the next debate they find themselves in. Our relatively-average psychology could do all that, but what they can’t seem to do well is to help students learn how to think about psychology; even if the information is retained, relevant, and interesting, it might not be applicable to any other topics not directly addressed.

“Excuse me, professor: how will classical conditioning help me get laid?”

I happen to feel that we can do better than our relative-average psychologists when designing psychology courses – especially introductory-level ones. If we can successfully provide students with a framework to think about psychology with, we don’t have to necessarily concern ourselves with whether one topic or another was covered or whether they remember some specific list of research findings, as such a framework can be applied to any topic the students may subsequently encounter. Seeing how findings “fit” into something bigger will also make the class seem that much more interesting. Granted, covering more topics in the same amount of depth is generally preferable to covering fewer, but there are very real time constraints to consider. With that limited time, I feel that giving students tools for thinking about psychological material is more valuable than providing them findings within various areas of psychology. Specific topics or findings within psychology should be used predominately as vehicles for getting students to understand that framework; trying to do things the other way around simply isn’t viable. This will not come as a surprise to any regular reader, but the framework that I feel we ought to be teaching students is the functionalist perspective guided by an understanding of evolution by natural selection. Teaching students how to ask and evaluate questions of “what is this designed to do” is a far more valuable skill than teaching them about who Freud was or some finding that failed to replicate but is still found in the introductory textbooks.

On that front, there is both reason to be optimistic and disappointed. According to a fairly exhaustive review of introductory psychology textbooks available from 1975 to 2004 (Cornwell et al, 2005), evolutionary psychology has been gaining greater and more accurate representation: whereas the topic was almost non-existent in 1975, in the 2000s, approximately 80% of all introductory texts discussed the subject at some point.  Further, the tone that the books take towards the subject has become more neutral or positive as well, with approximately 70% of textbooks treating the topic as such. My enthusiasm of the evolutionary perspective’s representation is dampened somewhat by a few other complicating factors, however. First, many of the textbooks analyzed contained inaccurate information when the topic was covered (approximately half of them overall, and the vast majority of the more recent texts that were considered, even if those inaccuracies might appear to have become more subtle over the years). Another concern is that, even when representations of evolutionary psychology were present within the textbooks, the discussion of the topic appeared relatively confined. Specifically, it didn’t appear that many important concepts (like kin selection or parental investment theory) received more than one or two paragraphs on average, if they even got that much space. In fact, the only topic that received much coverage seemed to be David Buss’s work on mating strategies; his citation count alone was greater than all others authors within evolutionary psychology combined. As Cornwell et al (2005) put it:

These data are troubling when one considers undergraduates might conclude that EP is mainly a science of mating strategies studied by David Buss. (p.366).

So, the good news is that introductory psychology books are acknowledging that evolutionary psychology exists in greater and greater number. The field is also less likely to be harshly criticized for being something it isn’t (like genetic determinism). That’s progress. The bad news is that this information is, like many topics in introductory books appear to be, cursory, often inaccurate in at least some regards, and largely restricted to the work of one researcher within the field. Though Cornwell et al (2005) don’t specifically mention it, another factor to consider is where the information is presented within the texts. Though I have no data on hand beyond my personal sample of introductory books I’ve seen in recent years (I’d put that number around a dozen or so), evolutionary psychology is generally found somewhere in the middle of the book when it is found at all (remember, approximately 1-in-5 texts didn’t seem to even acknowledge the topic). Rather than being presented as a framework that can help students understand any topic within psychology, it seems to be presented more as just another island within psychology. In other words, it doesn’t tend to stand out.

So not exactly the portrayal I had hoped for…

Now I have heard some people who aren’t exactly fans (though not necessarily opponents, either) of evolutionary psychology suggest that we wouldn’t want to prematurely close off any alternative avenues of theoretical understanding in favor of evolutionary psychology. The sentiment seems to suggest that we really ought to be treating evolutionary psychology as just another lonely island in the ocean of psychology. Of course, I would agree in the abstract: we wouldn’t want to prematurely foreclose on any alternative theoretical frameworks. If a perspective existed that was demonstrably better than evolution by natural selection and the functionalist view in some regards – perhaps for accounting for the data, understanding it, and generating predictions – I’d be happy to make use of it. I’m trying to further my academic career as much as the next person, and good theory can go a long way. However, psychology, as a field, has had about 150 years with which to come up with anything resembling a viable alternative theoretical framework – or really, a framework at all that goes beyond description – and seems to have resoundingly failed at that task. Perhaps that shouldn’t be surprising, since evolution is currently the only good theory we have for explaining complex biological design, and psychology is biology. So, sure, I’m on board with no foreclosing on alternative ideas, just as soon as those alternatives can be said to exist.

References: Cornwell, R., Palmer, C., Guinther, P., & Davis. H. (2005). Introductory Psychology Texts as a View of Sociobiology/Evolutionary Psychology’s Role in Psychology Evolutionary Psychology, 3, 355-374

I Find Your Lack Of Theory (And Replications) Disturbing

Let’s say you find yourself in charge of a group of children. Since you’re a relatively-average psychologist, you have a relatively strange hypothesis you want to test: you want to see whether wearing a red shirt will make children better at dodge ball. You happen to think that it will. I say this hypothesis is strange because you derived it from, basically, nothing; it’s just a hunch. Little more than a “wouldn’t it be cool if it were true?” idea. In any case, you want to run a test of your hypothesis.You begin by lining the students up, then you walk past them and count aloud: “1, 2, 1, 2, 1…”. All the children with a “1″ go an put on a red shirt and are on a team together; all the children with a “2″ go and pick a new shirt to put on from a pile of non-red shirts. They serve as your control group. The two teams then play each other in a round of dodge ball. The team wearing the red shirts comes out victorious. In fact, they win by a substantial margin. This must mean that the wearing the red shirts made students better at dodge ball, right? Well, since you’re a relatively-average psychologist, you would probably conclude that, yes, the red shirts clearly have some effect. Sure, your conclusion is, at the very least, hasty and likely wrong, but you are only an average psychologist: we can’t set the bar too high.

“Jump was successful (p < 0.05)”

A critical evaluation of the research could note that just because the children were randomly assigned to groups, it doesn’t mean that both groups were equally matched to begin with. If the children in the red shirt group were just better beforehand, that could drive the effect. It’s also likely that the red shirts might have had very little to do with which team ended up winning. The pressing question here would seem to be why would we expect red shirts to have any effect? It’s not as if a red shirt makes a child quicker, stronger, or better able to catch or throw than before; at least not for any theoretical reason that comes to mind. Again, this hypothesis is a strange one when you consider its basis. Let’s assume, however, that wearing red shirts actually did make children perform better, because it helped children tap into some preexisting skill set. This raises the somewhat obvious question: why would children require a red shirt to tap into that previously-untapped resource? If being good at the game is important socially – after all, you don’t want to get teased by the other children for your poor performance – and children could do better, it seems, well, odd that they would ever do worse. One would need to posit some kind of trade-off effected by shirt color, which sounds like kind of an odd variable for some cognitive mechanism to take into account.

Nevertheless, like any psychologist hoping to further their academic career, you publish your results in the Journal of Inexplicable Findings. The “Red Shirt Effect” becomes something of a classic, reported in Intro to Psychology textbooks. Published reports start cropping up from different people who have had other children wear red shirts and perform various tasks athletic task relatively better. While none of these papers are direct replications of your initial study, they also have children wearing red shirts outperforming their peers, so they get labeled “conceptual replications”. After all, since the concepts seem to be in order, they’re likely tapping the same underlying mechanism. Of course, these replications still don’t deal with the theoretical concerns discussed previously, so some other researchers begin to get somewhat suspicious about whether the “Red Shirt Effect” is all it’s made out to be. Part of these concerns are based around an odd facet of how publication works: positive results – those that find effects – tend to be favored for publication over studies that don’t find effects. This means that there may well be other researchers who attempted to make use of the Red Shirt Effect, failed to find anything and, because of their null or contradictory results, also failed to publish anything.

Eventually, word reaches you of a research team that attempted to replicate the Red Shirt Effect a dozen times in the same paper and failed to find anything. More troubling still, for you academic career, anyway, their results saw publication. Naturally, you feel pretty upset by this. Clearly the research team was doing something wrong: maybe they didn’t use the proper shade of red shirt; maybe they used a different brand of dodge balls in their study; maybe the experimenters behaved in some subtle way that was enough to counteract the Red Shirt Effect entirely. Then again, maybe the journal the results were published in doesn’t have good enough standards for their reviewers. Something must be wrong here; you know as much because your Red Shirt Effect was conceptually replicated many times by other labs. The Red Shirt Effect just must be there; you’ve been counting the hits in the literature faithfully. Of course, you also haven’t been counting the misses which were never published. Further, you were counting the slightly-altered hits as “conceptual replications but not the slightly-altered misses as “conceptual disconfirmations”. You still haven’t managed to explain, theoretically, why we should expect to see the Red Shirt Effect anyway, either. Then again, why would any of that matter to you? Part of your reputation is at stake.

And these colors don’t run!  (p < 0.05)

In somewhat-related news, there have been some salty comments from Social psychologist Ap Dijksterhuis aimed at a recent study (and coverage of the study, and the journal it was published in) concerning nine failures to replicate some work Ap did on intelligence priming, as well as work done by others on intelligence priming (Shanks et al, 2013). The initial idea of intelligence priming, apparently, was that priming subjects with professor-related cues made them better at answering multiple-choice, general-knowledge questions, whereas priming subjects with soccer-hooligan related cues made them perform worse (and no; I’m not kidding. It really was that odd). Intelligence itself is a rather fuzzy concept, and it seems that priming people to think about professors – people typically considered higher in some domains of that fuzzy concept – is a poor way to make them better at multiple choice questions. As far as I can tell, there was no theory surrounding why primes should work that way or, more precisely, why people should lack access to such knowledge in absence of some vague, unrelated prime. At the very least, none was discussed.

It wasn’t just that the failures to replicate reported by Shanks et al (2013) were non-significant but in the right direction, mind you; they often seemed to go in the wrong direction. Shanks et al (2013) even looked for demand characteristics explicitly, but couldn’t find them either. Nine consecutive failures are surprising in light of the fact that the intelligence priming effects were previously reported as being rather large. It seem rather peculiar that large effects can disappear so quickly; they should have had very good chance of replicating, were they real. Shanks et al (2013) rightly suggest that many of the confirmatory studies of intelligence priming, then, might represent publication bias, researcher degrees of freedom in analyzing data, or both. Thankfully, the salty comments of Ap reminded readers that: “the finding that one can prime intelligence has been obtained in 25 studies in 10 different labs”. Sure; and when a batter in the MLB only counts the times he hit the ball while at bat, his batting average would be a staggering 1.000. Counting only the hits and not the misses will sure make it seem like hits are common, no matter how rare they are. Perhaps Ap should have thought about professors more before writing his comments (though I’m told thinking about primes ruins them as well, so maybe he’s out of luck).

I would like to add there were similarly salty comments leveled by another Social Psychologist, John Bargh, when his work on priming old stereotypes on walking speed failed to replicate (though John has since deleted his posts). The two cases bear some striking similarties: claims of other “conceptual replications”, but no claims of “conceptual failures to replicate”; personal attacks on the credibility of the journal publishing the results; personal attacks on the researchers who failed to replicate the finding; even personal attacks on the people reporting about the failures to replicated. More interestingly, John also suggested that the priming effect was apparently so fragile that even minor deviations from the initial experiment could throw the entire thing into disarray. Now it seems to me that if your “effect” is so fleeting that even minor tweaks to the research protocol can cancel it out completely, then you’re really not dealing with much in the way of importance concerning the effect, even were it real. That’s precisely the kind of shooting-yourself-in-the-foot a “smarter” person might have considered leaving out of their otherwise persuasive tantrum.

“I handled the failure to replicate well (p < 0.05)”

I would also add, for the sake of completeness, that priming effects of stereotype threat haven’t replicated out well either. Oh, and the effects of depressive realism don’t show much promise. This brings me to my final point on the matter: given the risks posed by research degrees of freedom and publication bias, it would be wise to enact better safeguards against this kind of problem. Replications, however, only go so far. Replications require researchers willing to do them (and they can be low-reward, discouraged activities) and journals willing to publish them with sufficient frequency (which many do not, currently). Accordingly, I feel replications can only take us so far in fixing the problem. A simple – though only partial – remedy for the issue is, I feel, to require the inclusion of actual theory in psychological research; evolutionary theory in particular. While it does not stop false positives from being published, it at least allows other researchers and reviewers to more thoroughly assess the claims being made in papers. This allows poor assumptions to be better weeded out and better research projects crafted to address them directly. Further, updating old theory and providing new material is a personally-valuable enterprise. Without theory, all you have is a grab bag of findings, some positive, some negative, and no idea what to do with them or how they are to be understood. Without theory, things like intelligence priming – or Red Shirt Effects – sound valid.

 References: Shanks, D., Newell, B., Lee, E., Balakrishnan, D., Ekelund, L., Cenac, Z., Kavvadia, F., & Moore, C. (2013). Priming Intelligent Behavior: An Elusive Phenomenon PLoS ONE, 8 (4) DOI: 10.1371/journal.pone.0056515

An Implausible Function For Depression

Recently, I was involved in a discussion about experimenter-induced expectation biases in performance, also known as demand characteristics. The basic premise of the idea runs along the following lines: some subjects in your experiment are interested in pleasing the experimenter or, more generally, trying to do “well” on the task (others might be trying to undermine your task – the “screw you” effect – but we’ll ignore them for now). Accordingly, if the researchers conducting an experiment are too explicit about the task, or drop hints as to what the purpose is or what results they are expecting, even hints that might seem subtle, they might actually create the effect they are looking for, rather than just observe it. However, the interesting portion of the discussion I was having is that some people seemed to think you could get something for nothing from demand characteristics. That is to say some people seem to think that, for instance, if the experimenter thinks a subject will do well on a math problem, that subject will actually get better at doing math.

Hypothesis 1: Subjects will now be significantly more bullet-proof than they previously were.

This raises the obvious question: if certain demand characteristics can influence subjects to perform better or worse at some tasks, how would such an effect be achieved? (I might add that it’s a valuable first step to ensure that the effect exists in the first place which, in the case of stereotype threat with regard to math abilities, it might well not) It’s not as if these expectations are teaching subjects any new skills, so whatever information is being made use of (or not being made use of, in some cases) by the subject must have already been potentially accessible. No matter how much they might try, I highly doubt that researchers are able to simply expect subjects into suddenly knowing calculus or lifting twice as much weight as they normally can. The question of interest, then, would seem to become: given that subjects could perform better at some important task, why would they ever perform worse at it? Whatever specific answer one gives for that question, it will inevitably include the mention of trade-offs, where being better at some task (say, lifting weights) carries costs in other domains (such as risks of injury or the expenditure of energy that could be used for other tasks). Subjects might perform better on math problems after exercise, for instance, not because the exercise makes them better at math, but because there are fewer cognitive systems currently distracting the math one.

This brings us to depression. In attempting to explain why so many people get depressed, there are plenty of people who have suggested that there is a specific function to depression: people who are depressed are thought to be more accurate in some of their perceptions, relative to those who are not depressed. Perhaps, as Neel Burton and, curiously, Steven Pinker suggest, depressed individuals might do better at assessing the value of social relationships with others, or at figuring out when to stop persisting at a task that’s unlikely to yield benefits.  The official title for this hypothesis is depressive realism. I do appreciate such thinking insomuch as researchers appear to be trying to explain some psychological phenomenon functionally. Depressed people are more accurate in certain judgments, being more accurate in said judgments leads to some better social outcomes, so there are some adaptive benefits to being depressed. Neat. Unfortunately, such a line of thinking misses the aforementioned critical mention of trade-offs: specifically, if depressed people are supposed to perform better at such tasks, if people have the ability to better assess social relationships and their control over them, why would people ever be worse at those tasks?

If people hold unrealistically positive cognitive biases about their performance, and these biases cause people to, on the whole, do worse than they would without them, then the widespread existence of those positive biases need to be explained. The biases can’t simply exist because they make us feel good. Not only would such an explanation be uninformative (in that it doesn’t explain why we’d feel bad without them), but it would also be useless, as “feeling good” doesn’t do anything evolutionary useful. Notwithstanding those issues, however, the depressive realism hypothesis doesn’t even seem to be able to explain the nature of depression very well; not on the face of it anyway. Why should increasing one’s perceptual accuracy in certain domains go hand-in-hand with low energy levels or loss of appetite? Why should women be more likely to be depressed than men? Why should increases in perceptual accuracy similarly increase an individual’s risk of suicidal behavior? None of those symptoms seem like the hallmark of good, adaptive design when considered in the context of overcoming other, unexplained, and apparently maladaptive positive biases.

“We’ve manged to fix that noise the car made when it started by making it unable to start”

So, while the depressive realism hypothesis manages to think about functions, it would appear to fail to consider other relevant matters. As a result, it ends up positing a seemingly-implausible function for depression; it tries to get something (better accuracy) for nothing, all without explaining why other people don’t get that something as well. This might mean that depressive realism identifies an outcome of being depressed instead of explaining depression, but even that much is questionable. This returns to the initial point I made, in that one wants to be sure that the effect in question even exists in the first place. A meta-analysis of 75 studies of depressive realism conducted by Moore & Fresco (2012) did not yield a great deal of support for the effect being all that significant or theoretically interesting. While they found evidence of some depressive realism, the effect size of that realism was typically around or less than a tenth of a standard deviation in favor of the depressed individuals; an effect size that the authors repeatedly mentioned was “below [the] convention for a small effect” in psychology. In many cases, the effect sizes were so close to zero that they might of as well have been zero for all practical purposes; in other cases it was the non-depressed individuals who performed better. It would seem that depressed people aren’t terribly more realistic; certainly not relative to the costs that being depressed brings. More worryingly for the depressive realism hypothesis, the effect size appeared to be substantially larger in studies using poor methods of assessing depression, relative to studies using better methods. Yikes.

So, just to summarize, what we’re left with is an effect that might not exist and a hypothesis purporting to explain that possible effect which makes little conceptual sense. To continue to pile on, since we’re already here, the depressive realism hypothesis seems to generate few, if any, additional testable predictions. Though there might well be plenty of novel predictions that flow from the suggestion that depressed people are more realistic than non-depressed individuals, there aren’t any that immediately come to my mind. Now I know this might all seem pretty bad, but let’s not forget that we’re still in the field of psychology, making this outcome sort of par for the course in many respects, unfortunate as that might seem.

The curious part of the depressive realism hypothesis, to me, anyway, is why it appears to have generated as much interest as it did. The meta-analysis found over 120 research papers on the topic, which is (a) probably not exhaustive and (b) not representative of any failures to publish research on the topic, so there has clearly been a great deal of research done on the idea. Perhaps it has something to do with the idea that there’s a bright side to depression; some distinct benefit that ought to make people more sympathetic towards those suffering from depression. I have no data that speaks to that idea one way or the other though, so I remain confused as to why the realism hypothesis has drawn so much attention. It wouldn’t be the first piece of pop psychology to confuse me in such a manner.

And if it confuses you too, feel free to stop by this site for more updates.

As a final note, I’m sure there are some people out there who might be thinking that though the depressive realism idea is, admittedly, lacking in many regards, it’s currently the best explanation for depression on offer. While such conceptual flaws are, in my mind, reason enough to discard the idea even in the event there isn’t an alternative on offer, there is, in fact, a much better alternative theory. It’s called the bargaining model of depression, and the paper is available for free here. Despite not being an expert on depression myself, the bargaining model seems to make substantially more conceptual sense while simultaneously being able to account for the existing facts about depression. Arguably, it doesn’t paint the strategy of depression in the most flattering light, but it’s at least more realistic.

References: Moore, M., & Fresco, D. (2012). Depressive realism: A meta-analytic review Clinical Psychology Review, 32 (6), 496-509 DOI: 10.1016/j.cpr.2012.05.004

Do People “Really” Have Priors?

As of late, I’ve been dipping my toes ever-deeper into the conceptual world of statistics. If one aspires towards understanding precisely what they’re seeing in when it comes to research in psychology, understanding statistics can go a long way. Unfortunately, the world of statistics is a contentious one and the concepts involved in many of these discussions can be easily misinterpreted, so I’ve been attempting to be as cautious as possible in figuring the mess out. Most recently, I’ve been trying to decipher whether the hype over Bayesian methods is to be believed. There are some people who seem to feel that there’s a dividing line between Bayesian and Frequentist philosophies that one must choose sides over (Dienes, 2011), while others seem to suggest that such divisions are basically pointless and the field has moved beyond them (Gelman, 2008; Kass, 2011). One of the major points which has been bothering me about the Bayesian side of things is the conceptualization of a “prior” (though I feel such priors can easily be incorporated in Frequentist analyses as well, so this question applies well to any statistician). Like many concepts in statistics, this one seems to both be useful in certain situations and able to easily lead one astray in others. Today I’d like to consider a thought experiment dealing with the latter cases.

Thankfully, thought experiments are far cheaper than real ones

First, a quick overview of what a prior is and why they can be important. Here’s an example that I discussed previously:

say that you’re doctor trying to treat an infection that has broken out among a specific population of people. You happened to know that 5% of the people in this population are actually infected and you’re trying to figure out who those people are so you can at least quarantine them. Luckily for you, you happen to have a device that can test for the presence of this infection. If you use this device to test an individual who actually has the disease, it will come back positive 95% of the time; if the individual does not have the disease, it will come back positive 5% of the time. Given that an individual has tested positive for the disease, what is the probability that they actually have it? The answer, unintuitive to most, is 50%.

In this example, your prior (bolded) is the percent of people who have the disease. The prior is, roughly, what beliefs or uncertainties you come to your data with. Bayesian analysis requires one to explicitly state one’s prior beliefs, regardless of what those priors are, as they will eventually play a role in determining your conclusions. Like in the example above, priors can be exceptionally useful when they’re known values.

In the world of research it’s not always (or even generally) the case that priors are objectively known: in fact, they’re basically what we’re trying to figure out in the first place. More specifically, people are actually trying to derive posteriors (prior beliefs that have been revised by the data), but one man’s posteriors are another man’s priors, and the line between the two is more or less artificial. In the previous example, we took the 5% prevalence in the population is taken as a given; if you didn’t know that value and only had the results of your 95% effective test, figuring out how many of your positives were likely false-positive and, conversely, how many of your negatives were likely false-negatives, would be impossible values to accurately estimate (except if you got lucky). If the prevalence of the disease in the population is very low, you’ll have many false-positives; if the prevalence is very high, you’ll likely have many false-negatives. Accordingly, what prior beliefs you bring to your results will have a substantial effect on how they’re interpreted.

This is a fairly common point discussed when it comes to Bayesian analysis: the  frequent subjectivity of priors. Your belief about whether a disease is common or not doesn’t change the actual prevalence of it; just how you will eventually look at your data. This means that researchers with the same data can reach radically different conclusions on the basis on different priors. So, if one is given free-reign over which priors they want to use, this could allow confirmation bias to run wild and a lot of disagreeable data to be all but disregarded. As this is a fairly common point in the debate over Bayesian statistics, there’s already been a lot of ink (virtual and actual) spilled over it, so I don’t want to continue on with it.

There is, however, another issue concerning priors that, to the best of my knowledge, has not been thoroughly addressed. That question is to what extent we can consider people to have prior beliefs in the first place? Clearly, we feel that some things are more likely than others: I think it’s more likely that I won’t win the lottery than I will. No doubt you could immediately provide a list of things you think are more or less probable than others with ease. That these feelings can be so intuitive and automatically generated helps to mask an underlying problem with them: strictly speaking, it seems we ought to either not update our priors at all or not say that we “really” have any. A shocking assertion, no doubt, (and maybe a bit hyperbolic) but I want to explore it and see where it takes us.

Whether it’s to a new world or to our deaths, I’ll still be famous for it.

We can begin to explore this intuition with another thought experiment involving flipping a coin, which will be our stand-in for a random-outcome generator. Now this coin is slightly biased in a way that results in 60% of the flips coming up heads and the remaining 40% coming up tails. The first researcher has his entire belief centered 100% on the coin being 60% biased towards heads and, since there is no belief left to assign, thinks that all other states of bias are impossible. Rather than having a distribution of beliefs, this researcher has a single point. This first researcher will never update his belief about the bias of the coin no matter what outcomes he observed; he’s certain the coin is biased in a particular way. Because he just so happens to be right about the bias he can’t get any better and this is lack of updating his priors is a good thing (if you’re looking to make accurate predictions, that is).

Now let’s consider a second researcher. This researcher comes to the coin with a different set of priors: he thinks that the coin is likely fair, say 50% certain, and then distributes the rest of his belief equally between two additional potential values of the coin not being fair (say 25% sure that the coin is 60% biased towards heads and 25% sure that the coin is similarly biased towards tails). The precise distribution of these beliefs doesn’t matter terribly; it could come in the form of two or an infinite number of points. All that matters is that, because this researcher’s belief is distributed in such a way that it doesn’t lie on a single point, they are capable of being updated by the data from the coin flips. Researcher two, like a good Bayesian, will then update his priors to posteriors on the basis of the observed flips, then turn those posteriors into new priors and continues on updating for as long as he’s getting new data.

On the surface, then, the major difference between the two is that researcher one refuses to update his priors and researcher two is willing to do so. This implies something rather interesting about the latter researcher: researcher two has some degree of uncertainty about his priors. After all, if he was already sure he had the right priors, he wouldn’t update, since he would think he could do not better in terms of predictive accuracy. If researcher two is uncertain about his priors, then, shouldn’t that degree of uncertainty similarly be reflected somehow?

For instance, one could say that researcher two is 90% certain that he got the correct priors and 10% certain that he did not. That would represent his priors about his priors. He would presumably need to have some prior belief about the distribution he initial chose, as he was selecting from an infinite number of other possible distributions. His prior about his priors, however, must have its own set of priors as well. One can quickly see that this leads to an infinite regress: at some point, researcher two will basically have to admit complete uncertainty about his priors (or at least uncertainty about how they ought to be updated, as how one updates their priors depends upon the priors one is using, and there are an infinite number of possible distributions of priors), or admit complete certainty in them. If researcher two ends up admitting to complete uncertainty, this will give him a flat set of priors that ought to be updated very little (he will be able to rule out 100% biased towards heads or tails, contingent on observing either a heads or tails, but not much beyond that). On the other hand, if researcher two ends up stating one of his priors with 100% certainty, the rest of the priors ought to collapse on each other to 100% as well, resulting in an unwillingness to update.

Then again, math has never been specialty. I’m about 70% sure it isn’t, and about 30% sure of that estimate…

It is not immediately apparent how we can reconcile these two stances with each other. On the one hand, researcher one has a prior that cannot be updated; on the other, researcher two has a potentially infinite number of priors with almost no idea how to update them. While we certainly could say that researcher one has a prior, he would have no need for Bayesian analysis. Given that people seem to have prior beliefs about things (like how likely some candidate is to win an election), and these beliefs seem to be updated from time to time (once most of the votes have been tallied), this suggests that something about the above analysis might be wrong. It’s just difficult to place precisely what that thing is.

One way of ducking the dilemma might be to suggest that, at any given point in time, people are 100% certain of their priors, but what point they’re certain about change over time. Such a stance, however, suggests that priors aren’t updated so much as priors just change, and I’m not sure that such semantics can save us here. Another suggestion that was offered to me is that we could just forget the whole thing as priors themselves don’t need to themselves have priors. A prior is a belief distribution about probability and probability is not a “real” thing (that is the biased coin doesn’t come up 60% and 40% tails per flip; the result will either be a heads or a tails). For what it’s worth, I don’t think such a suggestion helps us out. It would essentially seem to be saying that, out of the infinite number of beliefs one could start with, any subset of those beliefs is as good as any other, even if they lead to mutually-exclusive or contradictory results and we can’t think about why some of them are better than others. Though my prior on people having priors might have been high, my posteriors about them aren’t looking so hot at the moment.

References: Dienes, Z. (2011). Bayesian Versus Orthodox Statistics: Which Side Are You On? Perspectives on Psychological Science, 6 (3), 274-290 DOI: 10.1177/1745691611406920

Gelman, A. (2008). Rejoinder. Bayesian Analysis, 3, 467-478.

Kass, R. (2011). Statistical inference: The big picture. Statistical Science, 26, 1-9.

How Much Does Amanda Palmer Trust Her Fans?

A new TED talk was put out today (though it won’t be today anymore by the time you read this) by Amanda Palmer entitled, “The Art of Asking”, which you can watch here. If the comments on the YouTube page of the talk are to be believed, it truly was an inspiring affair. Professional cynic that I am, the talk didn’t do much to inspire me; at least not in the way that Amanda probably intended it to. Now, for those of who you don’t know her, Amanda is (primarily, I think) famous for her music in The Dresden Dolls. One of the main thrusts of her talk centers around the question she poses towards the end: how do we let people pay for music, rather than how do we get people to pay for music. Part of Amanda’s answer to this question was to allow people to download her music on her website and let them pay whatever price they wanted for the download. So, if someone downloaded Amanda’s music from her site, they had the option of paying $0, $1, $5, $10, $15, $20, or $100 for it. Amanda further suggests that she views this as a sort of “trust” in her fans, presumably because she had given people the option of paying nothing, which is the option most economists would consider the “rational” one. While her talk is delivered with a strong emotional tone and the message is ostensibly positive, Amanda is still a human, so my guess is that there’s more to her trust than meets the eye.

And more than meets the shaved-off eyebrows as well

Admittedly, I don’t know much about Amanda beyond what I just heard. Though I am familiar with her music to some degree, I’ve never followed her personal life at all. Here’s what I do know: a quick browsing of her website shows me that while she will indeed allow people to choose their own price for her music and download it, she doesn’t seem to have that same policy towards shirts, CDs, vinyl, posters, art books, or the shipping and handling required to send any of them out. Something (the tour section of her website) also tells me the venues she plays at – which I’m imagining represent a substantial proportion of her income – don’t allow anyone to come in to see the show and pay whatever they feel like for tickets. It would seem that telling people, rather than asking them, to pay is the norm; not her exception. This raises the inevitably question: to what extent does the choose-your-own-price option reflect a genuine leap of faith, and how much of her TED talk is actually cheap talk?

Cheap talk is just what it sounds like: it’s a signal that is easy to produce. Like all signals, it functions to attempt and persuade another individual to change their behavior. Cheap talk, however, is of very questionable value precisely because it’s so easy to manufacture. For instance, let’s say that a man tries to convince a woman at a bar to have sex with him. He tells her that he’s fabulously wealthy, will remain faithful to her throughout his life, and see to it that she wants for nothing if she agrees. Tempting offer no doubt, but what’s guaranteeing that any of the information that the man is sending is true? It costs the man almost nothing to say the words, and once the two have sex, he’s free to go back on his word without penalty. However, if that same offer is made after a month of courtship where the man has paid for multiple dates, consistently dressed in expensive clothes, and accompanies the offer with a diamond ring, wedding ceremony, and legal contract that entitles the woman to half of everything he owns, we’ve stepped out the realm of cheap talk into costly signals. Because of those high costs, the signal is much harder to fake, so its honesty can be better guaranteed.

Now Amanda would like us to think that her choose-your-own-price option represents a costly signal of trust towards her fans. Indeed, she may well consciously believe that it is one, just as most people consciously believe they’re better than average at things that the majority of other people or less likely to have bad things happen to them. Since her personal, potentially self-serving feelings about the whole thing don’t necessarily reflect reality, this brings us back to the “how costly is her gesture?” question. As the internet stands right now, whether a musician provides the option for free downloading on their own website or not, the option likely exists somewhere. It took me all of three seconds to find a list of websites where I could have downloaded Amanda’s album for free anyway. In other words, if someone wanted to download her album without paying, they likely could have. This suggests that her pay-nothing option isn’t as trusting as it initially comes across. Not only is she not creating that option where it didn’t exist before, but, in all likelihood, it would exist regardless of whether she wanted it to or not. Counting this as “trust” is a bit like my saying that I “trust” gravity to do what it does; I don’t really have a choice in the matter.

Physics has yet to disappoint

On top of that preexisting problem of music downloads already being available, there’s another: to the best of my knowledge, the costs to letting someone download her album are minimal. While one could argue about how much money she would lose on account of people not paying, I’m talking more about the physical costs of sending the information to someone’s computer. Since there really is no cost there – and because the option to download for free would exist with or without Amanda’s seal of approval – Amanda is essentially undertaking zero risk in providing her ostensibly-trusting option. It requires no investment and no need for desire. Without that risk, assessing the credibility of the signal becomes very difficult, as was the case with the sex example above. How trusting would Amanda be when there are some actual risks involved? When she has the ability to create that trusting option, will she? This is where Amanda’s other merchandise comes to the rescue.

Things like shirts, posters, and physical copies of CDs cost actual money to produce, and the option to get these things for free doesn’t already exist. Nothing is stopping Amanda from paying out of pocket to have these items made and allowing her fans to pay whatever they want for them (from $0 a shirt a $100, for instance), yet this isn’t what she does. Once actual risk enters the picture – once Amanda needs to make a real initial investment – her trust sees to dry up in a hurry. Apparently, she doesn’t trust her fans enough to adequately compensate her on a shirt and the shipping cost when she has the option to. One could argue, I suppose, that a handful of amoral people could, in principle, ruin her financially by ordering dozens or hundreds of shirts from her for free online, and that same risk isn’t posed by the downloading of a CD. That would be a fair point, except there are multiple ways around it: the requirement of a credit card for the purchase (whatever the purchase price ended up being), a limit on the number of free or cheap items, or the option available to pay whatever you want for the merchandise, but only at live concerts. Admittedly, I don’t know if she does the last one; I just suspect she doesn’t offer it as default option.

Forgetting about the merchandise, we could also discuss ticket prices to the shows as well. A quick browsing of the links for tickets on her tour schedule shows tickets that can range from $15 to $60. Now of course ticket prices aren’t being set by Amanda herself, but, then again, venues aren’t set in stone either. Presumably Amanda could, if she wanted to, only schedule herself to play at venues where ticket prices could be determined (at least largely) by the willingness of the people who show up to pay. I’m sure there are plenty of venues – though not necessarily traditional ones – that would at least consider such an offer. Rather than take this approach, however, Amanda’s “art of asking” seems to involve first demanding people pay full price for tickets and merchandise and, in addition, asking them to then pay more, whether that more came in the form of additional money placed into a hat she passes around the crowd, giving her food, places to stay, practice space, or other items of interest.

“How can we let people pay $7 for a cup of coffee and then let them pay us even more?”

Now none of this is to say that Amanda is a bad person. As I said, I don’t know nearly enough about her to make that judgment one way or the other. This is merely to point out that the “trust” Amanda has in her fans certainly has its limits – many of them – as pretty much anyone’s does. That’s just the point though; there doesn’t seem to be anything particularly special going on here. Despite there being nothing special about it, Amanda seems to be trying to play it off as if it’s some great exercise in trust. The impossible-to-assess pretense is the part of the talk that inspired this post. There’s also the matter of the kickstarter she mentions. Amanda asked for $100,000 on kickstarter and ended up making over a million. Now I don’t find anything particularly egregious about that; if her fans wanted to support her, nothing was stopping them. What I did find curious, though, what her analysis of how that money would be spent. It seemed that she had a legitimate need for almost the full million. While that’s fine if she does, what’s curious about that was is if she needed the full million, why didn’t she ask for, well, the full million? Why only ask for the hundred thousand that clearly would have been grossly insufficient for her plans? Something about that analysis strikes me as off as well. Then again, if a pretense of trust is easy to manufacture, so is a pretense of need.

Statisticial Issues In Psychology And What Not To Do About Them

As I’ve discussed previously, there are a number of theoretical and practical issues that plague psychological research in terms of statistical testing. On the theoretical end of things, if you collect enough subjects, you’re all but guaranteed to find some statistically significant result, no matter how small or unimportant it might be. On the practical end of things, even if a researcher is given a random set of data they can end up finding a statistically significant (though not actually significant) result more often than they don’t by exercising certain “researcher degrees of freedom”. These degrees of freedom can take many forms, from breaking the data down into different sections, such as by sex, or high, medium, and low values of the variable of interest, or peaking at the data ahead of time and using that information to decide when to stop collecting subjects, among other methods. At the heart of many of these practical issues is the idea that the more statistic tests you can run, the better your chances of finding something significant. Even if the false-positive rate for any one test is low, with enough tests, the chances of a false-positive result rises dramatically. For instance, running 20 tests with an alpha of 0.05 on random data would result in a false-positive around 64% of the time.

“Hey every body, we got one; call off the data analysis and write it up!”

In attempts to banish false-positives from the published literature, some have advocated the use of what are known as Bonferroni corrections. The logic here seems simple enough: the more tests you run, the greater the likelihood that you’ll find something by chance so, to better avoid fluke results, you raise the evidentiary bar for each statistical test you run (or, more precisely, lower your alpha level). So, if you were to run the same 20 tests on random data as before, you can maintain an experiment-wide false-positive rate of 5% (instead of 64%) by adjusting your per-experiment error-rate to approximately 0.25% (instead of 5%). The correction, then, makes each test you do more conservative as a function of the total number of tests you run. Problem solved, right? Well, no; not exactly. According to Perneger (1998), these corrections not only fail to solve the initial problem we were interested in, but also create a series of new problems that we’re better off avoiding.

Taking these two issues in order, the first is that the Bonferroni correction will only serve to keep the experiment-wide false-positive rate a constant. While it might do a fine job at that, people very rarely care about that number. That is, we don’t care about whether there is a false-positive finding; we care about whether a specific finding is a false positive, and these two values are far from the same thing. To understand why, let’s return to our researcher who was running 20 independent hypothesis tests. Let’s say that, hypothetically, out of those 20 tests, 4 come back as significant at the 0.05 level. Now we know that the probability of making at least one type 1 error (false-positives) is 64%; what we don’t know is (a) whether any of our positive results are false-positives or, assuming at least one of them is, (b) which result(s) that happens to be. The most viable solution to this problem, in my mind, is not to raise the evidentiary bar across all tests, threatening to make all the results insignificant on account of the fact that one of them might just be a fluke.

There are two major reasons for not doing this: the first is that it will dramatically boost our type 2 error rate (failing to find an effect when one actually exists) and, even though this error rate is not the one that many conservative statisticians are predominately interested in, they’re still errors all the same. Even more worryingly, though, it doesn’t seem to make much sense to deem a result significant or not contingent on what other results you were examining. Consider two experimenters: one collects data on three variables of interest from the same group of subjects while a second researcher collects data on those three variables of interest, but from three different groups. Both researchers are thus running three hypothesis tests, but they’re either running them together or separately. If the two researchers were using a Bonferroni correction contingent on the number of tests they ran per experiment, the results might be significant in the latter case but not in the former, even the two researchers got identical sets of results. This lack of consistency in terms of which results get to be counted as “real” will only add to the confusion in the psychological literature.

“My results would have been significant, if it wasn’t for those other meddling tests!”

The full scale of the last issue might not have been captured by the two researcher example, so let’s consider another, single researcher example. Here, a researcher is giving a test to a group of subjects with the same 20 variables of interest, looking for differences between men and women. Among these variables, there is one hypothesis that we’ll call a “real” hypothesis: women will be shorter than men. The other 19 variables being assessed are being used to test “fake” hypotheses: things like whether men or women have a preference for drinking out of blue cups or whether they prefer green pens. A Bonferroni correction would, essentially, treat the results of the “fake” hypotheses as being equally as likely to generate a false-positive as the “real” hypothesis. In other words, Bonferroni corrections are theory-independent. Given that some differences between groups are more likely to be real than others, applying a uniform correction to all those tests seems to miss the mark.

To build on that point, as I initially mentioned, any difference between groups, no matter how small, could be considered statistically significant if your sample size is large enough due to the way that significance is calculated; this is one of the major theoretical criticisms of null hypothesis testing. Conversely, however, any difference, no matter how large, could be considered statistically insignificant if you run enough additional irrelevant tests and apply a Bonferroni correction. Granted, in many cases that might require a vast number of additional tests, but the precise number of tests is not the point. The point is that, on a theoretical level, the correction doesn’t make much sense.

While some might claim that the Bonferroni correct guards against researchers making excessive, unwarranted claims, there are better ways of guarding against this issue. As Perneger (1998) suggests, if researchers simply describes what they did (“we ran 40 tests and 3 were significant, but just barely”), that can generally be enough to help readers figure out whether the results were likely to be the chance outcomes of a fishing expedition or not. The issue is that this potential safeguard is that it would require researchers to accurately report all their failed manipulations as well their successful ones, which, for their own good, many don’t seem to do. One guard that Perneger (1998) does not explicitly mention which can get around that reporting issue, however, is the importance of theory in interpreting the results. As most psychological literature currently stands, results are simply redescribed, rather than explained. In this world of observations equaling explanations and theory, there is little way to separate out the meaningful significant results from the meaningless ones, especially when publication bias generally hinders the failed experiments from making it into print.

What failures-to-replicate are you talking about?

So long as people continue to be impressed by statistically significant results, even when those results cannot be adequately explained or placed into some larger theoretical context, these statistical problems will persist. Applying statistical corrections will not solve, or likely even stem, the research issues in the way psychological research is current conducted. Even if such corrections were honestly and consistently applied, they would likely only change the way psychological research is conducted, with researchers turn to an altogether less-efficient means in order to compensate for the reduced power (running one hypothesis per experiment, for instance).  Rather than demanding a higher standard of evidence for fishing expeditions, one might instead focus on reducing the prevalence of these fishing expeditions in the first place.

References: Perneger TV (1998). What’s wrong with Bonferroni adjustments? BMJ (Clinical research ed.), 316 (7139), 1236-8 PMID: 9553006

What Should We Mean When We Say “Universal”?

My last post prompted a series of spirited discussions, each of which I found interesting for slightly different reasons. Over the course of one of those discussion, a commenter over at Psychology Today (H/T to Anthro_girl) referred me to an article entitled “Darwin in mind: New opportunities for evolutionary psychology” (Bolhuis et al. 2011). I haven’t yet decided if I this will turn into a series of posts on the ideas presented in that article, but there is one point in particular I would like to focus in on for the current purposes, and it’s entirely semantic in nature: what the term “universal” ought to mean. Attempts on clearing up semantic confusion tend to be unproductive in my experience, but I think it’s important to at least give these matters a deeper consideration, as they can breed the appearance of disagreement, despite two parties saying essentially the same thing (what has been previously called “violent agreement“, and I think represents the bulk of the ideas found in the article).

“You’re absolutely right and I respect your position, which is also my own!”

The first point I would like to mention is that I find Bolhuis et al’s (2011) wording quite peculiar: they seem to, at least at some points, contrast “flexibility” with universality. It sounds as if they are trying to contrast “genetic determinism” with flexibility instead, which seems to be a fairly common mistake people make when criticizing what they think evolutionary psychology assumes. Since that point is a fairly common misunderstanding, there’s little need to go over it again here, but it does give me an opportunity to think about what it means for a trait to be universal, using their example of sexual selection. The authors suggest that as a number of environmental cues (encounter rates, cost of parental investment, etc) change, so too should we expect mating strategies to change: change the inputs to a system, change the outputs. Now nothing about that analysis strikes me as particularly incorrect, but the implication that follows it does: specifically, a universal trait ought not to show much, if any, variation. Well, OK, they don’t really imply it so much as they flat-out say it:

“Arguably, the more flexible and variable the exhibited behaviour, the less explanatory power can be attributed to evolved structure in the mind.”  

Their analysis seems to misstep in regard to why those other variables might matter in determining variation. In order for variables, like encounter rates or the likely costs of parental investment, to matter in the first place, some other psychological mechanisms need to be sensitive to those inputs; other evolved structures of the mind. If no evolved structures are sensitive to those inputs, or the structures which are sensitive to those variables aren’t hooked up to the structures that determine sexual behavior, there wouldn’t be any consistent effect of their presence or absence. Thus, finding variation in a trait, like sexual selectively, doesn’t tell you much about whether the mechanisms involved in determining said behavior are universal or not. This does, however, raise in inevitable question about universality: do we need to expect a near-perfectly consistent expression of a trait in order to call it universal?

I would think not. This gets at a distinction highlighted by Norenzayan & Heine (2005) between various types of universality, specifically the “functional” and “accessible” varieties. The functional type refers to traits that use the same underlying mechanisms and solve the same kinds of problems (so if people in all cultures use hammers to beat in nails, hammers would be functionally universal); the accessible type is the same as the functional type, only that it is used to pretty much the same degree across different cultures (all cultures would need to use their hammers approximately the same amount). In other words, then, different cultures might differ with respect to how sexually selective men tend to be relative to women, but in all people there are still the same underlying mechanisms at work and they are still used to solve the same kinds of problems, so we can still feel pretty good about calling that difference in sexual selectivity a universal. While that’s all well and good, it does create a new problem, though: how much variation counts as “a lot of it”, or at least enough of it to warrant one classification or the other?

Fairly mundane for basketball, but maybe the most exciting soccer match ever.

Two examples should help clear this up. Let’s say you’re a fairly boring kind of researcher and find yourself examining finger length cross-culturally, trying to determine if finger length is universal. You get your ruler out, figure out a way to convince thousands of people the world over to let you examine their hands in dozens of different languages, and locate a nice grant to cover all your travel costs (along with the time you won’t be spending doing other things like teaching or seeing your friends and family). After months of hard work, you’re finally able to report your findings: you have found that middle fingers are approximately 2.75 inches in length and, between cultures, that mean varies between 2.65 inches and 3.25 inches. From this, are we to conclude that middle finger length is or is not universal?

The answer to this question is by no means straight forward; it seems to be more of an “I know it when I see it” kind of judgment call. There clearly is some variation, but is there enough variation there to be meaningful? Would middle fingers be classified as a “functional” universal or an “accessible” universal (if such labels made sense in case of fingers, that is)?  While the finger might seem a bit strange as an example, it has a major benefit: it involves a trait that is rather easy to find a generally agreed upon definition and form of measurement. Let’s say that you’re interested in looking at something a more difficult to assess, like the aforementioned sexually selectively. Now all sorts of new questions will come creeping in: is your test the best way of assessing what you hope to? Is your method one that is likely to be interpreted in a consistent manner across cultures? The initial question still needs to be answered as well: how much variation is enough? If the difference in sexual selectively between men and women is twice as large in culture A, relative to culture B, does that make it a functional or an accessible universal? What is that difference was only 1.5 times the size from culture to culture, or 3 times the size? From what I could gather, there really is no hard or fast rule for determining this, so the distinction might appear to be more arbitrary than real.

While these are all worthwhile questions to consider and difficult ones to answer, let’s assume that we were able to provide answers to them, in some form and find that sexually selectively, while functionally universal, is not what we would consider an accessible universal (that is there is a significant amount, whatever that happens to be, of variance between cultures in its size). While the variance you turned up is all well and good, what precisely is that variance a product of? There are many cognitive mechanisms that play a role in determining sexual selectivity, and our finding that sexual selectively isn’t an accessible universal doesn’t answer the question as to which components that determine that trait are or are not accessible universals. Perhaps approach rate is an accessible universal, but the male/female ratio in a population is only a functional universal. This could, in particular cases, even lead us to some odd conclusions: if one of the mechanisms that helps determines sexual selectivity isn’t an accessible universal in that instance, it might well be considered an accessible universal in another where its output is used to determine some other trait. For instance, hypothetically, sex ratio might not be an accessible universal when it comes to sexual selectivity, but could be one when it comes to determining some propensity for violence. In other cases, sex ratio might be a functional or accessible universal, but only depending on what test you’re using (on a Likert scale, it might only be functionally universal; in a singles bar, it might be accessibly universal).

Riveting as I’m sure you all find this, I’ll try and wrap it up.

So, as before, attempts to clear up semantic confusion have not necessarily been successful. Then again, if matters like this were simple, it’s doubtful that these kinds of disagreement would have cropped up in the first place. Hopefully, some the issues between focusing on the outputs of mechanisms versus the mechanisms themselves have at least been highlighted. There are two final points to make about the idea of universality: first, if there was no underlying universal human nature, cross-cultural research would be all but impossible to conduct, as foreign cultures would not be able to be understood at all in the first place. Secondly, that point is demonstrated well by what I would call cross-cultural cross-fostering. More precisely, as Norenzayan & Heine (2005) note, when infants from other cultures are raised in a new one (say an Asian family immigrates to America), within two or three generations, the children of that family will be all but indistinguishable from their “new” cultural peers. Without an underlying set of universal psychological mechanisms, it’s unclear precisely how such adaptation would be possible.

So yes, while WEIRD undergraduates might not give you a complete picture of human psychology, it doesn’t mean that they offer nothing, or even very little. The differences between cultures can hide the oceans of similarity that lurk right underneath the surface. It’s important to not lose sight of the forest for a few trees.

References: Bolhuis JJ, Brown GR, Richardson RC, & Laland KN (2011). Darwin in mind: new opportunities for evolutionary psychology. PLoS biology, 9 (7) PMID: 21811401

Norenzayan, A., & Heine, S. (2005). Psychological Universals: What Are They and How Can We Know? Psychological Bulletin, 131 (5), 763-784 DOI: 10.1037/0033-2909.131.5.763

 

Is It Only “Good” Science When It Confirms Your World View?

Most people, when critical of some finding or some field, try to do things like keep their biases hidden, opting instead to try and argue from a position of perceived intellectual neutrality. Kate Clancy, evidently, is not most people. In her recent post at Scientific American, she lays it all out there, right in the title: “5 Ways to Make Progress in Evolutionary Psychology: Smash, Not Match, Stereotypes“. So, there you have it: if evolutionary psychology wants to progress as field, the practitioners ought to ensure we are getting results that Kate finds to be personally palatable so, rather than run experiments, we ought to just ask her what she likes instead. I can only imagine how much time and money this will save us all when it comes to collecting data and getting through the review boards, never mind all that pesky theory development. Of course, her suggestion for progression in the field might not be useful when it comes to developing and testing hypotheses about subjects that aren’t (heavily) stereotyped, but, in all fairness, her suggestion isn’t likely to be helpful in any case at all.

“Sure, it might not run, but at it does that 100% of the time!”

Thankfully, Kate is willing to suggest five more specific criticisms of where she thinks evolutionary psychology stands to be improved. I’m sure that her criticisms here will be enlightening for all the evolutionary psychologists, as the alternative – that she’s proposing things which have already been repeatedly acknowledged and cautioned against by every major researcher in the field from its inception – would probably be pretty embarrassing for her. Sure, the critics of evolutionary psychology have been known to be ignorant of the field they’re criticizing as a general rule, but stereotypes aren’t always true. Hopefully Kate will, like any good scientist should, according to her, bust that stereotype, demonstrating both her fluency in understanding the theoretical commitments of the field and also pointing out their deficiencies. Since I’m a non-progressive evolutionary psychologist, this leaves me stuck with the grim task of confirming the stereotype that critics of my field tend to, in fact, know very little about it. Five rounds and one issue: the progression of evolutionary psychology as a field.

Round 1: [Evolutionary Psychologists] aren’t measuring what we think we are.

The point here is that evolutionary psychologists sometimes use proxy measures to measure other variables. So, for instance, if you want to study some theoretical construct like, say, “general intelligence”, you might use the results of some other test, like an IQ test, to draw inferences about the initial construct (people who score high on the IQ test have a lot of general intelligence). Now there’s nothing wrong with pointing out the fact that these proxy measures might not be tapping the underlying construct that you think they are, nor is it particularly problematic to point out that the underlying construct you think you’re measuring might not even exist. I’m fine with all that. Where I get lost is when I consider what any of it has to do with evolutionary psychology, specifically. Are evolutionary researchers worse at creating or using proxy measures? Does this point speak to the theoretical foundations of evolutionary psychology in any way? Since Kate provides no evidence to help answer the first question, I’ll assume that answer is probably a no (unless Kate is just stereotyping evolutionary researchers as poor in this department). Since proxy measures in no way at all speak to the theoretical commitments of the field itself, this entire point seems rather misguided. If she was talking about the field of psychology more generally, sure, this is a research pitfall to avoid; it’s just not one specific to my field. Round one goes to stereotype confirming evolutionary psychology.

Round 2: Undergrads only teach us about undergrads.

Kate’s criticism here comes in two parts: concerns for generalizability across samples and concerns that undergraduates can’t tell us anything useful about human psychology. Taking them in order, in psychology more generally there is a reliance on undergraduate samples, mainly because they’re cheap and convenient. The problem, though, is that the results of research on some of these undergraduates (typically those taking introductory to psychology, no less), might not tell us much about people who differ from them, either in age, race, education, nationality, social life, etc. On that account, Kate is indeed correct: there might or might not be problems in generalizing from handfuls of undergraduates to the human race more generally. Again, however, this criticism runs directly into the same hurdle her last one did: it’s not specific to any of the theoretical commitments of evolutionary psychology. The problem here is one faced by psychology more generally and, if anything, the people who tend to realize the importance of cross-cultural as well as cross-species research tend to be evolutionary people, at least in my experience.

Her second point, however, is even worse. Kate seems to go from undergraduates might not be able to tell us much about the human species to undergraduates definitely do not tell us anything useful, or, as she puts it, are “about as far removed from the conditions in which we evolved as you can get“. What Kate fails to recognize is that, in the vast majority of respects, these undergraduates are very similar to people everywhere else: they form relationships, both sexual and social, they discriminate between potential mates, they reason, they morally condemn others, they defend against moral condemnation, they eat, they sleep, they reciprocate, they punish non-reciprocation, they learn language, and so on. Focusing on a few superficial differences between groups of people can, it seems, make one miss the oceans of similarity between them. Just because undergraduates aren’t living as hunter-gatherers, it does not follow that they have nothing useful to tell us about human psychology. Round two also goes to stereotype confirming evolutionary psychology.

Three more rounds to go. I’m sure you’ll turn it around…

Round 3: It’s not true that everything happens for a reason.

This charge is a classic one: there’s more to evolution than selection; there are also byproducts, drift, and mutation and those evolutionary psychologists need to recognize this! In Kate’s example, for instance, evolutionary psychologists might make up adaptive stories about her choice of sock color. If that was the state of evolutionary psychology, we truly would be a field in need of scolding. Now I could point out that, a little over two decades ago, in what might be considered the foundational text of the field, the byproduct, drift, and mutation issues are all discussed, and every major figure in the field has, at many points, explicitly acknowledged the role of these forces (see here, specifically charge 2) and leave it at that. I could also point out, as I have done before, that predictions derived from hypotheses of drift don’t tend to make very useful predictions. However, there are two additional points to not miss.

First, suggesting that psychological traits have adaptive functions is a step up from most non-evolutionary psychology, which tends to either posits unless functions (i.e. self-esteem or ego defense) or no functions at all. In this regard, evolutionary psychology is better, not worse, for it. Secondly, and more importantly, Kate gets a lot wrong in this section. Her initial point about how not all behaviors are the result of psychological adaptations misses the point entirely. Her behavior – choice of sock color, in this case – might not be the result of a specific module designed with the function of choosing sock color, but it would be a mistake to, from that, conclude it wasn’t result of other psychological adaptations. This would be as silly as my concluding that, because my body didn’t evolve to eat pop-tarts, my ability to digest them must not be a result of any physiological adaptations designed for digestion. On top of this misunderstanding, she then goes on to suggest that adaptations are heritable, by which she means some variation in them must be due to unique genetic factors. Under this logic, hands aren’t adaptations, because variation in having hands tends to not have a heritable genetic component (as well pretty much all do have hands). Anyone familiar with adaptationist logic will tell you pretty much the opposite: many adaptations – like livers and hands – tend to show very low heritability, because selection tends to remove heritability from the population. Round three is over, and it’s not looking so good for stereotype disconfirmation.

Round 4: There is more than one way [to reproduce]

This point suggests that, apparently, evolutionary psychologists have yet to realize that there’s more than one successful strategy that people can adopt when it comes to reproduction. We apparently don’t realize that there are many possible routes to take, and variable degrees of taking them. This is not only false; it’s stunningly false. In fact, in the next paragraph, Kate mentions that, sure, evolutionary psychologists have done research on some of these different, competing strategies, but it apparently wasn’t up to her standards. If she prefers a more nuanced view than the one she (likely incorrectly) perceives in the people doing research concerning whether one is more of a cad or a dad, she’s more than welcome to it. The researchers in the field would, if her view is better or has something they missed, happily accept the contribution. Were she to offer her view, however, my guess is that she’ll end up publicly disagreeing with an opinion that no serious researcher holds; basically what she is doing here. However, to imply, as she does, that evolutionary researchers don’t appreciate and attempt to understand variation, is just plain stupid, especially right after she points out that evolutionary psychologists already do it.

Kate then seems to try and say something about homosexuality, but, I admit, her point there is lost on me. It might be something along the lines of, “people who identify as non-straight sometimes have children, so there’s nothing to see here”, but I’ll admit that I’m having a hard time following what she’s trying to say, much less what the relevance would be. Round four, unsurprisingly, isn’t going to Kate.

Round 5: Just because [it's currently adaptive, that doesn't mean it previously was]

The only point I really want to make here is noting that Kate gets the definition of the environment of evolutionary adaptedness (EEA) dead wrong. As anyone familiar with this concept, or the primer on the subject, can tell you, the EEA is not a time or a place (much less on a savannah where everyone lived happily, as Kate seems to think it is), but the statistical aggregate of selective forces that shaped an adaptation. Thus, the EEA for language is different from the EEA for mate preference which is different still from the EEA for hands. I suppose I could also mention that every evolutionary psychologists knows that people do some things today – like wear heels and use hormonal birth control – that they used to not do during our evolutionary history, but, at this point, it seems to be so blindingly obvious to anyone that it hardly seems worth repeating. Final round goes to stereotype confirmation.

“I don’t understand your position, yet remain convinced you’re wrong!”

Now I would love to be the good, progressive scientist that Kate wants me to be and disconfirm the stereotype that evolutionary psychology’s critics are ignorant of the field they’re criticizing, but it’s difficult to do so when she, like so many others, confirms that stereotype. Of the concerns she lists, a collective none of them deal with the theoretical foundations of the field, the first two have more to do with research methodology than evolutionary psychology specifically (and even those two don’t paint evolutionary psychologists in a particularly bad light), and the remaining ones get basic definitions wrong while simultaneously misrepresenting the researchers in the field as being unsophisticated. Now, in all fairness to Kate, she does mention that she’s talking about what she thinks bad evolutionary psychology is, but it’s not clear to me that she has a solid enough grasp of the field to be making those kinds of pronouncements in the first place (not to mention she waivers back and forth between using that qualifier and dropping it, writing about evolutionary psychology as a whole). I also really don’t appreciate her insinuation that our field does politically-motivated research with the intent of keeping LBGT folks second-class citizens at the end either (which, by the way, we don’t; Tybur, Miller, & Gangestad, 2007), but at least she’s upfront about her biases, no matter how incorrect they happen to be.

References: Tybur, J., Miller, G., & Gangestad, S. (2007). Testing the controversy: An empirical examination of adaptationists’ attitudes towards politics and science. Human Nature, 18 (4), 313-328 DOI: 10.1007/s12110-007-9024-y

Should You Give A Damn About Your Reputation (Part 2)

In my last post, I outlined a number of theoretical problems that stand in the way of reputation being a substantial force for maintaining cooperation via indirect reciprocity. Just to recap them quickly: (1) reputational information is unlikely to be spread much via direct observation, (2) when it is spread, it’s most likely to flow towards people who already have a substantial amount of direct interactions with the bearer of the reputation, and (3) reputational information, whether observed visually or transmitted through language, might often be inaccurate (due to manipulation or misperception) or non-diagnostic of an individual’s future behavior, either in general or towards the observer. Now all of this is not to say that reputational information would be entirely useless in predicting the future behavior of others; just that it seems to be an unlikely force for sustaining cooperation in reality, despite what some philosophical intuitions written in the language of math might say. My goal today is to try and rescue reputation as a force to be reckoned with.

In all fairness, I did only say that I would try

The first – and, I think, the most important – step is to fundamentally rethink what this reputational information is being used to assess. The most common current thinking about what third-party reputation information is being used to assess would seem to be the obvious: you want to know about the character of that third party, because that knowledge might predict how that third party will act towards you. On top of assuming away the above problems, then, one would also need to add in the assumption that interactions between you and the third party would be relatively probable. Let’s return to the example of your friend getting punched by a stranger at a bar one night. Assuming that you accurately observed all the relevant parts of the incident and the behavior of the stranger there was also predictive of how he would behave towards you (that is, he would attack you unprovoked), if you weren’t going to interact with that stranger anyway, regardless of whether you received that information or not, while that information might be true, it’s not valuable.

But what if part of what people are trying to assess isn’t how that third party will behave towards them, but rather how that third party will behave towards their social allies. To clarify this point, let’s take a simple example with three people: A, B, and X. Person A and B will represent you and your friend, respectively; person X will represent the third party. Now let’s say that A and B have a healthy, mutually-cooperative relationship. Both A and B benefit form this relationship and have extensive histories with each other. Person B and X also have a relationship and extensive histories with one another, but this one is not nearly as cooperative; in fact, person X is downright exploitative over B. Given that A and X are otherwise unlikely to ever interact with each other directly, why would A care about what X does?

The answer to this question – or at least part of that answer – involves A and X interacting indirectly. This requires the addition of a simple assumption, however: the benefits that person B delivers to person A are contingent on person B’s state. To make this a little less abstract, let’s just use money. Person B has $10 and can invest that money with A. For every dollar that B invests, both players end up making two. If B invests all his money, then, both him and person A end up with $20. In the next round, B has his $10, but before he gets a chance to invest it with A, person X comes along and robs B of half of it. Now, person B only has $5 left to invest with A, netting them both $10. In essence, person X has now become person A’s problem, even though the two never interacted. All this assumption does, then, is make clear the fact that people are interacting in a broader social context, rather than in a series of prisoner’s dilemmas where your payoff only depends on your own, personal interactions.

Now if only there was a good metaphor for that idea…

With the addition of this assumption, we’re able to circumvent many of the initial problems that reputational models faced. Taking them in reverse order, we are able to get around the direct-interaction issue, since your social payoffs now co-vary to some extent with your friends, making direct interaction no longer a necessary condition. It also allows us to circumvent the diagnosticity issue: there’s less of a concern about how a third party might interact with you differently than your friend because it’s the third party’s behavior towards your friend that you’re trying to alter. It also, to some extent, allows us to get around the accuracy issue: if your friend was attacked and lies to you about why they were attacked, it matter less, as one of your primary concerns is simply making sure that your friend isn’t hurt, regardless of whether your friend was in the right or not. This takes some of the sting out of the issues of misperception or misinformation.

That said, it does not take all the sting out. In the previous example, person A has a vested interest in making sure B is not exploited, which gives person B some leverage. Let’s alter the example a bit, and say that person B can only invest $5 with person A during any given round; in that case, if X steals $5 from B’s initial $10, it wouldn’t affect person A at all. Since person B would rather not be exploited, they might wish to enlist A’s help, but find person A less than eager to pitch in. This leaves person B with three options: first, B might just suck it up and suffer the exploitation. Alternative, B might consider withholding cooperation from A until A is willing to help out, similar to B going on a strike. If person B opts for this route, then all concerns for accuracy are gone; person A helping out is merely a precondition of maintaining B’s cooperation. This strategy is risky for B, however, as it might look like exploitation from A’s point of view. As this makes B a costlier interaction partner, person A might consider taking his business elsewhere, so to speak. This would leave B still exploited and out a cooperative partner.

There is another potential way around the issue, though: person B might attempt to persuade A that person X really was interfering in such a way that made B unable to invest; that is, person B might try to convince A that X had really stolen $8 instead of $5. If person B is successful in this task, it might still make him look like a costlier social investment, but not because he is himself attempting to exploit A. Person B looks like he really does want to cooperate, but is being prevented from doing so by another. In other words, B looks more like a true friend to A, rather than just a fair-weather one or an exploiter (Tooby & Cosmides, 1996). In this case, something like manifesting depression might work well for B to recruit support to deal with X (Hagen, 2003). Even if such behavior doesn’t directly stop X from interfering in B’s life, though, it might also prompt A to increase their investment in B to help maintain the relationship despite those losses. Either way, whether through avoiding costs or gaining benefits, B can leverage their value with A in these interactions and maintain their reputation as a cooperator.

“I’ll only show back up to work after you help me kill my cheating wife”

Finally, let’s step out of the simple interaction into the bigger picture. I also mentioned last time that, sometimes, cooperating with one individual necessitates defecting on another. If person A and B allied against person X, if person Y is cooperating with X, person Y may now also incur some of the punishment A and B direct at X, either directly or indirectly. Again, to make this less abstract, consider that you recently found out your friend holds a very unpopular social opinion (say, that women shouldn’t be allowed to vote) that you do not. Other people’s scorn for your friend now makes your association with him all the more harmful for you: by benefiting him, you can, by proxy, be seen to either be helping him promote his views, or be inferred to hold those same views yourself. In either case, being his friend has now become that much costlier, and the value of the relationship might need to be reassessed in that light, even if his views might otherwise have little impact on your relationship directly. Knowing that someone has a good or bad reputation more generally can be seen as useful information in this light, as it might tell you all sorts of things about how costly an association with them might eventually prove to be.

References: Hagen, E.H. (2003). The bargaining model of depression. In: Genetic and Cultural Evolution of Cooperation, P. Hammerstein (ed.). MIT Press, 95-123

Tooby, J., & Cosmides, L. (1996). Friendship and the banker’s paradox:Other pathways to the evolution of adaptations for altruism. Proceedings of the British Academy (88), 119-143

Should You Give A Damn About Your Reputation? (Part 1)

According to Nowak (2012) and his endlessly-helpful mathematical models, once one assumes that cooperation can be sustained via one’s reputation, one ends up with the conclusion that cooperation can, indeed, be sustained (solely) by reputation, even if the same two individuals in a population never interact with each other more than once. As evidenced by the popular Joan Jett song, Bad Reputation, however, one can conclude there’s likely something profoundly incomplete about this picture: why would Joan give her reputation the finger in this now-famous rock anthem, and why would millions of fans be eagerly singing along, if reputation was that powerful of a force? The answer to this question will involve digging deeper into the assumptions that went into Nowak’s model and finding where they have gone wrong. In this case, not only are some of the assumptions of Nowak’s model a poor fit to reality in terms of the one’s he makes, but, perhaps more importantly, also poor in regards to what assumptions he doesn’t make.

Unfortunately, my reply to some current thinking about reputation can’t be expressed as succinctly.

The first thing worth pointing out here is probably that Joan Jett was wrong, even if she wasn’t lying: she most certainly did give a damn about her reputation. In fact, some part of her gave so much of a damn about her reputation that she ended up writing a song about it, despite that not being her conscious intent. More precisely, if she didn’t care about her reputation on any level, advertising that fact to others would be rather strange; it’s not as if that advertisement would provide Joan herself with any additional information. However, if that advertisement had an effect on the way that other people viewed her – updating her reputation among the listeners – her penning of the lyrics is immediately more understandable. She wants other people to think she doesn’t care about her (bad) reputation; she’s not trying to remind herself. There are a number of key insights that come from this understanding, many of which speak to the assumptions of these models of cooperation.

The initial point is that Joan needed to advertise her reputation. Reputations do not follow their owners around like a badge; they’re not the type of thing that can be accurately assessed on sight. Accordingly, if one does not have access to information about someone’s reputation, then their reputation, good or bad, would be entirely ineffective at deciding how to treat that someone. This problem is clearly not unsolvable, though. According to Sigmund (2012), the simple way around this problem involves direct observation: if I observe a person being mean to you, I can avoid that person without having to suffer the costs of their meanness firsthand. Simple enough, sure, but there are many problems with this suggestion too, some of which are more obvious than others. The first of these problems would be that a substantial amount – if not the vast majority – of (informative and relevant) human interactions are not visible to many people beyond those parties who are already directly involved. Affairs can be hidden, thieves can go undetected, and promises can be made in private, among other things (like, say, browsing histories being deleted…). Now that concern alone would not stop reputations derived from indirect information from being useful, but it would weaken its influence substantially if few people ever have access to it.

There’s a second, related concern that weakens it further, though: provided an interaction is observed by other parties, those who most likely to be doing the observing in the first place are the people who probably already have directly interacted with one or more of the others they’re observing; a natural result of people not spending their time around each other at random. People only have a limited amount of time to spend around others, and, since one can’t be in two places at once, you naturally end up spending a good deal of that time with friends (for a variety of good reasons that we need not get into now). So, if the people who can make the most use of reputational information (strangers) are the least likely to be observing anything that will tell them much about it, this would make indirect reciprocity a rather weak force. Indeed, as I’ve covered previously, research has found that people can make use of indirectly-acquired reputation information, and do make use of it when that’s all they have. Once they have information from direct interactions, however, the indirect variety of reputational information ceases to have an effect on their behavior. It’s your local (in the social sense; not necessarily physical-distance sense) reputation that’s most valuable. Your reputation more globally – among those you’re unlikely to ever interact much with – would be far less important.

See how you don’t care about anyone pictured here? The feeling’s mutual.

The problems don’t end there, though; not by a long shot. On top of information not being available, and not being important, there’s also the untouched matter concerning whether the information is even accurate. Potential inaccuracies can come in three forms: passive misunderstandings, active misinformation, and diagnosticity. Taking these in order, consider a case where you see your friend get punched in the nose from across the room by a stranger. From this information, you might decide that it’s best to steer clear of that stranger. This seems like a smart move, except for what you didn’t see: a moment prior your friend, being a bit drunk, had told the stranger’s wife to leave her husband at the bar and come home with him instead. So, what does this example show us? That even if you’ve directly observed an interaction, you probably didn’t observe one or more previous interactions that led up to the current one, and those might well have mattered. To put this in the language of game theorists, did you just witness a cooperator punishing a defector, a defector harming a cooperator, or some other combination? From your lone observation, there’s no sure way to tell.

But what if your friend told you that the other person had attacked them without provocation? Most reputational information would seem to spread this way, given that most human interaction is not observed by most other people. We could call this the “taking someone else’s word for it” model of reputation. The problems here should be clear to anyone who has ever had friends: it’s possible your friend had misinterpreted the situation, or that your friend had some ulterior motive for actively manipulating your perception that person’s reputation. To again rephrase this in terms of game theorist’s language, if cooperators can be manipulated into punishing other cooperators, either through misperception or misinformation, this throws another sizable wrench into the gears of the reputation model. If one’s reputation can be easily manipulated, this, to some extent, will make cooperation more costly (if one fails to reap some of cooperation’s benefits or can offset some of defection’s costs). Talk is cheap, and indirect reciprocity models seem to require a lot of it.

This brings us to the final accuracy point: diagnosticity. Let’s say that, hypothetically, the stranger did attack your friend without provocation, and this was observed accurately. What have you learned from this encounter? Perhaps you might infer that the stranger is likely to be an all-around nasty person, but there’s no way to tell precisely how predictive that incident is of the stranger’s later behavior, either towards your friend or towards you. Just because the stranger might make a bad social asset for someone else, it does not mean they’ll make a bad social asset for you, in much the same way that my not giving a homeless person change doesn’t mean my friends can’t count on my assistance when in need. Further, having a “bad” reputation among one group can even result in my having a good relationship with a different group; the enemy of my enemy is my friend, as the saying goes. In fact, that last point is probably what Joan Jett was advertising in her iconic song: not that she has a bad reputation with everyone, just that she has a bad reputation among those other people. The video for her song would lead us to believe those other people are also, more or less, without morals, only taking a liking to Joan when she has something to offer them.

The type of people who really don’t give a damn about their reputation.

While this in not an exhaustive list of ways in which many current assumptions of reputation models are lacking (there are, for instance, also cases where cooperating with one individual necessitates defecting on another), it still poses many severe problems that need to be overcome. Just to recap: information flow is limited, that flow is generally biased away from the people who need it the most, there’s no guarantee of the accuracy of that information if it’s received, and that information, even if received and accurate, is not necessarily predictive of future behavior. The information might not exist, might not be accurate, or might not matter. Despite these shortcomings, however, what other people think of you does seem to matter; it’s just that the reasons it matters need to be, in some respects, fundamentally rethought. Those reasons will be the subject of the next post.

References: Nowak, M. (2012). Evolving cooperation. Journal of Theoretical Biology, 299, 1-8.

Sigmund, K. (2012). Moral assessment in indirect reciprocity Journal of Theoretical Biology, 299, 25-30 DOI: 10.1016/j.jtbi.2011.03.024