I Find Your Lack Of Theory (And Replications) Disturbing

Let’s say you find yourself in charge of a group of children. Since you’re a relatively-average psychologist, you have a relatively strange hypothesis you want to test: you want to see whether wearing a red shirt will make children better at dodge ball. You happen to think that it will. I say this hypothesis is strange because you derived it from, basically, nothing; it’s just a hunch. Little more than a “wouldn’t it be cool if it were true?” idea. In any case, you want to run a test of your hypothesis.You begin by lining the students up, then you walk past them and count aloud: “1, 2, 1, 2, 1…”. All the children with a “1″ go an put on a red shirt and are on a team together; all the children with a “2″ go and pick a new shirt to put on from a pile of non-red shirts. They serve as your control group. The two teams then play each other in a round of dodge ball. The team wearing the red shirts comes out victorious. In fact, they win by a substantial margin. This must mean that the wearing the red shirts made students better at dodge ball, right? Well, since you’re a relatively-average psychologist, you would probably conclude that, yes, the red shirts clearly have some effect. Sure, your conclusion is, at the very least, hasty and likely wrong, but you are only an average psychologist: we can’t set the bar too high.

“Jump was successful (p < 0.05)”

A critical evaluation of the research could note that just because the children were randomly assigned to groups, it doesn’t mean that both groups were equally matched to begin with. If the children in the red shirt group were just better beforehand, that could drive the effect. It’s also likely that the red shirts might have had very little to do with which team ended up winning. The pressing question here would seem to be why would we expect red shirts to have any effect? It’s not as if a red shirt makes a child quicker, stronger, or better able to catch or throw than before; at least not for any theoretical reason that comes to mind. Again, this hypothesis is a strange one when you consider its basis. Let’s assume, however, that wearing red shirts actually did make children perform better, because it helped children tap into some preexisting skill set. This raises the somewhat obvious question: why would children require a red shirt to tap into that previously-untapped resource? If being good at the game is important socially – after all, you don’t want to get teased by the other children for your poor performance – and children could do better, it seems, well, odd that they would ever do worse. One would need to posit some kind of trade-off effected by shirt color, which sounds like kind of an odd variable for some cognitive mechanism to take into account.

Nevertheless, like any psychologist hoping to further their academic career, you publish your results in the Journal of Inexplicable Findings. The “Red Shirt Effect” becomes something of a classic, reported in Intro to Psychology textbooks. Published reports start cropping up from different people who have had other children wear red shirts and perform various tasks athletic task relatively better. While none of these papers are direct replications of your initial study, they also have children wearing red shirts outperforming their peers, so they get labeled “conceptual replications”. After all, since the concepts seem to be in order, they’re likely tapping the same underlying mechanism. Of course, these replications still don’t deal with the theoretical concerns discussed previously, so some other researchers begin to get somewhat suspicious about whether the “Red Shirt Effect” is all it’s made out to be. Part of these concerns are based around an odd facet of how publication works: positive results – those that find effects – tend to be favored for publication over studies that don’t find effects. This means that there may well be other researchers who attempted to make use of the Red Shirt Effect, failed to find anything and, because of their null or contradictory results, also failed to publish anything.

Eventually, word reaches you of a research team that attempted to replicate the Red Shirt Effect a dozen times in the same paper and failed to find anything. More troubling still, for you academic career, anyway, their results saw publication. Naturally, you feel pretty upset by this. Clearly the research team was doing something wrong: maybe they didn’t use the proper shade of red shirt; maybe they used a different brand of dodge balls in their study; maybe the experimenters behaved in some subtle way that was enough to counteract the Red Shirt Effect entirely. Then again, maybe the journal the results were published in doesn’t have good enough standards for their reviewers. Something must be wrong here; you know as much because your Red Shirt Effect was conceptually replicated many times by other labs. The Red Shirt Effect just must be there; you’ve been counting the hits in the literature faithfully. Of course, you also haven’t been counting the misses which were never published. Further, you were counting the slightly-altered hits as “conceptual replications but not the slightly-altered misses as “conceptual disconfirmations”. You still haven’t managed to explain, theoretically, why we should expect to see the Red Shirt Effect anyway, either. Then again, why would any of that matter to you? Part of your reputation is at stake.

And these colors don’t run! (p < 0.05)

In somewhat-related news, there have been some salty comments from Social psychologist Ap Dijksterhuis aimed at a recent study (and coverage of the study, and the journal it was published in) concerning nine failures to replicate some work Ap did on intelligence priming, as well as work done by others on intelligence priming (Shanks et al, 2013). The initial idea of intelligence priming, apparently, was that priming subjects with professor-related cues made them better at answering multiple-choice, general-knowledge questions, whereas priming subjects with soccer-hooligan related cues made them perform worse (and no; I’m not kidding. It really was that odd). Intelligence itself is a rather fuzzy concept, and it seems that priming people to think about professors – people typically considered higher in some domains of that fuzzy concept – is a poor way to make them better at multiple choice questions. As far as I can tell, there was no theory surrounding why primes should work that way or, more precisely, why people should lack access to such knowledge in absence of some vague, unrelated prime. At the very least, none was discussed.

It wasn’t just that the failures to replicate reported by Shanks et al (2013) were non-significant but in the right direction, mind you; they often seemed to go in the wrong direction. Shanks et al (2013) even looked for demand characteristics explicitly, but couldn’t find them either. Nine consecutive failures are surprising in light of the fact that the intelligence priming effects were previously reported as being rather large. It seem rather peculiar that large effects can disappear so quickly; they should have had very good chance of replicating, were they real. Shanks et al (2013) rightly suggest that many of the confirmatory studies of intelligence priming, then, might represent publication bias, researcher degrees of freedom in analyzing data, or both. Thankfully, the salty comments of Ap reminded readers that: “the finding that one can prime intelligence has been obtained in 25 studies in 10 different labs”. Sure; and when a batter in the MLB only counts the times he hit the ball while at bat, his batting average would be a staggering 1.000. Counting only the hits and not the misses will sure make it seem like hits are common, no matter how rare they are. Perhaps Ap should have thought about professors more before writing his comments (though I’m told thinking about primes ruins them as well, so maybe he’s out of luck).

I would like to add there were similarly salty comments leveled by another Social Psychologist, John Bargh, when his work on priming old stereotypes on walking speed failed to replicate (though John has since deleted his posts). The two cases bear some striking similarties: claims of other “conceptual replications”, but no claims of “conceptual failures to replicate”; personal attacks on the credibility of the journal publishing the results; personal attacks on the researchers who failed to replicate the finding; even personal attacks on the people reporting about the failures to replicated. More interestingly, John also suggested that the priming effect was apparently so fragile that even minor deviations from the initial experiment could throw the entire thing into disarray. Now it seems to me that if your “effect” is so fleeting that even minor tweaks to the research protocol can cancel it out completely, then you’re really not dealing with much in the way of importance concerning the effect, even were it real. That’s precisely the kind of shooting-yourself-in-the-foot a “smarter” person might have considered leaving out of their otherwise persuasive tantrum.

“I handled the failure to replicate well (p < 0.05)”

I would also add, for the sake of completeness, that priming effects of stereotype threat haven’t replicated out well either. Oh, and the effects of depressive realism don’t show much promise. This brings me to my final point on the matter: given the risks posed by research degrees of freedom and publication bias, it would be wise to enact better safeguards against this kind of problem. Replications, however, only go so far. Replications require researchers willing to do them (and they can be low-reward, discouraged activities) and journals willing to publish them with sufficient frequency (which many do not, currently). Accordingly, I feel replications can only take us so far in fixing the problem. A simple – though only partial – remedy for the issue is, I feel, to require the inclusion of actual theory in psychological research; evolutionary theory in particular. While it does not stop false positives from being published, it at least allows other researchers and reviewers to more thoroughly assess the claims being made in papers. This allows poor assumptions to be better weeded out and better research projects crafted to address them directly. Further, updating old theory and providing new material is a personally-valuable enterprise. Without theory, all you have is a grab bag of findings, some positive, some negative, and no idea what to do with them or how they are to be understood. Without theory, things like intelligence priming – or Red Shirt Effects – sound valid.

References: Shanks, D., Newell, B., Lee, E., Balakrishnan, D., Ekelund, L., Cenac, Z., Kavvadia, F., & Moore, C. (2013). Priming Intelligent Behavior: An Elusive Phenomenon PLoS ONE, 8 (4) DOI: 10.1371/journal.pone.0056515

Jeff Sherman on May 3, 2013 at 4:14 pm said:

There are a number of active and competing theories designed to explain these and other priming effects. Below is a small sampling. Of course, there are many models attempting to explain more mundane priming effects (e.g., bread-butter), including a variety of formalized models that I won’t bother to cite here.

An interesting theoretical question for you to consider is what kinds of behaviors you would and would not expect to be prime-able, and why. Priming the behavior of identifying and responding to “butter” appears mundane to most observers. What about priming a trait that influences impressions of other people? Plenty of data on that. What about primes that influence more molar behavior (e.g., how fast you walk, how well you perform a knowledge test). Is there a theoretical basis for expecting some kinds of behaviors to be prime-able but not others? Where is the demarcation point?

Finally, I find your treatment of failed replications unfortunate. We do not know if failed replications are more “real” than the original data. The kinds of effects we study are generally not Real or False. Rather, hopefully, we will eventually arrive at some consensus as to the robustness and effect size of different effects. Stereotype threat is one of the most highly and independently replicated phenomena in all of psychology. I find it odd that you should consider one failed replication to outweigh all of those successful replications.

Bargh, J.A. (2006). Agenda 2006: What have we been priming all these years? On the development, mechanisms, and ecology of nonconscious social behavior. European Journal of Social Psychology, 36, 147–168.

Cesario, J., Plaks, J.E., & Higgins, E.T. (2006). Automatic social behavior as motivated preparation to interact. Journal of Personality and Social Psychology, 90, 893–910.

Loersch, C., & Payne, B.K. (2011). The situated inference model: An integrative account of the effects of primes on perception, behavior, and motivation. Perspectives on Psychological Science, 6, 234-252.

Higgins, E.T. (1996). Knowledge activation: accessibility, applicability, and salience. In E.T. Higgins & A.W. Kruglanski (Eds.), Social psychology: Handbook of basic principles. New York: Guilford.

Schröder, T., & Thagard, P. (2013). The affective meanings of automatic social behaviors: Three mechanisms that explain priming. Psychological Review, 120, 255-280.

Wheeler, S.C., DeMarree, K.G.,&Petty, R.E. (2007). Understanding the role of the self in prime-to-behavior effects: The active self account. Personality and Social Psychology Review, 11, 234–261.

Jesse Marczyk on May 3, 2013 at 4:29 pm said:

It’s less a question of what behaviors could be primed and more to do with lack of discussion about tradeoffs. If some otherwise important skill, like, say, intelligence (however you conceptualize that), can be improved on the basis of thinking about a professor, it raises the question of why that prime is necessary in the first place. Without that consideration, priming becomes a “getting something for free” type of explanation. An association between bread and butter is a different kind of explanation than suddenly getting smarter on the basis of hearing a word (or some similar stimulus). For example, why would the phrase “you’re about to take a general knowledge test” not activate the relevant cognitive mechanisms that thinking about a professor would? These answers are missing from what I’ve seen.

And no, we don’t know whether the failed replication attempts are anymore real than the published one. They do, however, pose a discrepancy that needs resolution. I consider the stereotype threat failure to replicate important because of the tremendous sample size used. If the effect is there to be found, it should have showed up in the near 1,000 subjects. If it didn’t, there’s some explaining to do as to why.
- Jeff Sherman on May 3, 2013 at 5:22 pm said:
  
  The important point raised in some of the theoretical pieces I cited is that you may not need distinct mechanisms to account for the varied influences of primed, free-floating knowledge on a wide array of behaviors.
  
  In my opinion, the onus is on those who would argue that some kinds of priming effects are to be expected and others not.
  - Jesse Marczyk on May 3, 2013 at 5:50 pm said:
    
    No; you don’t need a distinct mechanism, but you do need to explain how multiple mechanisms interact to produce the effect. There needs to be explanation as to why some mechanisms should be expected to be sensitive to certain inputs and why those mechanisms should be expected to generate the output you’re trying to explain. For instance, there need not be a distinct mechanism for generating a placebo effect, but you need to explain why and how that outcome is plausible with reference to different functional pieces. This is a good example: http://www.epjournal.net/blog/2012/09/the-health-governor/
    - Jeff Sherman on May 3, 2013 at 5:57 pm said:
      
      Well, have you read those papers I cited (not to mention the original Dijksterhuis paper)? That is precisely what some of them attempt to do. Your charge that the research is atheoretical seems to primarily reflect your lack of familiarity with the research.
    - Jesse Marczyk on May 3, 2013 at 6:37 pm said:
      
      I don’t have to the time to read through all the sources at the moment, as I’m currently in the process of moving. From the research I’ve seen, they have not made any use of any functional theory to guide the research. One quote that stood out to me in particular was the following:
      
      “our empirical knowledge has outstripped our ability to understand and conceptualize just what is going on here—what exactly is being primed, and how are these impressive effects produced?“
      
      That doesn’t strike me as theory-driven. That strikes me as description masquerading as theory. If you wouldn’t mind summarizing the relevant theory you think I’m missing, I’d be happy to consider it.

Jeff Sherman on May 3, 2013 at 7:04 pm said:

At the heart of it, all theories of all kinds of priming are functional in nature. There are learning functions, adaptational functions to approach good and avoid bad things, social functions, and more. If you want more detail than that, you’ll have to read the work yourself. Good luck with the move.

Pingback: Welcome To Introduction To Psychology | Pop Psychology

Pop Psychology

The Internet's Best Evolutionary Psycholo-guy

I Find Your Lack Of Theory (And Replications) Disturbing

8 comments on “I Find Your Lack Of Theory (And Replications) Disturbing”