Smart People Are Good At Being Dumb In Politics

While I do my best to keep politics out of my life – usually by selectively blocking people who engage in too much proselytizing via link spamming on social media – I will never truly be rid of it. I do my best to cull my exposure to politics, not because I am lazy and looking to stay uninformed about the issues, but rather because I don’t particularly trust most of the sources of information I receive to leave me better informed than when I began. Putting this idea in a simple phrase, people are biased. In these socially-contentious domains, we tend to look for evidence that supports our favored conclusions first, and only stop to evaluate it later, if we do at all. If I can’t trust the conclusions of such pieces to be accurate, I would rather not waste my time with them at all, as I’m not looking to impress a particular partisan group with my agreeable beliefs. Naturally, since I find myself disinterested in politics – perhaps even going so far as to say I’m biased against such matters – this should mean I am more likely to approve of research that concludes people engaged with political issues aren’t quite good at reaching empirically-correct conclusions. Speaking of which… 

“Holy coincidences, Batman; let’s hit them with some knowledge!”

A recent paper by Kahan et al (2013) examined how people’s political beliefs affected their ability to reach empirically-sound conclusions in the face of relevant evidence. Specifically, the authors were testing two competing theories for explaining why people tended to get certain issues wrong. The first of these is referred to as the Science Comprehension Thesis (SCT), which proposes that people tend to get different answers to questions like, “Is global warming affected by human behavior?” or “Are GMOs safe to eat?” simply because they lack sufficient education on such topics or possess poor reasoning skills. Put in more blunt terms, we might (and frequently do) say that people get the answers to such questions wrong because they’re stupid or ignorant. The competing theory the authors propose is called the Identity-Protective Cognition Thesis (ICT) which suggests that these debates are driven more by people’s desire to not be ostracized by their in-group, effectively shutting off their ability to reach accurate conclusions. Again, putting this in more blunt terms, we might (and I did) say that people get the answers to such questions wrong because they’re biased. They have a conclusion they want to support first, and evidence is only useful inasmuch as it helps them do that.

Before getting to the matter of politics, though, let’s first consider skin cream. Sometimes people develop unpleasant rashes on their skin and, when that happens, people will create a variety of creams and lotions designed to help heal the rash and remove its associated discomfort. However, we want to know if these treatments actually work; after all, some rashes will go away on their own, and some rashes might even get worse following the treatment. So we do what any good scientist does: we conduct an experiment. Some people will use the cream while others will not, and we track who gets better and who gets worse. Imagine, then, that you are faced with the following results from your research: of the people who did use the skin cream, 223 of them got better, while 75 got worse; of the people who did not use the cream, 107 got better, while 21 got worse. From this, can we conclude that the skin cream works?

A little bit of division tells us that, among those who used the cream, about 3 people got better for each 1 who got worse; among those not using the cream, roughly 5 people got better for each 1 who got worse. Comparing the two ratios, we can conclude that the skin cream is not effective; if anything, it’s having precisely the opposite result. If you haven’t guessed by now, this is precisely the problem that Kahan et al (2013) posed to 1,111 US adults (though they also flipped the numbers between the conditions so that sometimes the treatment was effective). As it turns out, this problem is by no means easy for a lot of people to solve: only about half the sample was able to reach the correct conclusion. As one might expect, though, the participant’s numeracy – their ability to use quantitative skills – did predict their ability to get the right answer: the highly-numerate participants got the answer right about 75% of the time; those in the low-to-moderate end of numeracy ability got it right only about 50% of the time.

“I need it for a rash. That’s my story and I’m sticking to it”

Kahan et al (2013) then switched up the story. Instead of participants reading about a skin cream, they instead read about gun legislation that banned citizens from carrying handguns concealed in public; instead of looking at whether a rash went away, they examined whether crime in the cities that enacted such bans went up or down, relative to those cities that did not. Beyond the change in variables, all the numbers remained exactly the same. Participants were asked whether the gun ban was effective at reducing crime.  Again, people were not particularly good at solving this problem either – as we would expect – but an interesting result emerged: the most numerate subjects were now only solving the problem correctly 57% of the time, as compared with 75% in the skin-cream group. The change of topic seemed to make people’s ability to reason about these numbers quite a bit worse.

Breaking the data down by political affiliations made it clear what was going on. The more numerate subjects were, again, more likely to get the answer to the question correct, but only when it accorded with their political views. The most numerate liberal democrats, for instance, got the answer right when the data showed that concealed carry bans resulted in decreased crime; when crime increased, however, they were not appreciably better at reaching that conclusion relative to the less-numerate democrats. This pattern was reversed in the case of conservative republicans: when the concealed carry bans resulted in increased crime, the more numerate ones got the question right more often; when the ban resulted in decreased crime, performance plummeted.

More interestingly still, the gap in performance was greatest for the more-numerate subjects. The average difference in getting the right answer among the highly-numerate individuals was about 45% between cases in which the conclusion of the experiment did or did not support their view, while it was only 20% in the case of the less-numerate ones. Worth noting is that these differences did not appear when people were thinking about the non-partisan skin-cream issue. In essence, smart people were either not using their numeracy skills regularly  in cases where it meant drawing unpalatable political conclusions, or they were using them and subsequently discarding the “bad” results. This is an empirical validation of my complaints about people ignoring base rates when discussing Islamic terrorism. Highly-intelligent people will often get the answers to these questions wrong because of their partisan biases, not because of their lack of education. They ought to know better – indeed, they do know better – but that knowledge isn’t doing them much good when it comes to being right in cases where that means alienating members of their social group.

That future generations will appreciate your accuracy is only a cold comfort

At the risk of repeating this point, numeracy seemed to increase political polarization, not make it better. These abilities are being used more to metaphorically high-five in-group members than to be accurate. Kahan et al (2013) try to explain this effect in two ways, one of which I think is more plausible than the other. On the implausible front, the authors suggest that using these numeracy abilities is a taxing, high-effort activity that people try to avoid whenever possible. As such, people with this numeracy ability only engage in effortful reasoning when their initial beliefs were threatened by some portion of the data. I find this idea strange because I don’t think that – metabolically – these kinds of tasks are particularly costly or effortful. On the more plausible front, Kahan et al (2013) suggest that these conclusions have a certain kind of rationality behind them: if drawing an unpalatable conclusion would alienate important social relations that one depends on for their own well-being, then an immediate cost/benefit analysis can favor being wrong. If you are wrong about whether GMOs are harmful, the immediate effects on you are likely quite small (unless you’re starving); on the other hand, if your opinion about them puts off your friends, the immediate social effects are quite large.

In other words, I think people sometimes interpret data in incorrect ways to suit their social goals, but I don’t think they avoid interpreting it properly because doing so is difficult.

References: Kahan, D., Peters, E., Dawson, E., & Slovic, P. (2013). Motivated numeracy and enlightened self-government. Yale Law School, Public Law Working Paper No. 307.

Men Are Better At Selling Things On eBay

When it comes to gender politics, never take the title of the piece at face value; or the conclusions for that matter.

In my last post, I mentioned how I find some phrases and topics act as red flags regarding the quality of research one is liable to encounter. Today, the topic is gender equality – specifically some perceived (and, indeed, some rather peculiar) discrimination against women – which is an area not renowned for its clear-thinking or reasonable conclusions. As usual, the news articles circulating this piece of research made some outlandish claim that lacks even remote face validity. In this case, the research in question concludes that people, collectively, try to figure out the gender of the people selling things on eBay so as to pay women substantially less than men for similar goods. Those who found such a conclusion agreeable to their personal biases spread it to others across social media as yet another example of how the world is an evil, unfair place. So here I am again, taking a couple recreational shots at some nonsense story of sexism.

Just two more of these posts and I get a free smoothie

The piece question today is an article from Kricheli-Katz & Regev (2016) that examined data from about 1.1 million eBay auctions. The stated goals of the authors involve examining gender inequality in online product markets, so at least we can be sure they’re going into this without an agenda. Kricheli-Katz & Regev (2016) open their piece by talking about how gender inequality is a big problem, launching their discussion almost immediately with a rehashing of that misleading 20% pay gap statistic that’s been floating around forever. As that claim has been dissected so many times at this point, there’s not much more to say about it other than (a) when controlling for important factors, it drops to single digits and (b) when you see it, it’s time to buckle in for what will surely be an unpleasant ideological experience. Thankfully, the paper does not disappoint in that regard, promptly suggesting that women are discriminated against in online markets like eBay.

So let’s start by considering what the authors did, and what they found. First, Kricheli-Katz & Regev (2016) present us with their analysis of eBay data. They restricted their research to auctions only, where sellers will post an item and any subsequent interaction occurs between bidders alone, rather than between bidders and sellers. On average, they found that the women had about 10 fewer months of experience than men, though the accounts of both sexes had existed for over nine years of age, and women also had very-slightly better reputations, as measured by customer feedback. Women also tended to set slightly higher initial prices than men for their auctions, controlling for the product being sold. As such, women also tended to receive slightly fewer bids on their items, and ultimately less money per sale when they ended.

However, when the interaction between sex and product type (new or used) was examined, the headline-grabbing result appeared: while women netted a mere 3% less on average for used products than men, they netted a more-impressive 20% less for new products (where, naturally, one expects products to be the same). Kricheli-Katz & Regev (2016) claim that the discrepancy in the new-product case are due to beliefs about gender. Whatever these unspecified beliefs are, they cause people to pay women about 20% less for the same item. Taking that idea on face value for a moment, why does that gap all but evaporate in the used category of sales? The authors attribute that lack of a real difference to an increased trust people have in women’s descriptions of the condition of their products. So men trust women more when it comes to used goods, but pay them less for new ones when trust is less relevant. Both these conclusions, as far as I can see from the paper, have been pulled directly out of thin air. There is literally no evidence presented to support them: no data; not citations; no anything.

I might have found the source of their interpretations

By this point, anyone familiar with how eBay works is likely a bit confused. After all, the sex of the seller is at no point readily apparent in almost any listings. Without that crucial piece of information, people would have a very difficult time discriminating on the basis of it. Never fear, though; Kricheli-Katz & Regev (2016) report the results of a second study where they pulled 100 random sellers from their sample and asked about 400 participants to try and determine the sex of sellers in question. Each participant offered their guesses about five profiles, for a total of 2000 attempts. About 55% of the time, participants got the sex right, 9% of the time they got it wrong, and the remaining 36% of the time, they said they didn’t know (which, since they don’t know, also means they got it wrong). In short, people couldn’t determine the sex reliably about half the time. The authors do mention that the guesses got better as participants viewed more items that the seller had posted, however.

So here’s the story they’re trying to sell: When people log onto eBay, they seek out a product they’re looking to buy. When they find a seller listing the product, they examine the seller’s username, the listing in question, and their other listings in their store to attempt and discern the sex of the seller. Buyers subsequently lower their willingness to pay for an item by quite a bit if they see it is being sold by a woman, but only if it’s new. In fact, since women made 20% less, the actual reduction in willingness to pay must be larger than that, as sex can only be determined about half of the time reliably when people are trying. Buyers do all this despite even trusting female sellers more. Also, I do want to emphasis the word they, as this would need to be a pretty collective action. If it wasn’t a fairly universal response among buyers, the prices of female-sold items would eventually even out with the male price, as those who discriminated less against women would be drawn towards the cheaper prices and bump them back up.

Not only do I not buy this story – not even a little – but I wouldn’t pay the authors less for it because they happen to be women if I was looking to make a purchase. While people might be able to determine the sex of the seller on eBay sometimes, when they’re specifically asked to do so, that does not mean people engage in this sort of behavior naturally.

Finally, Kricheli-Katz & Regev (2016) report the results of a third study, asking 100 participants how much they value a $100 gift card being sold by either an Alison or a Brad. Sure enough, people were willing to pay Alison less for the card: she got a mere $83 to Brad’s $87; a 5% difference. I’d say someone should call the presses, but it looks like they already did, judging from the coverage this piece has received. Now this looks like discrimination – because it is – but I don’t think it’s based on sex per se. I say that because, earlier in the paper, Kricheli-Katz & Regev (2016) also report that women as buyers on eBay, tended to pay about 3% more than men for comparable goods. To the extent that the $4 difference in valuation is meaningful here, there are two things to say about it. First, it may well represent the fact that women aren’t as willing to negotiate prices in their favor. Indeed, while women were 23% of the sellers on eBay, they only represented 16% of the auctions with a negotiation component. If that’s the case, people are likely willing to pay less to women because they perceive (correctly) some population differences in their ability to get a good deal. I suspect if you gave them individuating information about the seller’s abilities, sex would stop mattering even 5%. Second, that slight, 5% difference would by no means account for the 20% gap the authors report finding with respect to new product sales; not even close.

But maybe your next big idea will work out better…

Instead, my guess is that in spite of the authors’ use of the word “equally qualified” when referring to the men and women in their seller sample, there were some important differences in listings the buyers noticed; the type of differences that you can’t account for when you’re looking at over a million of them and rough control measures aren’t effective. Kricheli-Katz & Regev (2016) never seemed to consider – and I mean really consider – the possibility that something about these listings, something they didn’t control for, might have been driving sale price differences. While they do control for factors like the seller’s reputation, experience, number of pictures, year of the sale, and some of the sentiments expressed by words in the listing (how positive or negative it is), there’s more to making a good listing than that. A more likely story is that differences in sale prices reflect different behaviors on the part of male and female sellers (as we already know others differences exist in the sample), as the alternative story attempting to be championed would require a level of obsession with gender-based discrimination in the population so wide and deep that we wouldn’t need to research it; it would be plainly obvious to everyone already.

Then again, perhaps it’s time I make my way over to eBay to pick up a new tinfoil hat.

References: Kricheli-Katz, T. & Regev, T. (2016). How many cents on the dollar? Women and men in product markets. Science Advances, 2, DOI: 10.1126/sciadv.1500599

Thoughtful Suggestions For Communicating Sex Differences

Having spent quite a bit of time around the psychological literature – both academic and lay pieces alike – there are some words or phrases I can no longer read without an immediate, knee-jerk sense of skepticism arising in me, as if they taint everything that follows and precedes them. Included in this list are terms like bias, stereotype, discrimination, and, for the present purposes, fallacy. The reason these words elicit such skepticism on my end is due to the repeated failure of people using them to  consistently produce high-quality work or convincing lines of reasoning. This is almost surely due to the perceived social stakes when such terms are being used: if you can make members of a particular group appear uniquely talented, victimized, or otherwise valuable, you can subsequently direct social support towards and away from various ends. When the goal of argumentation becomes persuasion, truth is not a necessary component and can be pushed aside. Importantly, the people engaged in such persuasive endeavors do not usually recognize they are treating information or arguments differently, contingent on how it suits their ends.

“Of course I’m being fair about this”

There are few areas of research that seem to engender as much conflict – philosophically and socially – as sex differences, and it is here those words appear regularly. As there are social reasons people might wish to emphasize or downplay sex differences, it has steadily become impossible for me to approach most of the writing I see on the topic with the assumption it is at least sort of unbiased. That’s not to say every paper is hopelessly mired in a particular worldview, rejecting all contrary data, mind you; just that I don’t expect them to reflect earnest examinations of the capital-T, truth. Speaking of which, a new paper by Maney (2016) recently crossed my desk; a the paper that concerns itself with how sex differences get reported and how they ought to be discussed. Maney (2016) appears to take a dim view of the research on sex differences in general and attempts to highlight some perceived fallacies of people’s understandings of them. Unfortunately, for someone trying and educate people about issues surrounding the sex difference literature, the paper does not come off as one written by someone possessing a uniquely deep knowledge of the topic.

The first fallacy Maney (2016) seeks to highlight is the idea that sexes form discrete groups. Her logic for explaining why this is not the case revolves around the idea that while the sexes do indeed differ to some degree on a number of traits, they also often overlap a great deal on them. Instead, Maney (2016) argues that we ought to not be asking whether the sexes differ on a given trait, but rather by how much they do. Indeed, she even puts the word ‘differences’ in quotes, suggesting that these ‘differences’ between sexes aren’t, in many cases, real. I like this brief section, as it highlights well why I have grown to distrust words like fallacy. Taking her points in reverse order, if one is interested in how much groups (in this case, sexes) differ, then one must have, at least implicitly, already answered the question as whether or not they do. After all, if the sexes did not differ, it would pointless to talk about the extent of those non-differences; there simply wouldn’t be variation. Second, I know of zero researchers whose primarily interest resides in answering the question of whether the sexes differ to the exclusion of the extent of those differences. As far as I’m aware, Maney (2016) seems to be condemning a strange class of imaginary researchers who are content to find that a difference exists and then never look into it further or provide more details. Finally, I see little value in noting that the sexes often overlap a great deal when it comes to explaining the areas in which they do not. In much the same way, if you were interested in understanding the differences between humans and chimpanzees, you are unlikely to get very far by noting that we share a great deal of genes in common. Simply put, you can’t explain differences with similarities. If one’s goal is to minimize the perception of differences, though, this would be a helpful move.  

The second fallacy that Maney (2016) seeks to tackle is that idea that the cause of a sex differences in behavior can be attributed to differing brain structures. Her argument on this front is that it is logically invalid to do the following: (1) note that some brain structure between men and women differ, (2) note that this brain structure is related to a given behavior on which they also differ, and so (3) conclude that a sex difference in brain structure between men and women is responsible for that different behavior. Now while this argument is true within the rules of formal logic, it is clear that differences in brain structure will result in differences in behavior; the only way that idea could be false would be if brain structure was not connected to behavior, and I don’t know of anyone crazy enough to try and make that argument. The researchers engaging in the fallacy thus might not get the specifics right all the time, but their underlying approach is fine: if a difference exists in behavior (between sexes, species, or individuals), there will exist some corresponding structural differences in the brain. The tools we have for studying the matter are a far cry from perfect, making inquiry difficult, but that’s a different issue. Relatedly, then, noting that some formal bit of logic is invalid is assuredly not the same thing as demonstrating that a conclusion is incorrect or the general approach misguided. (Also worth noting is that the above validity issue stops being a problem when conclusions are probabilistic, rather than definitive.)

“Sorry, but it’s not logical to conclude his muscles might determine his strength”

The third fallacy Maney (2016) addresses is the idea that sex differences in the brain must be preprogrammed or fixed, attempting to dispel the notion that sex differences are rooted in biology and thus impervious to experience. In short, she is arguing against the idea of hard genetic determinism. Oddly enough, I have never met a single genetic determinist in person; in fact, I’ve never even read an article that advanced such an argument (though maybe I’ve just been unusually lucky…). As every writer on the subject I have come across has emphasized – often in great detail – the interactive nature of genes and environments in determining the direction of development, it again seems like Maney (2016) is attacking philosophical enemies that are more imagined than real. She could have, for instance, quoted researchers who made claims along the lines of, “trait X is biologically-determined and impervious to environmental inputs during development”; instead, it looks like everyone she cites for this fallacy is making a similar criticism of others, rather than anyone making the claims being criticized (though I did not check those references myself, so I’m not 100% there). Curiously, Maney (2016) doesn’t seem to be at all concerned about the people who, more-or-less, disregard the role of genetics or biology in understanding human behavior; at the very least she doesn’t devote any portion of her paper to addressing that particular fallacy. That rather glaring omission – coupled with what she does present – could leave one with the impression that she isn’t really trying to present a balanced view of the issue.

With those ostensibly fallacies out of the way, there are a few other claims worth mentioning in the paper. The first is that Maney (2016) seems to have a hard time reconciling the idea of sexual dimorphisms – traits that occur in one form typical of males and one typical of females – with the idea that the sexes overlap to varying degrees on many of them, such as height. While it’s true enough that you can’t tell someone’s sex for certain if you only know their height, that doesn’t mean you can’t make some good guesses that are liable to be right a lot more often than they’re wrong. Indeed, the only dimorphisms she mentions are the presence of sex chromosomes, external genitalia, and gonads and then continues to write as if these were of little to no consequence. Much like height, however, there couldn’t be selection for any physical sex differences if the sexes did not behave differently. Since behavior is controlled by the brain, physical differences between the sexes, like height and genitalia, are usually also indicative of some structural differences in the brain. This is the case whether the dimorphism is one of degree (like height) or kind (like chromosomes).

Returning to the main point, outside of these all-or-none traits, it is unclear what Maney (2016) would consider a genuine difference, much less any clear justification for that standard. For example, she notes some research that found a 90% overlap in interhemispheric connectivity between the male and female distributions, but then seems to imply that the corresponding 10% non-overlap does not reflect a ‘real’ sex difference. We would surely notice a 10% difference in other traits, like height, IQ, or number of fingers but, I suppose in the realm of the brain, 10% just doesn’t cut it.

Maney (2016) also seems to take an odd stance when it comes to explanations for these differences. In one instance, she writes about a study on multitasking that found a sex difference favoring men; a difference which, we are told, was explained by a ‘much larger difference in video game experience,’ rather than sex per se. Great, but what are we to make of that ‘much larger’ sex difference in video game experience? It would seem that that finding too requires an explanation, and one is not present. Perhaps video game experience is explained more by, I don’t know, competitiveness than sex, but then what are we to explain competitiveness with? These kinds of explanations usually end up going nowhere in a hurry unless they eventually land on some kind of adaptive endpoint, as once a trait’s reproductive value is explained, you don’t need to go any further. Unfortunately, Maney (2016) seems to oppose evolutionary explanations for sex differences, scolding those who propose ‘questionable’ functional or evolutionary explanations for sex differences for being genetic determinists who see no role for sociocultural influences. In her rush to condemn those genetic determinists (who, again, I have never met or read, apparently), Maney’s (2016) piece appears to fall victim to the warning laid out by Tinbergen (1963) several decades ago: rather than seeking to improve the shape and direction of evolutionary, functional analyses, Maney (2016) instead recommends that people simply avoid them altogether.

“Don’t ask people to think about these things; you’ll only hurt their unisex brains”

This is a real shame, as evolutionary theory is the only tool available for providing a deeper understanding of these sex differences (as well as our physical and psychological form more generally). Just as species will differ in morphology and behavior to the extent they have faced different adaptive problems, so too will the sexes within a species. By understanding the different challenges faced by the sexes historically, one can get a much clearer sense as to where psychological and physical difference will – and will not – be expected to exist, as well as why (this extra level of ‘why’ is important, as it allows you to better figure out where an analysis has gone wrong if the predictions don’t work). Maney (2016), it would seem, even missed a golden opportunity within her paper to explain to her readers that evolutionary explanations complement, rather than supplant, more proximate explanations when quoting an abstract that seemed to contrast the two. I suspect this opportunity was missed because she is either legitimately unaware of that point, or does not understand it (judging from the tone of her paper), believing (incorrectly) instead that evolutionary means genetic, and therefore immutable. If that is the case, it would be rather ironic for someone who does not seem to have much understanding of the evolutionary literature lecturing others on how it ought to be reported.

References: Maney, D. (2016). Perils and pitfalls of reporting sex differences. Philosophical Transactions B, 371, 1-11.

Tinbergen, N. (1964). On aims and methods of ethology. Zeitschrift für Tierpsychologie, 20, 410-433.

 

Is Choice Overload A Real Thing?

Within the world of psychology research, time is often not kind to empirical findings. This unkindness was highlighted recently in the results of the reproducibility project, which found that the majority of psychological findings tested did not appear to replicate particularly well. There are a number of reasons this happens, including that psychological research tends to be conducted rather atheoretically (allowing large numbers of politically-motivated or implausible hypotheses to be successfully floated), and that researchers have the freedom to analyze their data in rather creative ways (allowing them to find evidence of effects where none actually exist). These practices are engaged in because positive findings tend to be published more often than null results. In fact, even if the researchers do everything right, that’s still not a guarantee of repeatable results; sometimes people just get lucky with their data. Accordingly, it is a fairly common occurrence for me to revisit some research I learned about during my early psychology education only to find out that things are not quite as straightforward or sensible as they had been presented to be. I’m happy to report that today is (sort of) one of those days. The topic in question has been called a few different things, but for my present purposes I will be referring to it as choice overload: the idea that having access to too many choices actually results in making decisions more difficult and less satisfying. In fact, if too many options are presented, people might even avoid making a decision altogether. What a fascinating idea.

Here’s to hoping time is kind to it…

The first time I had heard of this phenomenon, it was in the context of exotic jams. The summary of the research goes as follows: Iyengar & Lepper (2000) set up shop in a grocery store, creating a tasting booth for either six or 24 varieties of jams (from which the more-standard flavors, like strawberry, were removed). Shoppers were invited to stop by the booths, try as many of the jams as they wanted, given a $1 off coupon for that brand’s jam, and then left. The table with the more extensive variety did attract more customers (60% of those who walked by), relative to the table with fewer selections (40%), suggesting that the availability of more options was, at least initially, appealing to people. Curiously, however, there was no difference between the average number of jams sampled: whether the table had 6 flavors or 24, people only sampled about 1.5 of them, on average, and apparently, no one ever sampled more than two flavors (maybe they didn’t want to see rude or selfish). More interestingly still, because the customers were given coupons, their purchases could be tracked. Of those who stopped at the table with only six flavors, about 30% ended up later purchasing jam; when the table had 24 flavors, a mere 3% of customers ended up buying one.

There are a couple of potential issues with this study, of course, owing to its naturalistic design; issues which were noted by the authors. For instance, it is possible that people who were fairly uninterested in buying jam might have been attracted to the 24-flavor booth nevertheless, simply out of curiosity, whereas those with a greater interest in buying jams would have remained interested in sampling them even when a smaller number of options existed. To try and get around these issues, Iyengar & Lepper (2000) designed another two experiments, one of which I wanted to cover. This other experiment was carried out in a more standard lab setting (to help avoid some of the possible issues with the jam results) and involved tasting chocolate. There were three groups of participants in this case: the first group (n = 33) got to select and sample a chocolate from an array of six possible options, the second group (n = 34) got to select and sample a chocolate from an array of 30 possible options, and a final group (n = 67) were randomly assigned to test a chocolate they had not selected. In the interests of minimizing people’s familiar preferences for such things, only those who enjoyed chocolate, but did not have experience with that particular brand were selected for the study. After filling out a few survey items and completing the sampling task, the participants were presented with their payment option: either $5 in cash, or a box of chocolates from that brand worth $5. 

In accordance with the previous findings, participants who selected from 30 different options were somewhat more likely to say they had been presented with “too many” options (M = 4.88) compared with those who old had 6 possible choices (M = 3.61, on a seven-point scale, ranging from “too few” choices at 1, to “too many” choices at 7). Despite the subjects in the extensive-choice group saying that making a decision as to which chocolate to sample was more difficult, however, there was no correlation between how difficult participants found the decision and how much they reported enjoying making it. It seemed people could enjoy making more difficult choices. Additionally, participants in the limited-choice group were more satisfied with their choice (M = 6.28) than those in the extensive-choice group (M = 5.46), who were in turn more satisfied than those in the no-choice group (M = 4.92). Of particular interest are the compensation findings: those in the limited-choice group were more likely to accept a box of chocolate in lieu of cash (48%) than those in either the extensive-choice (12%) or no-choice conditions (10%). It seems that having some options was preferable to having no options, but having too many options seemed to cause people difficulty in making decisions. The research concluded that, to use the term, people could be overloaded by choices, hindering their decision making process.

“If it can’t be settled via coin flip, I’m not interested”

While such findings are indeed quite interesting, there is no guarantee they will hold up over time; as I mentioned initially, lots of research fails to do likewise. This is where meta-analyses can help. This is the kind of research where the results from many different studies can be examined jointly. Scheibehenne et al (2010) set out to conduct one of their own on the research surrounding choice overload, noting that some of the research on the phenomenon does not point in the same direction. They note a few examples, such as field research in which reducing the number of available items resulted in decreases or no changes to sales, rather than what should have been a predicted uptick in them. Indeed, the lead author also reports that their own attempt at replicating the jam study for their dissertation in 2008 failed, as well as the second author’s attempt to replicate the chocolate experiment. These failures to replicate the original research might indicate that the initial results of choice overload were something of a fluke, and so a wider swath of research needs to be examined to determine if that’s the case.

Towards this end, Scheibehenne et al (2010) collected 50 experiments from the literature on the subject, representing about 5,000 participants in 13 published and 16 unpublished papers from 2000-2009. In total, the average estimated effect size for the choice overload effect across all the experiments was a mere D = 0.02; the effect was all but non-existent. Further analysis revealed that the difference in effect sizes between studies did not seem to be randomly distributed; there were likely relevant differences between these papers determining what kind of results they found. To examine this issue further, Scheibehenne et al (2010) began by trimming off the 6 largest effects from both the top and the bottom ends of the reported research. The results showed that, in the trimmed data set, there was little evidence of difference between the remaining research. This suggests that most of the differences between these studies was being driven by unusually large positive and negative effects.

Returning to the complete, untrimmed data set, Scheibehenne et al (2010) started to pick apart how several moderating variables might be affecting the reported results. In line with the intuitions of Iyengar & Lepper (2000), preexisting preferences or expertise did indeed have an effect on the choice overload issue: people with existing preferences were not as troubled by additional items when making a choice, relative to those without such preferences. However, there was also an effect of publication – such that published papers were somewhat more likely to report an effect of choice overload, relative to unpublished ones – as well as a small effect of year – such that papers published more recently were a bit less likely to report choice overloading effects. In sum, the results of the meta-analysis indicated that the average effect size of choice overload was nearly zero, that older studies which saw publication report larger effects than those that came later or were not published, and that well-defined, preexisting preferences likely remove the negative effects of having too many options (to the extent they actually existed in the first place). Crucially, what should have been an important variable – the number of different options participants were presented with on the high end – explained essentially none of the variance. That is to say that 18 times didn’t seem to make any difference, compared to 30 items or more

“Well, there are too many different chip options; guess I’ll just starve”

While this does not rule out choice overload as being a real thing, it does cast doubt on the phenomenon being as pervasive or important as some might have given it credit for. Instead, it appears probable that such choice effects might be limited to particular contexts, assuming they reliably exist in the first place. Such contexts might include how easily the products can be compared to one another (i.e., it’s harder to decide when faced with two equally attractive, but quite distinct options), or whether people are able to use mental shortcuts (known as heuristics) to rapidly whittle down the number of options they actually consider (so as to avoid spending too much time making fairly unimportant choices). While future examination would be required to test some of these ideas, the larger message here extends beyond the choice overload literature to most of psychology research: it is probably fair to assume that, as things currently stand, the first thing you hear about the existence or importance of an effect will likely not resemble the last thing you do.

References: Iyengar, S. & Lepper, M. (2000). When choice is demotivating: Can one desire too much of a good thing? Journal of Personality & Social Psychology, 79, 995-1006.

Scheibehenne, B., Greifeneder, R., & Todd, P. (2010). Can there ever be too many options? A meta-analytic review of choice overload. Journal of Consumer Research, 37, 409-424.

 

Savvy Shoppers Seeking Sex

There exists an idea in the economic field known as revealed preferences theory. People are often said to have preferences for this or that, but preferences are not the kind of thing that can be directly observed (just as much of our psychology cannot). As such, you need to find a way to infer information about these underlying preferences through something observable. In the case of revealed preferences, the general idea is that people’s decisions about what to buy and how much to spend are capable of revealing that information. For instance, if you would rather buy a Honda instead of a Ford for the same price, I have learned that your preferences – at least in the current moment – favor Hondas; if I were interested in determining the degree of that preference, I could see how much more you were willing to pay for the Honda. There are some criticisms of this approach – such as the the issue that people sometimes prefer A to B when compared to each other directly, but prefer B to A when presented with a third, irrelevant option – but the general principle behind it seems sound: people’s willingness to purchase goods and services positively correlates with their desires, despite some peculiarities. The more someone is willing to pay for something, the more valuable they perceive it to be.

“Marrying you is worth about $1,500 to me”

Now this is by no means groundbreaking information; it’s a facet of our psychology we are all already intimately familiar with. It does, however, yield an interesting method for examining people’s mating preferences when it’s turned on prostitution. In this case, a new paper by Sohn (2016) sought to examine how well men’s self-reported mating preferences for youthful partners were reflected in the prostitution market, where encounters are often short in duration, fairly anonymous, and people can seek out what they’re interested in, so long as they can afford it. It is worth mentioning at the outset that seeking youth per se is not exactly valuable in the adaptive sense of the word; instead, youth is valued (at least in humans) because of how it relates to both reproductive potential and fertility. Reproductive potential refers to how many expected years of future reproduction a woman has remaining before she reaches menopause and loses that capability. As such, this value is highest around the time she reaches menarche (signaling the onset of her reproductive ability) in her mid-teens and decreases over time until it reaches zero at menopause. Fertility, by contrast, refers to a woman’s likelihood of successful conception following intercourse, and tends to peak around her early twenties, being lower both prior to and after that point.

Since the type of intercourse sought by men visiting prostitutes is usually short-term in nature, we ought to expect the male preference for traits that cue high fertility to be revealed by the relative price they’re willing to pay for sex with women displaying them (since short-term encounters are typically aimed at immediate successful reproduction, rather than monopolizing a woman’s reproductive potential in the future). As such fertility cues tend to peak at the same ages as fertility itself, we would predict that women in their early twenties should command the highest price on the sexual market price, and this value should decline as women get older or younger. There are some issues with studying the subject matter, of course: sex with minors – much like prostitution in general – is often subject to social and legal sanctions. While the former issue cannot (and, really, should not) be skirted, the latter issue can be. One way of getting around the legal sanctions of prostitution in general is to study it in areas in the world where it is legal. In this instance, Sohn (2016) reports on a data set derived from approximately 8,600 prostitutes in Indonesia, ranging from ages 17-40, where, we are told, prostitution is quasi-legal.

The variable of interest in this data set concerns how much money the prostitutes received during their last act of commercial sex. This single-act method was employed in the hopes of minimizing any kinds of reporting inaccuracies that might come with trying to estimate how much money is being earned on average over long periods of time. While this choice necessarily limits the scope of the emerging picture concerning the price of sex, I believe it to be a justifiable one. Age was the primary predictor of this sex-related income, but a number of other variables were included in the analysis, such as frequency of condom use, years of schooling, age of first sex, and time selling sex. Overall, these predictor variables were able to account for over half of the variance in the price of sex, which is quite good.

“Priced to move!”

Supporting the hypothesis that men really do value these cues of fertility, the price of sex nominally rose from age 17 until it peaked at 21 (though this rise was not too appreciable), tracking fertility, rather than reproductive potential. Following that peak, the price of sex began to quickly and continuously decline through age 40, though the decline slowed passed 30. Descriptively, the price of sex at its minimum value was only about half the price of sex at peak fertility (which is a helpful tip for all you bargain-seekers out there…). Indeed, when age alone was considered, each additional year reduced the price of sex, on average, by about 4.5%; the size of that decrease uniquely attributable to age was reduced to about 2% per year when other factors were added into the equation, but both numbers tell the same story. A more detailed examination of this decrease grouped women into blocks of 5-year age periods. When considering age alone, there was no statistical difference between women in the 17-19 and 20-25 range. After that period, however, differences emerged: those in the 26-30 range earned 22% less, on average; a figure which fell to 42% less in the 30-34 group, and about 53% in the the 35-40 group.

This decrease in the price of sex over a woman’s lifespan is the opposite of how income usually works in non-sexual careers, where income rises with time and experience. It would be quite strange to work at a job where you saw your pay get cut by 2% each year you were with the company. It is likely for this reason that prostitutes in the 20-25 range were the most common (representing 32.6% of the sample), and those in older age groups were represented less heavily (27.6% in the 26-30 group, all the way down to 12% in the 35-40 range). When shopping for sex, then, men were not necessarily seeking the most experienced candidate for the position(s), but rather the most fertile one. As fertility declined, so too did the price. As price declined, women tended to leave the market. 

There were a few other findings of note, though the ‘whys’ explaining them are less straightforward. First, more educated prostitutes commanded a higher average asking price than their less educated peers, to the tune of about a 5% increase in price per extra year of school. As men and women both value intelligence highly in long-term partners, it is possible that cues of intelligence remain attractive, even in short-term contexts. Second, controlling for age, each year of selling sex tended to decrease the average price by about 1.5%. It is possible that the effects of prostitution visibly wear down the cues that men find appealing over time. Third, prostitutes who had ever used drugs or drank alcohol earned 12% more than their peers who abstained. Though I don’t know precisely why, it’s unlikely a coincidence that moral views about recreational drug use happen to be well predicted by views about the acceptability of casual sex (data from OKCupid, for instance, tells us the single best predictor of a woman’s interest in casual sex is whether she enjoys the taste of beer). Finally, prostitutes who proposed using condoms more often earned about 10% more than those who never did. I agree with Sohn’s (2016) assessment that this probably has to do with more desirable prostitutes being attractive enough to effectively bargain for condom use, whereas less attractive women compromise there in order to bring in clients. While men prefer sex without condoms, they appear willing to put that preference aside in the face of an attractive-enough prospect.  

“Disappointment now sold in bulk”

So what has been revealed about men’s preferences for sex with these data? Unfortunately, interpretation of prices is less straightforward than simply examining the raw numbers: their correspondence to other sources of data and theory should be considered. For instance, at least when seeking short term encounters, men seem to value fertility highly, and are willing to pay a premium to get it. This “real world” data accords well with the self-reports of men in survey and laboratory settings and, as such, seems to be easily interpretable. On other hand, men usually prefer sex without condoms, so the price premium among prostitutes who always suggest they be used would seem to, at face value, ‘reveal’ the wrong preference. Instead, it is more likely that prostitutes who already command a high price are capable of bargaining effectively for their use. In order to test such an explanation, you would need to pit the prospect of sex with the same prostitute with and without a condom against each other, both at the same price. Further, more educated prostitutes seemed to command a higher price on the sexual market: is this because men value intelligence in short-term encounters, educated women are more effective at bargaining, intelligence correlates with other cues of fertility or developmental stability (and thus attractiveness), or because of some other alternative? While one needs to step outside the raw pricing data obtained from these naturalistic observations to answer such questions effectively, the idea of using price data in general seems like a valuable method of analysis; whether it is more accurate, or a “truer” representation of our preferences than our responses to surveys is debatable but, thankfully, this need not be an either/or type of analysis.

References: Sohn, K. (2016). Men’s revealed preferences regarding women’s ages: Evidence from prostitution. Evolution & Human Behavior, DOI: http://dx.doi.org/10.1016/j.evolhumbehav.2016.01.002 

Clues To The Function Of Moralistic Punishment

One of the major questions I’ve spent the better part of the last few years trying to work up an answer to is the matter of why – in the adaptive sense – people punish others as third parties moralistically. That is to ask why people condemn and punish others for engaging in behavior perceived to be “wrong”. This is not the same question as why people take revenge on others for their behavior; that kind of punishment isn’t terribly mysterious. Chimps, for instance, will punish those who cheated them, but they seem fairly uninterested in punishing those who cheated others (Riedl, Jensen, Call, & Tomasello, 2012). Humans are a bit different in that regard: not only do we involve ourselves in the disputes of others, but we do so on behalf of people who aren’t kin or even social allies. This is rather strange behavior, even though we do not usually perceive it to be so. I’m not the only one who has attempted an evolutionary explanation for moralistic punishment, of course; in my hubris, however, I do feel that the explanation I ended up floating tends to fit the existing data better than other, contemporary theories. Today, I wanted to try and demonstrate that fit by considering some relative-recent empirical research.

“I don’t know, man; I think it might be a little baggy. Try a size down”

The paper in question (Kurzban, DeScioli, & O’Brien, 2007) concerns audience effects on punishment. Specifically, it addresses the following question: do people punish others more (as third parties) when they are being watched and their decisions are being made public, relative to when they’re anonymous? The question was examined across two experiments. In the first of these, 43 participants viewed the outcome of a previous trust game in which one player had the choice between two different outcomes: $39 for himself and $1 for the other player, or $20 for both. In this case, the player had selected the former, selfish option. The participants in question were provided with $7 and told they could spend any amount of that money – from $0 to $7 – to deduct from the selfish player. For each dollar they spent on deductions, the selfish player would lose $3. Importantly, the participants were either doing so in conditions under which their spending would be unknown to anyone but themselves (n = 24), or conditions where the experimenter would known how much each particular participant spent (n = 19). In the anonymous condition, 38% of participants spent at least some money on punishment, whereas in the public condition, 47% opted to punish something. Importantly, however, the amount punished varied between these conditions. Those in the public condition tended to spend more on punishment than those in the anonymous condition when they did. For instance, though precise numbers are not reported, approximately 20% of the participants in the public condition paid the full $7 to punish the selfish player, as compared with 0% in the private condition.

The second experiment examined peoples’ responses to a different kind of economic game – a prisoner’s dilemma – where one player attempted to cooperate with their partner, but was met with defection. In terms of the outcome, this means that rather than both players getting $25 (if both had cooperated), the cooperative partner got $0 while the defector got $30. The participants in this experiment were shown this outcome and given $10. As before, the participants could spend any amount of this $10 to deduct from the defecting player at the 3-to-1 rate. In this case, the punishment decisions would either be known by (a) no one else but the participant (n = 31), (b) the experimenter alone (n = 26), or (c) the entire group of participants (n = 30). Again, fewer participants opted to punish at all when punishment was anonymous (42%) relative to when punishment was known to either the experimenter (65%) or the whole group (67%); the amount spent on punishment increased as well (Ms = $1.06, $2.54, and $3.17, respectively). Further, those in the public conditions also tended to display more anger, disgust, and contempt at the cheating, suggesting that they were more upset by the transgression when other people were watching (or they were at least pretending to be).

The existence of audiences seemed to have an important impact on determining moralistic punishment: not only did the presence of other people affect the percent of third parties willing to punish at all, but it also positively influenced how much they did punish. In a sentence, we could say that the presence of observers was being used as an input by the cognitive systems determining moralistic sentiments. While this may sound like a result that could have been derived without needing to run the experiments, the simplicity and predictability of these findings by no means makes them trivial on a theoretical level when it comes to answering the question, “what is the adaptive value of punishment?” Any theory seeking to explain morality in general – and moral punishment in particular – needs to be able to present a plausible explanation for why cues to anonymity (or lack thereof) are being used as inputs by our moral systems. What benefits arise from public punishment that fail to materialize in anonymous cases?

“If you’re good at something, never do it for free…or anonymously”

The first theoretical explanation for morality that these results cut against is the idea that our moral systems evolved to deliver benefits to other per se. One of the common forms of this argument is that our moral systems evolved because they delivered benefits to the wider group (in the form of maintaining beneficial cooperation between members) even if doing so was costly in terms of individual fitness. This argument clearly doesn’t work for explaining the present data, as the potential benefits that could be delivered to others by deterring cheating or selfishness do not (seem to) change contingent on anonymity, yet moral punishment does. 

These results also cut against some aspects of mutualistic theories for morality. This class of theory suggests that, broadly speaking, our moral sense responds primarily to behavior perceived to be costly to the punisher’s personal interests. In short, third parties do not punish perpetrators because they have any interest in the welfare of the victim, but rather because punishers can enforce their own interests through that punishment, however indirectly. To place that idea into a quick example, I might want to see a thief punished not because I care about the people he harmed, but rather because I don’t want to be stolen from and punishing the thief for their behavior reduces that probability for me. Since my interests in deterring certain behaviors do not change contingent on my anonymity, the mutualistic account might feel some degree of threat from the present data. As a rebuttal to that point, the mutualistic theories could make the argument that my punishment being made public would deter others from stealing from me to a greater extent than if they did not know I was the one responsible for punishing. “Because I punished theft in a case where it didn’t effect me,” the rebuttal goes, “this is a good indication I would certainly punish theft which did affect me. Conversely, if I fail to punish transgressions against others, I might not punish them when I’m the victim.” While that argument seems plausible at face value, it’s not bulletproof either. Just because I might fail to go out of my way to punish someone else who was, say, unfaithful in their relationship, that does not necessarily mean I would tolerate infidelity in my own. This rebuttal would require an appreciable correspondence between my willingness to punish those who transgress against others and those who do so against me. As much of the data I’ve seen suggests a weak-to-absent link in both humans and non-humans on that front, that argument might not hold much empirical water.

By contrast, the present evidence is perfectly consistent with the association-management explanation posited in my theory of morality. In brief, this theory suggests that our moral sense helps us navigate the social world, identifying good and bad targets of our limited social investment, and uses punishment to build and break relationships with them. Morality, essentially, is an ingratiation mechanism; it helps us make friends (or, alternatively, not alienate others). Under this perspective, the role of anonymity makes quite a bit of sense: if no one will know how much you punished, or whether you did at all, your ability to use punishment to manage your social associations is effectively compromised. Accordingly, third-party punishment drops off in a big way. On the other hand, when people will know about their punishment, participants become more willing to invest in it in the face of better estimated social return. This social return need not necessarily reside with the actual person being harmed, either (who, in this case, was not present); it can also come from other observers of punishment. The important part is that your value as an associate can be publicly demonstrated to others.

The first step isn’t to generate value; it’s to demonstrate it

The lines between these accounts can seem a bit fuzzy at times: good associates are often ones who share your values, providing some overlap between mutualistic and association accounts. Similarly, punishment, at least from the perspective of the punisher, is altruistic: they are suffering a cost to provide someone else with a benefit. This provides some overlap between the association and altruistic accounts as well. The important point for differentiating these accounts, then, is to look beyond their overlap into domains where they make different predictions in outcomes, or predict the same outcome will obtain, but for different reasons. I feel the results of the present research not only help do that (inconsistent with group selection accounts), but also present opportunities for future research directions as well (such as the search for whether punishment as a third party appreciably predicts revenge).

References: Kurzban, R., DeScioli, P., & O’Brien, E. (2007). Audience effects on moralistic punishment. Evolution & Human Behavior, 28, 75-84.

Riedl, K., Jensen, K., Call, J., & Tomasello, M. (2012). No third-party punishment in chimpanzees. Proceedings of the National Academy of Science, 109, 14824–14829

Exaggerating With Statistics (About Rape)

“As a professional psychology researcher, it’s my job to lie to the participants in my experiments so I can lie to others with statistics using their data”. -On understanding the role of deception in psychology research

In my last post, I discussed the topic of fear: specifically, how social and political agendas can distort the way people reason about statistics. The probable function of such distortions is to convince other people to accept a conclusion which is not exactly well supported by the available evidence. While such behavior is not exactly lying – inasmuch as the people making these claims don’t necessarily know they’re engaged in such cognitive distortions – it is certainly on the spectrum of dishonesty, as they would (and do) reject such reasoning otherwise. In the academic world, related kinds of statistical manipulations go by a few names, the one I like the most being “researcher degrees of freedom“. The spirit of this idea refers to the problem of researchers selectively interpreting their data in a variety of ways until they find a result they want to publish, and then omit mentioning all the ways that their data did not work out, or might be interpreted. On that note, here’s a scary statistic: 1-in-3 college men would rape a woman if they could get away with it. Fortunately (or unfortunately, depending on your perspective) the statistic is not at all what it seems.

“…But the researchers failed to adequately report their methods! Spooky!”

The paper in question (Edwards et al, 2014) seeks to try and understand the apparent mystery behind the following finding: when asked if they ever raped anyone, most men will say “no”; when asked instead whether they ever held someone down to coerce them into having sex, a greater percentage of men will indicate that they have. Women’s perceptions about the matter seem to follow suit. As I wrote when discussing the figure that 25% of college women will be raped:

The difference was so stark that roughly 75% of the participants that Koss had labeled as having experiencing rape did not, themselves, consider the experience to be rape.

What strikes me as curious about these findings is not the discrepancy in responses; that much can likely be explained by positing that these questions are perceived by the participants to be asking about categorically different behaviors. After all, if they were actually perceived to be asking about the same thing, you would see a greater agreement between the responses of both men and women between questions, which we do not. Instead, the curious part is that authors – like Edwards et al (2014) – continue to insist that all those participants must be wrong, writing, “…some men who rape do not seem to classify their behavior as such” (Jesse Singal at NYmag.com expresses a similar view, writing: “At the end of the day, after all, the two groups are saying the exact same thing“). Rather than conclude there is something wrong with the questions being asked (such as, say, they are capturing a portion of the population who would have rough, but consensual sex), they instead conclude there is something wrong with everyone else (both men and women) answering them. This latter explanation strikes me as unlikely. 

There’s already something of a bait-and-switch taking place, then, but this is far from the only methodological issue involved in deriving that scary-sounding 1-in-3 figure. Specifically, Edwards et al (2014) asked their 86 male participants to fill out part of the “attraction to sexual aggression” scale (Malamuth, 1989). On this scale, participants are asked to indicate, from 1 to 5, how likely they would be to engage in a variety of behaviors, with a “1″ corresponding to “not likely at all”, while “5″ corresponds to “very likely”. Included on this scale are two questions, one concerning whether the respondent would “rape” a woman, and another asking about whether he would “force her to do something she did not want to do” in a sexual setting. The participants in question were asked about their likelihood of engaging in such behaviors “if nobody would ever know and there wouldn’t be any consequences”. Edwards et al (2014) report that, if such criteria were met, 31% of the men would force a woman to do something sexually, whereas only 13% would rape a woman.

If you’re perceptive, you might have noticed something strange already: that 1-in-3 figure cannot be straightforwardly derived from the sexual aggression scale, as the scale is a 5-point measure, whereas the 1-in-3 statistic is clearly dichotomous. This raises the question of how one translates the scale into a yes/no response format. Edwards et al (2014) do not explicitly mention how they managed such a feat, but I think the answer is clear from the labeling in one of their tables: “Any intention to rape a woman” (emphasis, mine). What the researchers did, then, was code any response other than a “1″ as an affirmative; the statistical equivalent of saying that 2 is closer to 5 than it is to 1. In other words, the question was, “Would you rape a woman if you could get away with it”, and the answers were, effectively, “No, Yes, Yes, Yes, or Yes”. Making the matter even worse is that all that participants were answering both questions. This means they saw a question asking about “rape” and another question about “forcing a woman to do something she didn’t want to”. As participants likely figured that there was no reason the researchers would be asking the same question twice, they would have very good reason for thinking that these questions refer to categorically different things. For the authors to then conflate the two questions after the fact as being identical is stunningly disingenuous.

“The problem isn’t me; it’s everyone else”

To put these figures in better context, we could consider the results reported by Malamuth (1989). In response to the “Would you rape if you wouldn’t get caught” question, 74% of men indicated “1″ and 14% indicated a “2″, meaning a full 88% of them fell below the midpoint of the scale; by contrast, only 7% fell above the midpoint, with about 5% indicating a “4″ and 2% indicating a “5″. Of course, reporting that “1-in-3 men would rape” if they could get away with it is much different than saying “less than 1-in-10 probably would”. The authors appear interested in deriving the most-damning interpretation of their data possible, however, as evidenced by their unreported and, in my mind, unjustifiable grouping of the responses. That fact alone should raise alarm bells as to whether the statistics they provide you would do a good job of predicting reality.

But let’s go ahead and take these responses at face value anyway, even if we shouldn’t: somewhere between 10-30% of men would rape a woman if there were no consequences for doing so. How alarming should that figure be? On the first front, the hypothetical world of “no consequence” doesn’t exist. Some proportion of men who would be interested in doing such things are indeed restrained from doing so by the probability of being punished. Even within that hypothetical world of freedom from consequences, however, there are likely other problems to worry about, in that you will always find some percentage of the population willing to engage in anti-social behavior that harms others when there are no costs for doing so (in fact, the truly strange part is that lots of people indicate they would avoid such behaviors).

Starting off small, for instance, about 70% of men and women indicate that they would cheat on their committed partner if they wouldn’t get caught (and slightly over 50% have cheated in spite of those possible consequences). What about other acts, like stealing, or murder. How many people might kill someone else if there would be no consequences for it? One informal poll I found placed that number around 40%; another puts it a little above 50% and, when broken up by sex, 32% of women would and a full 68% of men would. Just let those numbers sink in for a moment: comparing the two numbers for rape and murder, the men in Edwards et al (2014) were in between 2-to-7 times less likely to say they would rape a woman than kill someone if they could, depending on how one interprets their answers. That’s a tremendous difference; one that might even suggest that rape is viewed as a less desirable activity than murder. Now that likely has quite a bit to do with some portion of that murder being viewed as defensive in nature, rather than exploitative, but it’s still some food for thought.

 There are proportionately fewer defensive rapes than defensive stabbings…

This returns us nicely to the politics of fear. The last post addressed people purposefully downplaying the risks posed by terrorist attacks; in this case, we see people purposefully inflating the reported propensities to rape. The 1-in-3 statistic is clearly crafted in the hopes of making an issue seem particularly threatening and large, as larger issues tend to have more altruism directed towards them in the hopes of a solution. As there are social stakes in trying to make one’s problems seem especially threatening, however, this should immediately make people skeptical when dealing with such statistics for the same reasons you shouldn’t let me tell you about how smart or nice I am. There is a very real risk of artificially trying to puff one’s statistics up, as people might come to eventually start not trusting you about things as the default, even for different topics entirely; this should hold true especially if they belong to a group targeted by such misleading results. The undesirable outcomes of such a process being, rather than increases in altruism and sympathy devoted to a real problem, apathy and hostility. Lessons learned from fables like The Boy Who Cried Wolf are timely as ever, it would seem.

References: Edwards, S., Bradshaw, K., & Hinsz, V. (2014). Denying rape but endorsing forceful intercourse: Exploring differences among responders. Violence & Gender, 1, 188-193.

Malamuth, N. (1989). The attraction to sexual aggression scale: Part 1. The Journal of Sex Research, 26, 26-49.

The Politics Of Fear

There’s an apparent order of operations frequently observed in human reasoning: politics first, facts second. People appear perfectly willing to accept flawed arguments or incorrect statistics they would otherwise immediately reject, just so long as they support the reasoner’s point of view; Greg Cochran documented a few such cases (in his simple and eloquent style) a few days ago on his blog. Such a bias in our reasoning ability is not only useful – inasmuch as persuading people to join your side of a dispute tends to carry benefits, regardless of whether you’re right or wrong – but it’s also common: we can see evidence of it in every group of people, from the uneducated to those with PhDs and decades of experience in their field. In my case, the most typical contexts in which I encounter examples of this facet of our psychology – like many of you, I would suspect – is through posts shared or liked by others on social media. Recently, these links have been cropping up concerning the topic of fear. More precisely, there are a number of writers who think that people (or at least those who disagree with them) are behaving irrationally regarding their fears of Islamic terrorism and the threat it poses to their life. My goal here is not to say that people are being rational or irrational about such things – I happen to have a hard time finding substance in such terms – but rather to provide a different perspective than the ones offered by the authors; one that is likely in the minority among my professional and social peers.

You can’t make an omelette without alienating important social relations 

The first article on the chopping block was published on the New York Times website in June of last year. The article is entitled, “Homegrown extremists tied to deadlier toll than Jihadists in U.S. since 9/11,” and it attempts to persuade the reader that we, as a nation, are all too worried about the threat Islamic terrorism poses. In other words, American fears of terrorism are wildly out of proportion to the actual threat it presents. This article attempted to highlight the fact that, in terms of the number of bodies, right-wing, anti-government violence was twice as dangerous as Jihadist attacks in the US since 9/11 (48 deaths from non-Muslims; 26 by Jihadists). Since we seem to dedicate more psychological worry to Islam, something was wrong there There are three important parts of that claim to be considered: first, a very important word in that last sentence is “was,” as the body count evened out by early December in that year (currently at 48 to 45). This updated statistic yields some interesting questions: were those people who feared both types of attacks equally (if they existed) being rational or not on December 1st? Were those who feared right-wing attacks more than Muslim ones suddenly being irrational on the 2nd? The idea these questions are targeting is whether or not fears can only be viewed as proportionate (or rational) with the aid of hindsight. If that’s the case, rather than saying that some fears are overblown or irrational, a more accurate statement would be that such fears “have not yet been founded.” Unless those fears have a specific cut-off date (e.g., the fear of being killed in a terrorist attack during a given time period), making claims about their validity is something that one cannot do particularly well. 

The  second important point of the article to consider is that the count begins one day after a Muslim attack that killed over 3,000 people (immediately; that doesn’t count those who were injured or later died as a consequence of the events). Accordingly, if that count is set back just slightly, the fear of being killed by a Muslim terrorist attack would be much more statistically founded, at least in a very general sense. This naturally raises the question of why the count starts when it does. The first explanation that comes to mind is that the people doing the counting (and reporting about the counting) are interested in presenting a rather selective and limited view of the facts that support their case. They want to denigrate the viewpoints of their political rivals first, and so they select the information that helps them do that while subtly brushing aside the information that does not. That seems like a fairly straightforward case of motivated reasoning, but I’m open to someone presenting a viable alternative point of view as to why the count needs to start when it does (such as, “their primary interest is actually in ignoring outliers across the board”).    

Saving the largest for last, the final important point of the article to consider is that it appears to neglect the matter of base rates entirely. The attacks labeled as “right-wing” left a greater absolute number of bodies (at least at the time it was written), but that does not mean we learned right-wing attacks (or individuals) are more dangerous. To see why, we need to consider another question: how many bodies should we have expected? The answer to that question is by no means simple, but we can do a (very) rough calculation. In the US, approximately 42% of the population self-identifies as Republican (our right-wing population), while about 1% identifies as Muslim. If both groups were equally likely to kill others, then we should expect that the right-wing terrorist groups leave 42 bodies for every 1 that the Muslim group do. That ratio would reflect a genuine parity in threat. A count suggesting that this ratio was 2-to-1 at the time it written, and was 1-to-1 later that same year, we might reasonably conclude that the Muslim population, per individual member, is actually quite a bit more prone to killing others in terrorist attacks; if we factor in the 9/11 number, that ratio becomes something closer to 0.01-to-1, which is a far cry from demographic expectations.

Thankfully, you don’t have to report inconvenient numbers

Another example comes from The New Yorker, published just the other day (perhaps is it something about New York that makes people publish these pieces), entitled, “Thinking rationally about terror.” The insinuation, as before, is that people’s fears about these issues do not correspond well to the reality. In order to make the case that people’s fears are wrongheaded, Lawrence Krauss leans on few examples. One of these concerns the recent shootings in Paris. According to Lawrence, these attacks represented an effective doubling of the overall murder rate in Paris from the previous year (2.6 murders per 100,000 residents), but that’s really not too big of a deal because that just makes Paris as dangerous as New York City, and people aren’t that worried about being killed in NYC (or are they? No data on that point is mentioned). In fact, Lawrence goes on to say, the average Paris resident is about as likely to have been killed in a car accident during any given year than to have been killed during the mass shooting. This point is raised, presumably, to highlight an irrationality: people aren’t concerned about being killed by cars for the most part, so they should be just as unconcerned about being killed by a terrorist if they want to be rational.

This point about cars is yet another fine example of an author failing to account for base rates. Looking at the raw body count is not enough, as people in Paris likely interact with hundreds (or perhaps even thousands; I don’t have any real sense for that number) of cars every day for extended periods of time. By contrast, I would imagine Paris residents interact markedly less frequently with Muslim extremists. Per unit of time spent around cars, they would pose what is likely a much, much lower threat of death than Muslim extremists. Further, people do fear the harm caused by cars (we look both ways before crossing a street, we restrict licenses to individuals who demonstrate their competence to handle the equipment, have speed limits, and so on), and it is likely that the harm they inflict would be much greater if such fears were not present. In much the same way, it is also possible that the harms caused by terrorist groups would be much higher if people decided that such things were not worth getting worked up about and took no steps to assure their safety early on. Do considerations of these base rates and future risks fall under the umbrella of “rational” thinking? I would like to think so, and yet they seemed so easily overlooked by someone chiding others for being irrational: Lawrence at least acknowledges that future terror risks might increase for places like Paris, but notes that that kind of life is pretty much normal for Israel; the base-rate problems is not even mentioned.

While there’s more I could say on these topics, the major point I hope to get across is this: if you want to know why people experience fear about certain topics, it’s probably best to not start your analysis with the assumption that these people are wrong to feel the way they do. Letting one’s politics do the thinking is not a reliable way to get at a solid understanding of anything, even if it might help further your social goals. If we were interested in understanding the “why” behind such fears, we might begin, for instance, with the prospect that many people likely fear historically-relevant, proximate cues of danger, including groups of young, violent males making threats to your life based on your group membership, and cases where those threats are followed through and made credible. Even if such individuals currently reside many miles away, and even if only a few such threats have been acted upon, and even if the dangerous ones represent a small minority of the population, fearing them for one’s own safety does not – by default – seem to be an unreasonable thing to do; neither does fearing them for the safety of one’s relatives, social relations, or wider group members.

“My odds of getting hurt were low, so this isn’t worth getting worked up over”

Now, as I mentioned, all of this is not to say that people ought to fear some particular group or not; my current interests do not reside in directing your fears or their scope. I have no desire to tell you that your fears are well founded or completely off base (in no small part because I earnestly don’t know if they are). My interests are much more general than that, as this kind of thinking is present in all kinds of different contexts. There’s a real problem in beginning with the truth of your perspective and beginning your search for evidence only after the fact. The problem can run so deep that I actually find myself surprised to see someone take up the position that they were wrong after an earnest dig through the available evidence. Such an occurrence should be commonplace if rationality or truth were the goal in these debates, as people get things wrong (at least to some extent) all the time, especially when such opinions are formed in advance of such knowledge. Admissions of incorrect thinking does require, however, that one is willing to, at least occasionally, sacrifice a belief that used to be held quite dear; it requires looking like a fool publicly now and again; it even requires working against your own interests sometimes. These are things you will have to do; not just things that the opposition will. As such, I suspect these kinds of inadequate lines of reasoning will continue to pervade such discussions, which is a bit of a problem when the lives of others literally hang in the balance of the outcome.

Science By Funeral

“A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.”

As the above quote by Max Planck suggests, science is a very human affair. While, in an idealized form, the scientific process is a very useful tool for discovering truth, the reality of using the process in the world can be substantially messier. One of the primary culprits of this messiness is that being a good scientist per se – as defined by one who rigorously and consistently applies the scientific method – is not necessarily any indication that one is particularly bright or worthy of social esteem. It is perfectly possible to apply the scientific method to the testing of any number of inane or incorrect hypotheses. Instead, social status (and its associated rewards) tends to be provided to people who discover something that is novel, interesting, and true. Well, sort of; the discovery itself need not be exactly true as much as people need to perceive the idea as being true. So long as people perceive my ideas to be true, I can reap those social benefits; I can even do so if my big idea was actually quite wrong.

Sure; it looks plenty bright, but it’s mostly just full of hot air

Just as there are benefits to being known as the person with the big idea, there are also benefits to being friends with the person with the big idea, as access to those social (and material) resources tends to diffuse to the academic superstar’s close associates. Importantly, these benefits can still flow to those associates even if they lack the same skill set that made the superstar famous. To put this all into a simple example, getting a professor position at Harvard likely carries social and material benefits to the professor; those who study under the professor and get a degree from Harvard can also benefit by riding the coattails of the professor, even if they aren’t particularly smart or talented themselves. One possible result of this process is that certain ideas can become entrenched in a field, even if the ideas are not necessarily the best: as the originator of the idea has a vested interest in keeping it the order of the day in his field, and his academic progeny have a similar interest in upholding the originator’s status (as their status depends on his), new ideas may be – formally or informally – barred from entry and resisted, even if they more closely resemble the truth. As Planck quipped, then, science begins to move forward as the old guard die out and can no longer defend their status effectively; not because they relinquish their status in the face of new, contradictory evidence.

With this in mind, I wanted to discuss the findings of one of the most interesting papers I’ve seen in some time. The paper (Azoulay, Fons-Rosen, & Zivin, 2015) examined what happens to a field of research in the life sciences following the untimely death of one of its superstar members. Azoulay et al (2015) began by identifying their sample of approximately 13,000 superstars, 452 of which died prematurely (which, in this case, corresponded to an average age of death at 61). Of those who died, the term “superstar” would certainly describe them well, at least in terms of their output, generating a median authorship on 138 papers, 8,347 citations, and receiving over $16 million in government funding by the time of their death. These superstars were then linked to various subfields in which they published, their collaborators and non-collaborators within those subfields were identified, and a number of other variables that I won’t go into were also collected.

The question of interest, then, is what happens to these fields following the death of a superstar? In terms of the raw number of publications within a subfield, there was a very slight increase following the death of about 2%. That number does not give much of a sense for the interesting things that were happening, however. The first of these things is that the superstar’s collaborators saw a rather steep decline in their research output; a decline of about 40% over time. However, this drop in productivity of the collaborators was more than offset by an 8% increase in output by non-collaborators. This was an effect that remained (though it was somewhat reduced) even when the analysis excluded papers on which the superstar was an author (which makes sense: if one of your authors dies, of course you will produce fewer papers; there was just more to the decline than that). This decline in collaborator output would be consistent with a healthy degree of coattail-riding likely taking place prior to death. Further, there were no hints of these trends prior to the death, suggesting that the death in question was doing the causing when it came to changes in research output.

Figure 2: How much better-off your death made other people

The possible “whys” as to these effects was examined in the rest of the paper. A number of hints as to what is going on follow. First, there is the effect of death on citation counts, with non-collaborators producing more high-impact – but not low-impact – papers after the superstar’s passing. Second, these non-collaborators were producing papers in the very same subfields that the superstar had previously been in. Third, this new work did not appear to be building on the work of the superstar; the non-collaborators tended to cite the superstar less and newer work more. Forth, the newer authors were largely not competitors of the superstar during the time they were alive, opting instead to become active in the field following the death. The picture being painted by the data seems to be one in which the superstars initially dominate publishing within their subfields. While new faces might have some interest in researching these same topics, they fail to enter the field while the superstar is alive, instead providing their new ideas – not those already established – only after a hole has opened in the social fabric of the field. In other words, there might be barriers to entry for newcomers keeping them out, and those barriers relax somewhat following the death of a prominent member.

Accordingly, Azoulay et al (2015) turn their attention to what kinds of barriers might exist. The first barrier they posit is one they call “Goliath’s Shadow”, where newcomers are simply deterred by the prospect of having to challenge existing, high-status figures. Evidence consistent with this prospect was reported: the importance of the superstar – as defined by the fraction of papers in the field produced by them – seemed to have a noticeable effect, with more important figures creating a larger void to fill. By contrast, the involvement of the superstar – as defined by what percentage of their papers were published in a given field – did not seem to have an effect. The more a superstar published (and received grant money), the less room other people seemed to see for themselves. 

Two other possible barriers to entry concern the intellectual and social closure of a field: the former refers to the degree that most of the researchers within a field – not just the superstar – agree on what methods to use and what questions to ask; the latter refers to how tightly the researchers within a field work together, coauthoring papers and such. Evidence for both of these came up positive: fields in which the superstar trained many of the researchers in it and fields in which people worked very closely did not show the major effects of superstar death. Finally, a related possibility is that the associates of the superstar might indirectly control access to the field by denying resources to newcomers who might challenge the older set of ideas. In this instance, the authors reported that the deaths of those superstars who had more collaborators on editorial and funding boards tended to have less of an impact, which could be a sign of trouble. 

The influence of these superstars on generating barriers to entry, then, were often quite indirect. It’s not that the superstars were preventing newcomers themselves; it is unlikely they had the power to do so, even if they were trying. Instead, these barriers were created indirectly, either through the superstar receiving a healthly portion of the existing funding and publication slots, or through the collaborators of the superstar forming a relatively tight-knit community that could wield influence over what ideas got to see the light of day more effectively.

“We have your ideas. We don’t know who you are, and now no one else will either”

While it’s easy (and sometimes fun) to conjure up a picture of some old professor and their intellectual clique keeping out plucky, young, and insightful prospects with the power of discrimination, it is important to not leap to that conclusion immediately. While the faces and ideas within a field might change following the deaths of important figures, that does not necessarily mean the new ideas are closer to to that all-important, capital-T, Truth that we (sometimes) value. The same social pressures, costs, and benefits that applied to the now-dead old guard apply in turn to the new researchers, and new status within a field will not be reaped by rehashing the ideas of the past, even if they’re correct. Old-but-true ideas might be cast aside for the sake of novelty, just as new-but-false ideas might be promulgated. Regardless of the truth value of these ideas, however, the present data does lend a good deal of credence of the notion that science tends to move one funeral at a time. While truth may eventually win out by a gradual process of erosion, it’s important to always bear in mind that the people doing science are still only human, subject to the same biases and social pressures we all are.

References: Azoulay, P., Fons-Rosen, C., & Zivin, J. (2015). Does science advance one funeral at a time? The National Bureau of Economic Research, DOI: 10.3386/w21788

 

When Intuitions Meet Reality

Let’s talk research ethics for a moment.

Would you rather have someone actually take $20 from your payment for taking part in a research project, or would you rather be told – incorrectly – that someone had taken $20, only to later (almost immediately, in fact) find out that your money is safely intact and that the other person who supposedly took it doesn’t actually exist? I have no data on that question, but I suspect most people would prefer the second option; after all, not losing money tends to be preferable to losing money, and the lie is relatively benign. To use a pop culture example, Jimmy Kimmel has aired a segment where parents lie to their children about having eaten all their Halloween candy. The children are naturally upset for a moment and their reactions are captured so people can laugh at them, only to later have their candy returned and the lie exposed (I would hope). Would it be more ethical, then, for parents to actually eat their children’s candy so as to avoid lying to their children? Would children prefer that outcome?

“I wasn’t actually going to eat your candy, but I wanted to be ethical”

I happen to think that answer is, “no; it’s better to lie about eating the candy than to actually do it” if you are primarily looking out for the children’s welfare (there is obviously the argument to be made that it’s neither OK to eat the candy or to lie about it, but that’s a separate discussion). That sounds simple enough, but according to some arguments I have heard, it is unethical to design research that, basically, mimics the lying outcome. The costs being suffered by participants need to be real in order for research on suffering costs to be ethically acceptable. Well, sort of; more precisely, what I’ve been told is that it’s OK to lie to my subjects (deceive them) about little matters, but only in the context of using participants drawn from undergraduate research pools. By contrast, it’s wrong for me to deceive participants I’ve recruited from online crowd-sourcing sites, like Mturk. Why is that the case? Because, as the logic continues, many researchers rely on MTurk for their participants, and my deception is bad for those researchers because it means participants may not take future research seriously. If I lied to them, perhaps other researchers would too, and I have poisoned the well, so to speak. In comparison, lying to undergraduates is acceptable because, once I’m done with them, they probably won’t be taking part in many future experiments, so their trust in future research is less relevant (at least they won’t take part in many research projects once they get out of the introductory courses that require them to do so. Forcing undergraduates to take part in research for the sake of their grade is, of course, perfectly ethical).

This scenario, it seems, creates a rather interesting ethical tension. What I think is happening here is that a conflict has been created between looking out for the welfare of research participants (in common research pools; not undergraduates) and looking out for the welfare of researchers. On the one hand, it’s probably better for participants’ welfare to briefly think they lost money, rather than to let them actually lose money; at least I’m fairly confident that is the option subjects would select if given the choice. On the other hand, it’s better for researchers if those participants actually lose money, rather than briefly hold the false believe that they did, so participants continue to take their other projects seriously. An ethical dilemma indeed, balancing the interests of the participants against those of the researchers.

I am sympathetic to the concerns here; don’t get me wrong. I find it plausible to suggest that if, say, 80% of researchers outright deceived their participants about something important, people taking this kind of research over and over again would likely come to assume some parts of it were unlikely to be true. Would this affect the answers participants provide to these surveys in any consistent manner? Possibly, but I can’t say with any confidence if or how it would. There also seems to be workarounds for this poisoning-the-well problem; perhaps honest researchers could write in big, bold letters, “the following research does not contain the use of deception” and research that did use deception would be prohibited from attaching that bit by the various institutional review boards that need to approve these projects. Barring the use of deception across the board would, of course, create its own set of problems too. For instance, many participants taking part in research are likely curious as to what the goals of the project are. If researchers were required to be honest and transparent about their purposes upfront so as to allow their participants to make informed decisions regarding their desire to participate (e.g., “I am studying X…”), this can lead to all sorts of interesting results being due to demand characteristics - where participants behave in unusual manners as a result of their knowledge about the purpose of the experiment – rather than the natural responses of the subjects to the experimental materials. One could argue (and many have) that not telling participants about the real purpose of the study is fine, since it’s not a lie as much as an omission. Other consequences of barring explicitly deception exist as well, though, including the lack of control over experimental stimuli during interactions between participants and the inability to feasibly even test some hypotheses (such as whether people prefer the tastes of identical foods, contingent on whether they’re labeled in non-identical ways).

Something tells me this one might be a knock off

Now this debate is all well and good to have in the abstract sense, but it’s important to bring some evidence to the matter if you want to move the discussion forward. After all, it’s not terribly difficult for people to come up with plausible-sounding, but ultimately incorrect, lines of reasoning as for why some research practice is possibly (un)ethical. For example, some review boards have raised concerns about psychologists asking people to take surveys on “sensitive topics”, under the fear that answering questions about things like sexual histories might send students into an abyss of anxiety. As it turns out, such concerns were ultimately empirically unfounded, but that does not always prevent them from holding up otherwise interesting or valuable research. So let’s take a quick break from thinking about how deception might be harmful in the abstract to see what effects it has (or doesn’t have) empirically.

Drawn by the debate between economists (who tend to think deception is bad) and social scientists (who tend to think it’s fine), Barrera & Simpson (2012) conducted two experiments to examine how deceiving participants affected their future behavior. The first of these studies tested the direct effects of deception: did deceiving a participant make them behave differently in a subsequent experiment? In this study, participants were recruited as part of a two-phase experiment from introductory undergraduate courses (so as to minimize their previous exposure to research deception, the story goes; it just so happens they’re likely also the easiest sample to get). In the first phase of this experiment, 150 participants played a prisoner’s dilemma game which involved cooperating with or defecting on another player; a decision which would affect both player’s payments. Once the decisions had been made, half the participants were told (correctly) that they had been interacting with another real person in the other room; the other half were told they had been deceived, and that no other player was actually present. Everyone was paid and sent home.

Two to three weeks later, 140 of these participants returned for phase two. Here, they played 4 rounds of similar economic games: two rounds of dictator-games and two rounds of trust-games. In the dictator games, subjects could divide $20 between themselves and their partner; in the trust games, subjects could send some amount of $10 to the other player, this amount would be multiplied by three, and that player could then keep it all or send some of it back. The question of interest, then, is whether the previously-deceived subjects would behave any differently, contingent on their doubts as to whether they were being deceived again. The thinking here is that if you don’t believe you’re interacting with another real person, then you might as well be more selfish than you otherwise would. The results showed that while the previously-deceived participants were more likely to believe that social science researchers used deception somewhat more regularly, relative to the non-deceived participants their behavior was actually no different. Not only were the amounts of money sent to others no different (participants gave $5.75 on average in the dictator condition and trusted $3.29 when they were not previously deceived, and gave $5.52 and trusted $3.92 when they had been), but the behavior was no more erratic either. The deceived participants behaved just like the non-deceived ones.

In the second study the indirect effects of deception were examined. One-hundred-six participants first completed the same dictator and trust games as above. They were then either assigned to read about an experiment that did or did not make use of deception; a deception which included the simulation of non-existent participants. They then played another round of dictator and trust games immediately afterwards to see if their behavior would differ, contingent on knowing about how researchers might be deceive them. As in the first study, no behavioral differences emerged. Neither directly deceiving participants about the presence of others in the experiment or providing them with information that deception does take place in such research seemed to have any noticeable effects on subsequent behavior.

“Fool me once, shame on me; Fool me twice? Sure, go ahead”

Now it is possible that the lack of any effect in the present research had to do with the fact that participants were only deceived once. It is certainly possible that repeated exposures to deception, if frequent enough, will begin to have an effect and that effect will be a lasting one and it will not just be limited to the researcher employing the deception. In essence, it is possible that some spillover between experimenters over time might occur. However, this is something that needs to be demonstrated; not just assumed. Ironically, as Barrera & Simpson (2012) note, demonstrating such a spillover effect can be difficult in some instances, as designing non-deceptive control conditions to test against the deceptive ones is not always a straightforward task. In other words, as I mentioned before, some research is quite difficult – if not impossible – to conduct without being able to use deception. Accordingly, some control conditions might require that you deceive participants about deceiving them, which is awfully meta. Barrera & Simpson (2012) also mention some research findings that report even when no deception is used, participants who repeatedly take part in these kinds of economic experiments tend to get less cooperative over time. If that finding holds true, then the effects of repeated deception need to be filtered out from the effects of repeated participation in general. In any case, there does not appear to any good evidence that minor deceptions are doing harm to participants or other researchers. They might still be doing harm, but I’d like to see it demonstrated before I accept that they do. 

References: Barrera, D. & Simpson, B. (2012). Much ado about deception: Consequences of deceiving research participants in the social sciences. Sociological Methods & Research, 41, 383-413.