A Curious Case Of Vegan Moral Hypocrisy

I’ve decided to take one of my always-fun breaks from discussing strictly academic matters to instead examine a case of moral hypocrisy I came across recently involving a vegan: one Piper Hoffman, over at Our Hen House. Piper, as you can no doubt guess, frowns upon at least certain aspects of the lifestyles of almost every American (and probably most people in the world as well). In her words, most of us are, “arrogant flesh eaters“, who are condemnable moral hypocrites for both tending to do things like love our pets and eat other animals. There are so many interesting ideas found in that sentence that it’s hard to know where to start. First is the matter of why people tend to nurture members of other species in a manner resembling the way we nurture our own children. There’s also the matter of why someone like Piper would adopt a moral stance that involves protecting non-human animals. Sure; such a motivation might be intuitively understood when it happens to be people doing it, but the same cannot be said of non-human species. That is, it would appear to be particularly strange if you found, say, a lion that simply refused to eat meat on moral grounds.

She might be a flesh-eater, but at least she’s not all arrogant about it.

The third thing I find interesting about Piper’s particular moral stance is that it’s severely unpopular: less than 1% of the US population would identify as being a vegan and, in practice, even self-reported vegetarians were more likely to have eaten meat in the last 24 hours than to have not done so. Now while diet might often be the primary focus when people think of the word ‘vegan’, Piper assures us that’s there’s more to being a vegan than what you put in your mouth. Here is Piper’s preferred definition:

“Veganism is a way of living which excludes all forms of exploitation of, and cruelty to, the animal kingdom, and includes a reverence for life. It applies to the practice of living on the products of the plant kingdom to the exclusion of flesh, fish, fowl, eggs, honey, animal milk and its derivatives, and encourages the use of alternatives for all commodities derived wholly or in part from animals.”

Accordingly, not only should vegans avoid eating animal-related foods, they also should also not do things like wear fur, leather, wool or silk, as all involve the suffering or exploitation of the animal kingdom. Bear the word “silk” in mind, as we’ll be returning to it shortly.

Taken together, what emerges is the following picture: a member of species X has begun to adopt all sorts of personally-costly behaviors (like avoiding certain types of food or tools) in order to attempt to avoid reducing the welfare of pretty much any other living organisms, irrespective of their identity. Further still, that member of species X is not content with just personally behaving in such a manner: she has also taken it upon herself to attempt to try and regulate the behavior of others around her to do similarly, morally condemning them if they do not. That latter factor is especially curious, given that most other members of her species are not so inclined. This means her moral stance could potentially threaten otherwise-valuable social ties, and is unlikely to receive the broad social support capable of reducing the costs inherent in moral condemnation. I would like to stress again how absolutely bizarre such behavior would seem to be if we observed it in pretty much any other species.

Without venturing a tentative explanation for what cognitive systems might be generating such stances at the present time, I would like to consider another post Piper made on October 21st of this year. While in her apartment, Piper heard some strange sounds and, upon investigation, discovered that a colony of ants had taken over her bedroom. Being a vegan who avoids all form of cruelty and exploitation of animals, Piper did what one might expect from one who displays a reverence for life: she bought some canisters of insect poison, personally gassed thousands of the ants herself, then called in professionals to finish the job and kill the rest of them that were living in the walls. Now one might, as Piper did, suggest that it’s unclear as to whether insects feel pain; perhaps they do, and perhaps they don’t. What is clear, however, is that Piper previously stated a moral rule against wearing products made from silk. Apparently the silk production is exploitative in a way mass murder is not. In any case, the comments on Piper’s blog are what one might expect from a vegan crowd who condemns cruelty and reveres life: unanimous agreement that mass killing was an appropriate response because, after all, people, even vegans, aren’t perfect.

“If it’s any consolation, I felt bad afterwards. I mean, c’mon; no one’s perfect”

This situation raises plenty of debatable and valuable questions. One is the matter of the hypocrisy itself: why didn’t Piper’s conscience stop her from acting? Another is the matter of those who commented on the article: why was Piper supported by other (presumed) vegans, rather than condemned for a clear act of selfish cruelty? A third is that it is clear Piper did not reduce or prevent animal suffering in anyway in the story, so is she, and the vegan code of conduct for generally, truly designed/attempting to reduce suffering per se? If the answer to the last question is “yes”, then one might ask whether or not the vegan lifestyle encourages people to engage in the proper behaviors capable of doing more to reduce suffering. While these are worthwhile questions that can shed light on all sorts of curious aspects of human psychology, I would like to focus on the last point.

Consider the following proposition: humans should exterminate all carnivorous species. This act might seem reasonable from a standpoint of reducing suffering. Why? By their very nature, carnivorous species require that other animals suffer and die so the carnivore can continue living. Since these murder-hungry species are unlikely to respond affirmatively to our polite requests that they kindly stop killing things, we could stop them from doing so, now and forever. Provided one wishes to reduce the suffering in the world, then, there are really only three answers to the question regarding whether we should exterminate all meat-eating species: “Yes”, because they cause more suffering than they offset (however that’s measured); “No”, because they offset more suffering than they cause; or “I don’t know” because we can’t calculate such things for sure.

Though I would find either of the first two answers acceptable from a consistency perspective, I have yet to find anyone who advocated for either of those options. What I have come across are people who posit the third answer with some frequency. I will of course grant that such things are incredibly difficult to calculate, especially with a high degree of accurately, but this clearly do not pose a problem in all cases. Refusing to wear silk clothes, for instance, seemed to be easy enough for Piper to calculate; it’s morally wrong because it involved animal suffering and/or exploitation. Similarly, I imagine most of us would not refrain from judging someone who slowly tortured our pet dog because we can’t be 100% sure that their actions were, on the whole, causing more suffering than they offset. If we cannot calculate welfare tradeoffs in situations like these with some certainty, then any argument for veganism built on the foundation of reducing animal suffering crumbles, as such a goal would be completely ineffective in guiding our actions.

Still having trouble calculating welfare impacts?

All the previous examples do is make people confront a simple fact: they’re often not all that interested in actually “minimizing suffering“. While it sounds like a noble goal – since most people don’t like suffering in the abstract – it’s too broadly phrased to be of any use. This should be expected for a number of reasons, namely that “reducing suffering per se” is a biologically-implausible function for any cognitive mechanism and, even if reducing suffering is the proximate goal in question, there’s pretty much always something else one could do to reduce it. Despite the latter fact, many people, like Piper, effectively give up on the idea once it becomes too much of a personal burden; they’re interested in reducing suffering, just so long as it’s not terribly inconvenient. But if people are not interested in minimizing suffering per se, what is actually motivating that stated interest? Presumably, it has something to do with the signal one sends by taking such moral a stance. I won’t discuss the precise nature of that signal at the present time, but feel free to offer speculations in the comments section.

Might Doesn’t Make Right, But It Helps

There’s no denying the importance and ubiquitousness of violence and aggression. Despite the suggestion of the owner of the swamp castle in Monty Python’s Quest for the Holy Grail, people continue to “bicker and argue about who killed who“. Given that anger is often a key motivator of aggression, developing a satisfying account of anger can go a long way towards understanding and predicting when people will be likely to aggress against others. While there has been a great deal of focus placed on reducing violence, there tends to somewhat less mind paid to understanding the functions and uses of anger. The American Psychology Association, for instance, notes that anger can be a good thing because, “it can give you a way to express negative feelings…or motivate you to find solutions to problems”. They also warn that anger can “get out of hand”. While such suggestions sound plausible (minus the idea that “expressing” an emotion is good, in and of itself), they tend to lack the ability to deliver suitably textured predictions about the correlates or shape of anger, much less qualify what counts as “getting out of hand”.

Seems like he had that situation completely under control to me.

Of course, that’s not to suggest that is anger is always going to be useful in precisely the same measure as it gets delivered; just that we ought to be interested in attempting to understand the emotion before trying to diagnose the problems with it (in much the same fashion, one might wish to understand the function of, say, a fever, before figuring out whether we should try to reduce them). Towards that end, I would like to turn to a paper by Sell, Tooby, & Cosmides (2009) who posit an altogether more specific and biologically-plausible function for anger: the regulation and modification of welfare-tradeoff ratios (WTRs). These ratios essentially represent how much of your own welfare you’re willing to give up to improve the welfare of another. To use a simple economics example, imagine choosing between two options: $6 for yourself and $1 for someone else, or $5 for yourself and $5 for someone else. One’s WTR towards that someone else could be approximated, at least in some sense, by their choice in that and other dilemmas. This propensity to suffer losses to benefit others varies considerably across individuals.

This basic concept can be readily expanded to the wider social world: everything we do tends to have an effect on others and ourselves, and we would be better off, on the whole, if other people were relatively more willing to take our welfare into account when they acted. Sometimes that works out favorably for both parties, as is often the case in kin relationships (shared genetic interests tend to increase the willingness to trade off your own welfare for another’s); other times, it won’t work out so nicely. Since everyone would be better off if they could increase their WTRs with others and not everyone can possibly achieve that goal at once, WTRs tend to be aligned in non-optimal ways from at least someone’s perspective, if not most people’s. So let’s say someone isn’t taking my welfare into account in a way I deem acceptable when they act; what’s a guy to do? One available option is to attempt and “renegotiate” their WTR towards me through the threat of inflicting costs or withdrawing benefits; the  kinds of behaviors that anger helps motivate. Anger, then, might serve the function of attempting to regulate other people’s WTRs towards you (or your allies, and you by extension) by signalling the intention to inflict costs after behavior indicative of an unacceptably-low WTR.

This function immediately suggests some design features we ought to expect to find in the cognitive systems regulating anger, because not everyone is equally capable of inflicting costs on others. Accordingly, someone in a better position to inflict costs on others ought to be more readily roused to anger. One obvious indicator of that capacity to inflict costs would be one’s physical formidability: physically stronger males should be more capable of inflicting costs on others, and thus more willing to do so in order to modify the WTRs held by said others. This prediction was born out well in the data Sell et al (2009) collected: across various measures of men’s strength, the correlation between physical formidability and proneness to anger, history of fighting, sense of entitlement, and the perceived usefulness of violence were all high, ranging from approximately r = 0.3 to 0.5; for women, the same correlations were around 0.05 to 0.1. It was only physical formidability in men that proved to be a good predictor of aggression and anger, which makes a good deal of sense in light of the fact that women tend to be substantially less physically formidable in general.

A relationship that holds even when measured in Hulks.

Women are not without power, though, even if typically falling behind men in physical strength. Perhaps owing to their ability to recruit the physical strength of others, or leverage some other social capital, attractive women might also be especially prone to anger. This set of predictions was also confirmed: women who perceived themselves to be attractive – like strong men – were more prone to anger, felt greater entitlement, were more successful in conflicts, and found violence to be more useful after controlling for physical strength. Attractiveness, however, did not predict history of fighting in women, as was expected. While attractive men also tended to feel a greater sense of entitlement and reported more success in conflicts, the variables relating to fighting ability did not reliably correlate with attractiveness once the effect of physical formidability was partial outed. In other words, in relation to anger, what physical strength was for men, attractiveness was for women.

It should also be noted that neither attractiveness or physical strength correlated well with how long people tended to ruminate when angry. It wasn’t simply the case that strong men/attractive women were angrier for longer periods of time. We ought to expect anger to be roused strategically and contextually in order to solve specific problems; not just generally, as that is liable to cause more problems than it solves. These results also cut against some popular misconceptions, like people being angry to compensate for a lack of physical strength or attractiveness, as the people who lacked those qualities tended to be less prone to anger. These data would also cut against the suggestions from the APA that I initially mentioned: unless there’s some compelling reason to predict that physically strong males/attractive females are particularly likely to be prone to anger in order to “express their emotions” or “solve problems” more generally, we can see that those ostensible functions for anger are clearly lacking in some regards. They fail to deliver good predictions or satisfyingly account for the existing data.

These findings do raise some questions bearing deeper examination. The first of these concerns the often ambiguous nature of casual arrows: do men become more prone to anger and aggression as they become physically stronger, or might there be some developmental window at which point aggressive tendencies tend to become relatively canalized (i.e. does current physical strength matter, or does one’s strength at, say, age 16 matter more)? What role does social influence – in the form of larger groups of allies – bring? Are well-liked, but physically-weak men less or more likely to become angry easily? Does it matter whether one’s friends are physically imposing? How about if the target of one’s anger is more formidable than the one experiencing the anger? Admittedly, these are tricky questions to answer, owing largely to potential logistical issues in conducting the research in an ecologically-valid context, but they’re certainly worth considering.

“Experimental Recruitment: Please bring a dozen close friends”

Returning to the initial point about when anger gets “out of control”, we can see the question becomes a significantly more nuanced one. For starters, “out of control” will clearly depend on who you ask: while the angry individual might feel that they are not being treated appropriately by others in their social world, the targets of that anger might insist that the angry individual is being unreasonable in their requests for some particular treatment. Further, “out of control” for one individual does not necessarily equal the same amount of aggression for any other, at least in terms of the adaptive value of the behavior. One might also consider, at least at times, a lack of aggression and anger to be unsuitable behavior, such as when meek children are told to stand up to their bullies. The key point here is that we ought to expect all these considerations to vary strategically, rather than as a function of someone needing to “express their emotions” by “venting” them. If Sell et al (2009) are correct, anger can likely be reduced by altering these WTRs in non-aggressive fashions. Once the expected WTR for one party has been reached, the anger systems ought to be deactivated. Whether such methods are likely to be practically feasible is another matter entirely.

References: Sell, A., Tooby, J., & Cosmides, L. (2009). Formidability and the logic of human anger.  Proceedings of the National Academy of Sciences, 106, 15073-78.

Classic Research In Evolutionary Psychology: Learning

Let’s say I were to give you a problem to solve: I want you to design a tool that is good at cutting. Despite the apparent generality of the function, this is actually a pretty vague request. For instance, one might want to know more about the material to be cut: a sword might work if your job is cutting some kind human flesh, but it might also be unwieldy to keep around the kitchen for preparing dinner (I’m also not entirely sure they’re dishwasher-safe, provided you managed to fit a katana into your machine in the first place). So let’s narrow the request down to some kind of kitchen utensil. Even that request, however, is a bit vague, as evidenced by Wikipedia naming about a dozen different kinds of utensil-style knives (and about 51 different kinds of knives overall). That list doesn’t even manage to capture other kinds of cutting-related kitchen utensils, like egg-slicers, mandolines, peelers, and graters. Why do we see so much variety, even in the kitchen, and why can’t one simple knife be good enough? Simple: when different tasks have non-overlapping sets of best design solutions, functional specificity tends to yield efficiency in one realm, but not in another.

“You have my bow! And my axe! And my sword-themed skillet!”.

The same basic logic has been applied to the design features of living organisms as well, including aspects of our cognition as I argued in the last post: the part of the mind that functions to logically reason about cheaters in the social environment does not appear to be able logically reason with similar ease about other, even closely-related topics. Today, we’re going to expand on that idea, but shift our focus towards the realm of learning. Generally speaking, learning can be conceived of as some change to an organism’s preexisting cognitive structure due to some experience (typically unrelated to physical trauma). As with most things related to biological changes, however, random alterations are unlikely to result in improvement; to modify a Richard Dawkins quote ever so slightly, “However many ways there may be of [learning something useful], it is certain that there are vastly more ways of [learning something that isn't". For this reason, along with some personal experience, no sane academic has ever suggested that our learning occurs randomly. Learning needs to be a highly-structured process in order to be of any use.

Precisely how structured "highly-structured" entails is a bit of a sticky issue, though. There are undoubtedly still some who would suggest that some general type of reinforcement-style learning might be good enough for learning all sorts of neat and useful things. It's a simple rule: if [action] is followed by [reward], then increase the probability of [action]; if [action] is followed by [punishment], then decrease the probability of [action]. There are a number of problems with such a simple rule, and they return to our knife example: the learning rule itself is under-specified for the demands of the various learning problems organisms face. Let’s begin with an analysis of what is known as conditioned taste aversion. Organisms, especially omnivorous ones, often need to learn about what things in their environment are safe to eat and which are toxic and to be avoided. One problem in learning about which are potential foods are toxic is that the action (eating) is often divorced from the outcome (sickness) by a span of minutes to hours, and plenty of intervening actions take place in the interim. On top of that, this is not the type of learning you want to need repeated exposures to in order to learn, as, and this should go without saying, eating poisonous foods is bad for you. In order to learn the connection between the food and the sickness, then, a learning mechanism would seem to need to “know” that the sickness is related to the food and not other, intervening variables, as well as being related in some specific temporal fashion. Events that conform more closely to this anticipated pattern should be more readily learnable.

The first study we’ll consider, then, is by Garcia & Koelling (1966) who were examining taste conditioning in rats. The experimenters created conditions in which rats were exposed to “bright, noisy” water and “tasty” water. The former condition was created by hooking a drinking apparatus up to a circuit that connected to a lamp and a clicking mechanism, so when the rats drank, they were provided with visual and auditory stimuli. The tasty condition was created by flavoring the water. Garcia & Koelling (1966) then attempted to pair the waters with either nausea or electric shocks, and subsequently measure how the rats responded in their preference for the beverage. After the conditioning phase, during the post-test period, a rather interesting sets of results emerged: while rats readily learned to pair nausea with taste, they did not draw the connection between nausea and audiovisual cues. When it came to the shocks, however, the reverse pattern emerged: rats could pair shocks with audiovisual cues well, but could not manage to pair taste and shock. This result makes a good deal of sense in light of a more domain-specific learning mechanism: things which produce certain kinds of audiovisual cues (like predators) might also have the habit of inflicting certain kinds of shock-like harms (such as with teeth or claws). On the other hand, predators don’t tend to cause nausea; toxins in food tend to do so, and these toxins also tend to come paired with distinct tastes. An all-purpose learning mechanism, by contrast, should be able to pair all these kinds of stimuli and outcomes equally well; it shouldn’t matter whether the conditioning comes in the form of nausea or shocks.

Turns out that shocks are useful for extracting information, as well as communicating it.

The second experiment to consider on the subject of learning, like the previous one, also involves rats, and actually pre-dates it. This paper, by Petrinovich & Bolles (1954), examined whether different deprivation states have qualitatively different effects on behavior. In this case, the two deprivation states under consideration were hunger and thirst. Two samples of rats were either deprived of food or water, then placed in a standard T-maze (which looks precisely how you might imagine it would). The relevant reward – food for the hungry rats and water for the thirsty ones – was placed in one arm of the T maze. The first trial was always rewarded, no matter which side the rat chose. Following that initial choice, the food was placed on the side of the maze the rat did not chose on the previous trial. For instance, if the rat went ‘right’ on the first trial, the reward was placed in the ‘left’ arm on the second trial. Whether the rat chose correctly or incorrectly didn’t matter; the reward was always placed on the opposite side as its previous choice. Did it matter whether the reward was food or water?

Yes; it mattered a great deal. The hungry rats averaged substantially fewer errors in reaching the reward than the thirsty ones (approximately 13 errors over 34 trials, relative to 28 errors, respectively). The rats were further tested until they managed to perform 10 out of 12 trials correctly. The hungry rats managed to meet the criterion value substantially sooner, requiring a median of 23 total trials before reaching that mark. By contrast, 7 of the 10 thirsty rats failed to reach the criterion at all, and, of the three that did, they required approximately 30 trials on average to manage that achievement. Petrinovich & Bolles (1954) suggested that these results can be understood in the following light: hunger makes the rat’s behavior more variable, while thirst makes its behavior more stereotyped. Why? The most likely candidate explanation is the nature of the stimuli themselves, as they tend to appear in the world. Food sources tend to be distributed semi-unpredictably throughout the environment, and where there is food today, there might not be food tomorrow. By contrast, the location of water tends to be substantially more fixed (where there was a river today, there is probably a river tomorrow), so returning to the last place you found water would be the more-secure bet. To continue to drive this point home: a domain general learning mechanism should do both tasks equally as well, and a more general account would seem to struggle to explain these findings.

Shifting gears away from rats, the final study for consideration is one I’ve touched on before, and it involves the fear responses of monkeys. As I’ve already discussed the experiment, (Cook & Mineka, 1989) I’ll offer only a brief recap of the paper. Lab-reared monkeys show no intrinsic fear responses to snakes or flowers. However, social creatures that they are, these lab-reared monkeys can readily develop fear responses to snakes after observing another conspecific reacting fearfully to them. This is, quite literally, a case of monkey see, monkey do. Does this same reaction hold in response to observations of conspecifics reacting fearfully to a flower? Not at all. Despite the lab-reared monkeys being exposed to stimuli they have never seen before in their life (snakes and flowers) paired with a fear reaction in both cases, it seems that the monkeys are prepared to learn to fear snakes, but not similarly prepared to learn a fear of flowers. Of note is that this isn’t just a fear reaction in response to living organisms in general: while monkeys can learn a fear of crocodiles, they do not learn to fear rabbits under the same conditions.

An effect noted by Python (1975)

When it comes to learning, it does not appear that we are dealing with some kind of domain-general learning mechanism, equally capable of learning all types of contingencies. This shouldn’t be entirely surprising, as organisms don’t face all kinds of contingencies with equivalent frequencies: predators that cause nausea are substantially less common than toxic compounds which do. Don’t misunderstand this argument: humans and nonhumans alike are certainly capable of learning many phylogenetically novel things. That said, this learning is constrained and directed in ways we are often wholly unaware of. The specific content area of the learning is of prime importance in determining how quickly somethings can learned, how lasting the learning is likely to be, and which things are learned (or learnable) at all. The take-home message of all this research, then, can be phrased as such: Learning is not the end point of an explanation; it’s a phenomenon which itself requires an explanation. We want to know why an organism learns what it does; not simply that it learns.

References: Cook M, & Mineka S (1989). Observational conditioning of fear to fear-relevant versus fear-irrelevant stimuli in rhesus monkeys. Journal of abnormal psychology, 98 (4), 448-59 PMID: 2592680

Garcia, J. & Koelling, R. (1966). Relation of cue to consequence in avoidance learning. Psychonomic Science, 4, 123-124.

Petrinovich, L. & Bolles, R. (1954). Deprivation states and behavioral attributes. Journal of Comparative Physiological Psychology, 47, 450-453.

Classic Research In Evolutionary Psychology: Reasoning

I’ve consistently argued that evolutionary psychology, as a framework, is a substantial, and, in many ways, vital remedy to some wide-spread problems: it allows us to connect seemingly disparate findings under a common understanding, and, while the framework is by itself no guarantee of good research, it forces researchers to be more precise in their hypotheses, allowing for conceptual problems with hypotheses and theories to be more transparently observed and addressed. In some regards the framework is quite a bit like the practice of explaining something in writing: while you may intuitively feel as if you understand a subject, it is often not until you try to express your thoughts in actual words that you find your estimation of your understanding has been a bit overstated. Evolutionary psychology forces our intuitive assumptions about the world to be made explicit, often to our own embarrassment.

“Now that you mention it, I’m surprised I didn’t notice that sooner…”

As I’ve recently been discussing one of the criticisms of evolutionary psychology – that the field is overly focused on domain-specific cognitive mechanisms – I feel that now would be a good time to review some classic research that speaks directly to the topic. Though the research to be discussed itself is of recent vintage (Cosmides, Barrett, & Tooby, 2010), the topic has been examined for some time, which is whether our logical reasoning abilities are best convinced of as domain-general or domain-specific (whether they work equally well, regardless of content, or whether content area is important to their proper functioning). We ought to expect domain specificity in our cognitive functioning for two primary reasons (though these are not the only reasons): the first is that specialization yields efficiency. The demands of solving a specific task are often different from the demands of solving a different one, and to the extent that those demands do not overlap, it becomes difficult to design a tool that solves both problems readily. Imagining a tool that can both open wine bottles and cut tomatoes is hard enough; now imagine adding on the requirement that it also needs to function as a credit card and the problem becomes exceedingly clear. The second problem is outlined well by Cosmides, Barrett, & Tooby (2010) and, as usual, they express it more eloquently than I would:

The computational problems our ancestors faced were not drawn randomly from the universe of all possible problems; instead, they were densely clustered in particular recurrent families.

Putting the two together, we end up with the following: humans tend to face a non-random set of adaptive problems in which the solution to any particular one tends to differ from the solution to any other. As domain-specific mechanisms solve problems more efficiently than domain-general ones, we ought to expect the mind to contain a large number of cognitive mechanisms designed to solve these specific and consistently-faced problems, rather than only a few general-purpose mechanisms more capable of solving many problems we do not face, but poorly-suited to the specific problems we do. While such theorizing sounds entirely plausible and, indeed, quite reasonable, without empirical support for the notion of domain-specificity, it’s all so much bark and no bite.

Thankfully, empirical research abounds in the realm of logical reasoning. The classic tool used to assess people’s ability to reason logically is the Wason selection task. In this task, people are presented with a logical rule taking the form of “if P, then Q“, and a number of cards representing P, Q, ~P, and ~Q (i.e. “If a card has a vowel on one side, then it has an even number on the other”, with cards showing A, B, 1 & 2). They are asked to point out the minimum set of cards that would need to be checked to test the initial “if P, then Q” statement. People’s performance on the task is generally poor, with only around 5-30% of people getting it right on their first attempt. That said, performance on the task can become remarkably good – up to around 65-80% of subjects getting the correct answer – when the task is phrased as a social contract (“If someone [gets a benefit], then they need to [pay a cost]“, the most well known being “If someone is drinking, then they need to be at least 21″). Despite the underlying logical form not being altered, the content of the Wason task matters greatly in terms of performance. This is a difficult finding to account for if one holds to the idea of a domain-general logical reasoning mechanism that functions the same way in all tasks involving formal logic. Noting that content matters is one thing, though; figuring out how and why content matters becomes something of a more difficult task.

While some might suggest that content simply matters as a function of familiarity – as people clearly have more experience with age restrictions on drinking and other social situations than vaguer stimuli – familiarity doesn’t help: people will fail the task when it is framed in terms of familiar stimuli and people will succeed at the task for unfamiliar social contracts. Accordingly, criticisms of the domain-specific social contract (or cheater-detection) mechanism shifted to suggest that the mechanism at work is indeed content-specific, but perhaps not specific to social contracts. Instead, the contention was that people are good at reasoning about social contracts, but only because they’re good at reasoning about deontic categories – like permissions and obligations – more generally. Assuming such an account were accurate, it remains debatable as to whether that mechanism would be counted as a domain-general or domain-specific one. Such a debate need not be had yet, though, as the more general account turns out to be unsupported by the empirical evidence.

We’re just waiting for critics to look down and figure it out.

While all social contracts involve deontic logic, not all deontic logic involves social contracts. If the more general account of deontic reasoning were true, we ought to not expect performance difference between the former and latter types of problems. In order to test whether such differences exist, Cosmides, Barrett, & Tooby’s (2010) first experiment involved presenting subjects with a permission rule – “If you do P, you must do Q first” – varying whether P was a benefit (going out at night), neutral (staying in), or a chore (taking out the trash; Q, in this case, involved tying a rock around your ankle). When the rule was a social contract (the benefit), performance was high on the Wason task, with 80% of subjects answering correctly. However, when the rule involved staying in, only 52% of subjects got it right; that number was even lower in the garbage condition, with only 44% accuracy among subjects. Further, this same pattern of results was subsequently replicated in a new context involving filing/signing forms as well. This results is quite difficult to account for with a more-general permission schema, as all the conditions involve reasoning about permissions; they are, however, consistent with the predictions from social contract theory, as only the contexts involving some form of social contract ended up eliciting the highest levels of performance.

Permission schemas, in their general form, also appear unconcerned with whether one violates a rule intentionally or accidentally. By contrast, social contract theory is concerned with the intentionality of the violation, as accidental violations do not imply the presence of a cheater the way intentional violations do. To continue to test the distinction between the two models, subjects were presented with the Wason task in contexts where the violations of the rule were likely intentional (with or without a benefit for the actor) or accidental. When the violation was intentional and benefited the actor, subjects performed accurately 68% of the time; when it was intentional but did not benefit that actor, that percentage dropped to 45%; when the violation was likely unintentional, performance bottomed-out at 27%. These results make good sense if one is trying to find evidence of a cheater; they do not if one is trying to find evidence of a rule violation more generally.

In a final experiment, the Wason task was again presented to subjects, this time varying three factors: whether one was intending to violate a rule or not; whether it would benefit the actor or not; and whether the ability to violate was present or absent. The pattern of results mimicked those above: when benefit, intention, and ability were all present, 64% of subjects determined the correct answer to the task; when only 2 factors were present, 46% of subjects got the correct answer; and when only 1 factor was present, subjects did worse still, with only 26% getting the correct answer, which is approximately the same performance level as when there were no factors present. Taken together, these three experiments provide powerful evidence that people aren’t just good at reasoning about the behavior of other people in general, but rather that they are good at reasoning about social contracts in particular. In the now-immortal words of Bill O’Reilly, “[domain-general accounts] can’t explain that“.

“Now cut their mic and let’s call it a day!”

Now, of course, logical reasoning is just one possible example for demonstrating domain specificity, and these experiments certainly don’t prove that the entire structure of the mind is domain specific; there are other realms of life – such as, say, mate selection, or learning – where domain general mechanisms might work. The possibility of domain-general mechanisms remains just that – possible; perhaps not often well-reasoned on a theoretical level or well-demonstrated at an empirical one, but possible all the same. The problem in differentiating between these different accounts may not always be easy in practice, as they are often thought to generate some, or even many, of the same predictions, but in principle it remains simple: we need to place the two accounts in experimental contexts in which they generate opposing predictions. In the next post, we’ll examine some experiments in which we pit a more domain-general account of learning against some more domain-specific ones.

References: Cosmides L, Barrett HC, & Tooby J (2010). Adaptive specializations, social exchange, and the evolution of human intelligence. Proceedings of the National Academy of Sciences of the United States of America, 107 Suppl 2, 9007-14 PMID: 20445099

Evolutionary Psychology: Tying Psychology Together

Every now and again – perhaps more frequently than many would prefer – someone who apparently fails to understand one or more aspects of the evolutionary perspective in psychology goes on to make rather public proclamations about what it is and what it can and cannot do for us. Notable instances are not particularly difficult to find. The most recent of these to cross my desk comes from Gregg Henriques, which takes a substantially less-nasty tone than I have come to expect. In it, he claims that evolutionary psychology does not provide us with a viable metatheory for understanding psychology, and he bases his argument on three main points: (1) evolutionary psychology is overly committed to the domain-specificity concept, (2) that the theory fails to have the correct map of complexity, and (3) it hasn’t done much for people in a clinical setting. In the course of making these arguments, I feel he stumbles badly on several points, so I’d like to take a little time to point out these errors. Thankfully, given the relative consistency of these errors, doing so is becoming more a routine than anything else.

So feel free to change the channel if you’ve seen this before.

Gregg begins with the natural starting point for many people in criticizing EP: while we have been focusing on how organisms solve specific adaptive problems, there might be more general adaptive problems out there. As Gregg put it:

The EP founders also overlooked the fact that there really is a domain general behavioral problem, which can be characterized as the problem of behavioral investment

There are a number of things to say about such a suggestion. Thankfully, I have said them before, so this is a relatively easy task. To start off, these ostensibly domain-general problems are, in fact, not all that general. To use a simple example, consider one raised by Gregg in his discussion of behavioral investment theory: organisms need to solve the problem of obtaining more energy than they spend to keep on doing things like being alive and mating. That seems like an awfully general problem, but, stated in such manner, the means by which that general problem is, or can be, solved are massively unspecified. How does an organism calculate its current caloric state? How does an organism decide which things to eat to obtain energy? How does an organism decide when to stop foraging for food in one area and pursue a new one? How is the return on energy calculated and compared against the expenditure? As one can quickly appreciate, the larger, domain-general problem (obtain more energy than one expends) is actually composed of very many smaller problems, and things can get complicated quickly. Pursuing mating rather than food, for instance, is unlikely to result in an organism obtaining more energy than it expends. This leaves the behavioral investment problem – broadly phrased – wanting in terms of any predictive power: why do organism pursue goals other than gaining and energy and under what conditions do they do so? The issue here, then, is not so much that domain-general problems aren’t being accounted for by evolutionary psychology, but rather that the problems themselves are being poorly formulated by the critics.

The next area in this criticism that Gregg stumbles on is the level of analysis that evolutionary psychology tends to work with. Gregg considers associative learning a domain general system but, again, it’s trivial to demonstrate it is not all that general. There are many things that associative learning systems do not do: regulate homeostatic processes, like breathing and heart rate, perceive anything, like light, sound, pleasure, or pain, generate emotions, store memory, and so on. In terms of their function, associative learning systems only really seem to do one thing: make behavior followed by reward more likely than behavior followed by discomfort, and that’s only after other systems have decided what is rewarding and what is not. That this system can apply the same function to many different inputs doesn’t make it a domain-general one. The distinction that Gregg appears to miss, then, is that functional specificity is not the same as input specificity. Calling learning a domain-general system is a bit like calling a knife a domain-general tool because it can be used to cut many different objects. Try to use a knife to weld metal, and you’ll quickly appreciate how domain-specific the function of a knife is.

On top of that, there is also the issue that some associations are learned far more readily than others. To quote Dawkins, “However many ways there may be of being alive, it is certain that there are vastly more ways of being dead”. A similar logic applies to learning: there are many more potentially incorrect and useless things to learn than there are useful ones. This is why learning ends up being a rather constrained process: rats can learn to associate light and sound with shocks, but do not tend to make the association between taste and shock, despite the unpleasantness of the shock itself. Conversely, associations between taste and nausea can be readily learned, but not between light and nausea. To continue beating this point to death, a domain-general account of associative learning has a rather difficult time explaining why some connections are readily learned and others are not. In order to generate more textured predictions, you need to start focusing on the more-specific sub-problems that make up the more general one.

And if doing so is not enough of a pain-in-the-ass, you’re probably doing it wrong.

On a topic somewhat-related to learning, the helpful link provided by Gregg concerning behavioral investment theory has several passages that, I think, are rather diagnostic of the perspective he has about evolutionary psychology:

Finally, because [behavioral investment/shutdown theory] is an evolutionary model, it also readily accounts for the fact that there is a substantial genetic component associated with depression (p.61)…there is much debate on the relative amount of genetic constraint versus experiential plasticity in various domains of mental functioning (p.70).

The problem here is that evolutionary psychology concerns itself with far more than genetic components. In the primer on evolutionary psychology, the focus on genetic components in particular is deemed to be nonsensical in the first place, as the dichotomy between genetic and environmental itself is a false one. Gregg appears to be conflating “evolutionary” with “genetic” for whatever reason, and possibly both with “fixed” when he writes:

In contrast to the static model suggested by evolutionary psychologists, The Origin of Minds describes a mind that is dynamic and ever-changing, redesigning itself with each life experience

As far as I know, no evolutionary psychologist has ever suggested a static model of the mind; not one. Given that evolutionary psychologists is pluralized in that sentence, I can only assume that the error is made by at least several of them, but to whom “them” refers is a mystery to me. Indeed, this passage by Gregg appears to play by the rules articulated in the pop anti-evolutionary psychology game nearly perfectly:

The second part of the game should be obvious. Once you’ve baldly asserted what evolutionary psychologists believe – and you lose points if, breaking tradition, you provide some evidence for what evolutionary psychologists have actually claimed in print and accurately portray their view – point out the blindingly obvious opposite of the view you’ve hung on evolutionary psychology. Here, anything vacuous but true works. Development matters. People learn. Behavior is flexible. Brains change over time. Not all traits are adaptations. The world has changed. People differ across cultures. Two plus two equals four. Whatever.

The example is so by-the-book that little more really needs to be said about it. Somewhat ironically, Gregg suggests that the evolutionary perspective creates a straw man of other perspectives, like learning and cultural ones. I’ll leave that suggestion without further comment.

The next point Gregg raises concerning complexity I have a difficult time understanding. If I’m parsing his meaning correctly, he’s saying that culture adds a level of complexity to analyses of human behavior. Indeed, local environmental conditions can certainly shape how adaptations develop and are activated, whether due to culture or not, but I’m not sure precisely how that is supposed to be a criticism of evolutionary psychology. As I mentioned before, I’m not sure a single contemporary evolutionary psychologist has ever been caught seriously suggesting something to the contrary. Gregg also makes some criticism of evolutionary psychology not defining psychology as he would prefer. Again, I’m not quite sure I catch his intended meaning here, but I fail to see how that it is a criticism of the perspective. Gregg suggests that we need psychology that can apply to non-humans as well, but I don’t to see how an evolutionary framework fails that test. No examples are given for further consideration, so there’s not much more to say on that front.

Gregg’s final criticism  amounts to a single line, suggesting that an evolutionary perspective has yet to unify every approach people take in psychotherapy. Not being the expert on psychotherapy myself, I’ll plead ignorance to the success that an evolutionary framework has had in that realm, and no evidence of any kind is provided for assessment. I fail to see why such a claim has any bearing on whether an evolutionary perspective could do so; I just wanted to make note that the criticism has been heard, but perhaps not formulated into a more appreciable fashion.

Final verdict: the prosecution seems confused.

Criticisms of an evolutionary perspective like these are unfortunately common and consistently misguided. Why they continue to abound despite their being answered time and again from the field’s origins is curious. Now in all fairness, Gregg doesn’t appear hostile to the field, and deems it “essential” for understanding psychology. Thankfully, the pop anti-evolutionary psychology game captures this sentiment as well, so I’ll leave it on that note:

The third part of the game is not always followed perfectly, and it is the hardest part. Now that you’ve shown how you are in full command of the way science is conducted or some truth about human behavior that evolutionary psychologists have missed, it’s important to assert that you absolutely acknowledge that of course humans are the product of evolution, and of course humans aren’t exempt from the principles of biology.

Look, you have to say, I’m not opposed to applying evolutionary ideas to humans in principle. This is key, as it gives you a kind of ecumenical gravitas. Yes, you continue, I’m all for the unity of science and cross-pollination and making the social sciences better, and so on. But, you have to add – and writing plaintively, if you can, helps here – I just want things to be done properly. If only evolutionary psychologists would (police themselves, consider development, acknowledge learning, study neuroscience, run experiments, etc…), then I would be just perfectly happy with the discipline.

Sound The Alarm: Sexist Citations

First things first: I would like to wish Popsych.org a happy two-year anniversary. Here’s looking at many more. That’s enough celebration for now; back to the regularly scheduled events.

When it comes to reading and writing, academics are fairly busy people. Despite these constraints on time, some of us (especially the male sections) still make sure to take the extra time to examine the articles we’re reading to ascertain the gender of the authors so as to systematically avoid citing women, irrespective of the quality of their work. OK; maybe that sounds just a bit silly. Provided people like that actually exist in any appreciable sense of the word, their representation among academics must surely be a vast minority, else their presence would be well known. So what are we to make of the recently-reported finding that, among some political science journals, female academics tend to have their work cited less often than might be expected, given a host of variables (Maliniak, Power, & Walter, 2013)?  Perhaps there might exist some covert bias against female authors, such that the people doing the citing aren’t even aware that they favor the work of men, relative to women. If the conclusions of the current paper are to be believed, this is precisely what we’re seeing (among other things).  Sexism – even the unconscious kind – is a bit of a politically hot topic to handle so, naturally, I suggest we jump right into the debate with complete disregard for the potential consequences; you know, for the fun of it all.

Don’t worry; I’m, like, 70% sure I know what I’m doing.

I would like to begin the review of this paper by noting a rather interesting facet of the tone of the introduction: what it does and does not label as “problematic”. What is labeled as problematic is the fact that women do not appear to earning tenured positions in equal proportion to the number of women earning PhDs. Though they discuss this fact in the light of the political science field, I assume they intend their conclusion to span many fields. This is the well-known leaky pipeline issue about which much has been written. What is not labeled as problematic are the facts in the next two sentences: women make up 57% of the undergraduate population, 52% of the graduate population, and these percentages are only expected to rise in the future. Admittedly, not every gender gap needs to be discussed in every paper that mentions them and, indeed, this gap might not actually mean much to us. I just want to note that women outnumbering men on campus by 1.3-to-1 and growing is mentioned without so much as batting an eye. The focus of the paper is unmistakably on considering the troubles that women will face. Well, sort of; a more accurate way of putting it is that the focus is on the assumed troubles that women will face: difficulty getting cited. As we will see, this citation issue is far from a problem exclusive to women.

Onto the main finding of interest: in the field of international relations, over 3000 articles across 12 influential journals spanning about 3 decades were coded for various descriptors about the article and the authors. Articles that were authored by men only were cited about 5 additional times, on average, than articles authored by women only. Since the average number of citations for all articles was about 25 citations per paper, this difference of 5 citations is labeled as “quite a significant” one, and understandably so; citation count appears to be becoming a more important part of the job process in academia. Importantly, the gap persisted at statistically significant levels even after controlling for factors like the age of the publication, the topic of study, whether it came from an R1 school, the methodological and theoretical approach taken in the paper, and the author’s tenure status. Statistically, being a woman seemed to be bad for citation count.

The authors suggest that this gap might be due to a few factors, though they appear to concede that a majority of the gap remains unexplained. The first explanation on offer is that women might be citing themselves less than men tend to (which they were: men averaged 0.4 self-citations per paper and women 0.25). However, subtracting out self-citation count and the average number of additional citations self-citation was thought to add does not entirely remove the gap either. The other possibility that the authors float involves what are called “citation cartels”, where authors or journals agree to cite each other, formally or informally, in order to artificially inflate citation counts.  While they have no evidence concerning the extent to which this occurs, nor whether it occurs across any gendered lines, they at least report that anecdotes suggest this practice exists. Would that factor help us explain the gender gap? No clue; there’s no evidence. In any case, from these findings, the authors conclude:

“A research article written by a woman and published in any of the top journals will still receive significantly fewer citations than if the same article had been written by a man” (p.29, emphasis mine).

I find the emphasized section rather interesting, as nothing that the authors researched would allow them to reach that conclusion. They were certainly not controlling for the quality of the papers themselves, nor their conclusions. It seems that because they controlled for a number of variables, the authors might have gotten a bit overconfident in assuming they had controlled for all or most of the relevant ones.

“Well, I’m out of ideas. I guess we’re done here”

Like other gender gaps, however, this one may not be entirely what it seems. Means are only one measure of central tendency, and not always preferable for describing one’s sample. For instance, the mean income of 10 people might be a million dollars provided nine have none and one is rather wealthy. A similar example might concern the “average” number of mates your typical male elephant seal has; while some have large harems, others are left out entirely from the mating game. In other words, a skewed distribution can result in means that are not entirely reflective of what many might consider the “true” average of the population. Another possible measure of central tendency we might consider, then, is the median: the value that falls in the middle of all the observed values, which is a bit more robust against outliers. Doing just that, we see that the gender gap in citation count vanishes entirely: not only does it not favor the men anymore, but it slightly favors the women in 2 of the 3 decades considered (the median for men from the 80s, 90s, and 00s are 5, 14, and 13; for women, 6, 14, and 15, respectively). Further, in two of the decades considered, mix-gendered articles appear to be favored by about 2-to-1 over papers with a single gender of author (medians equal 10, 22, and 16, respectively). Overall, the mean citation count looks to be about two-to-three times as high as the median, and the standard deviations of the citation count are huge. For instance, in the 1980s, articles authored by men averaged 17.6 citations per paper (substantially larger than the median of 5), and the SD of that count was 51.63. Yikes. Why is this rather interesting facet of the data not considered in much, if any, depth by the authors? I have no idea.

Now this is not to say that the mean or the median is necessarily the “correct” measure to consider here, but the fact that they return such different values ought to give us some pause for consideration. Mean values that are over twice as large as the median values with huge standard deviations suggests that we’re dealing with a rather skewed distribution, where some papers garner citation counts which are remarkably higher than others (a trend I wrote about recently with respect to cultural products). Now the authors do state that their results remain even if any outliers above 3 standard deviations are removed from the analysis, but I think that upper limit probably fails to fully capture what’s going on here. This handy graphical representation of citation count provided in the paper can help shed some light on the issue.

This is what science looks like.

What we see is not a terribly-noticeable trend for men to be cited more than women in general, as much as we see a trend for the papers with the largest citation counts to come disproportionately from men.  The work of most of the men, like most of the women, would seem to linger in relative obscurity. Even the mixed-sex papers fail to reach the heights that male-only papers tend to. In other words, the prototypical paper by women doesn’t seem to differ too much from the prototypical male paper; the “rockstar” papers (of which I’d estimate there are about 20 to 30 of in that picture), however, do differ substantially along gendered lines. Gendered lines are not the only way in which they might differ, however. A more accurate way of phrasing the questionable conclusion I quoted earlier would be to say “A research article written by anyone other than the initial author, if published in any of the top journals, might still receive significantly fewer citation even if it was the same article”. Cultural products can be capricious in their popularity, and even minor variations in initial conditions can set the stage for later popularity, or lack thereof.

Except for black; black is always fashionable.

This would naturally raise the question as to precisely why the papers with the largest impact come from men, relative to women. Unfortunately, I don’t have a good answer for that question. There is undoubtedly some cultural inertia to account for; were I to publish the same book as Steven Pinker in a parallel set of universes, I doubt mine would sell nearly as many copies (Steven has over 94,000 twitter followers, whereas I have more fingers and toes than fans). There is also a good deal of noise to consider: an article might not end up being popular because it was printed in the wrong place at the wrong time, rather than because of its quality. On the subject on quality, however, some papers are better than others, by whatever metric we’re using to determine such things (typically, that standard is “I know it when I wish I had thought of it first”). Though none of these factors lend themselves to analysis in any straightforward way, the important point is to not jump to overstated conclusions about sexism being the culprit, or to suggest that reviewers “…monitor the ratio of male to female citations in articles they publish” so as to point it out to the authors in the hopes of “remedying” any potential “imbalances”. One might also, I suppose, have reviewers suggest that authors make a conscious effort to cite articles with lower citation counts more broadly, so as to ensure a greater parity among citation counts in all articles. I don’t know why that state of affairs would be preferable, but one could suggest it.

References: Maliniak, D., Powers, R., & Walter, B. (2013). The gender citation gap in international relations. International Organization DOI: 10.1017/S0020818313000209

Having Their Cake And Eating It Too

Humans are a remarkably cooperative bunch of organisms. This is a remarkable fact because cooperation can open the door wide to all manner of costly exploitation. While it can be a profitable strategy for all involved parties, cooperation requires a certain degree of vigilance and, at times, the credible threat of punishment in order to maintain its existence. Figuring out how people manage to solve these cooperative problems has provided us with no shortage of research and theorizing, some of which is altogether more plausible than the rest. Though I haven’t quite figured out the appeal yet, there are many thoughtful people who favor the group selection accounts for explaining why people cooperate. They suggest that people will often cooperate in spite of its personal fitness costs because it serves to better the overall condition of the group to which they belong. While there haven’t been any useful predictions that appear to have fallen out of such a model, there are those who are fairly certain it can at least account for some known, but ostensibly strange findings.

That is a rather strange finding you got there. Thanks, Goodwill.

One human trait purported to require a group selection explanation is altruistic punishment and cooperation, especially in one-shot anonymous economic games. The basic logic goes as follows: in a prisoner’s dilemma game, so long as that game is a non-repeated event, there is really only one strategy, and that’s defection. This is because if you defect when your partner defects, you’re better off than if you cooperated; if you partner cooperated, on the other hand, you’re still better off if you defect. Economists might thus call the strategy of “always defect” to be a “rational” one. Further, punishing a defector in such conditions is similarly considered irrational behavior, as it only results in a lower payment for the punisher than they would have otherwise had. As we know from decades of research using these games, however, people don’t always behave “rationally”: sometimes they’ll cooperate with other people they’re playing with, and sometimes they’ll give up some of their own payment in order to punish someone who has either wronged them or, more importantly, wronged stranger. This pattern of behavior – paying to be nice to people who are nice, and paying to punish those who are not – has been dubbed “strong reciprocity”. (Fehr, Fischbacher, & Gachter, 2002)

The general raison d’etre of strong reciprocity seems to be that groups of people which had lots of individuals playing that strategy managed to out-compete other groups of people without them. Even though strong reciprocity is costly on the individual level, the society at large reaps larger overall benefits, as cooperation has the highest overall payoff, relative to any kind of defection. Strong reciprocity, then, helps to force cooperation by altering the costs and benefits to cooperation and defection on the individual level. There is a certain kind of unfairness inherent in this argument, though; a conceptual hypocrisy that can be summed up by the ever-popular phrase, “having one’s cake and eating it too”. To consider why, we need to understand the reason people engage in punishment in the first place. The likely, possibly-obvious candidate explanation just advanced is that punishment serves a deterrence function: by inflicting costs on those who engage in the punished behavior, those who engage in the behavior fail to benefit from it and thus stop behaving in that manner. This function, however, rests on a seemingly innocuous assumption: actors estimate the costs and benefits to acting, and only act when the expected benefits are sufficiently large, relative to the costs.

The conceptual hypocrisy is that this kind of cost-benefit estimation is something that strong reciprocators are thought to not to engage in. Specifically, they are punishing and cooperating regardless of the personal costs involved. We might say that a strong reciprocator’s behavior is inflexible with respect to their own payments. This example is a bit like playing the game of “chicken”, where two cars face each other from a distance and start driving at one another in a straight line. The first drive to turn away loses the match. However, if both cars continue on their path, the end result is a much greater cost to both drivers than is suffered if either one turns. If a player in this game was to adopt an inflexible strategy, then, by doing something like disabling their car’s ability to steer, they can force the other player to make a certain choice. Faced with a driver who cannot turn, you really only have one choice to make: continue going straight and suffer a huge cost, or turn and suffer a smaller one. If you’re a “rational” being, then, you can be beaten by an “irrational” strategy.

Flawless victory. Fatality.

So what would be the outcome if other individuals started playing the ever-present “always defect” strategy in a similarly inflexible fashion? We’ll call those people “strong defectors” for the sake of contrast. No matter what their partner does in these interactions, the strong defectors will always play defect, regardless of the personal costs and benefits. By doing so, these strong defectors might manage to place themselves beyond the reach of punishment from strong reciprocators. Why? Well, any amount of costly punishment directed towards a strong defector would be a net fitness loss from the group’s perspective, as costly punishment is a fitness-reducing behavior: it reduces the fitness of the person engaging in it (in the form of whatever cost they suffer to deliver the punishment) and it reduces the fitness of the target of the punishment. Further, the costs to punishing the defectors could have been directed towards benefiting other people instead – which are net fitness gains for the group – so there are opportunity costs to engaging in punishment as well. These fitness costs would need to be made up for elsewhere, from the group selection perspective.

The problem is that, because the strong defectors are playing an inflexible strategy, the costs cannot be made up for elsewhere; no behavioral change can be affected. Extending this game of chicken analogy to the group level, let’s say that turning away is the “cooperative” option, and dilemmas like these were at least fairly regular. They might not have involved cars, but they did involve a similar kind of payoff matrix: there’s only one benefit available, but there are potential costs in attempting to achieve it. Keeping in line with the metaphor, it would be in the interests of the larger population if no one crashed. It follows that between-group selective pressures favor turning every time, since the costs are guaranteed to be smaller for the wider population, but the sum of the benefits don’t change; only who achieves them does. In order to force the cooperative option, a strong reciprocator might disable their ability to turn so as it alters the cost and benefits to others.

The strong reciprocators shouldn’t be expected to be unaffected by costs and benefits, however; they ought to be affected by such considerations, just on the group level, rather than the individual one. Their strategy should be just as “rational” as any others, just with regard to a different variable. Accordingly, it can be beaten by other seemingly irrational strategies – like strong defection – that can’t be affected by the threats of costs. Strong defectors which refuse to turn will either force a behavioral change in the strong reciprocators or result in many serious crashes. In either case, the strong reciprocator strategy doesn’t seem to lead to benefits in that regard.

Now perhaps this example sounds a bit flawed. Specifically, one might wonder how appreciable portions of the population might come to develop an inflexible “always defect” strategy in the first place. This is because the strategy appears to be costly to maintain at times: there are benefits to cooperation and being able to alter one’s behavior in response to costs imposed through punishment, and people would be expected to be selected to achieve and avoid them, respectively. On top of that, there is also the distinct concern that repeated attempts at defection or exploitation can result in punishment severe enough to kill the defector. In other words, it seems that there are certain contexts in which strong defectors would be at a selective disadvantage, becoming less prevalent in the population over time. Indeed, such a criticism would be very reasonable, and that’s precisely the because the always defect population behaves without regard to their personal payoff. Of course, such a criticism applies in just as much force to the strong reciprocators, and that’s the entire point: using a limited budget to affect the lives of others regardless of its effects on you isn’t the best way to make the most money.

The interest on “making it rain” doesn’t compete with an IRA.

The idea of strong defectors seems perverse precisely because they act without regard to what we might consider their own rational interests. Were we to replace “rational” with “fitness”, the evolutionary disadvantage to a strategy that functions as if behaving in such a manner seems remarkably clear. The point is that the idea of a strong reciprocator type of strategy should be just as perverse. Those who attempt to put forth a strong reciprocator type of strategy as plausible account for cooperation and punishment attempt to create a context that allows them to have their irrational-agent cake and eat it as well: strong reciprocators need not behave within their fitness interests, but all the other agents are expected to. This assumption needs to be at least implicit within the models, or else they make no sense. They don’t seem to make very much sense in general, though, so perhaps that assumption is the least of their problems.

References: Fehr, E., Fischbacher, U., & Gachter, S. (2002). Strong reciprocity, human cooperation, and the enforcement of social norms. Human Nature, 13, 1-25 DOI: 10.1007/s12110-002-1012-7

The “Side-Effect Effect” And Curious Language

You keep using that word. I do not think it means what you think it means

That now famous quote was uttered by the character Inigo Montoya in the movie, The Princess Bride. In recent years, the phrase has been co-opted for its apparent usefulness in mocking people during online debates. While I enjoy a good internet argument as much as the next person, I do try to stay out of them these days due to time constraints, though I did used to be something of a chronic debater. (As an aside, I started this blog, at least in part, for reasons owing to balancing my enjoyment of debates with those time constraints. It’s worked pretty well so far). As any seasoned internet (or non-internet) debater can tell you, one of the underlying reasons debates tend to go on so long is that people often argue past one another. While there are many factors that explain why people do so, the one I would like to highlight today is semantic in nature: definitional obscurity. There are instances where people will use different words to allude to the same concept or use the same word to allude to different concepts. Needless to say, this makes agreement hard to reach.

But what’s the point of arguing if it means we’ll ever agree on something?

This brings us to the question of intentions. Defined by various dictionaries, intentions are aims, plans, or goals. By contrast, the definition of a side effect is just the opposite: an unintended outcome. Were these terms used consistently, then, one could never say a side effect was intended; foreseen, maybe, but not intended. Consistency, however, is rarely humanity’s strongest suit – as we ought expect it not to be – since consistency does not necessarily translate into “useful”: there are many cases in which I would be better off if I could both do X and stop other people from doing X (fill in ‘X’ however you see fit: stealing, having affairs, murder, etc). So what about intentions? There are two facts about intentions which make them prime candidates for expected inconsistency: (1) intentionally-committed acts tend to receive a greater degree of moral condemnation than unintentional ones, and (2) intentions are not readily observable, but rather need to be inferred.

This means that if you want to stop someone else from doing X, it is in your best interests to convince others if someone did X, that X was intended, so as to make punishment less costly and more effective (as more people might be interested in punishing, sharing the costs). Conversely, if you committed X, it is in your best interests to convince others that you did not intend X. It is on the former aspect – condemnation of others – that we’ll focus on here. In the now classic study by Knobe (2003), 39 people were given the following story:

The vice-president of a company went to the chairman of the board and said, “We are thinking of starting a new program. It will help us increase profits, but it will also harm the environment.” The chairman of the board answered, “I don’t care at all about harming the environment. I just want to make as much profit as I can. Let’s start the new program.’”They started the new program. Sure enough, the environment was harmed.

When asked whether the chairman intentionally harmed the environment, 82% of the participants agreed that he had. However, when the word “harm” was replaced with “help”, now 77% of the subjects said that the benefits to environment were unintentional (this effect was also replicated using a military context instead). Now, strictly speaking, the only stated intention the chairman had was to make money; whether that harmed or helped the environment should to be irrelevant, as both effects would side effects of that primary intention.Yet that’s not how people rated them.

Related to the point about moral condemnation, it was also found that participants said the chairman who brought about the negative side effect deserved substantially more punishment (4.8 on a 0 to 6 scale) than the chairman who brought about the positive impact deserved praise (1.4), and those ratings correlated pretty well with the extent to which the participants thought the chairman has brought about the effect intentionally. This tendency to asymmetrically see intentions behind negative, but not positive, side effects was dubbed “the side-effect effect”. There exists the possibility, however, that this label is actually not entirely accurate. Specifically, it might not be exclusive to side effects of actions; it might also hold for the means by which an effect is achieved as well. You know; the things that were actually intended.

Just like how this was probably planned by some evil corporation.

The paper that raised this possibility (Cova & Naar, 2012) began by replicating Knobe’s basic effect with different contexts (unintended targets being killed by a terrorist bombing as the negative side effect, and an orphanage expanding due to the terrorist bombing as the positive side effect). Again, negative side effects were seen as more intentional and more blameworthy than positive side effects were rated as intentional and praiseworthy. The interesting twist came when participants were asked about the following scenario:

A man named André tells his wife: “My father decided to leave his immense fortune to only one of his children. To be his heir, I must find a way to become his favorite child. But I can’t figure how.” His wife answers: “Your father always hated his neighbors and has declared war to them. You could do something that would really annoy them, even if you don’t care. Andre decides to set fire to the neighbors’ car.

Unsurprisingly, many people here (about 80% of them) said that Andre had intentionally harmed his neighbors. He planned to harm them, because doing so would further another one of his goals (getting money) A similar situation was also presented, however, where instead of burning down the neighbor’s car, Andre donates to a humanitarian-aid society because his father would have liked that. In that case, only 20% of subjects reported that Andre had intended to give money to the charity.

Now that answer is a bit peculiar. Surely, Andre intended to donate the money, even if his reason for doing so involved getting money from his father. While that might not be the most high-minded reason to donate, it ought not make the donating itself any less intentional (though perhaps it seems a bit grudging). Cova & Naar (2012) raise the following alternative explanation: the way the philosophers tend to use the word “intention” is not the only game in town. There are other possible conceptions that people might have of the word based on the context in which it’s found, such as, “something done knowingly for which an agent deserves praise of blame“. Indeed, taking these results at face value, we would need something else beyond the dictionary definitions of intention and side effect, since they don’t seem to be applying here.

This returns us to my initial point about intentions themselves. While this is an empirical matter (albeit a potentially difficult one), there are at least two distinct possibilities: (a) people mean something different by “intention” in moral and nonmoral contexts (we’ll call this the semantic account), or (b) people mean the same thing in both cases, but they do actually perceive it differently (the perceptual account). As I mentioned before, intentions are not the kinds of things which are readily observable, but rather need to be inferred, or perceived. What was not previously mentioned, however, is that it is not as if people only have a single intention at any given time; given the modularity of the mind, and the various goals one might be attempting to achieve, it is perfectly possible, at least conceptually, for people to have a variety of different intentions at once – even ones that pull in opposite directions. We’re all intimately familiar with the sensation of having conflicting intentions when we find ourselves stuck between two appealing, but mutually-exclusive options: a doctor may intend to do no harm, intend to save people’s lives, and find himself in a position where he can’t do both.

Simple solution: do neither.

For whatever it’s worth, of the two options, I favor the perceptual account over the semantic account for the following reason: there doesn’t seem to be a readily-apparent reason for definitions to change strategically, though there are reasons for perceptions to change. Let’s return to the Andre case to see why. One could say that Andre had at least two intentions: get the inheritance, and complete act X required to achieve the inheritance. Depending on whether one wants to praise or condemn Andre for doing X, one might choose to highlight different intentions, though in both cases keeping the definition of intention the same. In the event you want to condemn Andre for setting the car on fire, you can highlight the fact that he intended to do so; if you don’t feel like praising him for his ostensibly charitable donation, you can choose instead to highlight the fact that (you perceive) his primary intention was to get money – not give it. However, the point of that perceptual change would be to convince others that Andre ought to be punished; simply changing the definition of “intention” when talking with others about the matter wouldn’t seem to accomplish that goal quite as well, as it would require the other speaker to share your definition.

References: Cova, F., & Naar, H. (2012). Side-Effect Effect Without Side Effect: Revisiting Knobe’s Asymmetry. Philosophical Psychology, 25, 837-854

Knobe, J. (2003). Intentional Action and Side Effects in Ordinary Language. Analysis, 63, 190-193 DOI: 10.1093/analys/63.3.190

Can Rube Goldberg Help Us Understand Moral Judgments?

Though many people might be unfamiliar with Rube Goldberg, they are often not unfamiliar with Rube Goldberg machines: anyone who has ever seen the commercial for the game “Mouse Trap” is at least passingly familiar with them. Admittedly, that commercial is about two decades old at this point, so maybe a more timely reference is in order:OK Go’s music video for “This too shall pass” is a fine demonstration (or Mythbusters, if that’s more your cup of tea). The general principle behind a Rube Goldberg machine is that it completes an incredibly simple task in an overly-complicated manner. For instance, one might design one of these machines to turn on a light switch, but that end state will only be achieved after 200 intervening steps and hours of tedious setup. While these machines provide a great deal of novelty when they work (and that is a rather large “when”, since there is the possibility of error in each step), there might be a non-obvious lesson they can also teach us concerning our cognitive systems designed for moral condemnation.

  Or maybe they can’t; either way, it’ll be fun to watch and should kill some time.

In the literature on morality, there is this concept known as the doctrine of double effect. The principle states that actions with harmful consequences can be morally acceptable provided a number of conditions are met: (1) the act itself needs to be morally neutral or better, (2) the actor intends to achieve some positive end through acting; not the harmful consequence, (3) the bad effect is not a means to the good effect, and (4) the positive effects outweigh the negative ones sufficiently. While that might all seem rather abstract, two concrete and popular examples can demonstrate the principle easily: the trolley dilemma and the footbridge dilemma. Taking these in order, the trolley problem involves the following scenario:

There is a runaway trolley barreling down the railway tracks. Ahead, on the tracks, there are five people tied up and unable to move. The trolley is headed straight for them. You are standing some distance off in the train yard, next to a lever. If you pull this lever, the trolley will switch to a different set of tracks. Unfortunately, you notice that there is one person on the side track. You have two options: (1) Do nothing, and the trolley kills the five people on the main track. (2) Pull the lever, diverting the trolley onto the side track where it will kill one person.

In this dilemma, most people who have been surveyed (about 90% of them) suggest that it is morally acceptable to pull the lever, diverting the train onto the side track. It also fits the principle of double effect nicely: (1) the act (redirecting the train) is not itself immoral, (2) the actor intends a positive consequence (saving the 5) and not the negative one (1 dies), (3) the bad consequence (the death) is not a means of achieving the outcome, but rather a byproduct of the action (redirecting the train), and (4) the lives saved substantially outweigh the lives lost.

The footbridge dilemma is very similar in setup, but different in a key detail: in the footbridge dilemma, rather than redirecting the train to a sidetrack, a person is pushed in front of it. While the person dies, that causes the train to stop before hitting the 5 hikers, saving their lives. In this case, only about 10% of people say it’s morally acceptable to push the man. We can see how double effect fails in this case: (1) the act (pushing the man) is relatively on the immoral side of things, (2) the death of the person being pushed in intended, and (3) the bad consequence (the man dying) is the means by which the good consequence is achieved; the fact that the positive consequences outweigh the negative ones in terms of lives saved is not enough. But why should this be the case? Why do consequences alone not dictate our actions, and why can factors as simple as redirecting a train versus pushing a person make such tremendous differences in our moral judgments?

As I suggested recently, the answer to both of those questions can be understood through beginning our analysis of morality with an analysis of condemnation. These questions can be rephrased in that light to the following forms: “Why might people wish to morally condemn someone for achieving an outcome that is, on the whole, good?” and, “Why might people be less inclined to condemn certain outcomes, contingent on how they’re brought about?” The answer to the first question is fairly straightforward: I might wish to morally condemn someone because their actions (or failing to morally condemn them) might have some direct costs on me, even if they benefit others. For instance, I might wish to condemn someone for their behavior in the trolley or footbridge problem if it’s my friend dying, rather than a stranger. That some generally morally positive outcome was achieved is irrelevant to me if it was costly from my perspective. Natural selection doesn’t design adaptations for the good of the group, so that the group’s welfare is increased seems besides the point. Of course, a cost is a cost is a cost, so why should it matter to me at all if my friend was killed by being pushed or having the train sent towards him?

“DR. TEDDY! NOOOO!”

Part of that answer depends on what other people are willing to condemn. Trying to punish someone for their actions is not always cheap or easy: there’s always a chance of retaliation by the punished party or their allies. After all, a cost is a cost is a cost to both me and them. This social variable means that attempting to punish others without additional support might be completely ineffective (or at least substantially less effective) at times. Provided that other parties are less likely to punish negative byproducts, relative to negative intended outcomes, this puts pressure on you to attempt and persuade others that the person you want to punish acted with intent, whereas it puts the reverse pressure on the actor; to convince others they did not intend that bad outcome. This brings us back to Rube Goldberg, the footbridge dilemma, and a slight addition to doctrine of double effect.

There are some who argue that the doctrine of double effect isn’t quite complete. Specifically, there is an unappreciated third type of action: one in which a person acts because a negative outcome will obtain, but they do not intend that outcome (what is known as “triple effect”). This distinction is a bit trickier to grasp, so another example will help. Say that we’re again talking about the footbridge dilemma: there is a man standing on the bridge over the tracks with the oncoming train scheduled to hit the 5 hikers. However, we can pull a lever which will drop the man onto the track where he will be hit, thus stopping the train and saving the five. This is basically identical to the standard footbridge problem, and most people would deem it unacceptable to pull the lever. But now let’s consider another case: again, the man is standing on the bridge, but the mechanism that will drop him off the bridge is a light sensor. If light reflects off the train onto the sensor, the bridge will drop the man, he will die, and the 5 will be saved. Seeing the oncoming train, someone, Rube-Goldberg style, shines a spotlight on the train, illuminating it; the illumination hits the sensor, dropping the man onto the track, killing him and saving the five hikers.

There are some (Otsuka, 2008) that argue there is no meaningful difference between these two cases, but in order to make that claim, they need to infer something about the actor’s intentions in both cases, and precisely what one infers affects the subsequent shape of the analysis. Were one to infer that there is really only one problem to be solved – the train that going to kill 5 people – then the intentions of the person pulling the lever to illuminate the train and pulling the lever to drop the man are equivalent and equally condemnable. However, there is another inference one could make in the light case, as there are multiple facets to the problem: the train will both kill 5 and the train isn’t illuminated. If one intends to solve the latter problem (so now there will be an illuminated train about to kill 5 people) one also, as a byproduct of solving that problem, causes both the problem of 5 people getting killed to be solved and the death of man who got dropped onto the track. Now one could argue, as Otsuka (2008) does, that such an example fails because people could not be plausibly motivated to solve the non-illuminated part of the problem, but that seems like largely a matter of perspective. The addition of the light variable introduces, if even to some small degree, plausible deniability capable of shifting the perception of an outcome from intended to byproduct. Someone pulling the lever could have been doing so in order to illuminate the train or to drop the man onto the track, but it’s not entirely unambiguous which is the case.

“Well how was I supposed to know I was doing something dangerous?”

The light case is also a relatively simple one: there are only 3 steps (shine light on train, light opens door, door opening causes man to fall and stop train), and perfect knowledge is assumed (the person shining the light knew this would happen). Changing either or these variables would likely have the effect of altering the blame of the actor: if the actor didn’t know about the light sensor or the man on the footbridge, condemnation would likely decrease; if the action involved 10 steps, rather than 3, this could potentially introduce further plausible deniability, especially if any of those steps involved the actions of other people. It would be in the actor’s best interests to thus deny their knowledge of the outcome, or separate the outcome from their initial action as broadly as possible. Conversely, someone looking to condemn the actor would need to do the reverse.

Now maybe this all sounds terribly abstract, but there are real-life cases to which similar kinds of analysis can apply. Consider cases where a child is bullied at school and later commits suicide. Depending on one’s perspective in these kinds of cases, one might condemn or fail to condemn the bullies for the suicide (though one might still blame them for the bullying); one might also, however, condemn the parents for not being there for the child as they should have, or one might blame no one but the suicide victim themselves. As one thinks about ways in which the suicide could have been prevented, there are countless potential Rube-Goldberg kinds of variables in the casual chain to point to (violent media, the parents, the bullies, the friends, their diet, the suicide victim, the school, etc), the modification of any of which might have prevented the negative outcome. This gives condemners (who may wish to condemn people for initially-unrelated reasons) a wide-array of potential plausible targets. However, each of these potential sources also gives the other sources some way of mitigating and avoiding blame. While such strategic considerations tend to make a mess of normative moral theories, they do provide us the required tools to actually begin to understand morality itself.

References: Otsuka, M. (2008). Double Effect, Triple Effect and the Trolley Problem: Squaring the Circle in Looping Cases. Utilitas, 20, 92-110 DOI: 10.1017/S0953820807002932

Better Fathers Have Smaller Testicles, But…

There is currently an article making the rounds in the popular media (or at least the range of media that I’m exposed to) suggesting that testicular volume is a predictor of paternal investment in children: the larger the testicles, the less nurturing, fatherly behavior we see. I get the nagging sense that stories about genitals tends to get a larger-than-average share of attention (I did end up tracking the article down, after all), and that might have motivated both the crafting and sharing of this study (at least in the media. I can’t speak directly to the author’s intentions, though I can note the two domains often fail to overlap). In any case, more attention does not necessarily mean that people end up with an accurate picture of the research. Indeed, the percentage of people who will – or even can – read the source paper itself is vastly outnumbered by those who will not. So, for whatever it’s worth, here’s a more in-depth look at the flavor of the week research finding.

Our next new flavor will come out at the end of the month…

The paper (Mascaro, Hackett, & Rilling, 2013) begins with a discussion of life history theory. With respect to sexual behavior, life history theory posits that there is a tradeoff between mating effort and parental effort: the energy an organism spends investing in any single offspring is energy not spent in making new ones. Since then name of the game in evolution is maximizing fitness, this tradeoff needs to be resolved, and can be in various ways. Humans, compared to many other species, tend to fall rather heavily on the “investing” side of the scale, pouring immense amounts of time and energy into each highly-dependent offspring. Other species, like Salmon, for instance, invest all their energy into a single bout of mating, producing many offspring, but investing relatively less in each (as dead parents often make poor candidates for sources of potential investment). Life history theory is not just useful for understanding between-species differences though; it is also useful for understanding individual differences within species (as it must be, since the variation in the respective traits between species needed to have come from some initial population without said variance).

Perhaps the most well-known examples are the between-sex differences in life history tradeoffs among mammals, but let’s just stick to humans to make it relatable. When a woman gets pregnant, provided the baby will carried to term, her minimum required investment is approximately 9 months of pregnancy and often several years of breastfeeding, much of which precludes additional reproduction. The metabolic and temporal costs of this endeavor are hard to overstate. By contrast, a male’s minimum obligate investment in the process is a single ejaculate and however long intercourse took. One can immediately see that men tend to have more to gain from investing in mating effort, relative to women, at least from the minimum-investment standpoint. However, not all men have as much potential to achieve those mating-effort gains; some men are more attractive sexual partners, and others will be relatively shut-out of the mating market. If one cannot compete in the mating domain, it might pay to make oneself more appealing in the investment domain where they can more effectively compete. Accordingly, if one tends to attempt the investment strategy (though this need not mean a consciously-chosen plan), it’s plausible their body might follow a similar investment strategy, placing fewer resources into the more mating-orientated aspects of our physiology: specifically, the testicles.

Unsurprisingly, testicular volume appears to be correlated with a number of factors, but most notably sperm production (this especially the case between species, as I’ve written about before). Those men who tend to preferentially pursue a mating strategy (relative to an investment one) have slightly-different adaptive hurdles to overcome, most notably in the insemination and sperm competition arenas. Accordingly, Mascaro, Hackett, & Rilling (2013) predicted that we ought to see a relationship between testes size (representing a form of mating effort) and nurturing offspring (representing a form of parental effort). Enter the current study, where 70 biological fathers who were living with the mother of their children had their testicular volume (n = 55) and testosterone levels (n = 66) assessed. Additionally, reports of their parental behavior were also collected, along with a few other measures. As the title of the paper suggests, there was indeed a negative correlation (-0.29) between reported care-giving and testicle volume. This is the point where the highlighted finding begins to need qualifications, however, due to another pesky little factor: testosterone. Testosterone levels were also found to negatively correlate with reports of care-giving (-0.27), as well as the father’s reported desire to provide care (-0.26). Given that these are correlations, it’s not readily apparent that testicular volume per se would be the metaphorical horse pulling the cart.

Pulling the cart, metaphorically, “all the way“, that is.

Perhaps also unsurprisingly, testicular volume showed what the authors called a “moderate positive correlation” with testosterone levels (0.26, p = 0.06). As an aside, I find it interesting that the authors had, only a few sentences prior, reported an almost identically-sized correlation (r = -0.25, p = 0.06) between testicular volume and desire to invest in children, but there they labeled the correlation as a “strong trend”, rather than a “moderate correlation”. The choice of wording seems peculiar.

In any case, if bigger balls tended to go together with more testosterone, it becomes more difficult to make the case for testicular volume itself to be driving the relationship with parenting behaviors. In order to attempt and solve this problem, Mascaro, Hackett, & Rilling (2013) created a regression model, using testicular volume, testosterone levels, father’s earning, and hours worked as predictors of childcare. In that model, the only significant predictor of childcare was testosterone level.

Removing the “father’s earning” and “number of hours worked” variables from the regression model resulted in a gain in predictive value for testicular volume (though it was still not significant) but, again, it was testosterone that appeared to be having the greater effect. Whether or not it would be defensible to modify the regression model in that particular way in the first place is debatable, as the modification seems to be done in the interest of making testicular volume appear relatively more predictive than it was previously (also, removing those two previous factors resulted in the model accounting for quite a bit less of the variance in fathers’ overall childcare behaviors). Just because the authors had some a priori prediction about testicular volume and not about hours worked or money earned seems like only a mediocre reason for justifying the exclusion of the latter two variables while retaining the former.

There was also some neuroscience included in the study concerning the men looking at pictures of children’s faces and correlating the neural responses with childcare, testicular volume, and testosterone. I’ll preface what I’m about to say with the standard warning: I’m not the world’s foremost expert on neuroscience, so there is a distinct possibility I’m misunderstanding something here. That said, the authors did find a relationship there between testicular volume and neural response to children – a relationship that was apparently not diminished when controlling for testosterone.  It should be noted that, again, unless I’m misunderstanding something, this connection didn’t appear to translate into significant increases in the childcare actually displayed by the males in the study once the effects of testosterone were considered (if it did, it should have shown up in the initial regression models). Then again, I have historically been overly-cautious about inferring much from brain scans, so take from that what you will.

I’ve got my eye on you, imaging technology…

To return to the title of this post, yes, testicular volume appears to have some predictive value in determining parental care, but this value tends to be reduced, often substantially so, once a few other variables are considered. Now I happen to think that the hypotheses derived from life history theory are well thought out in this paper. I imagine I might be inclined to have made such predictions myself. Testicular measures have already given us plenty of useful information about the mating habits of various species, and I would expect there is still value to be gained from considering them. That said, I would also advise some degree of caution in attempting to fit the data to these interesting hypotheses. Using selective phrasing to highlight some trends (the connection between testicular volume and desire to provide childcare) relative to others (the connection between testicular volume and testosterone) because they fit the hypothesis better makes me uneasy. Similarly, dropping variables from a regression model to improve the predictive power of the variable of interest is also troublesome. Perhaps the basic idea might prove more fruitful were it to be expanded to other kinds of men (single men, non-fathers, divorced, etc) but, in any case, I find the research idea quite an interesting step, and I look forward to hearing a lot more about our balls in the future.

References: Mascaro, J., Hackett, P., & Rilling, J. (2013). Testicular volume is inversely correlated with nurturing-related brain activity in human fathers. Proceedings of the National Academy of Sciences of the United States of America.