More About Dunning-Kruger

Several years back I wrote a post about the Dunning-Kruger effect. At the time I was still getting my metaphorical sea legs for writing and, as a result, I don’t think the post turned out as well as it could have. In the interests of holding myself to a higher standard, today I decided to revisit the topic both in the interests of improving upon the original post and generating a future reference for me (and hopefully you) when discussing it with others. This is something of a time-saver for me because people talk about the effect frequently despite, ironically, not really understanding it too deeply.

First things first, what is the Dunning-Kruger effect? As you’ll find summarized just about everywhere, it refers to the idea that people who are below-average performers in some domains – like logical reasoning or humor – will tend to judge their performance as being above average. In other words, people are inaccurate at judging how well their skills stack up to their peers or, in some cases, to some objective standard. Moreover, this effect gets larger the more unskilled one happens to be. Not only are the worst performers worse at the task then others, but they’re also worse at understanding they’re bad at the task. This effect was said to obtain because people need to know what good performance is before they can accurately assess their own. So, because below-average performers don’t understand how to perform a task correctly, they also lack the skills to judge their performance accurately, relative to others.

Now available at Ben & Jerry’s: Two Scoops of Failure

As mentioned in my initial post (and by Kruger & Dunning themselves), this type of effect shouldn’t extend to domains where production and judging skills can be uncoupled. Just because you can’t hit a note to save your life on karaoke night, that doesn’t mean you will be unable to figure out which other singers are bad. This effect should also be primarily limited to domains in which the feedback you receive isn’t objective or standards for performance are clear. If you’re asked to re-assemble a car engine, for instance, unskilled people will quickly realize they cannot do this unassisted. That said, to highlight the reason why the original explanation for this finding doesn’t quite work – not even for the domains that were studied in the original paper – I wanted to examine a rather important graph of the effect from Kruger & Dunning (1999) with respect to their humor study:

My crudely-added red arrows demonstrate the issue. On the left-hand side, we see what people refer to as the Dunning-Kruger effect: those who were the worst performers in the humor realm were also the most inaccurate in judging their own performance, compared to others. They were unskilled and unaware of it. However, the right-hand side betrays the real issue that caught my eye: the best performers were also inaccurate. The pattern you should expect, according to the original explanation, is that the higher one’s performance, the more accurately they estimate their relative standings, but what we see is that the best performers aren’t quite as accurate as those who are only modestly above average. At this point, some of you might be thinking that this point I’m raising is basically a non-issue because the best performers were still more accurate than the worst performers, and the right-hand inaccuracy I’m highlighting isn’t appreciable. Let me try to persuade you otherwise.

Assume for a moment that people were just guessing as to how they performed, relative to others. Because having a good sense of humor is a socially-desirable skill, people all tend to rate themselves “modestly above-average” in the domain to try and persuade others they actually are funny (and because, in that moment, there are no consequences to being wrong). Despite these just being guesses, those who actually are modestly above-average will appear to be more accurate in their self-assessment than those who are in the bottom half of the population; that accuracy just doesn’t have anything to do with their true level of insight into their abilities (referred to as their meta-cognitive skills). Likewise, those who are more than modestly above average (i.e. are underestimating their skills) will be less accurate as well; there will just be fewer of them than those who overestimated their abilities.

Considering the findings of Kruger & Dunning (1999) on the whole, the above scenario I just outlined doesn’t reflect reality perfectly. There was a positive correlation between people’s performance and their rating of their relative standing (r = .39), but, for the most part, people’s judgments of their own ability (the black line) appear relatively uniform. Then again, if you consider their results in studies two and three of that same paper (logical reasoning and grammar), the correlations between performance and judgments of performance relative to others drop to a low of r = .05 ranging up to a peak of r = .19, which was statistically significant. People’s judgments of their relative performance were almost flat across several such tasks. To the extent these meta-cognitive judgments of performance use actual performance as an input for determining relative standings, it’s clearly not the major factor for either low or high performers.

They all shop at the same cognitive store

Indeed, actual performance shouldn’t be expected to be the primary input for these meta-cognitive systems (the ones that generate relative judgments of performance) for two reasons. The first of these is the original performance explanation posited by Kruger & Dunning (1999): if the system generating the performance doesn’t have access to the “correct” answer, then it would seem particularly strange that another system – the meta-cognitive one – would have access to the correct answer, but only use it to judge performance, rather than to help generate it.

To put that in a quick memory example, say you were experiencing a tip-of-the-tongue state, where you are sure you know the right answer to a question, but you can’t quite recall it.  In this instance, we have a long-term memory system generating performance (trying to recall an answer) and a meta-cognitive system generating confidence judgments (the tip-of-the-tongue state). If the meta-cognitive system had access to the correct answer, it should just share it with the long-term memory system, rather than using the correct answer to tell the other system to keep looking for the correct answer. The latter path is clearly inefficient and redundant. Instead, the meta-cognitive system should use some cues other than direct access to information in generating its judgments.

The second reason actual performance (relative to others) wouldn’t be an input for these meta-cognitive systems is that people don’t have reliable and accurate access to population-level data. If you’re asking people how funny they are relative to everyone else, they might have some sense for it (how funny are you, relative to some particular people you know), but they certainly don’t have access to how funny everyone is because they don’t know everyone; they don’t even know most people. If you don’t have the relevant information, then it should go without saying that you cannot use it to help inform your responses.

Better start meeting more people to do better in the next experiment

So if these meta-cognitive systems are using inputs other than accurate information in generating their judgments about how we stack up to others, what would those inputs be? One possible input would be task difficulty, not in the sense of how hard the task objectively is for a person to complete, but rather in terms of how difficult a task feels. This means that factors like how quickly an answer can be called to mind likely play a role in these judgments, even if the answer itself is wrong. If judging the humor value of a joke feels easy, people might be inclined to say they are above average in that domain, even if they aren’t.

This yields an important prediction: if you provide people with tasks that feel difficult, you should see them largely begin to guess they are below-average in that domain. If everyone is effectively guessing that they are below average (regardless of their actual performance), this means that those who perform the best will be the most inaccurate in judging their relative ability. In tasks that feel easy, people might be unskilled and unaware; for those that feel hard, people might be skilled but still unaware.

This is precisely what Burson, Larrick, & Klayman (2006) tested, across three studies. While I won’t go into details about the specifics of all their studies (this is already getting long), I will recreate a graph from one of their three studies that captures their overall pattern of results pretty well:

As we can see, when the domains being tested became harder, it was now the case that the worst performers were more accurate in estimating their percentile rank than the best ones. On tasks of moderate difficulty, the best and worst performers were equally calibrated. However, it doesn’t seem that this accuracy is primarily due to their real insights into their performance; it just so happened to be the case that their guesses landed closer to the truth. When people think, “this task is hard,” they all seem to estimate their performance as being modestly below average; when the task feels easy instead, they all seem to estimate their performance as being modestly above average. The extent to which that matches reality is largely due to chance, relative to true insight.

Worth noting is that when you ask people to make different kinds of judgments, there is (or at least can be) a modest average advantage for top performers, relative to bottom ones. Specifically, when you ask people to judge their absolute performance (i.e., how many of these questions did you get right?) and compare that to their actual performance, the best performers sometimes had a better grasp on that estimate than the worst ones, but the size of that advantage varied depending on the nature of the task and wasn’t entirely consistent. Averaged across the studies reported by Burson et al (2006), top-half performers displayed a better correlation between their perceived and actual absolute performance (r = .45), relative to bottom performers (r = .05). The corresponding correlations for actual and relative percentiles were in the same direction, but lower (rs = .23 and .03, respectively). While there might be some truth to the idea that the best performers are more sensitive to their relative rank, the bulk of the miscalibration seems to be driven by other factors.

Driving still feels easy, so I’m still above-average at it

These judgments of one’s relative standing compared to others appear rather difficult for people to get accurate. As they should, really; for the most part we lack access to the relevant information/feedback and there are possible social-desirability issues to contend with, coupled with a lack on consequences for being wrong. This is basically a perfect storm for inaccuracy. Perhaps worth noting is that the correlation between one’s relative performance and their actual performance was pretty close for one domain in particular in Burson et al (2006): knowledge of pop music trivia (the graph of which can seen here). As pop music is the kind of thing people have more experience learning and talking about with others, it is a good candidate for a case when these judgments might be more accurate because people do have more access to the relevant information.

The important point to take away from this research is that people don’t appear to be particularly good at judging their abilities relative to others, and this obtains regardless of whether the judges are themselves skilled or unskilled. At least for most of the contexts studied, anyway; it’s perfectly plausible that people – again, skilled and unskilled – will be better able to judge their relative (and absolute) performance when they have experience with a domain in question and have received meaningful feedback on their performance. This is why people sometimes drop out of a major or job after receiving consistent negative feedback, opting to believe they aren’t as cut out for it instead of persisting to believe they are actually above average in that context. You will likely see the least miscalibration for domains where people’s judgments of their ability need to hit reality and there are consequences for being wrong.

References: Burson, K., Larrick, R., & Klayman, J. (2006). Skilled or unskilled, but still unaware of it: How perceptions of difficulty drive miscalibration in relative comparisons. Journal of Personality & Social Psychology, 90, 60-77.

Kruger, J. & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality & Social Psychology, 77, 1121-1134.

Classic Theory In Evolution: The Big Four Questions

Explanations for things appear to be a truly vexing issue for people in many instances. Admittedly, that might sound a little strange; after all, we seem to explain things all the times without much apparent effort. We could consider a number of examples for explanations of behavior: people punch walls because they’re angry; people have sex because it feels good; people eat certain foods because they prefer those flavors, and so on. Explanations like these seem to come automatically to us; one might even say naturally. The trouble that people appear to have with explanations is with respect to the following issue: there are multiple, distinct, and complimentary ways of explaining the same thing. Now by that I don’t mean that, for instance, someone punched a wall because they were angry and drunk, but rather that there are qualitatively different ways to explain the same thing. For instance, if you ask me what an object is, I could tell you it’s a metallic box that appears to run on electricity and contains a heating element that can be adjusted via knobs; I could also tell you it’s a toaster. The former explanation tells you about various features of the object, while the latter tells you (roughly) what it functions to do (or, at least, what it was initially designed to do).

…And might have saved you that trip to the ER.

More precisely, the two issues people seem to run into when it comes to these different kinds of explanations is that they (a) don’t view these explanations as complimentary, but rather as mutually-exclusive, or (b) don’t realize that there are distinct classes of explanations that require different considerations from one another. It is on the second point that I want to focus today. Let’s start by considering the questions found in the first paragraph in what is perhaps their most basic form: “what causes that behavior?” or, alternatively, “what preceding events contributed to the occurrence of the behavior?” We could use as our example the man punching the wall to guide us through the different classes of explanations, of which there are 4 generally-agreed upon categories (Tinbergen, 1963).

The first two of these classes of explanations can be considered proximate – or immediate – causes of the behavior. The standard explanation many people might give for why the man punched the wall would be to reference the aforementioned anger. This would correspond to Tinbergen’s (1963) category of causation which, roughly, can be captured by considerations of how the cognitive systems which are responsible for generating the emotional outputs of anger and corresponding wall-punching work on a mechanical level: what inputs do they use, how are these inputs operated upon to generate outputs, what outputs are generated, what structures in the brain become activated, and so on. It is on this proximate level of causation that most psychological research focuses, and with good reason: the hypothesized proximate causes for behaviors are generally the most open to direct observation. Now that’s certainly not to say that they are easy to observe and distinguish in practice (as we need to determine what cognitive or behavioral units we’re talking about, and how they might be distinct from others), but the potential is there.

The second type of explanation one might offer is also a proximate-type of explanation: an ontological explanation. Ontology refers to changes to the underlying proximate mechanisms that takes place during the course of development, growth, and aging of an organism. Tinbergen (1963) is explicit in what this does not refer to: behavioral changes that correspond to environmental changes. For instance, a predator might evidence feeding behavior in the presence of prey, but not evidence that behavior in absence of prey. This is not good evidence that anything has changed in the underlying mechanisms that generate the behavior in question; it’s more likely that they exist in the form they did moments prior, but now have been provided with novel inputs. More specifically, then, ontology refers, roughly, to considerations of what internal or external inputs are responsible for shaping the underlying mechanisms as they develop (i.e. how is the mechanism shaped as you grow from a single cell into an adult organism). For instance, if you raise certain organisms in total darkness, parts of their eyes may fail to process visual information later in life; light, then, is a necessary developmental input for portions of the visual system. To continue on with the wall-punching example, ontological explanations for why the man punched the wall would reference what inputs are responsible for the development of the underlying mechanisms that would produce the eventual behavior.

Like their father’s fear of commitment…

The next two classes of explanations refer to ultimate – or distal – causal explanations. The first of these is what Tinbergen calls evolution, though it could be more accurately referred to as a phylogenetic explanation. Species tend to resemble each other to varying degrees because of shared ancestry. Accordingly, the presence of certain traits and mechanisms can be explained by homology (common descent). The more recently two species diverged from one another in their evolutionary history, the more traits we might expect the two to share in common. In other words, all the great apes might have eyes because they all share a common ancestor who had eyes, rather than because they all independently evolved the trait. Continuing on with our example, the act of wall-punching might be explained phylogenetically by noting that the cognitive mechanisms we possess related to, say, aggression, are to some degree shared with a variety of species.

Finally, this brings us to my personal favorite: survival value. Survival value explanations for traits involve (necessarily-speculative, but perfectly testable) considerations about what evolutionary function a given trait might have (i.e. what reproductively-relevant problem, if any, is “solved” by the mechanism in question). Considerations of function help inform some of the “why” questions of the proximate levels, such as “why are these particular inputs used by the mechanism?”, “why do these mechanisms generate the output they do?”, or “why does this trait develop in the manner that it does?”. To return to the punching example, we might say that the man punched the wall because aggressive responses to particular frustrations might have solved some adaptive problem (like convincing others to give you a needed resource rather than face the costs of your aggression). Considerations of function also manage to inform the evolution, or phylogeny, level, allowing us to answer questions along the lines of, “why was this trait maintained in certain species but not others?”. As another for instance, even if cave-dwelling and non-cave dwelling species share a common ancestor that had working eyes, that’s no guarantee that functional eyes will persist in both populations. Homology might explain why the cave-dweller develops non-functional eyes, but it would not itself explain why those eyes don’t work. Similarly, noting that people punch walls when they are angry alone does not explain why we do so.

All four types of explanations answer the question “what causes this behavior?”, but in distinct ways. This distinction between questions of function and questions of causation, ontogeny, and phylogeny, for instance, can be summed up quite well by a quote from Tinbergen (1963):

No physiologist applying the term “eye” to a vertebrate lens eye as well as a compound Arthropod eye is in danger of assuming that the mechanism of the two is the same; he just knows that the word “eye” characterizes achievement, and nothing more.

Using the word “eye” to refer to a functional outcome of a mechanism (processing particular classes of light-related information) allows us to speak of the “eyes” of different species, despite them making use of different proximate mechanisms and cues, developing in unique fashions over the span of an organism’s life, and having distinct evolutionary histories. If the functional level of analysis was not distinct, in some sense, from analyzes concerning development, proximate functioning, and evolutionary history, then we would not be able to even discuss these different types of “eyes” as being types of the same underlying thing; we would fail to recognize a rather useful similarity.

“I’m going to need about 10,000 contact lens”

To get a complete (for lack of a better word) understanding of a trait, all four of these questions need to be considered jointly. Thankfully, each level of analysis can, in some ways, help inform the other levels: understanding the ultimate function of a trait can help inform research into how that trait functions proximately; homologous traits might well serve similar functions in different species; what variables a trait is sensitive towards during development might inform us as to its function, and so on. That said, each of these levels of analysis remains distinct, and one can potentially speculate about the function of a trait without knowing much about how it develops, just as one could research the proximate mechanisms of a trait without knowing much about its evolutionary history.

Unfortunately, there has been and, sadly, continues to be, some hostility and misunderstandings with respect to certain levels of analyzes. Tinbergen (1963) had this to say:

It was a reaction against the habit of making uncritical guesses about the survival value, the function, of life processes and structures. This reaction, of course healthy in itself, did not (as one might expect) result in an attempt to improve methods of studying survival value; rather it deteriorated into lack of interest in the problem – one of the most deplorable things that can happen in science. Worse, it even developed into an attitude of intolerance: even wondering about survival value was consider unscientific

That these same kinds of criticisms continue to exist over 50 years later (and they weren’t novel when Tinbergen was writing either) might suggest that some deeper, psychological issue exists surrounding our understanding of explanations. Ironically enough, the proximate functioning of the mechanisms that generate these criticisms might even give us some insight into their ultimate function. Then again, we don’t want to just end up telling stories and making assumptions about why traits work, do we?

References: Tinbergen, N. (1963). On aims and methods of Ethology. Zeitschrift für Tierpsychologie, 20, 410-433.

Why Parents Affect Children Less Than Many People Assume

Despite what a small handful of detractors have had to say, inclusive fitness theory has proved to be one of most valuable ideas we have for understanding much of the altruism we observe in both human and non-human species. The basic logic of inclusive fitness theory is simple: genes can increase their reproductive fitness by benefiting other bodies that contain copies of them. So, since you happen to share 50% of your genes in common by descent with a full sibling, you can, to some extent, increase your own reproductive fitness by increasing theirs. This logic is captured by the deceptively-tiny formula of rb > c. In English, rather than math, the formula states that altruism will be favored so long as the benefit delivered to the receiver, discounted by the degree of relatedness between the two, is greater than the cost to the giver. To use the sibling example again, altruism would be favored by selection if the the benefit you provided to a full sibling increased their reproductive success by twice as much (or more) than it cost you to give even if there was zero reciprocation.

“You scratch my back, and then you scratch my back again”

While this equation highlights why a lot of “good/nice” behaviors are observed – like childcare – there’s a darker side to this equation as well. By dividing each side of the inclusive fitness equation by r, you get this: b > c/r. What this new equation highlights is the selfish nature of these interactions: relatives can be selected to benefit themselves by inflicting costs on their kin. In the case of full siblings, I should be expected to value my benefiting twice as much, relative to theirs; for half siblings, I should value myself four-times as much, and so on. Let’s stick to full-siblings for now, just to stay consistent. Each sibling within a family should, all else being equal, be expected to value itself twice as much as they value any other sibling. The parents of these siblings, however, see things very differently: from the perspective of the parent, each of these siblings is equally related to them, so, in theory, they should value each of these offspring equally (again, all else being equal. All else is almost never equal, but let’s assume it is to keep the math easy).

This means that parents should prefer that their children act in a particular way: specifically, parents should prefer their children to help each other when the benefit to one outweighs the cost to the other, or b > c. The children, on the other hand, should only wish to behave that way when the benefit to their sibling is twice the cost of themselves, or 2b > c. This yields the following conclusion: how parents would like their children to behave does not necessarily correspond to what is in the child’s best fitness interests. Parents hoping to maximize their own fitness have different best interests from the children hoping to maximize theirs. Children who behave as their parents would prefer would be at a reproductive disadvantage, then, relative to children who were resistant to such parental expectations. This insight was formalized by Trivers (1974) when he wrote:

  “…an important feature of the argument presented here is that offspring cannot rely on parents for disinterested guidance. One expects the offspring to be pre-programmed to resist some parental teachings while being open to other forms. This is particularly true, as argued below, for parental teachings that affects the altruistic and egoistic tendencies of the offspring.” (p. 258)

While parents might feel as if they only acting in the best interests of their children, the logic of inclusive fitness suggests strongly that this feeling might represent an attempt at manipulating others, rather than a statement of fact. To avoid the risk of sounding one-sided, this argument cuts in the other direction as well: children might experience their parent’s treatment of them as being less-fair than it actually is, as each child would like to receive twice the investment that parents should be willing to give naturally. The take-home message of this point, however, is simply that children who were readily molded by their parents should be expected to have reproduced those tendencies less, relative to children who were not so affected. In some regards, children should be expected to actively disregard what their parents want for them.

“My parents want me to brush my teeth. They’re such fascists sometimes.”

There are other reasons to expect that parents should not tend to leave lasting impressions on their children’s eventual personalities. One of those very good reasons also has to do with the inclusive fitness logic laid out initially: because parents tend to be 50% genetically related to their children, parents should be expected to invest in their children fairly heavily, relative to non-children at least. The corollary to this idea is that non-parents of the child should be expected to treat them substantially different than their parents do. This means that a child should be relatively unable to learn what counts as appropriate behavior towards others more generally from their interactions with their parents. Just because a proud parent has hung their child’s scribbled artwork on the household refrigerator, it doesn’t mean that anyone else will come to think of the child as a great artist. A relationship with your parents is different than a relationship with your friends which is different from a sexual relationship in a great many ways. Even within these broad classes of relationships, you might behave differently with one friend than you do with another.

We should expect our behavior around these different individuals to be context-specific. What you learn about one relationship might not readily transfer to any other. Though a child might be unable to physically dominate their parents, they might be able to dominate their peers; some jokes might be appropriate amongst friends, but not with your boss. Though some of what you learn about how to behave around your parents might transfer to other situations (such as the language you speak, if your parents happen to speakers of the native tongue), it also may not. When it does not transfer, we should expect children to discard what they learned about how to behave around their parents in favor of more context-appropriate behaviors (indeed, when children find their parents speak a different language than their peers, the child will predominately learn to speak as their peers do; not their parents). While a parent’s behavior should be expected to influence how that child behaves around that parent, we should not necessarily expect it to influence the child’s behavior around anyone else.

It should come as little surprise, then, that being raised by the same parents doesn’t actually tend to make children any more similar with respect to their personality than being raised by different ones. Tellegan et al (1988) compared 44 identical twin (MZ) pairs raised apart with 217 identical twins reared together, along with 27 fraternal twins (DZ) reared apart and 114 reared together. In terms of their personality measures, the MZ twins were far more alike than the DZ twins,  as one would expect from their shared genetics. When it came to the personality measures, however, MZ twins reared together were more highly correlated on 7 of the measures, while those reared apart were more highly correlated on 6 of them. In terms of the DZ twins, those reared together were higher on 9 of the variables, whereas those reared apart were higher on the remaining 5. The size of these differences when they did exist was often exceedingly small, typically amounting to a correlation difference of about 0.1 between the pairs, or 1% of the variance.

Pick the one you want to keep. I’d recommend the cuter one.

Even if twins reared together ended up being substantially more similar than twins reared apart – which they didn’t – this would still not demonstrate that parenting was the cause of that similarity. After all, twins reared together tend to share more than their parents; they also tend to share various aspects of their wider social life, such as extended families, peer groups, and other social settings. There are good empirical and theoretical reasons for thinking that parents have less of a lasting effect on their children than many often suppose. That’s not to say that parents don’t have any effects on their children, mind you; just that the effects that they have ought to be largely limited to their particular relationship with the child in question, barring the infliction of any serious injuries or other such issues that will transfer from one context to another. Parents can certainly make their children more or less happy when they’re in each others presence, but so can friends and more intimate partners. In terms of shaping their children’s later personality, it truly takes a village.

References: Tellegen et al. (1988). Personality similarity in twins reared apart and together. Journal of Personality and Social Psychology, 54, 1031-1039.

Trivers, R. (1974). Parent-Offspring conflict. American Zoologist, 14, 249-264.

Classic Research In Evolutionary Psychology: Learning

Let’s say I were to give you a problem to solve: I want you to design a tool that is good at cutting. Despite the apparent generality of the function, this is actually a pretty vague request. For instance, one might want to know more about the material to be cut: a sword might work if your job is cutting some kind human flesh, but it might also be unwieldy to keep around the kitchen for preparing dinner (I’m also not entirely sure they’re dishwasher-safe, provided you managed to fit a katana into your machine in the first place). So let’s narrow the request down to some kind of kitchen utensil. Even that request, however, is a bit vague, as evidenced by Wikipedia naming about a dozen different kinds of utensil-style knives (and about 51 different kinds of knives overall). That list doesn’t even manage to capture other kinds of cutting-related kitchen utensils, like egg-slicers, mandolines, peelers, and graters. Why do we see so much variety, even in the kitchen, and why can’t one simple knife be good enough? Simple: when different tasks have non-overlapping sets of best design solutions, functional specificity tends to yield efficiency in one realm, but not in another.

“You have my bow! And my axe! And my sword-themed skillet!”.

The same basic logic has been applied to the design features of living organisms as well, including aspects of our cognition as I argued in the last post: the part of the mind that functions to logically reason about cheaters in the social environment does not appear to be able logically reason with similar ease about other, even closely-related topics. Today, we’re going to expand on that idea, but shift our focus towards the realm of learning. Generally speaking, learning can be conceived of as some change to an organism’s preexisting cognitive structure due to some experience (typically unrelated to physical trauma). As with most things related to biological changes, however, random alterations are unlikely to result in improvement; to modify a Richard Dawkins quote ever so slightly, “However many ways there may be of [learning something useful], it is certain that there are vastly more ways of [learning something that isn't". For this reason, along with some personal experience, no sane academic has ever suggested that our learning occurs randomly. Learning needs to be a highly-structured process in order to be of any use.

Precisely how structured "highly-structured" entails is a bit of a sticky issue, though. There are undoubtedly still some who would suggest that some general type of reinforcement-style learning might be good enough for learning all sorts of neat and useful things. It's a simple rule: if [action] is followed by [reward], then increase the probability of [action]; if [action] is followed by [punishment], then decrease the probability of [action]. There are a number of problems with such a simple rule, and they return to our knife example: the learning rule itself is under-specified for the demands of the various learning problems organisms face. Let’s begin with an analysis of what is known as conditioned taste aversion. Organisms, especially omnivorous ones, often need to learn about what things in their environment are safe to eat and which are toxic and to be avoided. One problem in learning about which are potential foods are toxic is that the action (eating) is often divorced from the outcome (sickness) by a span of minutes to hours, and plenty of intervening actions take place in the interim. On top of that, this is not the type of learning you want to need repeated exposures to in order to learn, as, and this should go without saying, eating poisonous foods is bad for you. In order to learn the connection between the food and the sickness, then, a learning mechanism would seem to need to “know” that the sickness is related to the food and not other, intervening variables, as well as being related in some specific temporal fashion. Events that conform more closely to this anticipated pattern should be more readily learnable.

The first study we’ll consider, then, is by Garcia & Koelling (1966) who were examining taste conditioning in rats. The experimenters created conditions in which rats were exposed to “bright, noisy” water and “tasty” water. The former condition was created by hooking a drinking apparatus up to a circuit that connected to a lamp and a clicking mechanism, so when the rats drank, they were provided with visual and auditory stimuli. The tasty condition was created by flavoring the water. Garcia & Koelling (1966) then attempted to pair the waters with either nausea or electric shocks, and subsequently measure how the rats responded in their preference for the beverage. After the conditioning phase, during the post-test period, a rather interesting sets of results emerged: while rats readily learned to pair nausea with taste, they did not draw the connection between nausea and audiovisual cues. When it came to the shocks, however, the reverse pattern emerged: rats could pair shocks with audiovisual cues well, but could not manage to pair taste and shock. This result makes a good deal of sense in light of a more domain-specific learning mechanism: things which produce certain kinds of audiovisual cues (like predators) might also have the habit of inflicting certain kinds of shock-like harms (such as with teeth or claws). On the other hand, predators don’t tend to cause nausea; toxins in food tend to do so, and these toxins also tend to come paired with distinct tastes. An all-purpose learning mechanism, by contrast, should be able to pair all these kinds of stimuli and outcomes equally well; it shouldn’t matter whether the conditioning comes in the form of nausea or shocks.

Turns out that shocks are useful for extracting information, as well as communicating it.

The second experiment to consider on the subject of learning, like the previous one, also involves rats, and actually pre-dates it. This paper, by Petrinovich & Bolles (1954), examined whether different deprivation states have qualitatively different effects on behavior. In this case, the two deprivation states under consideration were hunger and thirst. Two samples of rats were either deprived of food or water, then placed in a standard T-maze (which looks precisely how you might imagine it would). The relevant reward – food for the hungry rats and water for the thirsty ones – was placed in one arm of the T maze. The first trial was always rewarded, no matter which side the rat chose. Following that initial choice, the food was placed on the side of the maze the rat did not chose on the previous trial. For instance, if the rat went ‘right’ on the first trial, the reward was placed in the ‘left’ arm on the second trial. Whether the rat chose correctly or incorrectly didn’t matter; the reward was always placed on the opposite side as its previous choice. Did it matter whether the reward was food or water?

Yes; it mattered a great deal. The hungry rats averaged substantially fewer errors in reaching the reward than the thirsty ones (approximately 13 errors over 34 trials, relative to 28 errors, respectively). The rats were further tested until they managed to perform 10 out of 12 trials correctly. The hungry rats managed to meet the criterion value substantially sooner, requiring a median of 23 total trials before reaching that mark. By contrast, 7 of the 10 thirsty rats failed to reach the criterion at all, and, of the three that did, they required approximately 30 trials on average to manage that achievement. Petrinovich & Bolles (1954) suggested that these results can be understood in the following light: hunger makes the rat’s behavior more variable, while thirst makes its behavior more stereotyped. Why? The most likely candidate explanation is the nature of the stimuli themselves, as they tend to appear in the world. Food sources tend to be distributed semi-unpredictably throughout the environment, and where there is food today, there might not be food tomorrow. By contrast, the location of water tends to be substantially more fixed (where there was a river today, there is probably a river tomorrow), so returning to the last place you found water would be the more-secure bet. To continue to drive this point home: a domain general learning mechanism should do both tasks equally as well, and a more general account would seem to struggle to explain these findings.

Shifting gears away from rats, the final study for consideration is one I’ve touched on before, and it involves the fear responses of monkeys. As I’ve already discussed the experiment, (Cook & Mineka, 1989) I’ll offer only a brief recap of the paper. Lab-reared monkeys show no intrinsic fear responses to snakes or flowers. However, social creatures that they are, these lab-reared monkeys can readily develop fear responses to snakes after observing another conspecific reacting fearfully to them. This is, quite literally, a case of monkey see, monkey do. Does this same reaction hold in response to observations of conspecifics reacting fearfully to a flower? Not at all. Despite the lab-reared monkeys being exposed to stimuli they have never seen before in their life (snakes and flowers) paired with a fear reaction in both cases, it seems that the monkeys are prepared to learn to fear snakes, but not similarly prepared to learn a fear of flowers. Of note is that this isn’t just a fear reaction in response to living organisms in general: while monkeys can learn a fear of crocodiles, they do not learn to fear rabbits under the same conditions.

An effect noted by Python (1975)

When it comes to learning, it does not appear that we are dealing with some kind of domain-general learning mechanism, equally capable of learning all types of contingencies. This shouldn’t be entirely surprising, as organisms don’t face all kinds of contingencies with equivalent frequencies: predators that cause nausea are substantially less common than toxic compounds which do. Don’t misunderstand this argument: humans and nonhumans alike are certainly capable of learning many phylogenetically novel things. That said, this learning is constrained and directed in ways we are often wholly unaware of. The specific content area of the learning is of prime importance in determining how quickly somethings can learned, how lasting the learning is likely to be, and which things are learned (or learnable) at all. The take-home message of all this research, then, can be phrased as such: Learning is not the end point of an explanation; it’s a phenomenon which itself requires an explanation. We want to know why an organism learns what it does; not simply that it learns.

References: Cook M, & Mineka S (1989). Observational conditioning of fear to fear-relevant versus fear-irrelevant stimuli in rhesus monkeys. Journal of abnormal psychology, 98 (4), 448-59 PMID: 2592680

Garcia, J. & Koelling, R. (1966). Relation of cue to consequence in avoidance learning. Psychonomic Science, 4, 123-124.

Petrinovich, L. & Bolles, R. (1954). Deprivation states and behavioral attributes. Journal of Comparative Physiological Psychology, 47, 450-453.

Classic Research In Evolutionary Psychology: Reasoning

I’ve consistently argued that evolutionary psychology, as a framework, is a substantial, and, in many ways, vital remedy to some wide-spread problems: it allows us to connect seemingly disparate findings under a common understanding, and, while the framework is by itself no guarantee of good research, it forces researchers to be more precise in their hypotheses, allowing for conceptual problems with hypotheses and theories to be more transparently observed and addressed. In some regards the framework is quite a bit like the practice of explaining something in writing: while you may intuitively feel as if you understand a subject, it is often not until you try to express your thoughts in actual words that you find your estimation of your understanding has been a bit overstated. Evolutionary psychology forces our intuitive assumptions about the world to be made explicit, often to our own embarrassment.

“Now that you mention it, I’m surprised I didn’t notice that sooner…”

As I’ve recently been discussing one of the criticisms of evolutionary psychology – that the field is overly focused on domain-specific cognitive mechanisms – I feel that now would be a good time to review some classic research that speaks directly to the topic. Though the research to be discussed itself is of recent vintage (Cosmides, Barrett, & Tooby, 2010), the topic has been examined for some time, which is whether our logical reasoning abilities are best convinced of as domain-general or domain-specific (whether they work equally well, regardless of content, or whether content area is important to their proper functioning). We ought to expect domain specificity in our cognitive functioning for two primary reasons (though these are not the only reasons): the first is that specialization yields efficiency. The demands of solving a specific task are often different from the demands of solving a different one, and to the extent that those demands do not overlap, it becomes difficult to design a tool that solves both problems readily. Imagining a tool that can both open wine bottles and cut tomatoes is hard enough; now imagine adding on the requirement that it also needs to function as a credit card and the problem becomes exceedingly clear. The second problem is outlined well by Cosmides, Barrett, & Tooby (2010) and, as usual, they express it more eloquently than I would:

The computational problems our ancestors faced were not drawn randomly from the universe of all possible problems; instead, they were densely clustered in particular recurrent families.

Putting the two together, we end up with the following: humans tend to face a non-random set of adaptive problems in which the solution to any particular one tends to differ from the solution to any other. As domain-specific mechanisms solve problems more efficiently than domain-general ones, we ought to expect the mind to contain a large number of cognitive mechanisms designed to solve these specific and consistently-faced problems, rather than only a few general-purpose mechanisms more capable of solving many problems we do not face, but poorly-suited to the specific problems we do. While such theorizing sounds entirely plausible and, indeed, quite reasonable, without empirical support for the notion of domain-specificity, it’s all so much bark and no bite.

Thankfully, empirical research abounds in the realm of logical reasoning. The classic tool used to assess people’s ability to reason logically is the Wason selection task. In this task, people are presented with a logical rule taking the form of “if P, then Q“, and a number of cards representing P, Q, ~P, and ~Q (i.e. “If a card has a vowel on one side, then it has an even number on the other”, with cards showing A, B, 1 & 2). They are asked to point out the minimum set of cards that would need to be checked to test the initial “if P, then Q” statement. People’s performance on the task is generally poor, with only around 5-30% of people getting it right on their first attempt. That said, performance on the task can become remarkably good – up to around 65-80% of subjects getting the correct answer – when the task is phrased as a social contract (“If someone [gets a benefit], then they need to [pay a cost]“, the most well known being “If someone is drinking, then they need to be at least 21″). Despite the underlying logical form not being altered, the content of the Wason task matters greatly in terms of performance. This is a difficult finding to account for if one holds to the idea of a domain-general logical reasoning mechanism that functions the same way in all tasks involving formal logic. Noting that content matters is one thing, though; figuring out how and why content matters becomes something of a more difficult task.

While some might suggest that content simply matters as a function of familiarity – as people clearly have more experience with age restrictions on drinking and other social situations than vaguer stimuli – familiarity doesn’t help: people will fail the task when it is framed in terms of familiar stimuli and people will succeed at the task for unfamiliar social contracts. Accordingly, criticisms of the domain-specific social contract (or cheater-detection) mechanism shifted to suggest that the mechanism at work is indeed content-specific, but perhaps not specific to social contracts. Instead, the contention was that people are good at reasoning about social contracts, but only because they’re good at reasoning about deontic categories – like permissions and obligations – more generally. Assuming such an account were accurate, it remains debatable as to whether that mechanism would be counted as a domain-general or domain-specific one. Such a debate need not be had yet, though, as the more general account turns out to be unsupported by the empirical evidence.

We’re just waiting for critics to look down and figure it out.

While all social contracts involve deontic logic, not all deontic logic involves social contracts. If the more general account of deontic reasoning were true, we ought to not expect performance difference between the former and latter types of problems. In order to test whether such differences exist, Cosmides, Barrett, & Tooby’s (2010) first experiment involved presenting subjects with a permission rule – “If you do P, you must do Q first” – varying whether P was a benefit (going out at night), neutral (staying in), or a chore (taking out the trash; Q, in this case, involved tying a rock around your ankle). When the rule was a social contract (the benefit), performance was high on the Wason task, with 80% of subjects answering correctly. However, when the rule involved staying in, only 52% of subjects got it right; that number was even lower in the garbage condition, with only 44% accuracy among subjects. Further, this same pattern of results was subsequently replicated in a new context involving filing/signing forms as well. This results is quite difficult to account for with a more-general permission schema, as all the conditions involve reasoning about permissions; they are, however, consistent with the predictions from social contract theory, as only the contexts involving some form of social contract ended up eliciting the highest levels of performance.

Permission schemas, in their general form, also appear unconcerned with whether one violates a rule intentionally or accidentally. By contrast, social contract theory is concerned with the intentionality of the violation, as accidental violations do not imply the presence of a cheater the way intentional violations do. To continue to test the distinction between the two models, subjects were presented with the Wason task in contexts where the violations of the rule were likely intentional (with or without a benefit for the actor) or accidental. When the violation was intentional and benefited the actor, subjects performed accurately 68% of the time; when it was intentional but did not benefit that actor, that percentage dropped to 45%; when the violation was likely unintentional, performance bottomed-out at 27%. These results make good sense if one is trying to find evidence of a cheater; they do not if one is trying to find evidence of a rule violation more generally.

In a final experiment, the Wason task was again presented to subjects, this time varying three factors: whether one was intending to violate a rule or not; whether it would benefit the actor or not; and whether the ability to violate was present or absent. The pattern of results mimicked those above: when benefit, intention, and ability were all present, 64% of subjects determined the correct answer to the task; when only 2 factors were present, 46% of subjects got the correct answer; and when only 1 factor was present, subjects did worse still, with only 26% getting the correct answer, which is approximately the same performance level as when there were no factors present. Taken together, these three experiments provide powerful evidence that people aren’t just good at reasoning about the behavior of other people in general, but rather that they are good at reasoning about social contracts in particular. In the now-immortal words of Bill O’Reilly, “[domain-general accounts] can’t explain that“.

“Now cut their mic and let’s call it a day!”

Now, of course, logical reasoning is just one possible example for demonstrating domain specificity, and these experiments certainly don’t prove that the entire structure of the mind is domain specific; there are other realms of life – such as, say, mate selection, or learning – where domain general mechanisms might work. The possibility of domain-general mechanisms remains just that – possible; perhaps not often well-reasoned on a theoretical level or well-demonstrated at an empirical one, but possible all the same. The problem in differentiating between these different accounts may not always be easy in practice, as they are often thought to generate some, or even many, of the same predictions, but in principle it remains simple: we need to place the two accounts in experimental contexts in which they generate opposing predictions. In the next post, we’ll examine some experiments in which we pit a more domain-general account of learning against some more domain-specific ones.

References: Cosmides L, Barrett HC, & Tooby J (2010). Adaptive specializations, social exchange, and the evolution of human intelligence. Proceedings of the National Academy of Sciences of the United States of America, 107 Suppl 2, 9007-14 PMID: 20445099