This Is Water: Making The Familiar Strange

In the fairly-recent past, there was a viral video being shared across various social media sites called “This is Water” by David Foster Wallace. The beginning of the speech tells a story of two fish who are oblivious to the water in which they exist, in much the same way that humans come to take the existence of the air they breathe for granted. The water is so ubiquitous that the fish fail to notice it; it’s just the way things are. The larger point of the video – for my present purposes – is that the inferences people make in their day-to-day lives are so automatic as to become taken for granted. David correctly notes that there are many, many different inferences that one could make about people we see in our every day lives: is the person in the SUV driving it because they fear for their safety or are they selfish for driving that gas-guzzler? Is the person yelling at their kids not usually like that, or are they an abusive parent? There are two key points in all of this. The first is the aforementioned habit people have to take the ability we have to draw these kinds of inferences in the first place for granted; what Cosmides & Tooby (1994) call instinct blindness. Seeing, for instance, is an incredibly complex and difficult-to-solve task, but the only effort we perceive when it comes to vision involves opening our eyes: the seeing part just happens. The second, related point is the more interesting part to me: it involves the underdetermination of the inferences we draw from the information we’re provided. That is to say that no part of the observations we make (the woman yelling at her child) intrinsically provides us with good information to make inferences with (what is she like at other times).

Was Leonidas really trying to give them something to drink?

There are many ways of demonstrating underdetermination, but visual illusions – like this one – prove to be remarkable effective in quickly highlighting cases where the automatic assumptions your visual systems makes about the world cease to work. Underdetermination isn’t just a problem need to be solved with respect to vision, though: our minds make all sorts of assumptions about the world that we rarely find ourselves in a position to appreciate or even notice. In this instance, we’ll be considering some of the information our mind automatically fills in concerning the actions of other people. Specifically, we perceive our world along a dimension of intentionality. Not only do we perceive that individuals acted “accidentally” or “on purpose”, we also perceive that individuals acted to achieve certain goals; that is, we perceive “motives” in the behavior of others.

Knowing why others might act is incredibly useful for predicting and manipulating their future behavior. The problem that our minds need to solve, as you can no doubt guess by this point, is that intentions and motives are not readily observable from actions. This means that we need to do our best to approximate them from other cues, and that entails making certain assumptions about observable actions and the actors who bring them about. Without these assumptions, we would have no way to distinguish between someone killing in self-defense, killing accidentally, or killing just for the good old fashion fun of it. The questions for consideration, then, concern which kinds of assumptions tend to be triggered by which kinds of cues under what circumstances, as well as why they get triggered by that set of cues. Understanding what problems these inferences about intentions and motives were designed to solve can help us more accurately predict the form that these often-unnoticed assumptions will likely take.

While attempting to answer that question about what cues our minds use, one needs to be careful to not lapse in the automatically-generated inferences our minds typically make and remain instinct-blind. The reason that one ought to avoid doing this – in regards to inferences about intentions and motives – is made very well by Gawronski (2009):

“…how [do] people know that a given behavior is intentional or unintentional[?]  The answer provided…is that a behavior will judged as intentional if the agent (a) desired the outcome, (b) believed that the action would bring about the outcome, (c) planned the action, (d) had the skill to accomplish the action, and (e) was aware of accomplishing the outcome…[T]his conceptualization implies the risk of circularity, as inferences of intentionality provide a precondition for inferences about aims and motives, but at the same time inferences of intentionality depend on a perceivers’ inferences about aims and motives.”

In other words, people often attempt to explain whether or not someone acted intentionally by referencing motives (“he intended to harm X because he stood to benefit”), and they also often attempt to explain someone’s motives on the basis of whether or not they acted intentionally (“because he stood to benefit by harming X, he intended harm”). On top of that, you might also notice that inferences about motives and intentions are themselves derived, at least in part, from other, non-observable inferences about talents and planning. This circularity manages to help us avoid something resembling a more-complete explanation for what we perceive.

“It looks three-dimensional because it is, and it is 3-D because it looks like it”

Even if we ignore this circularity problem for the moment and just grant that inferences about motives and intentions can influence each other, there is also the issue of the multiple possible inferences which could be drawn about a behavior. For instance, if you observe a son push his father down the stairs and kill him, one could make several possible inferences about motives and intentions. Perhaps the son wanted money from an inheritance, resulting in his intending to push his father to cause death. However, pushing his father not only kills close kin, but also carries the risk of a punishment. Since the son might have wanted to avoid punishment (and might well have loved his father), this would result in his not intending to push his father and cause death (i.e. maybe he tripped, which is what caused him to push). Then again, unlikely as it may sound, perhaps the son actively sought punishment, which is why he intended to push. This could go on for some time. The point is that, in order to reach any one of these conclusions, the mind needs to add information that is not present in the initial observation itself.

This leads us to ask what information is added, and on what basis? The answer to this question, I imagine, would depend on the specific inferential goals of the perceiver. One goal is could be accuracy: people wish to try and infer the “actual” motivations and intentions of others, to the extent it makes sense to talk about such things. If it’s true, for instance, that people are more likely to act in ways that avoid something like their own bodily harm, our cognitive systems could be expected to pick up on that regularity and avoid drawing the the inference that someone was intentionally seeking it. Accuracy only gets us so far, however, due to the aforementioned issue of multiple potential motives for acting: there are many different goals one might be intending to achieve and many different costs one might be intending to avoid, and these are not always readily distinguishable from one another. The other complication is that accuracy can sometimes get in the way of other useful goals. Our visual system, for instance, while not always accurate, might well be classified as honest. That is to say though our visual system might occasionally get things wrong, it doesn’t tend to do so strategically; there would be no benefit to sometimes perceiving a shirt as blue and other times as red in the same lighting conditions.

That logic doesn’t always hold for perceptions of intentions and motives, though: intentionally committed moral infractions tend to receive greater degrees of moral condemnation than unintentional ones, and can make one seem like a better or worse social investment. Given that there are some people we might wish to see receive less punishment (ourselves, our kin, and our allies) and some we might wish to see receive more (those who inflict costs on us or our allies), we ought to expect our intentional systems to perceive identical sets of actions very differently, contingent on the nature of the actor in question. In other words, if we can persuade others about our intentions and motives, or the intentions and motives of others, and alter their behavior accordingly, we ought to expect perceptual biases that assist in those goals to start cropping up. This, of course, rests on the idea that other parties can be persuaded to share your sense of these things, posing us with related problems like under what circumstances does it benefit other parties to develop one set of perceptions or another?

How fun this party is can be directly correlated to the odds of picking someone up.

I don’t pretend to have all the answers to questions like these, but they should serve as a reminder that the our minds need to add a lot of structure to the information they perceive in order to do many of the things of which they are capable. Explanations for how and why we do things like perceive intentionality and motive need to be divorced from the feeling that such perceptions are just “natural” or “intuitive”; what we might consider the experience of the word “duh”. This is an especially large concern when you’re dealing with systems that are not guaranteed to be accurate or honest in their perceptions. The cues that our minds use to determine what the motives people had when they acted and what they intended to do are by no means always straightforward, so saying that inferences are generated by “the situation” is unlikely to be of much help, on top of just being wrong.

References: Cosmides, L. & Tooby, J. (1996). Beyond intuition and instinct blindness: Towards an evolutionary rigorous cognitive science. Cognition, 50, 41-77.

Gawronski, B. (2009). The Multiple Inference Model of Social Perception: Two Conceptual Problems and Some Thoughts on How to Resolve Them. Psychological Inquiry, 20, 24-29 DOI: 10.1080/10478400902744261

Sexed-Up Statistics – Female Genital Mutilation

“A lie can travel halfway around the world while the truth is putting on its shoes” – Mark Twain.

I had planned on finishing up another post today (which will likely be up tomorrow now) until a news story caught my eye this morning, changing my plans somewhat. The news story (found on Alternet) is titled, “Evidence shows that female genital cutting is a growing phenomenon in the US“. Yikes; that certainly sounds worrying. From that title, and subsequent article, it would seem two things are likely to inferred by the reader: (1) There is more female genital cutting in the US in recent years than there was in the past and (2) some kind of evidence supports that claim. There were several facets of the article that struck me as suspect, however, most of which speak to the second point: I don’t think the author has the evidence required to substantiate their claims about FGC. Just to clear up a few initial points, before moving forward with this analysis, no; I’m not trying to claim that FGC doesn’t occur at all in the US or on overseas trips from the US. Also, I personally oppose the practice in both the male and female varieties; cutting pieces off a non-consenting individual is, on my moral scale, a bad thing. My points here only concern accurate scholarship in reporting. They also raise the possibility that the problem may well be overstated – something which, I think, ought to be good news.

It means we can start with just the pitchforks; the torches aren’t required…yet.

So let’s look at the first major alarmist claim of the article: there was a report put out by the Sanctuary for Families that claimed approximately 200,000 women living in the US were living in risk of genital cutting. That number sounds pretty troubling, but the latter part of the claim sounds a bit strange: what does “at risk” mean? I suppose, for instance, that I’m living “at risk” of being involved in a fatal car accident, just as everyone else who drives a car is. Saying that there are approximately 200,000,000 people in the US living at risk of a fatal car crash is useless on its own, though: it requires some qualifications. So what’s the context behind the FGC number? The report itself references a 1997 paper by the CDC that estimated between 150,000 and 200,000 women in the US were at risk of being forced to undergo FGC (which we’ll return to later). Given that the reference for claim is a paper by the CDC, it seems very peculiar that the Sanctuary for Families attaches a citation that instead directs you to another news site that just reiterates the claim.

This is peculiar for two reasons: first, it’s a useless reference. It would be a bit like my writing down on a sheet of paper, “I think FGC is one the rise” because I had read it somewhere, and then referencing the fact that I wrote that down when I say it again the next time.Without directing one the initial source of the claim, it’s not a proper citation and doesn’t add any information. The second reason that the reference is peculiar is that the 1997 CDC paper (or at least what I assume is the paper) is actually freely available online. It took me all of 15 seconds to find it through a Google search. While I’m not prepared to infer any sinister motivation on the Sanctuary for Families for not citing the actual paper, it does, I think speak to the quality of scholarship that went into drafting the report, and in a negative way. It makes one wonder whether they actually read the key report in the first place.

Thankfully, it does finally provide us with the context as to how the estimated number was arrived at. The first point worth noting is that the estimate the paper delivers (168,000) is a reflection of people living in the US who had either already undergone the procedure before they moved here or who might undergo it in the future (but not necessarily within the US). The estimate is mute on when or where the procedure might have taken place. If it happened in another country years or decades ago, it would be part of this estimate. In any case, the authors began with the 1990 census data of the US population. On the census, respondents were asked about their country of origin and how long they lived in the US. From that data, the authors then cross-referenced the estimated rates of FGC in people’s home countries to estimate whether or not they were likely to have undergone the procedure. Further, the authors made the assumption in all of this that immigrants were not unique from the population from which they were derived with respect to their practicing of FGC: if 50% of the population in a families’ country of origin practiced it, then 50% of immigrants were expected to have practiced it or might do so in the future. In other words, the 168,000 number is an estimate, based on other estimates, based on an assumption.

It’s an impressive number, but I worry about its foundation.

I would call this figure, well, a very-rough estimate, and not exactly solid evidence. Further, it’s an estimate of FGC in other countries; not in the US. The authors of the CDC paper were explicit about this point, writing, “No direct information is available on FGC in the United States”. It is curious, then, that the Sanctuary report and the Alternet article both reference the threat of FGC that girls in the US face while referencing the CDC estimate. For example, here’s how the Sanctuary report phrased the estimate:

In 1997, however, the Centers for Disease Control and Prevention (CDC) estimated that as many as 150,000 to 200,000 girls in the United States were at risk of being forced to undergo female genital mutilation.

See the important differences? The CDC estimate wasn’t one concerning people at risk of being forced to undergo the practice; it was an estimate of people who might undergo it and whom might have already undergone it at some point in the past in some other country. Indeed, the CDC document could more accurately be considered an immigration report, rather than an paper on FGC itself. So, when the Sanctuary report and Alternet article suggest that the number of women at risk for FGC is rising, what they appear to mean is that immigration from certain countries where the practice is more common is rising, but that doesn’t seem to have quite the same emotional effect. Importantly, the level of risk isn’t ever qualified. Approximately 200,000,000 people are “at risk” of being involved in a fatal car crash; how many of them actually are involved in one? (about 40,000 a year and on the decline). So how many of the 168,000 women “at risk” for FGC already had one, how many might still be “at risk”, and how many of those “at risk” end up actually undergoing the procedure? Good evidence is missing on these points.

This kind of not-entirely-accurate reporting remind me of a piece by Neuroskeptic on what he called “sexed-up statistics”. These are statistics presented or reported on in such a way as to make some problem seem as bad as possible, most likely in the goal of furthering some social, political, or funding goals (big problems attract money for their solution). It’s come up before in the debate over the wage-gap between men and women, and when considering the extent of rape among college-aged (and non-college aged) women, to just name two prominent cases. This ought not be terribly surprising in light of the fact that the pursuit of dispassionate accuracy is likely not the function of human reasoning. The speed with which people can either accept or reject previously-unknown information (such as the rate of FGC in the US and whether it’s a growing problem) tells us that concerns for accuracy per se are not driving these decisions. This is probably why the initial quote by Mark Twain carries the intuitive appeal that it does.

“Everyone but me and the people I agree with are so easily fooled!”

FGC ought to be opposed, but it’s important to not let one’s opposition for it (or, for that matter, one’s opposition or support for any other specific issue) get in the way of accurately considering and reporting on the evidence at hand (or al least doing the best one can in that regard). The evidence – and that term is used rather loosely here – presented certainly does not show that illegal FGC is a “growing phenomenon in the US”, as Jodie at Alternet suggests. How could the evidence even already show it was a growing problem if one grants that determining the initial and current scope of the problem hasn’t been done and couldn’t even feasibly be done? As far as the “evidence” suggests, the problem could be on the rise, on the decline, or have remained static. One of those options just happens to make for the “sexier” story; the story more capable of making its way halfway around the world in an instant.

Mathematical Modeling Of Menopause

Some states of affairs are so ubiquitous in the natural world that – much like the air we breathe – we stop noticing their existence or finding them particularly strange. The effects of aging are good examples of this. All else being equal, we ought to expect organisms that are alive longer to reproduce more. The longevity/reproduction link would seem to make the previously-unappreciated question of why organism’s bodies tend to breakdown over time rather salient. Why do organisms grow old and frail, before one or more homeostatic systems start failing, if being alive tends to aid in reproduction? One candidate explanation for understanding senescence involves.considering the trade off between the certainty of the present and the uncertainty of the future; what we might consider the discount rate of life. Each day, our bodies need to avoid death from a variety of sources, such as accidental injuries, intentional injuries from predators or conspecifics, the billions of hungry microorganisms we encounter, or lacking access to sufficient metabolic resources. Despite the whole world seemingly trying to kill us constantly, our bodies manage to successfully cheat death pretty well, all things considered.

“What do we say to death? Not to..OH MY GOD, WHAT’S BEHIND YOU?”

Of course, we don’t always manage to avoid dying: we get sick, we get into fights, and sometimes we jump out of airplanes for fun. Each new day, then, brings new opportunities that might result in the less-than-desirable outcome, and the future is full of new days. This makes each day in the future that much less valuable than each day in the present, as future days come with the same potential benefits, but all the collective added risk. Given the uncertainty of the future, it follows that some adaptations might be designed to increase our chances of being alive today, even if they decrease our odds of being alive tomorrow. These adaptations may well explain why we age the way we do. They would be expected to make us age in very specific ways, though: all our biological systems ought to be expected to breakdown at roughly the same time. This is because investing tons of energy into making a liver that never breaks doesn’t make much sense if the lungs give out too easily, as the body with the well-functioning liver would die all the same without the ability to breathe; better to divert some of that energy from liver maintenance to lung function.

As noted previously, however, being alive is only useful from an evolutionary perspective if being alive means better genetic representation in the future. The most straightforward way of achieving said genetic representation is through direct reproduction. This makes human menopause a very strange phenomenon indeed. Why do female’s reproductive capabilities shut off decades before the rest of their body tends to? It seems that pattern of loss of function parallels the liver/lungs example. Further, as the use of the word ‘human’ suggests, this cessation of reproductive abilities is not well-documented among other species. It’s not that other species don’t ever lose the capacity for reproduction, mind you, just that they tend to lose it much closer to the point when they would die anyway. This adds a second part to our initial question concerning the existence of menopause: why does it seem to only really happen in humans?

Currently, the most viable explanation is known as “The Grandmother Hypothesis“. The hypothesis suggests that, due to the highly-dependent nature of human offspring and the risks involved in pregnancy, it became adaptive for women to cease focusing on producing new offspring of their own and shift their efforts towards investing in their existing offspring and grandoffspring. At its core, the grandmother hypothesis is just an extension of kin selection: the benefits to helping relatives begin to exceed the benefits of direct reproduction. While this hypothesis may well prove to not be the full story, it does have two major considerations going for it: first, it explains the loss of reproductive capacity through a tradeoff – time spent investing in new offspring is time not spent investing in existing ones. It doesn’t commit what I would call the “dire straits fallacy” by trying to get something for free, as some standard psychology ideas (like depressive realism) seem to. The second distinct benefit of this hypothesis is perhaps more vital, however: it explains why menopause appears to be rather human-specific by referencing something unique to humans – extremely altricial infants that are risky to give birth to.

A fairly accurate way to conceptualize the costs of the pregnancy-through-college years.

A new (and brief) paper by Morton, Stone, & Singh (2013) sought to examine another possible explanation for menopause: mate choice on the part of males.The authors used mathematical models to attempt and demonstrate that, assuming men have a preference for young mates, mutations that had deleterious effects on women’s fertility later in life could drift into fixation. Though the authors aren’t explicit on this point, they seem to be assuming, de facto, that human female menopause is a byproduct of senescence plus a male sexual preference for younger women, as without this male sexual preference, their simulated models failed to result in female menopause. They feel their models demonstrate that you don’t necessarily need something like a grandmother hypothesis to explain menopause. My trust in results derived from mathematical models like these can be described as skeptical at the best of times, so it should come as no surprise that I found this explanation lacking on three rather major fronts.

My first complaint is that while their model might show that – given certain states of affairs held – explanations like the grandmother hypothesis need not be necessary, they fail to rule out the grandmother hypothesis in empirical or theoretical way. They don’t bother to demonstrate that their state of affairs actually held. Why that’s a problem is easy to recognize: it would be trivial to concoct a separate mathematical model that “demonstrated” the strength of the grandmother hypothesis by making a different set of assumptions (such as by assuming that past a certain age, investments returned in existing offspring outweighed investments in new ones). Yes; to do so would be pure question-begging, and I fail to see how the initial model provided by Morton et al (2013) isn’t doing just that.

My second complaint is, like the grandmother hypothesis, Morton et al’s (2013) byproduct model does consider tradeoffs, avoiding the dire straits fallacy; unlike the grandmother hypothesis, however, the byproduct account fails to posit anything human-specific about menopause. It seems to me that the explanation on offer from the byproduct account could be applied to any sexually-reproducing species. Trying to explain a relatively human-specific trait with a non-human-specific selection pressure isn’t as theoretically-sound as I would like. “But”, Morton et al might object, “we do posit a human-specific trait: a male preference for young female mates“. A fine rebuttal, complicated only by the fact that this is actually the weakest point of the paper. The authors appear to be trying to use an unexplained-preference to explain the decline in fertility, when it seems the explanation ought to run in precisely the opposite direction. If, as the model initially assumes, ancestral females did not differ substantially in their fertility with respect to age, how would a male preference for younger females ever come to exist in the first place? What benefits would arise to men who shunned older – but equally fertile – women in favor of younger ones? It’s hard to say. By contrast, if our starting point is that older females were less fertile, a preference for younger ones is easily explained.

No amount of math makes this an advisable idea.

Preferences are not explanations themselves; they require explanations. Much like aging, however, people can take preferences for granted because of how common they are (like the human male’s tendency to find females of certain ages maximally attractive), forgetting that basic fact in the process. The demonstration that male mating preferences could have been the driving force explaining the existence of menopause, then, seems empty. The model, like many others that I’ve encountered, seems to do little more than restate the author’s initial assumptions as conclusions, just in the language of math, rather than English. As far as I can see, the model makes no testable or novel predictions, and only manages to reach that point by assuming a maladaptive, stable preference on the part of men. I wouldn’t mark it down as a strong contender for helping us understand the mystery of menopause.

References: Morton, R., Stone, J., & Singh, R. (2013). Mate Choice and the Origin of Menopause PLoS Computational Biology, 9 (6) DOI: 10.1371/journal.pcbi.1003092

How Hard Is Psychology?

The scientific method is a pretty useful tool for assisting people in doing things related to testing hypotheses and discerning truth – or as close as one can come to such things. Like the famous Churchill quote about democracy, the scientific method is the worst system we have for doing so, except for all the others. That said, the scientists who use the method are often not doing so in the single-minded pursuit of truth. Perhaps phrased more aptly, testing hypotheses is generally not done for its own sake: people testing hypotheses are typically doing so for other reasons, such as raising one’s status and furthering one’s career in the process. So, while the scientific method could be used to test any number of hypotheses, scientists tend to try and use for certain ends and to test certain types ideas: those perceived to be interesting, novel, or useful. I imagine that none of that is particularly groundbreaking information to most people: science in theory is different from science in practice. A curious question, then, is given that we ought to expect scientists from all fields to use the method for similar reasons, why are some topics to which the scientific method is applied viewed as “soft” or “hard” (like psychology and physics, respectively)?

Very clever, Chemistry, but you’ll never top Freud jokes.

One potential reason for this impression is that these non-truth-seeking (what some might consider questionable) uses to which people attempt to put the scientific method could simply be more prevalent in some fields, relative to other ones. The further one strays from science in theory to science in practice, the softer your field might be seen as being. If, for instance, psychology was particularly prone to biases that compromises the quality or validity of the data, relative to other fields, then people would be justified in taking a more critical stance towards the findings from it. One of those possible biases involves tending to only report the data consistent with one hypothesis or another. As the scientific method requires reporting the data that is both consistent and inconsistent with one’s hypothesis, if only one of those is being done, then the validity of the method can be compromised and you’re no longer doing “hard” science. A 2010 paper by Fanellli provides us with some reason to worry on that front. In that paper, Fanelli examined approximately 2500 papers randomly drawn from various disciplines to determine the extent to which positive results (those which support one or more of the hypotheses being tested statistically) dominate in the published literature. The Psychology/Psychiatry category sat at the top of the list, with 91.5% of all published papers reporting positive results.

While that number may seem high, it is important to put the figure into perspective: the field at the bottom of that list – the one which reported the fewest positive results overall – were the Space Sciences, with 70.2% of all the sampled published work reporting positive results. Other fields ran a relatively smooth line between the upper- and lower-limits, so the extent to which the fields differ in positive results dominating is a matter of degree; not kind. Physics and Chemistry, for instance, both ran about 85% in terms of positive results, despite both being considered “harder” sciences than psychology. Now that the 91% figure might seem a little less worrying, let’s add some more context to reintroduce the concern: those percentages only consider whether any positive results were reported, so papers that tested multiple hypotheses tended to have a better chance of reporting something positive. It also happened that papers within psychology tended to test more hypotheses on average than papers in other fields. When correcting for that issue, positive results in psychology were approximately five-times more likely than positive results in the space sciences. By comparison, positive results physics and chemistry were only about two-and-a-half-times more likely. How much cause for concern should this bring us?

There are two questions to consider, before answering that last question: (1) what are the causes of these different rates of positive results and (2) are these differences in positive results driving the perception among people that some sciences are “softer” than others? Taking these in order, there are still more reasons to worry about the prevalence of positive results in psychology: according to Fanelli, studies in psychology tend to have lower statistical power than studies in physical science fields. Lower statistical power means that, all else being equal, psychological research should find fewer – not greater – percentages of positive results overall. If psychological studies tend to not be as statistically powerful, where else might the causes of the high-proportion of positive results reside? One possibility is that psychologists are particularly likely to be predicting things that happen to be true. In other words, “predicting” things in psychology tends to be easy because hypotheses tend to only be made after a good deal of anecdata has been “collected” by personal experience (incidentally, personal experience is a not-uncommonly cited reason for research hypotheses within psychology). Essentially, then, predictions in psychology are being made once a good deal of data is already in, at least informally, making them less predictions and more restatements of already-known facts.

“I predict that you would like a psychic reading, on the basis of you asking for one, just now.”

A related possibility is that psychologists might be more likely to engage in outright-dishonest tactics, such actually collecting their data formally first (rather than just informally), and then making up “predictions” that restate their data after the fact. In the event that publishers within different fields are more or less interested in positive results, then we ought to expect researchers within those fields to attempt this kind of dishonesty on a greater scale (it should be noted, however, that the data is still the data, regardless of whether it was predicted ahead of time, so the effects on the truth-value ought to be minimal). Though greater amounts of outright dishonesty is a possibility, it would be unclear as to why psychology would be particularly prone to this, relative to any other field, so it might not be worth worrying too much about. Another possibility is that psychologists are particularly prone to using questionable statistical practices that tend to boost their false-positive rates substantially, an issue which I’ve discussed before.

There are two issues above all the others stand out to me, though, and they might help to answer the second question – why psychology is viewed as “soft” and physics as “hard”. The first issue has to do with what Fanelli refers to as the distinction between the “core”  and the “frontier” of a discipline. The core of a field of study represents the agreed upon theories and concepts on which the field rests; the frontier, by contrast, is where most of the new research is being conducted and new concepts are being minted. Psychology, as it currently stands, is largely frontier-based. This lack of a core can be exemplified by a recent post concerning “101 greats insights from psychology 101“. In the list, you’ll find the word “theory” used a collective three times, and two of those mentions concern Freud. If you consider the plural – “theories” – instead, you’ll find five novel uses of the term, four of which mention no specific theory. The extent to which the remaining two uses represent actual theories, as opposed to redescriptions of findings, is another matter entirely. If one is left with only a core-less frontier of research, that could well send the message that the people within the field don’t have a good handle on what it is they’re studying, thus the “soft” reputation.

The second issue involves the subject matter itself. The “soft” sciences – psychology and its variants (like sociology and economics) – seem to dabble in human affairs. This can be troublesome for more than one reason. A first reason might involve the fact that the other humans reading about psychological research are all intuitive psychologists, so to speak. We all have an interest in understanding the psychological factors that motivate other people in order to predict what they’re going to do. This seems to give many people the impression that psychology, as a field, doesn’t have much new information to offer them. If they can already “do” psychology without needing explicit instructions, they might come to view psychology as “soft” precisely because it’s perceived as being easy. I would also note that this suggestion ties neatly into the point about psychologists possibly tending to make many predictions based on personal experience and intuitions. If the findings they are delivering tend to give people the impression that “Why did you need research? I could have told you that”, that ease of inference might cause people to give psychology less credit as a science.

“We go to the moon because it is hard, making physics a real science”

The other standout reason as to why psychology might pose people with the soft perception is that, on top of trying to understand other people’s psychological goings-on, we also try to manipulate them. It’s not just that we want to understand why people support or oppose gay marriage, for instance, it’s that we might also want to change their points of view. Accordingly, findings from psychology tend to speak more directly to issues people care a good deal about (like sex, drugs, and moral goals. Most people don’t seem to argue over the latest implications from chemistry research), which might make people either (a) relatively resistant to the findings or (b) relatively accepting of them, contingent more on one’s personal views and less on the scientific quality of the work itself. This means that, in addition to many people having a reaction of “that is obvious” with respect to a good deal of psychological work, they also have the reaction of “that is obviously wrong”, neither of which makes psychology look terribly important.

It seems likely to me that many of these issues could be mediated with the addition of a core to psychology. If results need to fit into theory, various statistical manipulations might become somewhat easier to spot. If students were learning how to think about psychology, rather than to think about and remember lists of findings which they feel are often trivial or obviously wrong, they might come away with a better impression of the field. Now if only a core could be found

References: Fanelli D (2010). “Positive” results increase down the Hierarchy of the Sciences. PloS one, 5 (4) PMID: 20383332

When (And Why) Is Discrimination Acceptable?

As a means of humble-bragging, I like to tell people that I have been rejected from many prestigious universities; the University of Pennsylvania, Harvard, and Yale are all on that list. Also on that list happens to be the University of New Mexico, home of one Geoffrey Miller. Very recently, Dr. Miller has found himself in a little bit of moral hot water from what seems to be an ill-conceived tweet. It reads as follows: “Dear obese PhD applicants: if you don’t have enough willpower to stop eating carbs, you won’t have the willpower to do a dissertation #truth“. Miller subsequently deleted the tweet and apologized for it in two follow up tweets. Now, as I mentioned, I’ve been previously rejected from Miller’s lab – on more than one occasion, mind you (I forgot if it was 3 or 4 times now) – so clearly, I was discriminated against. Indeed, discrimination policies are vital to anyone, university or otherwise, with open positions to fill. When you have 10 slots open and you get approximately 750 applications, you need some way of discriminating between them (and whatever method you use will disappoint approximately 740 of them). Evidently, being obese is one characteristic that people found to be morally unacceptable to even jokingly suggest you were discriminating on the basis of. This raises the question of why?

Oh no; someone’s going to get a nasty email…

Let’s start with a related situation: it’s well-known that many universities make use of standardized test scores, such as the SAT or GRE, in order to screen out applicants. As a general rule, this doesn’t tend to cause too much moral outrage, though it does cause plenty of frustration. One could – any many do – argue that using these scores is not only morally acceptable, but appropriate, given that they predict some facets of performance at school-related tasks. While there might be some disagreement over whether or not the tests are good enough predictors of performance (or whether they’re predicting something conceptually important), there doesn’t appear to be much disagreement about whether or not they could be made use of, from a moral standpoint. That’s a good principle to start the discussion over the obese comment with, isn’t it? If you have a measure that’s predictive of some task-relevant skill, it’s OK to use it.

Well, not so fast. Let’s say, for the sake of this argument, that obesity was actually a predictor of graduate school performance. I don’t know if there’s actually any predictive value there, but let’s assume there is and, just for the sake of this example, let’s assume that being obese was indicative of doing slightly worse at school, like Geoffrey suggested; why it might have that effect is, for the moment, of no importance. So, given that obesity could, to some extent, predict graduate school performance, should schools be morally allowed  to use it in order to discriminate between potential applicants?

I happen to think the matter is not nearly so simple as predictive value. For starters, there doesn’t seem to be any widely-agreed upon rule as for precisely how predictive some variable needs to be before its use is deemed morally acceptable. If obesity could, controlling for all other variables, predict an additional 1% of the variance graduate performance, should applications start including boxes for height and weight? While 1% might not seem like a lot, if you could give yourself a 1% better chance at succeeding at some task for free (landing a promotion, getting hired, avoiding being struck by a car or, in this case, admitting a productive student), it seems like almost everyone would be interested in doing so; ignoring or avoiding useful information would be a very curious route to opt for, as it only ensures that, on the whole, you make a worse decision than if you hadn’t considered it. One could play around with the numbers to try and find some threshold of acceptability, if they were so inclined (i.e. what if it could predict 10%, or only 0.1%), to help drive the point home. In any case, there are a number of different factors which could predict graduate school performance in different respects: previous GPAs, letters of recommendation, other reasoning tasks, previous work experience, and so on. However, to the best of my knowledge, no one is arguing that it would be immoral to only use any of them other than the best predictor (or the top X number of predictors, or the second best if you aren’t using the first, and so on). The core of the issue seems to center on obesity, rather than discriminant validity per se.

*May also apply to PhD applications.

Thankfully, there is some research we can bring to bear on the matter. The research comes from a paper by Tetlock et al (2000) who were examining what they called “forbidden base rates” – an issue I touched on once before. In one study, Tetlock et al presented subjects with an insurance-related case: an insurance executive had been tasked with assessing how to charge people for insurance. Three towns had been classified as high-risk (10% chance of experiencing fires or break-ins), while another three had been classified as low-risk (less than 1% chance). Naturally, you would expect that anyone trying to maximize their risk-to-profit ratio would change different premiums, contingent on risk. If one is not allowed to do so, they’re left with the choices of offering coverage at a price that’s too low to be sustainable for them or too high to be viable for some of their customers. While you don’t want to charge low-risk people more than you need to, you also don’t want to under-charge the high-risk ones and risk losing money. Price discrimination in this example is a good thing.

The twist was that these classifications of high- and low-risk either happened to correlate along racial lines, or they did not, despite their being no a priori interest in discriminating against any one race. When faced with this situation, something interesting happens: compared to conservatives and moderates, when confronted with data suggesting black people tended to live in the high-risk areas, liberals tended to advocate for disallowing the use of the data to make profit-maximizing economic choices. However, this effect was not present when the people being discriminated against in the high-risk area happened to be white.

In other words, people don’t seem to have an issue with the idea of using useful data to discriminate amongst groups of people itself, but if that discrimination ended up affecting the “wrong” group, it can be deemed morally problematic. As Tetlock et al (2000) argued, people are viewing certain types of discrimination not as “tricky statistical issues” but rather as moral ones. The parallels to our initial example are apparent: even if discriminating on the basis of obesity could provide us with useful information, the act itself is not morally acceptable in some circles. Why people might view discrimination against obese people morally offensive itself is a separate matter. After all, as previously mentioned, people tend to have no moral problems with tests like GRE that discriminate not on weight, but other characteristics, such as working memory, information processing speeds, and a number of other difficult to change factors. Unfortunately, people tend to not have much in the way of conscious insight into how their moral judgments are arrived at and what variables they make use of (Hauser et al, 2007), so we can’t just ask people about their judgments and expect compelling answers.

Though I have no data bearing on the subject, I can make some educated guesses as to why obesity might have moral protection: first, and perhaps most obvious, is that people with the moral qualms about discrimination along the weight dimension might themselves tend to be fat or obese and would prefer to not have that count against them. In much the say way, I’m fairly confident that we could expect people who scored low on tests like the GRE to downplay their validity as a measure and suggest that schools really ought to be looking at other factors to determine admission criteria. Relatedly, one might also have people they consider to be their friends or family members who are obese, so they adopt moral stances against discrimination that would ultimately harm their social ingroup. If such groups become prominent enough, siding against them would become progressively costlier. Adopting a moral rule disallowing discrimination on the basis of weight can spread in those cases, even if enforcing that rule is personally costly, on account of not adopting the rule can end up being an even greater cost (as evidenced by Geoffrey currently being hit with a wave of moral condemnation for his remarks).

Hopefully it won’t crush you and drag you to your death. Hang ten.

As to one final matter, one could be left wondering why this moralization of judgments concern certain traits – like obesity – can be successful, whereas moralization of judgments based on other traits – like whatever GREs measure – doesn’t obtain. My guess in that regard is that some traits simply effect more people or effect them in much larger ways, and that can have some major effects on the value of an individual adopting certain moral rules. For instance, being obese effects many areas of one’s life, such as mating prospects and mobility, and weight cannot easily be hidden. On the other hand, something like GRE scores effect very little (really, only graduate school admissions), and are not readily observable. Accordingly, one manages to create a “better” victim of discrimination; one that is proportionately more in need of assistance and, because of that, more likely to reciprocate any given assistance in the future (all else being equal). Such a line of thought might well explain the aforementioned difference we see in judgments between racial discrimination being unacceptable when it predominately harms blacks, but fine when it predominately harmed whites. So long as the harm isn’t perceived as great enough to generate an appropriate amount of need, we can expect people to be relatively indifferent to it. It just doesn’t create the same social-investment potential in all cases.

References: Hauser, M., Cushman, F., Young, L., Kang-Xing Jin, R., & Mikhail, J. (2007). A dissociation between moral judgments and justifications. Mind & Language, 22, 1-21.

Tetlock, P., Kristel, O., Elson, S., Green, M., & Lerner, J. (2000). The psychology of the unthinkable: Taboo trade-offs, forbidden base rates, and heretical counterfactuals. Journal of Personality and Social Psychology, 78 (5), 853-870 DOI: 10.1037//0022-3514.78.5.853

Why Are They Called “Spoilers”?

Imagine you are running experiments with mice. You deprive the mice of food until they get hungry and then you drop them into a maze. Now obviously the hungry mice are pretty invested in the idea of finding the food; you have been starving them and all. You’re not really that evil of a researcher, though: in one group, you color-code the maze so the mice always know where to go to find the reward. The mice, I expect, would not be terribly bothered by your providing them with information and, if they could talk, I doubt many of them would complain about your “spoiling” the adventure of finding the food themselves. In fact, I would also expect most people would respond the same way when they were hungry: they would rather you provide them with the information they sought directly instead of having to make their own way through the pain of a maze (or do some equally-annoying psychological task) before they could eat. We ought to expect this because, at least in this instance, as well as many others, having access to greater quantities of accurate information allows you to do more useful things with your time. Knowing where food is cuts down on your required search time, which allows you to spend that time in other, more fruitful ways (like doing pretty much anything that undergraduates can do that doesn’t involve serving a participant for psychologists). So what are we to make of cases where people seem to actively avoid such information and claim they find it aversive?

Spoiler warning: If you would rather formulate your own ideas first, stop reading now.

The topic arose for me lately in the context of the upcoming E3 event, where the next generation of video games will be previewed. There happens to be one video game specifically I find myself heavily invested in and, for whatever reason, I find myself wary of tuning into E3 due to the risk of inadvertently exposing myself to any more content from the game. I don’t want to know what the story is; I don’t want to see any more game play; I want to remain as ignorant as possible until I can experience the game firsthand. I’m also far from alone in that experience: of approximately 40,000 who have voiced their opinions, a full half reported that they found spoilers unpleasant. Indeed, the word that refers to the leaking of crucial plot details itself implies that the experience of learning them can actually ruin the pleasure that finding them out for yourself can bring, in much the same way that microorganisms make food unpalatable or dangerous to ingest. Am I, along with the other 20,000, simply mistaken? That is, do spoilers actually make the experience of reading some book or playing some video game any less pleasant? At least two people think that answer is “yes”.

Leavitt & Chistienfeld (2011) suggest that spoilers, in fact, do not make the experience of a story any less pleasant. After all, the authors mention people are perfectly willing to experience stories again, such as by rereading a book, without any apparent loss of pleasure from the story (curiously they cite no empirical evidence on this front, making it an untested assumption). Leavitt & Christienfeld also suggested that perceptual fluency (in the form of familiarity) with a story might make it more pleasant because the information subsequently becomes easier to process. Finally, the pair appear all but entirely disinterested in positing any reasons as to why so many people might find spoilers unpleasant. The most they offer up is the possibility that suspense might have something to do with it, but we’ll return to that point later. The authors, like your average person discussing spoilers, didn’t offer anything resembling a compelling reason as for why people might not like them. They simply note that many people think spoilers are unpleasant and move on.

In any case, to test whether spoilers really spoiled things, they recruited approximately 800 subjects to read a series of short stories, some of which came with a spoiler, some of which without, and some in which the spoiler was presented as the opening paragraph of the short story itself. These stories were short indeed: between 1,400 and 4,200 words a piece, which amounts to the approximate length of this post to about three of them. I think this happens to be another important detail to which I’ll return later, (as I have no intention of spoiling my ideas fully yet). After the subjects had read each story, they rated how much they enjoyed it on a scale of 1 to 10. Across all three types of stories that were presented – mysteries, ironic twists, and literary ones – subjects actually reported liking the spoiled stories somewhat more than the non-spoiled ones. The difference was slight, but significant, and certainly not in the spoiler-are-ruining-things direction. From this, the authors suggest that people are, in fact, mistaken in their beliefs about whether spoilers have any adverse impact on the pleasure one gets from a story. They also suggest that people might like birthday presents more if they were wrapped in clear cellophane.

Then you can get the disappointment over with much quicker.

Is this widespread avoidance of spoilers just another example of quirky, “irrational” human behavior, then, born from the fact that people tend to not have side-by-side exposure to both spoiled and non-spoiled version of a story? I think Leavitt & Christenfeld are being rather hasty in their conclusion, to put it mildly. Let’s start with the first issue: when it comes to my concern over watching the E3 coverage, I’m not worried about getting spoilers for any and all games. I’m worried about getting spoilers for one specific game, and it’s a game from a series I already have a deep emotional commitment to (Dark Souls, for the curious reader). When Harry Potter fans were eagerly awaiting the moment they got to crack open the next new book in the series, I doubt they would care much one way or the other if you told them about the plot to the latest Die Hard movie. Similarly, a hardcore Star Wars fan would probably not have enjoyed someone leaving the theater in 1980 blurting out that Darth Vader was Luke’s father; by comparison, someone who didn’t know anything about Star Wars probably wouldn’t have cared. In other words, the subjects likely have absolutely no emotional attachment to the stories they were reading and, as such, the information they were being given was not exactly a spoiler. If the authors weren’t studying what people would typically consider aversive spoilers in the first place, then their conclusions about spoilers more generally are misplaced.

One of the other issues, as I hinted at before, is that the stories themselves were all rather short. It would take no more than a few minutes to read even the longest of them. This lack of investment of time could cause a major issue for the study but, as the authors didn’t posit any good reasons for why people might not like spoilers in the first place, they didn’t appear to give the point much, if any, consideration. Those who care about spoilers, though, seem to be those who consider themselves part of some community surrounding the story; people who have made some lasting emotional connection with in it along with at least a moderately deep investment of time and energy. At the very least, people have generally selected the story to which they’re about to be exposed themselves (which is quite unlike being handed a preselected story by an experimenter).

If the phenomenon we’re considering appears to be a costly act with no apparent compensating benefits – like actively avoiding information that would otherwise require a great deal of temporal investment to obtain – then it seems we’re venturing into the realm of costly signaling theory (Zahavi, 1975). Perhaps people are avoiding the information ahead of time so they can display their dedication to some person, group, or signal something about themselves by obtaining the information personally. If the signal is too cheap, its information value can be undermined, and that’s certainly something people might be bothered by.

So, given the length of these stories, there didn’t seem to be much that one could actually spoil. If one doesn’t need to invest any real time or energy in obtaining the relevant information, spoilers would not be likely to cause much distress, even in cases where someone was already deeply committed to the story. At worst, the spoilers have ruined what would have been 5 minutes of effort. Further, as I previously mentioned, people don’t seem to dislike receiving all kinds of information (“spoilers” about the location of food or plot detains from stories they don’t care about, for instance). In fact, we ought to expect people to crave these “spoilers” with some frequency, as information gain for cheap or free is, on the whole, generally a good thing. It is only when people are attempting to signal something with their conspicuous ignorance that we ought to expect “spoilers” to actually be spoilers, because it is only then that they have the potential spoil anything. In this case, they would be ruining an attempt to signal some underlying quality of the person who wants to find out for themselves.

Similar reasoning helps explain why it’s not enough for them to just hate people privately.

In two short pages, then, the paper by Leavitt & Christenfeld (2011) demonstrates a host of problems that can be found in the field of psychological research. In fact, this might be the largest number of problems I’ve seen crammed into such a small space. First, they appear to fundamentally misunderstand the topic they’re ostensibly researching. It seems, to me, anyway, as if they’re trying to simply find a new “irrational belief” that people hold, point it out, and say, “isn’t that odd?”. Of course, simply finding a bias or mistaken belief doesn’t explain anything about it, and there’s little to no apparent effort made to understand why people might hold said odd belief. The best the authors offer is that the tension in a story might be heightened by spoilers, but that only comes after they had previously suggested that such suspense might detract from enjoyment by diverting a reader’s attention. While these two claims aren’t necessarily opposed, they seem at least somewhat conflicting and, in any case, neither claim is ever tested.

There’s also a conclusion that vastly over-reaches the scope of the data and is phrased without the necessary cautions. They go from saying that their data “suggest that people are wasting their time avoiding spoilers” to intuitions about spoilers just being flat-out “wrong”. I will agree that people are most definitely wasting their time by avoiding spoilers. I would just also add that, well, that waste is probably the entire point.

References: Leavitt JD, & Christenfeld NJ (2011). Story spoilers don’t spoil stories. Psychological science, 22 (9), 1152-4 PMID: 21841150

Zahavi, M. (1975). Mate selection – A selection for a handicap. Journal of Theoretical Biology, 53, 205-214.