Examining Arousal And Homophobia

In my last post, I mentioned that the idea of people misplacing or misinterpreting their arousal as being a silly one (as I also did previously here). Today, I wanted to talk about that arousal issue again. In the wake of the supreme court’s legalization of same-sex marriage here in the US, let’s consider arousal in the context straight men’s penises reacting to gay, straight, and lesbian pornography. Specifically, I wanted to discuss a rather strange instance where some people have interpreted men’s physiological arousal as sexual arousal, despite the protests of those men themselves, in the apparent interests of making a political point about homophobia. The political point in question happens to be that a disproportionate number of homophobes are actually latent homosexual themselves who, in true Freudian fashion, are trying to deny and suppress their gay urges in the form of their homophobic attitudes  (see here and here for some examples).

Homosexual individuals, on the other hand, are only repressing a latent homophobia

The paper in question I wanted to examine today is a 1996 piece by Adams, Wright, & Lohr. The paper was designed to test a Freudian idea about homophobia: namely, as mentioned above, that individuals might express homophobic attitudes as a result of their own internal struggle regarding some unresolved homosexual desires. As an initial note, this idea seems rather on the insane side of things, as many Freudian ideas tend to seem. I won’t get too mired in the reasons the idea is crazy, but it should be sufficient to note that the underlying idea appears to be that people develop maladaptive sexual desires in early childhood (long before puberty, when they’d be relevant) which then need to be suppressed by different mechanisms that don’t actually do that job very well. In other words, the idea seems to be positing that we have cognitive mechanisms whose function is generate maladaptive sexual behavior, only to develop different mechanisms later that (poorly and inconsistently) suppress the maladaptive ones. If that isn’t torturous logic, I don’t know what would be.

In any case, the researchers recruited 64 men from their college’s subject pool who had all previously self-identified as 100% straight. These men were then given the internalized homophobia scale (IHP), which, though I can’t access the original paper with the questions, appears to contain 25 questions aimed at assessing people’s emotional reactions to homosexuals, largely focused on their level of comfort/dread being around them. The men were divided into two groups: those who scored above the midpoint on the scale (the men labeled as homophobes) and those who scored below the midpoint (the non-homophobes). Each subject was provided with a stain gauge to attach to their penis which functioned to measure changes in penile diameter; basically how erect the men were getting. Each subject then watched three, four-minute long pornographic scenes: one depicting heterosexual intercourse, another gay intercourse, and another for lesbian intercourse. After each clip, they were asked how sexually aroused they were and how erect their penis was, before being given a change to return to flaccid before the next clip was shown.

In terms of the arousal to the heterosexual and lesbian pornography, there was no difference between the homophobic and non-homophobic groups with respect to how erect the men got and how aroused they reported being. However, in the gay porn condition, the homophobic men became more erect. Framed in terms of the degree of tumescence (engorgement), the non-homophobic men displayed no tumescence 66% of the time, modest tumescence 10% of the time, and definite tumescence 24% of the time in response to the gay porn; the corresponding numbers for the homophobic group were 20%, 26%, and 55%, respectively, while there was no difference between the homophobic and non-homophobic groups with respect how aroused they reported being, the physiological arousal did seem to differ. So what’s going on here? Does homophobia have its roots in some latent homosexual desires being denied?

And does ignoring those desires place you in the perfect position for penetration?

I happen to think that such an idea is highly implausible. There are a few reasons I feel that way, but let’s start with the statistical arguments for why that interpretation probably isn’t right. In terms of the number of men who identify as homosexual or bisexual at a population level, we’re only looking about 1-3%. Given that rough estimate, with a sample size of 60 individuals, you should expect about 1.5 gay people if you were sampling randomly. However, this sampling was anything but random: the subjects were selected specifically because they identified as straight. This should bias the number of gay or bisexual participants in the study downward. Simply put, this sample size is not large enough to expect that any gay or bisexual male participants were in it at all, let alone in large enough numbers to detect any kind of noticeable effect. That problem gets even worse in that they’re looking to find participants that are both bisexual/gay and homophobic, which cuts the probability down even further.

The second statistical reason to be wary of these results is that bisexual men tend to be less common that gay men by a ratio of approximately 1:2. However, the pattern of results observed in the paper from the homophobic group could better be described as bisexual than gay: each group reported the same degree of subjective and physiological arousal to the straight and lesbian porn; there was only the erection difference observed during the homosexual porn. This means that the sample would have been needed to have been compromised of many bisexual homophobes who publicly identified as straight, which seems outlandishly unlikely.

Moreover, the sheer number of the participants displaying “definite tumescence” requires some deeper consideration. If we assume that the physiological arousal translates directly into some kind of sexual desire, then about 25% of non-homophobic men and 55% of homophobic men are sexually interested in homosexual intercourse despite, as I mentioned before, only about 1-3% of the population saying they are gay or bisexual. Perhaps that rather strange state of affairs holds, but a much likelier explanation is that something has gone wrong in the realm of interpretation somewhere. Adams et al (1996) note in their discussion that another interpretation of their results involves the genital swelling being the result of other arousing emotions, such as anxiety, rather than sexual arousal per se. While I can’t say whether such an explanation is true, I can say that it certainly sounds a hell of a lot more plausible than the idea that most homophobes (and about 1-in-4 non-homophobes) are secretly harboring same-sex desires. At least the anxiety-arousal explanation could, in principle, explain why 25% of non-homophobic men’s penises wiggled a little when viewing guy-on-guy action; they’re actually uncomfortable.

Maybe they’re not as comfortable with gay people as they like to say they are…

Now don’t get me wrong: to the extent that one perceives there to be social costs associated with a particular sexual orientation (or social attitude), we should expect people to try and send the the message that they do not possess such things to others. Likewise, if I’ve stolen something, there might be a good reason for me to lie about having stolen it publicly if I don’t want to suffer the costs of moral condemnation for having done so. I’m not saying that everyone will be accurate or truthful about themselves at all times to others; far from it. However, we should also expect that others will not be accurate or truthful about others either, at least to the extent they are trying to persuade people about things. In this case, I think people are misinterpreting data on physiological arousal to imply a non-existent sexual arousal for the purposes of making some kind of social progress. After all, if homophobes are secretly gay, you don’t need to take their points into consideration to quite the same degree you might have otherwise (since once we reach a greater level of societal acceptance, they’ll just come out anyway and probably thank you for it, or something along those lines). I’m all for social acceptance; just not at the expense of accurately understanding reality.

References: Adams, H., Wright L., & Lohr, B. (1996). Is homophobia associated with homosexual arousal? Journal of Abnormal Psychology, 105, 440-445.

Evolutionary Marketing

There are many popular views about the human mind that, roughly, treat it as a rather general-purpose kind of tool: one that’s not particularly suited to this task or that, but more as a Jack of all trades and master of none. In fact, many such perspectives view the mind as (baffling) being wrong about the world almost all the time. If one views the mind this way, one can be lead into making some predictions about how it ought to behave. As one for instance, some people might predict that our minds will, essentially, mistake one kind of arousal for another. A common example of this thinking involves experiments in which people are placed in a fear-arousal condition in the hopes that they will subsequently report more romantic or sexual attraction to certain partners they meet at that time. The explanation for this finding often hinges on some notion of people “misplacing” their arousal – since both kinds of arousal involve some degree of overlapping physiological responses – or reinterpreting a negative arousal as a positive one (e.g., “I dislike being afraid, so I must actually be turned on instead”). I happen to think that such explanations can’t even possibly be close to true, largely because the response to arousal generated by fear and sexual interest should motivate categorically different kinds of behavior.

Here’s one instance where an arousal mistake like that can be costly

Bit by bit, this view of the human mind is being eroded (though progress can be slow), as it does not fit the empirical evidence or possess any solid theoretical groundings. As a great example of this forward progress, consider the experiments demonstrating that learning mechanisms appear to be eloquently tailored to specific kinds of adaptive problems, since learning to, say, avoid poisonous foods requires much different cognitive rules, inputs, and outputs, than learning to avoid predator attacks. Learning, in other words, represents a series of rather domain-specific tasks which a general-purpose mechanism could not navigate successfully. As psychological hypotheses begin to get tailored more closely to considerations of recurrent adaptive problems, new previously-unappreciated, features of our minds come into stark relief.

So let’s return to the matter of arousal and think about how arousal might impact our day-to-day behavior, specifically with respect to persuasion; a matter of interest to anyone in the fields of marketing or advertising. If your goal is to sell something to someone else – to persuade them to buy what you’re offering – the message you use to try and sell it is going to be crucial. You might, for example, try to appeal to someone’s desire to stand out from the crowd in order to get them interested in your product (e.g., “Think different“); alternatively, you might try to appeal to the popularity of a product to get them to buy (e.g., “The world’s most popular computer”). Importantly, you can’t try to send both of these messages at once (“Be different by doing that thing everyone else is doing”), so which message should you use, and in what contexts should you use it?

A paper by Griskevicius et al (2009) sought to provide an answer to that very question by considering the adaptive functions of particular arousal states. Previous accounts examining how arousal affected information processing were on the general side of things: the general arousal-based accounts would predict that arousal – irrespective of the source – should yield shallower processing of information, causing people to rely more on mental heuristics, like scarcity or popularity, when assessing a product; affect valance-based accounts took this idea one step further, suggesting that positive emotions, like happiness, should yield shallower processing, whereas negative emotions, like fear, should yield deeper processing. However, the authors proposed a new way of thinking about arousal – based on evolutionary theory that suggests those previous theories are too vague to help us truly understand how arousal shapes behavior. Instead, one needs to consider what adaptive functions particular arousal states serve in order to understand when one type of message will be persuasive in that context.

Don’t worry; if this gets too complicated, you can just fall back on using sex

To demonstrate this point, Griskevicius et al (2009) examined two arousal-inducing contexts: the aforementioned fear and romantic desire. If the general arousal-based accounts are correct, both the scarcity and popularity appeals should become more persuasive as people become aroused by romance or fear; by contrast, if the affect valance-accounts are correct, the positively-valanced romantic feelings should make all sorts of heuristics more persuasive, whereas the negatively-valanced fear arousal should make both less persuasive. The evolutionary account instead focuses on the functional aspects of fear and romance: fear activates self-defense-relevant behavior, one form of which would be to seek safety in numbers; a common animal defense tactic. If one were motivated to seek safety in numbers, a popularity appeal might be particularly persuasive (since that’s where a lot of other people are), whereas a scarcity appeal would not be; in fact, sending the message that a product would help make one stand out from the crowd when they’re afraid could actually be counterproductive. By contrast, if one is in a romantic state of mind, positively differentiating oneself from your competition can be useful for attracting and subsequently retaining attention. Accordingly, romance-based arousal might have the reverse effect, making popularity heuristics less persuasive while making scarcity appeals more so.

To test these ideas, Griskevicius et al (2009) induced romantic desire or fear in about 300 participants by having them read stories or watch movie clips related to each domain. Following the arousal-inducing, participants were then asked to briefly examine an advertisement for a museum or restaurant which contained a message that appealed to popularity (e.g., “visited by over 1,000,000 people each year”), scarcity (“stand out from the crowd”), or neither message, and then report on how appealing the location was and whether or not they would be likely to go there (on a 9-point scale across a few questions).

As predicted, the fear condition led to popularity messages to be more persuasive (M = 6.5) than the control advertisements (M = 5.9). However, fear had the opposite effect for the scarcity messages (M = 5.0), making them less appealing than the control ads. That pattern of results was flipped for the romantic desire condition: scarcity appeals (M = 6.5) were more persuasive than controls (M = 5.8), whereas the popularity appeals were less persuasive than either (M = 5.0). Without getting too bogged down in the details on their second experiment, the authors also reported that these effects were even more specific than that: in particular, appeals to scarcity and popularity only had their effects when discussing behavioral aspects (stand out from the crowd/everyone’s doing it); when discussing attitudes (everyone’s talking about it) or opportunities (limited time offer) popularity and scarcity did not differ in their effectiveness, regardless of the type of arousal being experienced.

One condition did pose interpretive problems, though…

Thinking about the adaptive problems and selection pressures that shaped our psychology is critical for constructing hypotheses and generating theoretically plausible explanations for understanding its features. Expecting some kind of general arousal, emotional valance, or other such factors to explain much about the human (or nonhuman) mind is unlikely to pan out well; indeed, it hasn’t been working out for the field for many decades now. I don’t suspect such general explanations will disappear in the near future, despite their lack of explanatory power, though; they have saturated much of the field in psychology and many psychologists lack the necessary theoretical background to fully appreciate why such explanations are implausible to begin with. Nevertheless, I remain hopeful that someday the future of psychology might not include reams of thinking about misplaced arousal and general information processing mechanisms that are, apparently, quite bad at solving important adaptive problems.

References: Griskevicius, V., Goldstein, N., Mortensen, C., Sundie, J., Cialdini, R., & Kenrick, D. (2009). Fear and loving in Las Vegas: Evolution, emotion, and persuasion. Journal of Marketing Research, 46, 384-395.

A Curious Case Of Welfare Considerations In Morality

There was a stage in my life, several years back, where I was a bit of a chronic internet debater. As anyone who has engaged in such debates – online or off, for that matter – can attest to, progress can be quite slow if any is observed at all. Owing to the snail’s pace of such disputes, I found myself investing more time in them than I probably should have. In order to free up my time while still allowing me to express my thoughts, I created my own site (this one) where I could write about topics that interested me, express my view points, and then be done with them, freeing me from the quagmire of debate. Happily, this is a tactic that has not only proven to be effective, but I like to think that it has produced some positive externalities for my readers in the form of several years worth of posts that, I am told, some people enjoy. Occasionally, however, I do still wander back into a debate here and there, since I find them fun and engaging. Sharing ideas and trading intellectual blows is nice recreation.

 My other hobbies follow a similar theme

In the wake of the recent shooting in Charleston, the debate I found myself engaged in concerned the arguments for the moral and legal removal guns from polite society, and I wanted to write a bit about it here, serving both the purposes of cleansing it from my mind and, hopefully, making an interesting point about our moral psychology in the process. The discussion itself centered around a clip from one of my favorite comedians, Jim Jefferies, who happens to not be a fan of guns himself. While I recommend watching the full clip and associated stand-up because Jim is a funny man, for those not interested in investing the time and itching to get to the moral controversy, here’s the gist of Jim’s views about guns:

“There’s one argument and one argument alone for having a gun, and this is the argument: Fuck off; I like guns”

While Jim notes that there’s nothing wrong with saying, “I like something; don’t take it away from me”, the rest of the routine goes through various discussions of how other arguments for the owning of guns are, in Jim’s word’s, bullshit (including owning guns for self-defense or the overthrow of an oppressive government. For a different comedic perspective, see Bill Burr).

Laying my cards on the table, I happen to be one of those people who enjoys shooting recreationally (just target practice; I don’t get fancy with it and I have no interest in hunting). That said, I’m not writing today to argue with any of Jim’s points; in fact, I’m quite sympathetic to many of the concerns and comments he makes: on the whole, I feel the expected value of guns, in general, to be a net cost for society. I further feel that if guns were voluntarily abandoned by the population, there would probably be many aggregate welfare benefits, including reduced rates of suicide, homicide, and accidental injury (owing to the possibility that many such conflicts are heat of the moment issues, and lacking the momentary ability to employ deadly force might mean it’s never used at all later). I’m even going to grant his point I quoted above: the best justification for owning a gun is recreational in nature. I don’t ask that you agree or disagree with all this; just that you follow the logical form of what’s to come.

Taking all of that together, the argument for enacting some kind of legal ban of guns – or at the very least the moral condemnation of the ability to own them – goes something like this: because the only real benefit to having a gun is that you get to have some fun with it, and because the expected costs to all those guns being around tend to be quite high, we ought to do away with the guns. The welfare balance just shifts away from having lots of deadly weapons around. Jim even notes that while most gun owners will never use their weapons intentionally or accidentally to inflict costs on others or themselves, the law nevertheless needs to cater to the 1% or so of people who would do such things. So, this thing – X – generates welfare costs for others which far outstrip its welfare benefits, and therefore should be removed. The important point of this argument, then, would seem to focus on these welfare concerns.

Coincidentally, owning a gun may make people put a greater emphasis on your concerns

The interesting portion of this debate is that the logical form of the argument can be applied to many other topics, yet it will not carry the same moral weight; a point I tried to make over the course of the discussion with a very limited degree of success. Ideas die one person at a time, the saying goes, and this debate did not carry on to the point of anyone losing their life.

In the case, we can try and apply the above logic to the very legal, condoned, and often celebrated topic of alcohol. On the whole, I would expect that the availability of alcohol is a net cost for society: drunk driving deaths in the US yield about 10,000 bodies (a comparable number to homicides committed with a firearm), which directly inflict costs on non-drinkers. While it’s more difficult to put numbers on other costs, there are a few non-trivial matters to consider, such as the number of suicides, assaults, and non-traffic accidents encouraged by the use of alcohol, the number of unintended pregnancies and STIs spread through more casual and risky drunk sex, as well as the number of alcohol-related illnesses and liver damage. Broken homes, abused and neglected children, spirals of poverty, infidelity, and missed work could also factor into these calculations somewhere. Both of these products – guns and booze – tend to inflict costs on individuals other than the actor when they’re available, and these costs appear to be substantial,

So, in the face of all those costs, what’s the argument in favor of alcohol being approved of, legally or moally? Well, the best and most common argument seems to be, as Jim might say, “Fuck off; I like drinking”. Now, of course, there are some notable differences between drinking and owning guns, mainly being that people don’t often drink to inflict costs on others while many people do use guns to intentionally do harm. While the point is well taken, it’s worth bearing in mind that the arguments against guns are not the same arguments against murder. The argument as it pertains to guns seemed to be, as I noted above, that regular people should not be allowed to own guns because some small portion of the population that does have one around will do something reprehensible or stupid with it, and that these concerns trump the ability of the responsible owners to do what they enjoy. Well, presumably, we could say the same thing about booze: even if most people who drink don’t drive while drunk, and even if not all drunk drivers end up killing someone, our morals and laws need to cater to that percentage of people that do.

(As an aside, I spent the past few years at New Mexico State University. One day, while standing outside a classroom in the hall, I noticed a poster about drunk driving. The intended purpose of the flyer seemed to be to inform students that most people don’t drive drunk; in fact, about 75% students reported not driving under the influence, if I recall correctly. That does mean, of course, that about 1 in 4 students did at some point, which is a worrying figure; perhaps enough to make a solid argument for welfare concerns)

There is also the matter of enforcement: making alcohol illegal didn’t work out well in the past; making guns illegal could arguably be more successful on a logistical level. While such a point is worth thinking about, it is also a bit of a red herring from the heart of the issue: that is, most people are not opposed to the banning of alcohol because it’s difficult in practice, but otherwise supportive of the measure on principle; instead, people seem as if they would oppose the idea even if it could be implemented efficiently. People’s moral judgments can be quite independent of enforcement capacity. Computationally, it seems like the judgments concerning whether something is worth condemning in the first place ought to proceed judgments about whether it could be done feasibly, simply because the latter estimation is useless without the former. Spending time thinking about what one could punish effectively without any interest in following through would be like thinking about all the things one could chew and swallow when they’re hungry, even if they wouldn’t want to eat them.

Plenty of fiber…and there’s lots of it….

There are two points to bear in mind from this discussion to try and tie it back to understanding our own moral psychology and making a productive point. The first is that there is some degree of variance in moral judgments that is not being determined by welfare concerns. Just because something ends up resulting in harm to others, people are not necessarily going to be willing to condemn it. We might (not) accept a line of reasoning for condemning a particular act because we have some vested interest in (encouraging) preventing it while categorically (accepting) rejecting that same line in other cases where our strategic interests run in the opposite direction; interests which we might not even be consciously aware of in many cases. This much, I suspect, will come as no surprise to anyone, especially because other people in debates are known for being so clearly biased to you, the dispassionate observer. Strategic interests lead us to preference our own concerns.

The other point worth considering, though, is that people raise or deny these welfare concerns in the interests of being persuasive to others. The welfare of other people appears to have some impact on our moral judgments; if welfare concerns were not used as inputs, it would seem rather strange that so many arguments about morality often lean so heavily and explicitly upon them. I don’t argue that you should accept my moral argument because it’s Sunday, as that fact seems to have little bearing to my moral mechanisms. While this too might seem obvious to people (“of course other people’s suffering matters to me!”), understanding why the welfare of others matters to our moral judgments is a much trickier explanatory issue than understanding why our own welfare matters to us. Both of these are matters that any complete theory of morality needs to deal with.

The Morality Of Guilt

Today, I wanted to discuss the topic of guilt; specifically, what the emotion is, whether we should consider it to be a moral emotion, and whether it generates moral behavioral outputs. The first part of that discussion will be somewhat easier to handle than the latter. In the most common sense, guilt appears to an emotion aroused by the perception of wrong-doing which has harmed someone else on the part of the individual experiencing guilt. The negative feelings that accompany guilt often lead to the guilty party desiring to make amends to the injured one so as to compensate the damage done and repair the relationship between the two (e.g., “I’m sorry that totaled your car by driving it into your house; I feel like a total heel. Let me buy you dinner to make up for it”). Because the emotion appears to be aroused by the perceptions of a moral transgression – that is, someone feels they have done something wrong, or impermissible –  it seems like guilt could rightly be considered a moral emotion; specifically, an emotion related to moral conscience (a self regulating mechanism), rather than moral condemnation (an other regulating mechanism).

Nothing beats packing for a nice, relaxing guilt trip

The understanding that guilt is a moral emotion, then, allows us to inform our opinion about what kind of thing morality is by examining how guilt works in greater, proximate detail. In other words, we can infer what adaptive value our moral sense might have had through studying the form of the emotional guilt mechanisms: what inputs they use and what outputs they produce. This brings us to some rather interesting work I recently dug out of my backlog of papers to read, by de Hooge et al (2011), that focused on figuring out what kinds of effects guilt tends to have on people’s behavior when you take guilt out of a dyadic (two-person) relationship and drop it into larger groups of people. The authors were interested, in part, on deciding whether or not guilt could be classified as a morally good emotion. While they acknowledge guilt is a moral emotion, they question whether it produces morally good outcomes in certain types of situations.

This leads naturally to the following question: what is a morally good outcome? The answer to that question is going to depend on what type of function one thinks morality has. In this case, de Hooge et al (2011) write as if our moral sense is an altruism device – one that functions to deliver benefits to others at a cost to one’s self. Accordingly, a morally good outcome is going to be one that results in benefits flowing to others at a cost to the actor. Framed in terms of guilt, we might expect that individuals experiencing guilt will behave more altruistically than individuals who are not; the guilty’s regard for the welfare of others will be regulated upwards, with a corresponding down-regulation placed on their own welfare. The authors note that much of the previous research on guilt has uncovered evidence consistent with that pattern: guilty parties tend to forgo benefits to themselves or suffer costs in order to deliver benefits to the party they have wronged. This makes guilt look rather altruistic.

Such research, however, was typically conducted in a two-party context: the guilty party and their victim. This presents something of an interpretative issue, inasmuch as the guilty party only has that one option available to them: if, say, I want to make you better off, I need to suffer a cost myself. While that might make the behavior look altruistic in nature, in the social world that we reside within, that is usually not the only option available; I could, for instance, also make you better off not at an expense to myself, but rather at the expense of someone else; an outcome most people wouldn’t exactly call altruism, and one de Hooge et al (2011) wouldn’t consider morally good either. To the extent a guilty party is interested in making their victim better off in both case, both outcomes would look the same in a two-party case; to the extent the guilty party is interested in behaving altruistically towards the victimized party, though, things would look different in a three-party context.

As they usually do…

de Hooge et al (2011) report on the results of three pilot studies and four experiments examining how guilt affects behavior in these three-party contexts in terms of welfare-relevant choices. While I don’t have time to discuss all of what they did, I wanted to highlight one of their experiments in more detail while noting that each of them generated data consistent with the same general pattern. The experiment I will discuss is their third one. In that experiment, 44 participants were assigned to either a guilt or a control condition. In both conditions, the participants were asked to complete a two-part joint effort task with another person to earn payment rewards. Colored letters (red or green) would pop up on each player’s screens and the participant and their partner had to click a button quickly in order to complete the task: the participant would push the button if the letter was green, whereas their partner would have to push if the letter was red. In the first part of the task, the performance of both the participant and their partner would be earning rewards for the participant; in the second part, the pair would be earning rewards for the partner instead. Each reward was worth 8 units of what I’ll call welfare points.

The participants were informed that while they would receive the bonus from the first round, their partner would not receive a bonus from the second. In the control condition, the partner did not earn the bonus because of their own poor performance; in the guilt condition, the partner did not earn the bonus because of the participant’s poor performance. In the next phase of this experiment, the participants were presented with three pay offs: their own, their partner’s, and an unrelated individual from the experiment who had also earned the bonus. The participants were told that one of the three would be randomly assigned the chance to redistribute the earnings though, of course, the participants always received that assignment. This allowed participants to give a benefit to their partner, but to do so at either a cost to themselves or at a cost to someone else.

Out of the 8 welfare units the participants had earned, they opted to give an average of 2.2 of them to their partner in the guilt condition, but only 1 unit in the control condition, so guilt did seem to make the participants somewhat more altruistic. Interestingly, however, guilt made participants even more willing to take from the outside party: guilty parties took an average of 4.2 units from the third party for their partner, relative to the 2.5 units they took in the control condition. In short, the participants appeared to be interested in repairing the relationship between themselves and their partners, but were more interested in doing so via taking from someone else, rather than giving up their own resources. Participants also viewed the welfare of the third party as being relatively unimportant as compared to the welfare of the partner they had ostensibly failed.

“To make up for hurting Mike, I think it’s only fair that Karen here suffers”

This returns us to the matter of what kind of thing morality is. de Hooge et al (2011) appear to view morality as an altruism device and view guilt as a moral emotion, yet, strangely, guilt did not appear to make people substantially more altruistic; instead, it seems to make them partial. Given that guilt was not making people behave more altruistically, we might want to reconsider the adaptive function of morality. What if, rather than acting as an altruism device, morality functions as an association management mechanism? If our moral sense functions to build and manage partial relationships, benefiting someone you’ve harmed at the expense of other targets of investment might make more sense. This is because there are good reasons to suspect that friendships represent partial allies maintained in the service of being able to win potential future disputes (DeScioli & Kurzban, 2009). These partial alliances are rank-ordered, however: I have a best friend, close friends, and more distant ones. In order to signal that I rank you highly as a friend, then, I need to demonstrate that I value you more than other people. Showing that I value you highly relative to myself – as would be the case with acts of altruism – would not necessarily tell you much about your value as my friend, relative to other friends. By contrast, behaving in ways that signal I value you more than others at least temporarily – as appeared to be the case in current experiments – could serve to repair a damaged alliance. Morality as an altruism device doesn’t fit the current pattern of data; an alliance management device does, though.

References: DeScioli, P. & Kurzban, R. (2009). The alliance hypothesis for human friendship. PLoS ONE 4(6): e5802. doi:10.1371/journal.pone.0005802

de Hooge, I. Nelissen R., Breugelmans, S., & Zeelenberg, M. (2011). What is moral about guilt? Acting “prosocially” at the disadvantage of others. Journal of Personality & Social Psychology, 100, 462-473.


Privilege And The Nature Of Inequality

Recently, there’s been a new comic floating around my social news feeds claiming that it will forever change the way I think about something. It’s not like there’s ever isn’t such article on my feeds, really, but I decided it would provide me with the opportunity to examine some research I’ve wanted to write about for some time. In the case of this mind-blowing comic, the concept of privilege is explained through a short story. The concept itself is not a hard one to understand: privilege here refers to cases in which an individual goes through their life with certain advantages they did not earn. The comic in question looks at an economic privilege: two children are born, but one has parents with lots of money and social connections. As expected, the one with the privilege ends up doing fairly well for himself, as many burdens of life have been removed, while the one without ends up working a series of low-paying jobs, eventually in service to the privileged one. The privileged individual declares that nothing has ever been handed to him in life as he is literally being handed some food on a silver platter by the underprivileged individual, apparently oblivious to what his parent’s wealth and connections have brought him.

Stupid, rich baby…

In the interests of laying my cards on the table at the outset, I would count myself among those born into privilege. While my family is not rich or well-connected the way people typically think about those things, there haven’t been any necessities of life I have wanted for; I have even had access to many additional luxuries that others have not. Having those burdens removed is something I am quite grateful for, and it has allowed me to invest my time in ways other people could not. I have the hard-work and responsibility of my parents to thank for these advantages. These are not advantages I earned, but they are certainly not advantages which just fell from the sky; if my parents had made different choices, things likely would have worked out differently for me. I want to acknowledge my advantages without downplaying their efforts at all.

That last part raises a rather interesting question that pertains to the privilege debate, however. In the aforementioned comic, the implication seems to be – unless I’m misunderstanding it – that things likely would have turned out equally well for both children if they had been given access to the same advantages in their life. Some of the differences that each child starts with seems to be the results of their parent’s work, while other parts of that difference are the result of happenstance. The comic appears to suggest the differences in that case were just due to chance: both sets of parents love their children, but one set seems to have better jobs. Luck of the draw, I suppose. However, is that the case for life more generally; you know, the thing about which the comic intends to make a point?

For instance, if one set of parents happen to be more short-term oriented – interested in taking rewards now rather than foregoing them for possibly larger rewards in the future, i.e., not really savers – we could expect that their children will, to some extent, inherit those short-term psychological tendencies; they will also inherit a more meager amount of cash. Similarly, the child of the parents who are more long-term focused should inherit their proclivities as well, in addition to the benefits those psychologies eventually accrued.

Provided that happened to be the case, what would become of these two children if they both started life in the same position? Should we expect that they both end up at similar places? Putting the questions another way, let’s imagine that, all the sudden, the wealth of this world was evenly distributed among the population; no one had more or less than anyone else. In this imaginary world, how long would that state of relative equality last? I can’t say for certain, but my expectation is that it wouldn’t last very long at all. While the money might be equally distributed in the population, the psychological predispositions for spending, saving, earning, investing, and so on are unlikely to be. Over time, inequalities will again begin to assert themselves as those psychological differences – be they slight or large – accumulate from decision after decision.

Clearly, this isn an experiment that couldn’t be run in real life – people are quite attached to their money – but there are naturally occurring versions of it in everyday life. If you want to find a context in which people might randomly come into possession of a sum of money, look no further than the lottery. Winning the lottery, both whether one wins at all and how much money you get, are as close to randomly determined as we’re going to get. If the differences between the families in the mind-blowing comic are due to chance factors, we would predict that people who win more money in the lottery should, subsequently, be doing better in life, relative to those who won smaller amounts. By contrast, if chance factors are relatively unimportant, than the amount won should be less important: whether they win large or small amounts, they might spend it (or waste it) at similar rates.

Nothing quite like a dose of privilege to turn your life around

This was precisely what was examined by Hankins et al (2010): the authors sought to assess the relationship between the amount of money won in a lottery and the probability of the winner filing for bankruptcy within a five year period of their win. Rather than removing inequalities and seeing how things shake out, then, this research took the opposite approach: examining a process that generated inequalities and seeing how long it took for them to dissipate.

The primary sample for this research were the Fantasy 5 winners in Florida from April 1993 to November, 2002 who had won $600 or more: approximately 35,000 of them after certain screening measures had been implemented. These lottery winners were grouped into those who won between $10,000 and $50,000, and those who won between $50,000 and $150,000 (subsequent analyses would examine those who won $10,000 or less as well, leading to small, medium, and large winner groups).

Of those 35,000 winners, about 2,000 were linked to a bankruptcy filing within five years of their win, meaning that a little more than 1% of winners were filing each year on average; a rate comparable to the broader Florida population. The first step was to examine whether the large winners were doing comparable amounts of bankruptcy filing prior to their win, relative to the low winners which, thankfully, they were. In pretty much all respects, those who won a lot of money did not differ from those who won less before their win (including race, gender, marital status, educational attainment, and nine other demographic variables). That’s what one would expect from the lottery, after all.

Turning to what happened after their win, within the first two years, those who won larger sums of money were less likely to file for bankruptcy than smaller winners; however, in years 3 through 5 that pattern reversed itself, with larger winners becoming more likely to file. The end result of this shifting pattern was that, in five years time, large winners were equally likely to have filed for bankruptcy, relative to smaller winners. As Hankins et al (2010) put it, large cash payments did not prevent bankruptcy; they only postponed it. This result was consistently obtained after attempting a number of different analyses, suggesting that the finding is fairly robust. In fact, when the winners eventually did file for bankruptcy, the big winners didn’t have much more to show for it than small winners: those who won between $25,000 and $150,000 only had about $8,000 more in assets than those who had won less than $1,500, and the two groups had comparable debts.

Not much of an ROI on making it rain these days, it seems

At least when it came to one of the most severe forms of financial distress, large sums of cash did not appear to stop people from falling back into poverty in the long term, suggesting that there’s more going on in the world than just poor luck and unearned privilege. Whatever this money was being spent on, it did not appear to be sound investments. Maybe people were making more of their luck than they realized.

It should be noted that this natural experiment does pose certain confounds, perhaps the most important of which is that not everyone plays the lottery. In fact, given that the lottery itself is quite a bad investment, we are likely looking at a non-random sample of people who choose to play it in the first place; people who already aren’t prone to making wise, long-term decisions. Perhaps these results would look different if everyone played the lottery but, as it stands, thinking about these results in the context of the initial comic about privilege, I would have to say that my mind remains un-blown. Unsurprisingly, deep truths about social life can be difficult to sum up in a short comic.

References: Hankins, S., Hoekstra, M., & Skiba, P. (2010). The ticket to easy street? The financial consequences of winning the lottery. Vanderbilt Law and Economics Research Paper, 10-12.

Relaxing With Some Silly Research

In psychology, there is a lot of bad research out there by all estimates. The poor quality of this research can be attributed to concerns about ideology-driven research agendas, research bias, demand characteristics, lack of any real theory guiding the research itself, p-hacking, file-drawer effects, failures to replicate, small sample sizes, and reliance on undergraduate samples, among others. Arguably, there is more bad (or at least inaccurate) research than good research floating around as, in principle, there are many more ways of being wrong about the human mind than there are of being right about it (even given our familiarity with it); a problem made worse by the fact that being (or appearing) wrong or reporting null findings does not tend to garner one social status in the world of academia. If many of the incentives reside in finding particular kinds of results – and those kinds are not necessarily accurate – the predictable result is a lot of misleading papers. Determining what parts of the existing psychological literature are an accurate description of human psychology can be something of a burden, however, owing to the obscure nature of some of these issues: it’s not always readily apparent that a paper found a fluke result or that certain shady research practices have been employed. Thankfully, it doesn’t take a lot of effort to see why some particular pieces of psychological research are silly; criticizing that stuff can be as relaxing as a day off at the beach.

Kind of like this, but indoors and with fewer women

The last time I remember coming across some of the research that can easily be recognized as silly was when one brave set of researchers asked if leaning to the left made the Eiffel tower look smaller. The theory behind that initial bit of research is called, I think, number line theory, though I’m not positive on that. Regardless of the name, the gist of the idea seems to be that people - and chickens, apparently - associate smaller numbers with a relative leftwardly direction and larger numbers with a rightwardly one. For humans, such a mental representation might make sense in light of our using certain systems of writing; for nonhumans, this finding would seem to make zero sense. To understand why this finding makes no sense, try and place it within a functional framework by asking (a) why might humans and chickens (and perhaps other animals as well) represent smaller quantities with their left, and (b) why might leaning to the left be expected to bias one’s estimate of size? Personally, I’m coming up with a blank on the answer to those questions, especially because biasing one’s estimate of size on the basis of how one is leaning is unlikely to yield more accurate estimates. A decrease in accuracy seems like that could only carry costs in this case; not benefits. So, at best, we’re left calling those findings a development byproduct for humans and likely a fluke for the chickens. In all likelihood, the human finding is probably a fluke as well.

Thankfully, for the sake of entertainment, silly research is not to be deterred. One of the more recent tests of this number line hypothesis (Anelli et al, 2014) makes an even bolder prediction than the Eiffel tower paper: people will actually get better at performing certain mathematical operations when they’re traveling to the left or the right: specifically, going right will make you better at addition and left better at subtraction. Why? Because smaller numbers are associated with the left? How does that make one better at subtraction? I don’t know and the paper doesn’t really go into that part. On the face of it, this seems like a great example of what I have nicknamed “dire straits thinking”. Named after the band’s song, “money for nothing” this type of thinking leads people to hypothesizing that others can get better (or worse) at tasks without any associated costs. The problem with this kind of thinking is that if people did possess the cognitive capacities to be better at certain tasks, one might wonder why people ever perform worse than they could. This would lead me to pose questions like, “why do I have to be traveling right to be better at addition; why not just be better all the time?” Some kind of trade-offs need to referenced to explain that apparent detriment/bonus to performance, but none ever are in dire straits thinking.

In any case, let’s look at the details of the experiment, which was quite simple. Anelli et al, (2014) had a total of 48 participants walk with an experimenter (one at a time; not all 48 at once). The pair would walk together for 20 seconds in a straight line, at which point the experimenter would call out a three-digit number, tell the participants to add or subtract from it by 3 aloud for 22 seconds, give them a direction to turn (right or left), and tell them to begin. At that point, the participant would turn and start doing the math. Each participant completed four trials: two congruent (right/addition or left/subtraction) and two incongruent (right/subtraction or left/addition). The researchers hoped to uncover a congruency effect, such that more correct calculations would be performed in the congruent, relative to incongruent, trials.

Now put the data into to the “I’m right” program and it’s ready to publish

Indeed, just such an effect was found: when participants were moving in a congruent direction as their mathematical operations, they performed more correct calculations on average (M = 10.1), relative to when they were traveling in an incongruent direction (M = 9.6). However, when this effect was broken down by direction, it turns out that the effect only exists when participants were doing addition (M = 11.1 when going right, 10.2 when going left); there was no difference for subtraction (M = 9.0 and 9.1, respectively). Why was there no effect for subtraction? Well, the authors postulate a number of possibilities – one of which being that perhaps participants needed to be walking backwards – though none of them include the possibility of the addition finding being a statistical fluke. It’s strange how infrequently this possibility is ever mentioned in published work, especially in the face of inconsistent findings.

Now one obvious criticism of this research is that the participants were never traveling right or left; they were walking straight ahead in all cases. Right or left, unlike East or West, depends on perspective. When I am facing my computer, I feel I am facing ahead; when I turn around to walk to the bathroom, I don’t feel like I’m walking behind me. The current research would thus rely on the effects of a momentary turn affecting participant’s math abilities for about half a minute. Accordingly, participants shouldn’t even have needed to be walking; asking them to turn and stand in place should be expected to have precisely the same effect. If the researchers wanted to measure walking to the right or left, they should have had participants moving to the side by sliding, rather than turning and walking forward.

Other obvious criticisms of the research could include the small sample size, the small effect size, the inconsistency of the effect (works for addition but not subtraction and is inconsistent with other research they cite which was itself inconsistent – people being better at addition when going up in an elevator but not walking up stairs, if I understand correctly), or the complete lack of anything resembling a real theory guiding the research. But let’s say for a moment that my impression of these results as silly is incorrect; let’s assume that these results accurately describe the workings of human mind in some respect. What are the implications of that finding? What, in other words, happens to be at stake here? Why would this research be published, relative to the other submissions received by Frontiers in Psychology? Even if it’s a true effect – which already seems unlikely, given the aforementioned issues – it doesn’t seem particularly noteworthy. Should people be turning to the right and left while taking their GREs? Do people need to be doing jumping jacks to improve their multiplication skills so as to make their body look more like the multiplication symbol? If so, how could you manage to do them while you’re supposed to be sitting down quietly while taking your GREs without getting kicked out of the testing site? Perhaps someone more informed on the topic could lend a suggestion, because I’m having trouble seeing the importance of it.

Maybe the insignificance of the results is supposed to make the reader feel more important

Without wanting to make a mountain out of a mole hill, this paper was authored by five researchers and presumably made it passed an editor and several reviewers before it saw publication. At a minimum, that’s probably about 8 to 10 people. That seems like a remarkable feat, given how strange the paper happens to look on its face. I’m not just mindlessly poking fun at the paper, though: I’m bringing attention to it because it seems to highlight a variety of problems in the world of psychological research. There are, of course, many suggestions as to how these problems might be ferreted out, though many of them that I have seen focus more on statistical solutions or combating researcher degrees of freedom. While such measures might reduce the quantity of bad research (like pre-registering studies), they will be unlikely to increase the absolute quality of good work (since one can pre-register silly ideas like this), which I think is an equally valuable goal. For my money, the requirement of some theoretical functional grounding for research would likely be the strongest candidate for improving work in psychology. I imagine many people would find it harder to propose such an idea in the first place if they needed to include some kind of functional considerations as to why turning right makes you better at addition. Even if such a feat was accomplished, it seems those considerations would make the rationale for the paper even easier to pick apart by reviewers and readers.

Instead of asking for silly research to be conducted on larger, more diverse samples, it seems better to ask that silly research not be conducted at all.

References: Anelli, F., Lugli, L., Baroni G., Borghi, A., & Nicoletti, R. (2014). Walking boosts your performance in making additions and subtractions. Frontiers in Psychology, 5, doi: 10.3389/fpsyg.2014.01459

Do Moral Violations Require A Victim?

If you’ve ever been a student of psychology, chances are pretty good that you’ve heard about or read a great many studies concerning how people’s perceptions about the world are biased, incorrect, inaccurate, erroneous, and other such similar adjectives. A related sentiment exists in some parts of the morality literature as well. Perhaps the most notable instance is the unpublished paper on moral dumbfounding, by Haidt, Bjorklund, & Murphy (2000). In that paper, the authors claim to provide evidence that people first decide whether an act is immoral and then seek to find victims or harms for the act post hoc. Importantly, the point seems to be that people seek out victims and harm despite them not actually existing. In other words, people are mistaken in perceiving harm or victims. We could call such tendencies the “fundamental victim error” or the “harm bias”, perhaps. If that interpretation of the results is correct, it would carry a number of implications, chief among which (for my present purposes) is that harm is not a required input for moral systems. Whatever cognitive systems are in charge of processing morally-relevant information, they seem to be able to do so without knowledge of who – if anyone – is getting harmed.

Just a little consensual incest. It’s not like anyone is getting hurt.

Now I’ve long found that implication to be a rather interesting one. The reason it’s interesting is because, in general, we should expect that people’s perceptions about the world are relatively accurate. Not perfect, mind you, but we should be expected to be as accurate as available information allows us to be. If our perceptions weren’t generally accurate, this would likely yield all sorts of negative fitness consequences: for example, believing you can achieve a goal you actually cannot could lead to the investment of time and resources in a fruitless endeavor; resources which could be more profitably spent elsewhere. Sincerely believing you’re going to win the lottery does not mean the tickets are wise investments. Given these negative consequences for acting on inaccurate information, we should expect that our perceptual systems evolved to be as accurate as they can be, given certain real-world constraints.

The only context I’ve seen in which being wrong about something could consistently lead to adaptive outcomes is in the realm of persuasion. In this case, however, it’s not that being wrong about something per se helps you, as much as someone else being wrong helps you. If people happen to think my future prospects are bright – even if they’re not – it might encourage them to see me as an attractive social partner or mate; an arrangement from which I could reap benefits. So, if some part of me happen to be wrong, in some sense, about my future prospects, and being wrong doesn’t cause me to behave in too many maladaptive ways, and it also helps persuade you to treat me better than you would given accurate information, being wrong (or biased) could be, at times, adaptive.

How does persuasion relate to morality and victimhood, you may well be wondering? Consider again the initial point about people, apparently, being wrong about the existence of harms and victims of acts they deem to be immoral. If one was to suggest that people are wrong in this realm – indeed, that our psychology appears to be designed in such a way to consistently be wrong – one would also need to couch that suggestion in the context of persuasion (or some entirely new hypothesis about why being wrong is a good thing). In other words, the argument would need to go something like this: by perceiving victims and harms where none actually exist, I could be better able to persuade other people to take my side in a moral dispute. The implications of that suggestion would seem to, in a rather straight-forward way, rely on people taking sides on moral issues on the basis of harm in the first place; if they didn’t, claims of harm wouldn’t be very persuasive. This would leave the moral dumbfounding work in a bit of a bind, theoretically-speaking, with respect to whether harms are required inputs for moral systems or not: that people perceive something as immoral and then later perceive harms would suggest harms are not required inputs; that arguments about harms are rather persuasive could suggest that harms are required inputs.

Enough about implications; let’s get to some research 

At the very least, the perceptions of victimhood and harm appear intimately tied perceptions of immorality. The connection between the two was further examined recently by Gray, Schein, & Ward, (2014) across five studies, though I’m only going to discuss one of them. In the study of interest, 82 participants each rated 12 actions on whether they wrong (1-5 scale, from ‘not wrong at all’ to ‘extremely wrong’) and whether the act had a victim (1-5 scale, from ‘definitely not’ to definitely yes’). These 12 actions were broken down into three groups of four acts each: the harmful group (including items like kicking a dog or hitting a spouse), the impure group (including masturbating to a picture of your dead sister or covering a bible with feces), and the neutral group (such as eating toast or riding a bus). The interesting twist in this study involved the time frame in which participants answered: one group was placed under a time constraint in which they had to read the question and provide their answers within seven seconds; the other group was not allowed to answer until at least a seven-second delay had passed, and were given an unlimited amount of time in which to answer. So one group was relying on, shall we say, their gut reaction, while the other was given ample time to reason about things consciously.

Unsurprisingly, there appeared to be a connection between harm and victimhood: the directly harmful scenarios generated more certainty about a victim (M = 4.8) than the impure ones (M = 2.5), and the neutral scenarios didn’t generate any victims (M = 1). More notably, the time constraint did have an effect, but only in the impure category: when answering under time constraints in the impure category, participants reported more certainty about the existence of a victim (M = 2.9) relative to when they had more time to think (M = 2.1). By contrast, the perceptions of victims in the harm (M = 4.8 and 4.9, respectively) and neutral categories (M = 1 and 1) did not differ across time constraints.

This finding puts a different interpretive spin on the moral dumbfounding literature: when people had more time to think about (and perhaps invent) victims for more ambiguous violations, they came up with fewer victims. Rather than people reaching a conclusion about immorality first and then consciously reasoning about who might have been harmed, it seems that people could have instead been reaching implicit conclusions about both harm and immorality quite early on, and only later consciously reasoning about why an act which seemed immoral isn’t actually making any worthy victims. If representations about victims and harms are arising earlier in this process than would be anticipated by the moral dumbfounding research, this might speak to whether or not harms are required inputs for moral systems.

Turns out that piece might have been more important than we thought

It is possible, I suppose, that morality could simply use harm as an input sometimes without it being a required input. That possibility would allow harm to be both persuasive and not required, though it would require some explanation as to why harm is only expected to matter in moral judgments at times. At present, I know of no such argument having ever been made, so there’s not too much to engage with on that front.

It is true enough that, at times, when people perceive victims, they tend to perceive victims in a rather broad sense, naming entities like “society” to be harmed by certain acts. Needless to say, it seems rather difficult to assess such claims, which makes one wonder how people perceive such entities as being harmed in the first place. One possibility, obviously, is that such entities (to the extent they can be said to exist at all) aren’t really being harmed and people are using unverifiable targets to persuade others to join a moral cause without the risk of being proved wrong. Another possibility, of course, is that the part of the brain that is doing the reporting isn’t quite able to articulate the underlying reason for the judgment well to others. That is, one part of the brain is (accurately) finding harm, but the talking part isn’t able to report on it. Yet another possibility still is that harm befalling different groups is strategically discounted (Marczyk (2015). For instance, members of a religious group might find disrespect towards a symbol of their faith (rubbing feces on the bible, in this case) to be indicative of someone liable to do harm to their members; those opposed to the religious group might count that harm differently – perhaps not as harm at all. Such an explanation could, in principle, explain the time-constraint effect I mentioned before: the part of the brain discounting harm towards certain groups might not have had enough time to act on the perceptions of harm yet. While these explanations are not necessarily mutually exclusive, they are all ideas worth thinking about.

References: Gray, K., Schein, C., & Ward, A. (2014). The myth of harmless wrongs in moral cognition: Automatic dyadic completion from sin to suffering. Journal of Experimental Psychology, 143, 1600-1615.

Haidt, J., Bjorklund, F., & Murphy, S. (2000). Moral dumbfounding: When intuition finds no reason. Unpublished Manuscript. 

Marczyk, J. (2015). Moral alliance strategies theory. Evolutionary Psychological Science, 1, 77-90.

(Some Of) My Teaching Philosophy

Over the course of my time at various public schools and universities I have encountered a great many teachers. Some of my teachers were quite good. I would credit my interest in evolutionary psychology to one particularly excellent teacher – Gordon Gallup. Not only was the material itself unlike anything I had previously been presented with in other psychology courses, but the way Gordon taught his classes was unparalleled. Each day he would show up and, without the aid of any PowerPoints or any apparent notes, just lecture. On occasion we would get some graphs or charts drawn on the board, but that was about it. What struck me about this teaching style is what it communicated about the speaker: this is someone who knows what he’s talking about. His command of the material was so impressive I actually sat through his course again for no credit in the follow years to transcribe them (and the similarity from year-to-year was remarkable, given that lack of notes). It was just a pleasure listening to him do what we did best.

A feat I was recently recognized for

That I say Gordon was outstanding is to say he was exceptional, relative to his peers (even if many of those peers, mistakenly, believe they are exceptional as well). The converse to that praise, then, is that I have encountered many more professors who were either not particularly good at what they did or downright awful at it (subjectively speaking, of course). I’ve had some professors who act, more or less, as an audio guide to the textbook that, when questioned, didn’t seem to really understand the material they were teaching; I’ve had another tell his class “now, we know this isn’t true, but maybe it’s useful” as he reviewed Maslow’s hierarchy of needs for what must have been the tenth time in my psychology education – a statement which promptly turned off my attention for the day. The number of examples I could provide likely outnumber my fingers and toes, so there’s no need to detail each one. In fact, just about everyone who has attended school has had experiences like this. Are these subjective evaluations of teachers that we have all made accurate representations of their teaching ability, though?

According to some research by Braga et al (2011), that answer is “yes”, but in a rather perverse sense: teacher evaluations tend to be negatively predictive of actual teaching effectiveness. In other words, at the end of a semester when a teacher receives evaluations from their students, the better these evaluations, the less effective the teacher tends to be. As someone who received fairly high evaluations from my own students, this should either be cause for some reflection as to my methods (since I am interested in my students learning; not just their being satisfied with my course) or a hunt for why the research in question must be wrong to make me feel better about my good reviews. In the interests of prioritizing my self-esteem, let’s start by considering the research and seeing if any holes can be poked in it.

“Don’t worry; I’m sure those good reviews will still reflect well on you”

Braga et al (2011) analyzed data from a private Italian university offering programs in economics, business, and law in 1998/9. The students in these programs had to take a fixed course of classes with fixed sets of materials and the same examinations. Additionally, students were randomly assigned to professors, making this one of the most controlled academic settings for this kind of research I could imagine. At the end of the terms, students provided evaluations of their instructors, allowing their ratings of instructors to be correlated – at the classroom level, as the evaluations were anonymous – with their performance in being effective teachers.

Teaching effectiveness was measured by examining how students did in subsequent courses, (controlling for a variety of non-teacher factors, like class size) the assumption being that students with better professors in the first course would do better in future courses, owing to their more proficient grasping of the material. These non-teacher factors accounted for about 57% of the variance in future course grades, leaving plenty of room for teacher effects. The effect of teachers was appreciable, with an increase of one standard deviation in effectiveness led to gain of about 0.17 standard deviations of grade in future classes (about a 2.3% bump up). Given the standardized materials and the gulf which could exist between the best and worst teachers, it seems there’s plenty of room for teacher effectiveness to matter. Certainly no students want to end up at a disadvantage because of a poor teacher; I know I wouldn’t.

When it came to the main research question, the results showed that teachers who were the least effective in providing future success for their students tended to receive the highest evaluations. This effect was sizable as well: for each standard deviation increase in teaching effectiveness, student evaluation ratings dropped by about 40% of a standard deviation. Perhaps unsurprisingly, grades were correlated with teaching evaluations as well: the better grades the students received, the better the evaluations they tended to give the professors. Interestingly, this effect did not exist in classes comprised of 25% or more of the top students (as measured by their cognitive entrance exams); the evaluations of those classes were simply not predictive of effectiveness.

That last section is the part of the paper that most everyone will cite: the negative relationship between teacher evaluations and future performance. What fewer people seem to do when referencing that finding is consider why this relationship exists and then use that answer to inform their teaching styles (as I get the sense this information will quite often be cited to excuse otherwise lackluster evaluations, rather than to change anything). The authors of the paper posit two main possibilities for explaining this effect: (1) that some teachers make class time more entertaining at the expense of learning, and/or (2) that some teachers might “teach for the test”, even if they do so at the expense of “true learning”. While neither possibility is directly tested in the paper, the latter possibility strikes me as most plausible: students in the “teaching for the test” classes might simply focus on the particular chunks of information relevant for them at the moment, rather than engaging it as a whole and understanding the subject more broadly.

In other words, vague expectations encourage cramming with a greater scope

With that research in mind, I would like to present a section of my philosophy when it came to teaching and assessment. A question of interest that I have given much thought to is what, precisely, are grades aimed at achieving? For many professors – indeed, I’d say the bulk of them – grades serve the ends of assessment. The grades are used to tell people – students and others – how well the students did at understanding the material come test time. My answer to this question is a bit different, however: as an instructor, I had no particular interest in the assessment of students per se; my interest was in their learning. I only wanted to assess my students as a means of pushing them to the end of learning. As a word of caution, my method of assessment demands substantially more effort from those doing the assessing, be it a teacher or assistant, than is typical. It’s an investment of time many might be unwilling to make.

My assessments were all short-essay style questions, asking students to apply theories they have learned about to novel questions we did not cover directly in class; there were no multiple choice questions. According to the speculations of Braga et al (2011), this would put me firmly in the “real teaching” camp, instead of the “teaching to the test” one. There are a few reasons for my decision: first, multiple choice questions don’t allow you to see what the students were thinking when answering the question. Just because someone gets an answer correct on a multiple choice exam, it doesn’t mean they got the correct answer for the right reasons. For my method to be effective, however, it does mean someone needs to read the exams in depth instead of just feeding them through a scantron machine, and that reading takes time. Second, essay exams force students to confront what they do and do not know. Having spent many years as a writer (and even more as a student), I’ve found that many ideas that seem crystal clear in my head do not always translate readily to text. The feeling of understanding can exist in lack of actual understanding. If students find they cannot explain an idea as readily as felt they understood it, that feeling might be effectively challenged, yielding a new round of engagement with the material.

After seeing where the students were going wrong, the essay format allowed me to make notes on their work and hand it back to them for revisions; something you can’t do very well with multiple choice questions either. Once the students had my comments on their work, they were free to revise it and hand it back into me. The grade they got on their revisions would be their new grade: no averaging of the two or anything of the sort. The process would then begin again, with revisions being made on revisions, until the students were happy with their grade or stopped trying. In order for assessment to serve the end of learning, assessment needs to be ongoing if you expect learning to be. If assessment is not ongoing, students have little need to fix their mistakes; they’ll simply look at their grade and then toss their test in the trash as many of them do. After all, why would they bother putting in the effort to figure out where they went wrong and how to go right if doing so successfully would have no impact whatsoever on the one thing they get from the class that people will see?

Make no mistake: they’re here for a grade. Educations are much cheaper than college.

I should also add that my students were allowed to use any resource they wanted for the exams, be that their notes, the textbook, outside sources, or even other students. I wanted them to engage with the material and think about it while they worked, and I didn’t expect them to have it all memorized already. In many ways, this format mirrors the way academics function in the world outside the classroom: when writing our papers, we are allowed to access our notes and references whenever we want; we are allowed to collaborate with others; we are allowed – and in many cases, required – to make revisions to our work. If academics were forced to do their job without access to these resources, I suspect the quality of it would drop precipitously. If these things all improve the quality of our work and help us learn and retain material, asking students to discard all of them come test time seems like a poor idea. It does require test questions to have some thought put into their construction, though, and that means another investment of time.

Some might worry that my method makes things too easy on the students. All that access to different materials means they could just get an easy “A”, and that’s why my evaluations were good. Perhaps that’s true, but just as my interest is not on assessment, my interest is also not on making a course “easy” or “challenging”; it’s on learning, and tests should be as easy or hard as that requires. As I recall, the class average for each test started at about a 75; by the end of the revisions, the average for each test had risen to about a 90. You can decide from those numbers whether or not that means my exams were too easy.

Now I don’t have the outcome measures that Braga et al (2011) did for my own teaching success. Perhaps my methods were a rousing failure when it came to getting students to learn, despite the high evaluations they earned me (in the Braga et al sample, the average teacher rating was 7 out of 10 with a standard deviation of 0.9; my average rating would be around a 9 on that scale, placing my evaluations about two standard deviations above the mean); perhaps this entire post reflects a defensiveness on my part when it comes to, ironically, having to justify my positive evaluations, just as I suspect people who cite this paper might use the results to justify relatively poor evaluations. In regards to the current results, I think both myself and others have room to be concerned: just because I received good evaluations, it does not mean my teaching method was effective; however, just because you received poor evaluations, it does not mean your teaching method is effective either. Just as students can get the right answer for the wrong reason, they can also give a teacher a good or bad evaluation for the right or wrong reasons. Good reviews should not make teachers complacent, just as poor reviews should not be brushed aside. The important point is that we both think about how to improve on our effectiveness as teachers.

References: Braga, M., Paccagnella, M., & Pellizzari, M. (2011). Evaluating students’ evaluations of professors. Economics of Education Review, 41, 71-88.  

Should We Expect Cross-Cultural Perceptual Errors?

There was a rather interesting paper that crossed my social media feeds recently concerning stereotypes about women in science fields; a topic about which I have been writing lately. I’m going to do something I don’t usually do and talk about it briefly despite having just read the abstract and discussion section. The paper, by Miller, Eagly, and Linn (2014), reported on people’s implicit gender stereotypes about science, which associated science more readily with men, relative to women. As it turns out, across a number of different cultures, people’s implicit stereotypes corresponded fairly well to the actual representation of men and women in those fields. In other words, people’s perceptions, or at least their responses, tended to be accurate: if more men were associated with science psychologically, it seemed to be because more men also happened to work in science fields. In general, this is how we should expect the mind to work. While our minds might imperfectly gather information about the world, they should do their best to be accurate. The reasons for this accuracy, I suspect, have a lot to do with being right resulting in useful modifications of behaviors.

   Being wrong about skateboarding skill, for instance, has some consequences

Whenever people propose psychological hypotheses that have to do with people being wrong, then, we should be a bit skeptical. A psychology designed in such a way so as to be wrong about the world consistently will, on the whole, tend to direct behavior in more maladaptive ways than a more accurate mind would. If one is positing that people are wrong about the world in some regard, it would require either that (a) there are no consequences for being wrong in that particular way or (b) there are some consequences, but the negative consequences are outweighed by the benefits. Most hypotheses for holding incorrect beliefs I have encountered tend towards the latter route, suggesting that some incorrect beliefs might outperform true beliefs in some fitness-relevant way(s).

One such hypothesis that I’ve written about before concerns error management theory. To recap, error management theory recognizes that some errors are costlier to make than others. To use an example in the context of the current paper I’m about to discuss, consider a case in which a man desires to have sex with a woman. The woman in question might or might not be interested in the prospect; the man might also perceive that she is interested or not interested. If the woman is interested and the man makes the mistake of thinking she isn’t, he has missed out on a potentially important opportunity to increase his reproductive output. On the other hand, if the woman isn’t interested and the man makes the mistake of thinking she is, he might waste some time and energy pursuing her unsuccessfully. These two mistakes do not carry equivalent costs: one could make the argument that a missed encounter is costlier on average, from a fitness standpoint, than an unsuccessful pursuit (depending, of course, on how much time and energy is invested in the pursuit).

Accordingly, it has been hypothesized that male psychology might be designed in such a way so as to over-perceive women’s sexual interest in them, minimizing the costs associated with making mistakes, multiplied by their frequency, rather than minimizing the number of mistakes one makes in total. While that sounds plausible at first glance, there is a rather important point worth bearing in mind when evaluating it: incorrect beliefs are not the only way to go about solving this problem: a man could believe, correctly, that a woman is not all that interested in him, but simply use a lower threshold for acceptable pursuits. Putting that into numbers, let’s say a woman has a 5% chance of having sex with the man in question: the man might not pursue any chance below 10%, and so could bias his belief upward to think he actually has a 10% chance; alternatively, he might believe she has about a 5% chance of having sex with him and decide to go after her anyway. It seems that the second route solves this problem more effectively, as a biased probability of success with a woman might have downstream effects on other pursuits.

Like on the important task of watching the road

Now in that last post I mentioned, it seems that the evidence that men over-perceive women’s sexual interest might instead be better explained by the hypothesis that women are underreporting their intentions. After all, we have no data on the probability of a woman having sex with someone given she did something like held his hand or bought him a present, so concluding that men over-perceive requires assuming that women report accurately (the previous evidence would also require that pretty much everyone else but the woman is wrong about her behavior, male or female). Some new evidence puts the hypothesis of male over-perception into even hotter water. A recent paper by Perilloux et al (2015) sought to test this over-perception bias cross-culturally, as most of the data bearing on it happens to have been derived from American samples. If men possess some adaptation designed for over-perception of sexual interest, we should expect to see it cross-culturally; it ought to be a human universal (as I’ve noted before, this doesn’t mean we should expect invariance in its expression, but we should at least find its presence).

Perilloux et al (2015) collected data from participants in Spain, Chile, and France, representing a total sample size of approximately 400 subjects. Men and women were given a list of 15 behaviors. They were asked to imagine they had been out on a few dates with a member of the opposite sex, and then about their estimates of having sex with them, given that this opposite sex individual engaged in those behaviors (from -3 being “extremely unlikely” to 3 being “extremely likely”). The results showed an overall sex difference in each country, with men tending perceive more sexual interest than women. While this might appear to support the idea that over-perception is a universal feature of male psychology, a closer examination of the data cast some doubt on that idea.

In the US sample, men perceived more sexual interest than women in 12 of the 15 items; in Spain, that number was 5, in Chile it was 2, and in France it was 1. It seemed that the question concerning whether someone bought jewelry was enough to driving this sex difference in both the French and Chilean samples. Rather than men over-perceiving women’s reported interests in general across a wide range of behaviors, it seemed that the cross-cultural sample’s differences were being driven by only a few behaviors; behaviors which are, apparently, also rather atypical for relationships in those countries (inasmuch as women don’t usually buy men jewelry). As for why there’s a greater correspondence between French and Chilean men and women’s reported likelihoods, I can’t say. However, that men from France and Chile seem to be rather accurate in their perceptions of female sexual intent would cast doubt on the idea that male psychology contains some mechanisms for sexual over-perception.

I’ll bet US men still lead in shooting accuracy, though

This paper helps make two very good points that, at first, might seem like they oppose each other, despite their complimentary nature. The first point is the obvious importance of cross-cultural research; one cannot simply take it for granted that a given effect will appear in other cultures. Many sex differences – like height and willingness to engage in casual sex – do, but some will not. The second point, however, is that hypotheses about function can be developed and even tested (albeit incompletely) in absence of data about their universality. Hypotheses about function are distinct from hypotheses about proximate form or development, though these different levels of analysis can often be used to inform others. Indeed, that’s what happened in the current paper, with Perilloux et al (2015) drawing the implicit hypothesis about universality from the hypothesis about ultimate functioning, using data about the former to inform their posterior beliefs about the latter. While different levels of analysis inform each other, they are nonetheless distinct, and that’s always worth repeating.

References: Perilloux, C., Munoz-Reyes, J., Turiegano, E., Kurzban, R., & Pita, M. (2015). Do (non-American) men overestimate women’s sexual intentions? Evolutionary Psychological Science, DOI 10.1007/s40806-015-0017-5

Miller, D., Eagly, A., & Linn, M., (2014). Women’s representation in science predicts national gender-science stereotypes: Evidence from 66 nations. Journal of Educational Psychology,  http://dx.doi.org/10.1037/edu0000005

Has A Universal Preference Just Been Challenged?

One well-documented physiological feature which plays a role in determining women’s attractiveness is the ratio of their waist to their hips (their WHR). The largest underlying reason for this preference appears to concern fertility: controlling for other factors, women with lower WHRs tend to be more fertile than women with higher ratios (Zaadstra et al, 1993). Historically, men who found lower WHRs more attractive could thus be expected to have ended up pursuing more viable mating opportunities than men who failed to do likewise. It should come as no surprise, then, that this preference for lower WHRs shows up in cross-cultural samples. The preference is so robust in its development that even men who were born blind appear to show evidence of it from touch alone, demonstrating that visual input is not required to shape this preference (doing violence to the notion that these standards are socialized into us by media forces for some arbitrary reason). The cognitive mechanism responsible for generating these perceptions of attractiveness to relatively low WHRs can be considered what we would call a universal feature of human psychology. However, there appears to be some confusion over what precisely what is meant by “universal” which I wanted to address today.

For instance, this is Mrs. Universe; not Mrs. Universal

The point of confusion focuses on whether a universal human preference should be expected to be invariant in its expression. In a new paper, Bovet & Raymond (2015) present some data they claim challenges “the universality of an ideal WHR” of about 0.7. More specifically, their claim seems to be that “the assertion that the preference for [a] WHR [of 0.7] is universal and temporally invariant” (p. 9) is incorrect because preferences for WHRs have changed over time. Before I get to what their methods and results were, I wanted to make an initial note about the assertion they sought to challenge: I find it strange. What I find particularly strange about the assertion that Bovet & Raymond (2015) seek to cast doubt on is that, and I want to be crystal clear about this, I have never heard it before. By that, I mean that I know of no author who has claimed that men have and will continue to show an invariant preference for a specific WHR over time. Checking a citation for Singh (1993) that is mentioned in conjunction with that claim, for instance, reveals no evidence of that assertion being made. The closest Singh (1993) comes to saying anything along those lines is that the significance of WHR – not a particular value of it – should be expected to be culturally invariant. In that respect, it seems that Bovet & Raymond (2015) might be tilting at windmills.

With that out of the way, let’s move on to consider what Bovet & Raymond (2015) did and what they found. For what I would consider the main study in their paper, they collected 216 images of works of art – both paintings and sculptures – representing women over the last 2,500 years. The art was collected so as to show nude or partially nude forms, allowing the WHR of the subject being depicted to be observable. Pictures of these works of art were then presented to about 1,400 diligent Mturk workers, each of whom was asked to examine 17 of these art pieces and to select which female figure it most closely resembled from an array of 12 line drawings of women; drawing which varied on both BMI and WHR, and can be seen here. These estimates of which WHR was depicted were used to create an average estimate of the WHR of the figure in the art. Not the most precise method, admittedly, but let’s move to what they found.

Comparing the antique art (defined as 500 BCE to 400 CE) to the recent art (1400 CE to 2014 CE), no significance difference in the average estimates of depicted WHRs emerged: both groups averaged a WHR of about 0.8. In the more recent works group, there was a slight tendency for more modern art to depict a relatively smaller WHR over time, and no such trend was found in the antique art. It also happened to be the case that works of art designated as specifically depicting female beauty symbols – like Aphrodite – were depicted with relatively lower WHRs than the non-symbolic women – like Eve.

Depictions of clam shells remained highly unrealistic during this time

Study two just involved analyzing a data set of the WHR measurements of Playboy centerfolds and Miss America winners from 1920 to 2000 which, evidently, showed a curvilinear relationship over time right around a mean of 0.7, so there’s not too much to say there. Skipping to the third study, the estimates of the WHRs from the art in study 1 were compared against actual measurements of 13 of the sculptures to try and correct for participant’s estimation errors. As it turns out, participants tended to overestimate the WHRs by about 8% on average. Correcting participant’s estimates, then, it was estimated that the average WHR depicted in the antique set was about 0.73; quite close to the 0.7 figure I mentioned initially. By contrast, the more recent art set, combined with the Playboy and Miss America winners, yielded an average depicted WHR of about 0.75 at 1400 CE down to about 0.68 by the present. This latter set of modern depictions was the substantially larger sample, though I’m not sure what to make of that.

So, taking these results at face value, two major points fall out: while (1) estimates of artistic depictions of women’s WHRs show a remarkable consistency from 500 BCE to the present day, (2) these depictions do tend to get a little smaller in more recent works; there’s some variance. Does this little bit of variation cut against the heart of the idea that a preference for relatively-small WHR on women is a universal feature of our mating psychology? I would say certainly not. There are a few reasons I would give this answer. The first of those is that, as I mentioned before, I know of no theory which ever claimed 0.7 as the invariant set point for peak attractiveness. Every trait – including the psychological ones which determine perceptions of attractiveness – needs to develop, and development can be a rocky road in many respects. Expecting development to land on a specific value every time would be absurd.

The second, and perhaps more relevant point, is that traits are not depicted, nor selected, in a vacuum. For example, we could consider the ever-popular Playboy centerfolds. While the shape of their body is certainly one rather important factor that comes into play with respect to their selection for the magazine, their WHR is certainly not the only feature relevant to the decision. Also included could be other factors like hair color, breast size and shape, clarity of skin, BMI, whether they are pushing for the position, and so on. The same kind of trade-offs need to made when selecting a mate: do you want the one with a slightly shapelier body or the one with more intelligence? One might argue that such trade-offs need not be made when it comes to producing pieces of art, and I would concede the point. I would also add in the point that artists, no matter how talented, are not necessarily perfectly accurate in translating their preferences onto canvas or marble.

“Nailed it”

One final point relating to that second one is that preferences were not being directly assessed in any of this research: just depictions. While I (and the authors) would argue that we should expect a rather high degree of concordance between these preferences and depictions, I would also argue that the translation will be imperfect. This adds another source of variation into the mix which might account for a little bit of the inconsistency we notice. While I don’t doubt that preferences for one trait or another should be expected to vary over time adaptively on the basis on environmental inputs, I think that reflects more on trade-offs that have to made rather than on what some ideal would be in the absence of them. For what it’s worth, I see the current data as rather supportive of the idea that preferences of WHRs are universal features of or psychology, rather than cutting against it.

References: Bovet J. & Raymond, M. (2015). Preferred women’s waist-to-hip ratio variation over the last 2,500 years. PLos One, 10, e0123284. doi:10.1371/journal.pone.0123284

Singh, D. (1993). Adaptive significance of female physical attractiveness: Role of waist-to-hip ratio. Journal of Personality & Social Psychology, 65, 293-307.

Zaadstra et al. (1993). Fat and fecundity: prospective study of effect of body fat distribution on conception rates. British Medical Journal, 306, 484-48.