Relaxing With Some Silly Research

In psychology, there is a lot of bad research out there by all estimates. The poor quality of this research can be attributed to concerns about ideology-driven research agendas, research bias, demand characteristics, lack of any real theory guiding the research itself, p-hacking, file-drawer effects, failures to replicate, small sample sizes, and reliance on undergraduate samples, among others. Arguably, there is more bad (or at least inaccurate) research than good research floating around as, in principle, there are many more ways of being wrong about the human mind than there are of being right about it (even given our familiarity with it); a problem made worse by the fact that being (or appearing) wrong or reporting null findings does not tend to garner one social status in the world of academia. If many of the incentives reside in finding particular kinds of results – and those kinds are not necessarily accurate – the predictable result is a lot of misleading papers. Determining what parts of the existing psychological literature are an accurate description of human psychology can be something of a burden, however, owing to the obscure nature of some of these issues: it’s not always readily apparent that a paper found a fluke result or that certain shady research practices have been employed. Thankfully, it doesn’t take a lot of effort to see why some particular pieces of psychological research are silly; criticizing that stuff can be as relaxing as a day off at the beach.

Kind of like this, but indoors and with fewer women

The last time I remember coming across some of the research that can easily be recognized as silly was when one brave set of researchers asked if leaning to the left made the Eiffel tower look smaller. The theory behind that initial bit of research is called, I think, number line theory, though I’m not positive on that. Regardless of the name, the gist of the idea seems to be that people - and chickens, apparently - associate smaller numbers with a relative leftwardly direction and larger numbers with a rightwardly one. For humans, such a mental representation might make sense in light of our using certain systems of writing; for nonhumans, this finding would seem to make zero sense. To understand why this finding makes no sense, try and place it within a functional framework by asking (a) why might humans and chickens (and perhaps other animals as well) represent smaller quantities with their left, and (b) why might leaning to the left be expected to bias one’s estimate of size? Personally, I’m coming up with a blank on the answer to those questions, especially because biasing one’s estimate of size on the basis of how one is leaning is unlikely to yield more accurate estimates. A decrease in accuracy seems like that could only carry costs in this case; not benefits. So, at best, we’re left calling those findings a development byproduct for humans and likely a fluke for the chickens. In all likelihood, the human finding is probably a fluke as well.

Thankfully, for the sake of entertainment, silly research is not to be deterred. One of the more recent tests of this number line hypothesis (Anelli et al, 2014) makes an even bolder prediction than the Eiffel tower paper: people will actually get better at performing certain mathematical operations when they’re traveling to the left or the right: specifically, going right will make you better at addition and left better at subtraction. Why? Because smaller numbers are associated with the left? How does that make one better at subtraction? I don’t know and the paper doesn’t really go into that part. On the face of it, this seems like a great example of what I have nicknamed “dire straits thinking”. Named after the band’s song, “money for nothing” this type of thinking leads people to hypothesizing that others can get better (or worse) at tasks without any associated costs. The problem with this kind of thinking is that if people did possess the cognitive capacities to be better at certain tasks, one might wonder why people ever perform worse than they could. This would lead me to pose questions like, “why do I have to be traveling right to be better at addition; why not just be better all the time?” Some kind of trade-offs need to referenced to explain that apparent detriment/bonus to performance, but none ever are in dire straits thinking.

In any case, let’s look at the details of the experiment, which was quite simple. Anelli et al, (2014) had a total of 48 participants walk with an experimenter (one at a time; not all 48 at once). The pair would walk together for 20 seconds in a straight line, at which point the experimenter would call out a three-digit number, tell the participants to add or subtract from it by 3 aloud for 22 seconds, give them a direction to turn (right or left), and tell them to begin. At that point, the participant would turn and start doing the math. Each participant completed four trials: two congruent (right/addition or left/subtraction) and two incongruent (right/subtraction or left/addition). The researchers hoped to uncover a congruency effect, such that more correct calculations would be performed in the congruent, relative to incongruent, trials.

Now put the data into to the “I’m right” program and it’s ready to publish

Indeed, just such an effect was found: when participants were moving in a congruent direction as their mathematical operations, they performed more correct calculations on average (M = 10.1), relative to when they were traveling in an incongruent direction (M = 9.6). However, when this effect was broken down by direction, it turns out that the effect only exists when participants were doing addition (M = 11.1 when going right, 10.2 when going left); there was no difference for subtraction (M = 9.0 and 9.1, respectively). Why was there no effect for subtraction? Well, the authors postulate a number of possibilities – one of which being that perhaps participants needed to be walking backwards – though none of them include the possibility of the addition finding being a statistical fluke. It’s strange how infrequently this possibility is ever mentioned in published work, especially in the face of inconsistent findings.

Now one obvious criticism of this research is that the participants were never traveling right or left; they were walking straight ahead in all cases. Right or left, unlike East or West, depends on perspective. When I am facing my computer, I feel I am facing ahead; when I turn around to walk to the bathroom, I don’t feel like I’m walking behind me. The current research would thus rely on the effects of a momentary turn affecting participant’s math abilities for about half a minute. Accordingly, participants shouldn’t even have needed to be walking; asking them to turn and stand in place should be expected to have precisely the same effect. If the researchers wanted to measure walking to the right or left, they should have had participants moving to the side by sliding, rather than turning and walking forward.

Other obvious criticisms of the research could include the small sample size, the small effect size, the inconsistency of the effect (works for addition but not subtraction and is inconsistent with other research they cite which was itself inconsistent – people being better at addition when going up in an elevator but not walking up stairs, if I understand correctly), or the complete lack of anything resembling a real theory guiding the research. But let’s say for a moment that my impression of these results as silly is incorrect; let’s assume that these results accurately describe the workings of human mind in some respect. What are the implications of that finding? What, in other words, happens to be at stake here? Why would this research be published, relative to the other submissions received by Frontiers in Psychology? Even if it’s a true effect – which already seems unlikely, given the aforementioned issues – it doesn’t seem particularly noteworthy. Should people be turning to the right and left while taking their GREs? Do people need to be doing jumping jacks to improve their multiplication skills so as to make their body look more like the multiplication symbol? If so, how could you manage to do them while you’re supposed to be sitting down quietly while taking your GREs without getting kicked out of the testing site? Perhaps someone more informed on the topic could lend a suggestion, because I’m having trouble seeing the importance of it.

Maybe the insignificance of the results is supposed to make the reader feel more important

Without wanting to make a mountain out of a mole hill, this paper was authored by five researchers and presumably made it passed an editor and several reviewers before it saw publication. At a minimum, that’s probably about 8 to 10 people. That seems like a remarkable feat, given how strange the paper happens to look on its face. I’m not just mindlessly poking fun at the paper, though: I’m bringing attention to it because it seems to highlight a variety of problems in the world of psychological research. There are, of course, many suggestions as to how these problems might be ferreted out, though many of them that I have seen focus more on statistical solutions or combating researcher degrees of freedom. While such measures might reduce the quantity of bad research (like pre-registering studies), they will be unlikely to increase the absolute quality of good work (since one can pre-register silly ideas like this), which I think is an equally valuable goal. For my money, the requirement of some theoretical functional grounding for research would likely be the strongest candidate for improving work in psychology. I imagine many people would find it harder to propose such an idea in the first place if they needed to include some kind of functional considerations as to why turning right makes you better at addition. Even if such a feat was accomplished, it seems those considerations would make the rationale for the paper even easier to pick apart by reviewers and readers.

Instead of asking for silly research to be conducted on larger, more diverse samples, it seems better to ask that silly research not be conducted at all.

References: Anelli, F., Lugli, L., Baroni G., Borghi, A., & Nicoletti, R. (2014). Walking boosts your performance in making additions and subtractions. Frontiers in Psychology, 5, doi: 10.3389/fpsyg.2014.01459

Do Moral Violations Require A Victim?

If you’ve ever been a student of psychology, chances are pretty good that you’ve heard about or read a great many studies concerning how people’s perceptions about the world are biased, incorrect, inaccurate, erroneous, and other such similar adjectives. A related sentiment exists in some parts of the morality literature as well. Perhaps the most notable instance is the unpublished paper on moral dumbfounding, by Haidt, Bjorklund, & Murphy (2000). In that paper, the authors claim to provide evidence that people first decide whether an act is immoral and then seek to find victims or harms for the act post hoc. Importantly, the point seems to be that people seek out victims and harm despite them not actually existing. In other words, people are mistaken in perceiving harm or victims. We could call such tendencies the “fundamental victim error” or the “harm bias”, perhaps. If that interpretation of the results is correct, it would carry a number of implications, chief among which (for my present purposes) is that harm is not a required input for moral systems. Whatever cognitive systems are in charge of processing morally-relevant information, they seem to be able to do so without knowledge of who – if anyone – is getting harmed.

Just a little consensual incest. It’s not like anyone is getting hurt.

Now I’ve long found that implication to be a rather interesting one. The reason it’s interesting is because, in general, we should expect that people’s perceptions about the world are relatively accurate. Not perfect, mind you, but we should be expected to be as accurate as available information allows us to be. If our perceptions weren’t generally accurate, this would likely yield all sorts of negative fitness consequences: for example, believing you can achieve a goal you actually cannot could lead to the investment of time and resources in a fruitless endeavor; resources which could be more profitably spent elsewhere. Sincerely believing you’re going to win the lottery does not mean the tickets are wise investments. Given these negative consequences for acting on inaccurate information, we should expect that our perceptual systems evolved to be as accurate as they can be, given certain real-world constraints.

The only context I’ve seen in which being wrong about something could consistently lead to adaptive outcomes is in the realm of persuasion. In this case, however, it’s not that being wrong about something per se helps you, as much as someone else being wrong helps you. If people happen to think my future prospects are bright – even if they’re not – it might encourage them to see me as an attractive social partner or mate; an arrangement from which I could reap benefits. So, if some part of me happen to be wrong, in some sense, about my future prospects, and being wrong doesn’t cause me to behave in too many maladaptive ways, and it also helps persuade you to treat me better than you would given accurate information, being wrong (or biased) could be, at times, adaptive.

How does persuasion relate to morality and victimhood, you may well be wondering? Consider again the initial point about people, apparently, being wrong about the existence of harms and victims of acts they deem to be immoral. If one was to suggest that people are wrong in this realm – indeed, that our psychology appears to be designed in such a way to consistently be wrong – one would also need to couch that suggestion in the context of persuasion (or some entirely new hypothesis about why being wrong is a good thing). In other words, the argument would need to go something like this: by perceiving victims and harms where none actually exist, I could be better able to persuade other people to take my side in a moral dispute. The implications of that suggestion would seem to, in a rather straight-forward way, rely on people taking sides on moral issues on the basis of harm in the first place; if they didn’t, claims of harm wouldn’t be very persuasive. This would leave the moral dumbfounding work in a bit of a bind, theoretically-speaking, with respect to whether harms are required inputs for moral systems or not: that people perceive something as immoral and then later perceive harms would suggest harms are not required inputs; that arguments about harms are rather persuasive could suggest that harms are required inputs.

Enough about implications; let’s get to some research 

At the very least, the perceptions of victimhood and harm appear intimately tied perceptions of immorality. The connection between the two was further examined recently by Gray, Schein, & Ward, (2014) across five studies, though I’m only going to discuss one of them. In the study of interest, 82 participants each rated 12 actions on whether they wrong (1-5 scale, from ‘not wrong at all’ to ‘extremely wrong’) and whether the act had a victim (1-5 scale, from ‘definitely not’ to definitely yes’). These 12 actions were broken down into three groups of four acts each: the harmful group (including items like kicking a dog or hitting a spouse), the impure group (including masturbating to a picture of your dead sister or covering a bible with feces), and the neutral group (such as eating toast or riding a bus). The interesting twist in this study involved the time frame in which participants answered: one group was placed under a time constraint in which they had to read the question and provide their answers within seven seconds; the other group was not allowed to answer until at least a seven-second delay had passed, and were given an unlimited amount of time in which to answer. So one group was relying on, shall we say, their gut reaction, while the other was given ample time to reason about things consciously.

Unsurprisingly, there appeared to be a connection between harm and victimhood: the directly harmful scenarios generated more certainty about a victim (M = 4.8) than the impure ones (M = 2.5), and the neutral scenarios didn’t generate any victims (M = 1). More notably, the time constraint did have an effect, but only in the impure category: when answering under time constraints in the impure category, participants reported more certainty about the existence of a victim (M = 2.9) relative to when they had more time to think (M = 2.1). By contrast, the perceptions of victims in the harm (M = 4.8 and 4.9, respectively) and neutral categories (M = 1 and 1) did not differ across time constraints.

This finding puts a different interpretive spin on the moral dumbfounding literature: when people had more time to think about (and perhaps invent) victims for more ambiguous violations, they came up with fewer victims. Rather than people reaching a conclusion about immorality first and then consciously reasoning about who might have been harmed, it seems that people could have instead been reaching implicit conclusions about both harm and immorality quite early on, and only later consciously reasoning about why an act which seemed immoral isn’t actually making any worthy victims. If representations about victims and harms are arising earlier in this process than would be anticipated by the moral dumbfounding research, this might speak to whether or not harms are required inputs for moral systems.

Turns out that piece might have been more important than we thought

It is possible, I suppose, that morality could simply use harm as an input sometimes without it being a required input. That possibility would allow harm to be both persuasive and not required, though it would require some explanation as to why harm is only expected to matter in moral judgments at times. At present, I know of no such argument having ever been made, so there’s not too much to engage with on that front.

It is true enough that, at times, when people perceive victims, they tend to perceive victims in a rather broad sense, naming entities like “society” to be harmed by certain acts. Needless to say, it seems rather difficult to assess such claims, which makes one wonder how people perceive such entities as being harmed in the first place. One possibility, obviously, is that such entities (to the extent they can be said to exist at all) aren’t really being harmed and people are using unverifiable targets to persuade others to join a moral cause without the risk of being proved wrong. Another possibility, of course, is that the part of the brain that is doing the reporting isn’t quite able to articulate the underlying reason for the judgment well to others. That is, one part of the brain is (accurately) finding harm, but the talking part isn’t able to report on it. Yet another possibility still is that harm befalling different groups is strategically discounted (Marczyk (2015). For instance, members of a religious group might find disrespect towards a symbol of their faith (rubbing feces on the bible, in this case) to be indicative of someone liable to do harm to their members; those opposed to the religious group might count that harm differently – perhaps not as harm at all. Such an explanation could, in principle, explain the time-constraint effect I mentioned before: the part of the brain discounting harm towards certain groups might not have had enough time to act on the perceptions of harm yet. While these explanations are not necessarily mutually exclusive, they are all ideas worth thinking about.

References: Gray, K., Schein, C., & Ward, A. (2014). The myth of harmless wrongs in moral cognition: Automatic dyadic completion from sin to suffering. Journal of Experimental Psychology, 143, 1600-1615.

Haidt, J., Bjorklund, F., & Murphy, S. (2000). Moral dumbfounding: When intuition finds no reason. Unpublished Manuscript. 

Marczyk, J. (2015). Moral alliance strategies theory. Evolutionary Psychological Science, 1, 77-90.

(Some Of) My Teaching Philosophy

Over the course of my time at various public schools and universities I have encountered a great many teachers. Some of my teachers were quite good. I would credit my interest in evolutionary psychology to one particularly excellent teacher – Gordon Gallup. Not only was the material itself unlike anything I had previously been presented with in other psychology courses, but the way Gordon taught his classes was unparalleled. Each day he would show up and, without the aid of any PowerPoints or any apparent notes, just lecture. On occasion we would get some graphs or charts drawn on the board, but that was about it. What struck me about this teaching style is what it communicated about the speaker: this is someone who knows what he’s talking about. His command of the material was so impressive I actually sat through his course again for no credit in the follow years to transcribe them (and the similarity from year-to-year was remarkable, given that lack of notes). It was just a pleasure listening to him do what we did best.

A feat I was recently recognized for

That I say Gordon was outstanding is to say he was exceptional, relative to his peers (even if many of those peers, mistakenly, believe they are exceptional as well). The converse to that praise, then, is that I have encountered many more professors who were either not particularly good at what they did or downright awful at it (subjectively speaking, of course). I’ve had some professors who act, more or less, as an audio guide to the textbook that, when questioned, didn’t seem to really understand the material they were teaching; I’ve had another tell his class “now, we know this isn’t true, but maybe it’s useful” as he reviewed Maslow’s hierarchy of needs for what must have been the tenth time in my psychology education – a statement which promptly turned off my attention for the day. The number of examples I could provide likely outnumber my fingers and toes, so there’s no need to detail each one. In fact, just about everyone who has attended school has had experiences like this. Are these subjective evaluations of teachers that we have all made accurate representations of their teaching ability, though?

According to some research by Braga et al (2011), that answer is “yes”, but in a rather perverse sense: teacher evaluations tend to be negatively predictive of actual teaching effectiveness. In other words, at the end of a semester when a teacher receives evaluations from their students, the better these evaluations, the less effective the teacher tends to be. As someone who received fairly high evaluations from my own students, this should either be cause for some reflection as to my methods (since I am interested in my students learning; not just their being satisfied with my course) or a hunt for why the research in question must be wrong to make me feel better about my good reviews. In the interests of prioritizing my self-esteem, let’s start by considering the research and seeing if any holes can be poked in it.

“Don’t worry; I’m sure those good reviews will still reflect well on you”

Braga et al (2011) analyzed data from a private Italian university offering programs in economics, business, and law in 1998/9. The students in these programs had to take a fixed course of classes with fixed sets of materials and the same examinations. Additionally, students were randomly assigned to professors, making this one of the most controlled academic settings for this kind of research I could imagine. At the end of the terms, students provided evaluations of their instructors, allowing their ratings of instructors to be correlated – at the classroom level, as the evaluations were anonymous – with their performance in being effective teachers.

Teaching effectiveness was measured by examining how students did in subsequent courses, (controlling for a variety of non-teacher factors, like class size) the assumption being that students with better professors in the first course would do better in future courses, owing to their more proficient grasping of the material. These non-teacher factors accounted for about 57% of the variance in future course grades, leaving plenty of room for teacher effects. The effect of teachers was appreciable, with an increase of one standard deviation in effectiveness led to gain of about 0.17 standard deviations of grade in future classes (about a 2.3% bump up). Given the standardized materials and the gulf which could exist between the best and worst teachers, it seems there’s plenty of room for teacher effectiveness to matter. Certainly no students want to end up at a disadvantage because of a poor teacher; I know I wouldn’t.

When it came to the main research question, the results showed that teachers who were the least effective in providing future success for their students tended to receive the highest evaluations. This effect was sizable as well: for each standard deviation increase in teaching effectiveness, student evaluation ratings dropped by about 40% of a standard deviation. Perhaps unsurprisingly, grades were correlated with teaching evaluations as well: the better grades the students received, the better the evaluations they tended to give the professors. Interestingly, this effect did not exist in classes comprised of 25% or more of the top students (as measured by their cognitive entrance exams); the evaluations of those classes were simply not predictive of effectiveness.

That last section is the part of the paper that most everyone will cite: the negative relationship between teacher evaluations and future performance. What fewer people seem to do when referencing that finding is consider why this relationship exists and then use that answer to inform their teaching styles (as I get the sense this information will quite often be cited to excuse otherwise lackluster evaluations, rather than to change anything). The authors of the paper posit two main possibilities for explaining this effect: (1) that some teachers make class time more entertaining at the expense of learning, and/or (2) that some teachers might “teach for the test”, even if they do so at the expense of “true learning”. While neither possibility is directly tested in the paper, the latter possibility strikes me as most plausible: students in the “teaching for the test” classes might simply focus on the particular chunks of information relevant for them at the moment, rather than engaging it as a whole and understanding the subject more broadly.

In other words, vague expectations encourage cramming with a greater scope

With that research in mind, I would like to present a section of my philosophy when it came to teaching and assessment. A question of interest that I have given much thought to is what, precisely, are grades aimed at achieving? For many professors – indeed, I’d say the bulk of them – grades serve the ends of assessment. The grades are used to tell people – students and others – how well the students did at understanding the material come test time. My answer to this question is a bit different, however: as an instructor, I had no particular interest in the assessment of students per se; my interest was in their learning. I only wanted to assess my students as a means of pushing them to the end of learning. As a word of caution, my method of assessment demands substantially more effort from those doing the assessing, be it a teacher or assistant, than is typical. It’s an investment of time many might be unwilling to make.

My assessments were all short-essay style questions, asking students to apply theories they have learned about to novel questions we did not cover directly in class; there were no multiple choice questions. According to the speculations of Braga et al (2011), this would put me firmly in the “real teaching” camp, instead of the “teaching to the test” one. There are a few reasons for my decision: first, multiple choice questions don’t allow you to see what the students were thinking when answering the question. Just because someone gets an answer correct on a multiple choice exam, it doesn’t mean they got the correct answer for the right reasons. For my method to be effective, however, it does mean someone needs to read the exams in depth instead of just feeding them through a scantron machine, and that reading takes time. Second, essay exams force students to confront what they do and do not know. Having spent many years as a writer (and even more as a student), I’ve found that many ideas that seem crystal clear in my head do not always translate readily to text. The feeling of understanding can exist in lack of actual understanding. If students find they cannot explain an idea as readily as felt they understood it, that feeling might be effectively challenged, yielding a new round of engagement with the material.

After seeing where the students were going wrong, the essay format allowed me to make notes on their work and hand it back to them for revisions; something you can’t do very well with multiple choice questions either. Once the students had my comments on their work, they were free to revise it and hand it back into me. The grade they got on their revisions would be their new grade: no averaging of the two or anything of the sort. The process would then begin again, with revisions being made on revisions, until the students were happy with their grade or stopped trying. In order for assessment to serve the end of learning, assessment needs to be ongoing if you expect learning to be. If assessment is not ongoing, students have little need to fix their mistakes; they’ll simply look at their grade and then toss their test in the trash as many of them do. After all, why would they bother putting in the effort to figure out where they went wrong and how to go right if doing so successfully would have no impact whatsoever on the one thing they get from the class that people will see?

Make no mistake: they’re here for a grade. Educations are much cheaper than college.

I should also add that my students were allowed to use any resource they wanted for the exams, be that their notes, the textbook, outside sources, or even other students. I wanted them to engage with the material and think about it while they worked, and I didn’t expect them to have it all memorized already. In many ways, this format mirrors the way academics function in the world outside the classroom: when writing our papers, we are allowed to access our notes and references whenever we want; we are allowed to collaborate with others; we are allowed – and in many cases, required – to make revisions to our work. If academics were forced to do their job without access to these resources, I suspect the quality of it would drop precipitously. If these things all improve the quality of our work and help us learn and retain material, asking students to discard all of them come test time seems like a poor idea. It does require test questions to have some thought put into their construction, though, and that means another investment of time.

Some might worry that my method makes things too easy on the students. All that access to different materials means they could just get an easy “A”, and that’s why my evaluations were good. Perhaps that’s true, but just as my interest is not on assessment, my interest is also not on making a course “easy” or “challenging”; it’s on learning, and tests should be as easy or hard as that requires. As I recall, the class average for each test started at about a 75; by the end of the revisions, the average for each test had risen to about a 90. You can decide from those numbers whether or not that means my exams were too easy.

Now I don’t have the outcome measures that Braga et al (2011) did for my own teaching success. Perhaps my methods were a rousing failure when it came to getting students to learn, despite the high evaluations they earned me (in the Braga et al sample, the average teacher rating was 7 out of 10 with a standard deviation of 0.9; my average rating would be around a 9 on that scale, placing my evaluations about two standard deviations above the mean); perhaps this entire post reflects a defensiveness on my part when it comes to, ironically, having to justify my positive evaluations, just as I suspect people who cite this paper might use the results to justify relatively poor evaluations. In regards to the current results, I think both myself and others have room to be concerned: just because I received good evaluations, it does not mean my teaching method was effective; however, just because you received poor evaluations, it does not mean your teaching method is effective either. Just as students can get the right answer for the wrong reason, they can also give a teacher a good or bad evaluation for the right or wrong reasons. Good reviews should not make teachers complacent, just as poor reviews should not be brushed aside. The important point is that we both think about how to improve on our effectiveness as teachers.

References: Braga, M., Paccagnella, M., & Pellizzari, M. (2011). Evaluating students’ evaluations of professors. Economics of Education Review, 41, 71-88.  

Should We Expect Cross-Cultural Perceptual Errors?

There was a rather interesting paper that crossed my social media feeds recently concerning stereotypes about women in science fields; a topic about which I have been writing lately. I’m going to do something I don’t usually do and talk about it briefly despite having just read the abstract and discussion section. The paper, by Miller, Eagly, and Linn (2014), reported on people’s implicit gender stereotypes about science, which associated science more readily with men, relative to women. As it turns out, across a number of different cultures, people’s implicit stereotypes corresponded fairly well to the actual representation of men and women in those fields. In other words, people’s perceptions, or at least their responses, tended to be accurate: if more men were associated with science psychologically, it seemed to be because more men also happened to work in science fields. In general, this is how we should expect the mind to work. While our minds might imperfectly gather information about the world, they should do their best to be accurate. The reasons for this accuracy, I suspect, have a lot to do with being right resulting in useful modifications of behaviors.

   Being wrong about skateboarding skill, for instance, has some consequences

Whenever people propose psychological hypotheses that have to do with people being wrong, then, we should be a bit skeptical. A psychology designed in such a way so as to be wrong about the world consistently will, on the whole, tend to direct behavior in more maladaptive ways than a more accurate mind would. If one is positing that people are wrong about the world in some regard, it would require either that (a) there are no consequences for being wrong in that particular way or (b) there are some consequences, but the negative consequences are outweighed by the benefits. Most hypotheses for holding incorrect beliefs I have encountered tend towards the latter route, suggesting that some incorrect beliefs might outperform true beliefs in some fitness-relevant way(s).

One such hypothesis that I’ve written about before concerns error management theory. To recap, error management theory recognizes that some errors are costlier to make than others. To use an example in the context of the current paper I’m about to discuss, consider a case in which a man desires to have sex with a woman. The woman in question might or might not be interested in the prospect; the man might also perceive that she is interested or not interested. If the woman is interested and the man makes the mistake of thinking she isn’t, he has missed out on a potentially important opportunity to increase his reproductive output. On the other hand, if the woman isn’t interested and the man makes the mistake of thinking she is, he might waste some time and energy pursuing her unsuccessfully. These two mistakes do not carry equivalent costs: one could make the argument that a missed encounter is costlier on average, from a fitness standpoint, than an unsuccessful pursuit (depending, of course, on how much time and energy is invested in the pursuit).

Accordingly, it has been hypothesized that male psychology might be designed in such a way so as to over-perceive women’s sexual interest in them, minimizing the costs associated with making mistakes, multiplied by their frequency, rather than minimizing the number of mistakes one makes in total. While that sounds plausible at first glance, there is a rather important point worth bearing in mind when evaluating it: incorrect beliefs are not the only way to go about solving this problem: a man could believe, correctly, that a woman is not all that interested in him, but simply use a lower threshold for acceptable pursuits. Putting that into numbers, let’s say a woman has a 5% chance of having sex with the man in question: the man might not pursue any chance below 10%, and so could bias his belief upward to think he actually has a 10% chance; alternatively, he might believe she has about a 5% chance of having sex with him and decide to go after her anyway. It seems that the second route solves this problem more effectively, as a biased probability of success with a woman might have downstream effects on other pursuits.

Like on the important task of watching the road

Now in that last post I mentioned, it seems that the evidence that men over-perceive women’s sexual interest might instead be better explained by the hypothesis that women are underreporting their intentions. After all, we have no data on the probability of a woman having sex with someone given she did something like held his hand or bought him a present, so concluding that men over-perceive requires assuming that women report accurately (the previous evidence would also require that pretty much everyone else but the woman is wrong about her behavior, male or female). Some new evidence puts the hypothesis of male over-perception into even hotter water. A recent paper by Perilloux et al (2015) sought to test this over-perception bias cross-culturally, as most of the data bearing on it happens to have been derived from American samples. If men possess some adaptation designed for over-perception of sexual interest, we should expect to see it cross-culturally; it ought to be a human universal (as I’ve noted before, this doesn’t mean we should expect invariance in its expression, but we should at least find its presence).

Perilloux et al (2015) collected data from participants in Spain, Chile, and France, representing a total sample size of approximately 400 subjects. Men and women were given a list of 15 behaviors. They were asked to imagine they had been out on a few dates with a member of the opposite sex, and then about their estimates of having sex with them, given that this opposite sex individual engaged in those behaviors (from -3 being “extremely unlikely” to 3 being “extremely likely”). The results showed an overall sex difference in each country, with men tending perceive more sexual interest than women. While this might appear to support the idea that over-perception is a universal feature of male psychology, a closer examination of the data cast some doubt on that idea.

In the US sample, men perceived more sexual interest than women in 12 of the 15 items; in Spain, that number was 5, in Chile it was 2, and in France it was 1. It seemed that the question concerning whether someone bought jewelry was enough to driving this sex difference in both the French and Chilean samples. Rather than men over-perceiving women’s reported interests in general across a wide range of behaviors, it seemed that the cross-cultural sample’s differences were being driven by only a few behaviors; behaviors which are, apparently, also rather atypical for relationships in those countries (inasmuch as women don’t usually buy men jewelry). As for why there’s a greater correspondence between French and Chilean men and women’s reported likelihoods, I can’t say. However, that men from France and Chile seem to be rather accurate in their perceptions of female sexual intent would cast doubt on the idea that male psychology contains some mechanisms for sexual over-perception.

I’ll bet US men still lead in shooting accuracy, though

This paper helps make two very good points that, at first, might seem like they oppose each other, despite their complimentary nature. The first point is the obvious importance of cross-cultural research; one cannot simply take it for granted that a given effect will appear in other cultures. Many sex differences – like height and willingness to engage in casual sex – do, but some will not. The second point, however, is that hypotheses about function can be developed and even tested (albeit incompletely) in absence of data about their universality. Hypotheses about function are distinct from hypotheses about proximate form or development, though these different levels of analysis can often be used to inform others. Indeed, that’s what happened in the current paper, with Perilloux et al (2015) drawing the implicit hypothesis about universality from the hypothesis about ultimate functioning, using data about the former to inform their posterior beliefs about the latter. While different levels of analysis inform each other, they are nonetheless distinct, and that’s always worth repeating.

References: Perilloux, C., Munoz-Reyes, J., Turiegano, E., Kurzban, R., & Pita, M. (2015). Do (non-American) men overestimate women’s sexual intentions? Evolutionary Psychological Science, DOI 10.1007/s40806-015-0017-5

Miller, D., Eagly, A., & Linn, M., (2014). Women’s representation in science predicts national gender-science stereotypes: Evidence from 66 nations. Journal of Educational Psychology,