About Jesse Marczyk

An Evolutionary-Minded Psychologist, of All Things

Does Diversity Per Se Pay?

In one of the most interesting short reports I read recently, some research was conducted in Australia examining what the effect of blind reviews would be on hiring. The premise of the research, far as I can surmise, was that a fear existed of conscious or unconscious bias against women and minority groups when it came to getting hired. This bias would naturally make it harder for those groups to find employment, ultimately yielding a less diverse workforce. In the interests of avoiding that bias, the research team compared what happened when candidates were assessed on either standard resumes or de-identified ones. The latter resumes were identical to the former, except they had group-relevant information (like gender and race) removed. If reviewers don’t have that information of race or gender available, then they couldn’t possibly assess the candidates on the basis of them, whether consciously or unconsciously. That seems straightforward enough. The aim was to compare the results from the blind assessments to those of the standard resumes. As it turned out, there were indeed hints of bias; relatively small in size sometimes, but present nonetheless. However, the bias did not go in the direction that had been feared.

Shocking that the headline wasn’t “Blind review processes are biased”

Specifically, when the participants assessing the resumes had information about gender, they were about 3% more likely to select women, and 3% less likely to select men. Further, minorities were more likely to be selected as well when the information was available (about 6% for males and 9% for females). While there’s more to the picture than that, the primary result seemed to be that, when given the option, these reviewers discriminated in favor of women and minority groups simply because of their group membership. If these results had run in the opposite direction (against women and minorities) there would have no doubt been calls for increasing blind reviews. However, because blind reviews seemed to disfavor women and minorities, the authors had a different suggestion:

Overall, the results indicate the need for caution when moving towards ’blind’ recruitment processes in the Australian Public Service, as de-identification may frustrate efforts aimed at promoting diversity

It’s hard to interpret that statement as anything other than ”we should hire more women and minorities, regardless of qualifications.” Even if sex and race ought to be irrelevant to the demands of the job and candidates should be assessed on their merit, people should also apparently be cautious when removing those irrelevant pieces from the application process. The authors seemed to favor discrimination based on sex or race so long as it benefited the right groups. Such discriminatory practices have led to negative reactions on the part of others, as one might expect.

This brings me another question: why should we value diversity when it comes to hiring decisions? To be clear, the diversity being sought is often strictly demographic in nature (many organizations tout diversity in race, for instance, but not in perspective. I don’t recall the draw of many positions being that you will meet a variety of people who hold fundamental disagreements with your view on the world). It’s also usually the kind of diversity that benefits women and minorities (I’ve never come across calls to get more white males into certain fields dominated by women or other races. Perhaps they exist; I just haven’t seen them). But are there real economic benefits to increasing diversity per se? Could it be the case that more diverse organizations just do better? On the face of it, I would assume the answer is “no” if the diversity in question is simply demographic in nature. What matters when it comes to job performance is not the color of one’s skin or what sex chromosomes they possess, but rather their skills and competencies they bring with them. While some of those skills and competencies might be very roughly approximated by race and gender if you have no additional information about your applicants, we thankfully don’t need to rely on those indirect measures. Rather than asking about gender or race, one could just ask directly about skill sets and interests. When you can do that, the additional value of knowing one’s group membership is likely close to nil. Why bother using a predictor of a variable when you can just use the variable itself?

Do you really love roundabouts that much?

Nevertheless, it has apparently been reported before that demographic diversity predicts the relative success of companies (Herring, 2009). A business case was made for diversity, such that diverse companies were found to generally do better than less diverse ones across a number of different metrics. Not that those in favor of increasing diversity really seemed to need a financial justification, but having one certainly wouldn’t hurt their case. As this paper was apparently popular within the literature (for what I assume is that reason), a replication was attempted (Stojmenovska et al, 2017), beginning in a graduate course as an assignment to help students “learn from the best.” Since it seems “psychology research” and “replications” mix about as well as oil and water as of late, the results turned out a bit worse than hoped. The student wasn’t even trying to look for problems; they just stumbled upon them.  

In this instance, the replication attempt failed to find the published result, instead catching two primary mistakes made in the original paper (as opposed to anything malicious): there were a number of coding mistakes within the data, and the sample data itself was skewed. Without going too deeply into why this is a problem, it should suffice to say that coding mistakes are bad for all the obvious reasons. Fixing the coding mistakes by deleting missing data resulted in a substantial reduction in sample size (25-50% smaller). As for the issue of skew, having a skewed sample can result in an underestimation of the relationship between predictors and outcomes. In brief, there were confounding relationships between predictor variables and the outcomes that were not adequately controlled for in the original paper. To correct for the skew issue, a log transformation on the data was carried out, resulting in a dramatic increase in the relationship between particular variables.

In order to provide a concrete sense for that increase, in the original report the correlation between company size and racial diversity was .14; after the log transformation was carried out, that correlation increased to .41. This means that larger companies tended to be more racially diverse than smaller ones, but that relationship was not fully accounted for in the original paper examining how diversity impacted success. The same issue held for gender diversity and establishment size.

Once these two issues – coding errors and skewed data – were addressed, the new results showed that gender and racial diversity were effectively unrelated to company performance. The only remaining relationship was a small one between gender diversity and the logged number of customers. While seven of the original eight hypotheses were supported in the first paper, the replication attempt correcting these errors only found one of the eight to be statistically supported. As most of the effects no longer existed, and the one that did exist was small in size, the business justification for increasing racial and gender diversity failed to receive any real support.

Very colorful, but they ultimately all taste the same

As I initially mentioned, I don’t see a very good reason to expect that a more demographically diverse group of employees should yield better outcomes. They don’t yield worse outcomes either. However, the study from Australia suggests that the benefits of diversity (or the lack thereof) are basically besides the point in many instances. That is, not only would I imagine this failure to replicate won’t have a substantial impact on many people’s views on whether or not diversity should be increased, but I don’t think it would even if diversity was found to be a bad thing, financially speaking. This is because I don’t suspect many views of whether increasing diversity should be done are based on the foundation that it’s good for people economically in the first place. Increasing diversity isn’t viewed as a tricky empirical matter as much as it seems to be a moral one; one in which certain groups of people are viewed as owing or deserving various things.

This is only looking at the outcomes of adding diversity, of course. The causes of such diverse levels of diversity across different walks of life is another beast entirely.

References: Stojmenovska, D., Bol, T., & Leopolda, T. (2017). Does diversity pay? Replication of Herring (2009). American Sociological Review, 82, 857-867. 

Herring, C. (2009). Does diversity pay? Race, gender, and the business case for diversity. American Sociological Review, 74, 208–224.

If You Got It, Think Hard About Flaunting It

I’ve attended the Gay Pride Parade in New York on more than one occasion. The event itself holds a special significance for many people who have been close to me and I’m always happy to see them happy, even if parades normally aren’t my cup of tea. That said, I have found certain aspects of the event a little peculiar, at least with regard to its execution. I had this to say about it some years ago:

One could be left wondering what a straight pride parade would even look like anyway, and admittedly, I have no idea. Of course, if I didn’t already know what gay pride parades do look like, I don’t know why I would assume they would be populated with mostly naked men and rainbows, especially if the goal is fostering acceptance and rejection of bigotry. The two don’t seem to have any real connection, as evidenced by black civil rights activists not marching mostly naked for the rights afforded to whites, and suffragettes not holding any marches while clad in assless leather chaps.

Colorful exaggerations aside, there’s something very noteworthy to think about here. While it might seem normal for gay pride events to be rather flamboyant affairs, there need not be any displays of promiscuous sexuality inherent to the event. That is, if people were celebrating a straight, monogamous relationship style with a parade, I don’t think we’d see many people dressing down or, in some cases, going without clothing at all. I imagine the event would be substantially more modest as, well, most other parts of life tend to be.

“From: Straight Pride Boat Ride, 2016″

The relevance of this point comes when one begins to consider what types of people in the world are most opposed to homosexual lifestyles and, accordingly, pose the largest obstacles to things like marriage and adoption rights for the gay community. When considering who those people are, the most common idea that will no doubt spring to many minds are the conservative, religious type (likely because that would be the correct answer). But why are such people most likely to condemn homosexuality on a moral level? A tempting answer would be to make reference to some religious texts condemning homosexuality, but that’s a rather circular explanation: religious people condemn homosexuality because they believe in a doctrine that condemns homosexuality. It’s also not entirely complete, as many parts of the doctrine are only selectively followed in other contexts. We’re also left wondering why those doctrines condemned homosexuality in the first place, placing us back at square one.

A more detailed picture begins to emerge when you consider what predicts religiosity in the first place; what type of person is most drawn to such groups. As it turns out, one of the better predictors of who ends up associating themselves with religious groups and who does not is sexual strategy. Those who are more inclined to monogamy (or, more precisely, opposed to promiscuity) tend to be more religious, and this holds across cultures and religions. By contrast, religiosity is not well predicted by general cooperative morals or behavior. It would be remarkable if religions from all parts of the world ended up stumbling upon a common distaste for promiscuity if it was not inherently tied to religious belief. Something about sexual behavior is uniquely predictive of religiosity, which ought to be strange when you consider that one’s sexual behavior should have little bearing on whether a deity (or several deities) exist. It has even been proposed that religious groups themselves function to support particular kinds of relatively monogamous mating arrangements. In that light, religious groups can be viewed as a support structure for monogamous couples who plan on having many children.

With that perspective in mind, the religious opposition to promiscuity becomes substantially clearer: promiscuity makes monogamous arrangements more difficult to sustain, and vice versa. If you plan on having a lot of children, men face risks of cuckoldry (raising a child that was unknowingly sired by another man) while women face risks of abandonment (if their husband runs off with another woman, leaving her to care for the children alone). As such, having lots of promiscuous men and women around who might lure your partner away or stop them from investing in you in the first place does the monogamous type no favors. In order to support their more monogamous lifestyle, then, these people begin to punish those who engage in promiscuous behaviors to make such strategies more costly to engage in and, accordingly, more rare.

The first punishment for promiscuity – spankings – didn’t have the intended effect

While homosexual individuals themselves don’t exactly pose direct risks to heterosexual, long-term mating couples, they may nevertheless be condemned to the extent that the gay community is viewed as promiscuous. There are a few possible reasons for that outcome to obtain. Perhaps homosexuals are viewed as supporting and encouraging promiscuity, and to let that go unpunished would start other people down a path towards promiscuity (similar to how recreational drug use is also condemned by the long-term maters). Perhaps all sorts of non-traditional sexual behavior is condemned by the conservative groups and homosexuality just ends up condemned as a byproduct. Whatever the explanation for this condemnation, however, a key prediction falls out of this framework: moral condemnation of homosexuality ought to increase to the extent they are viewed as promiscuous and decrease to the extent they are viewed as monogamous. As homosexual groups (particularly men) are viewed as more promiscuous than their heterosexual counterparts (because they are, from every data set I’ve seen), this might help explain the condemnation and, in turn, do something about it.

This is exactly what a new paper by Pinsof & Haselton (2017) sought to test. The pair recruited approximately 1,000 participants from online. The participants read either an article that reported gay men had more partners than straight ones, or an article that reported gay men and straight had the same number of partners. Participants were also asked about their own perceptions of how promiscuous gay men are, their stance on gay rights, and on their own mating orientation (whether they thought short-term sexual encounters were acceptable or not).

As expected, there was an appreciable relationship between one’s mating orientation and one’s support of gay rights: the more long-term their mating strategy, the less supportive of gay rights they were (r = -0.4). That said, despite men being more accepting of promiscuity than women, there was no relationship between gender and support for gay rights. Crucially, an interaction was observed between experimental condition and mating orientation when it came to predicting support for gay rights: Those who were particularly accepting of short-term mating arrangements opposed gay rights very little regardless of which article they had read regarding gay men’s sexual behavior (Ms = approximately 2.25 in both groups, on a scale from 1-7). However, among those who were relatively less accepting of short-term mating, there was a significant difference between the two conditions: when reading an article about how gay men were more promiscuous, opposition to gay rights was higher (M = 4.25) than it was in the condition where they read about how gay men were equally promiscuous (M = 3.5).

Acceptable

By manipulating perceptions of whether gay men were promiscuous, the researchers were also able to manipulate opposition to gay rights. So, if one is interested in achieving greater support for the homosexual community, that’s important information to bear in mind. It also brings me back to the initial point I mentioned about the Gay Pride events I have attended. While I was there, I couldn’t help but wonder whether the atmosphere of sexual promiscuity surrounding the parade would be off-putting to a substantial percentage of the population (even within the gay community), and it seems that intuition was borne out by the present data. The Gay Pride events go beyond a simple celebration and acceptance of homosexuality at points, as it is frequently coupled with sexual promiscuity. It seems that many people might have less of a problem with the former issue if the latter one wasn’t tagging along.

Then again, perhaps promiscuity will be a bit more closely linked with the homosexual community in general, given that children do not result from such unions (making them less costly to engage in) and because heterosexual men are usually only as promiscuous as women allow them to be. If women were just as interested in casual sex as men, there would likely be a lot more casual sex going on. When men are attracted to other men, however, the barriers that usually holds promiscuity in check (children and women’s desires) are much weaker. That does raise the interesting question of whether a different pattern holds for lesbian relationships (which are less promiscuous than gay ones), and it’s certainly one worth pursuing.

References: Pinsof, D. & Haselton, M. (2017). The effect of the promiscuity stereotype on opposition to gay rights. PLoS ONE 12(7): e0178534. https://doi.org/10.1371/journal.pone.0178534

Not-So-Leaky Pipelines

There’s an interesting perspective many people take when trying to understand the distribution of jobs in the world, specifically with respect to men and women: they look at the percentage of men and women in a population (usually in terms of country-wide percentages, but sometimes more localized), make note of any deviations from those percentages in terms of representation in a job, and then use those deviations to suggest that certain desirable fields (but not usually undesirable ones) are biased against women. So, for instance, if women make up 50% of the population but only represent 30% of lawyers, there are some who would conclude this means the profession (and associated organizations) is likely biased against women, usually because of some implicit sexism (as evidence of explicit and systematic sexism in training or hiring practices is exceptionally hard to come by). Similar methods have been used when substituting race for gender as well.

Just another gap, no doubt caused by sexism

Most of the ostensible demonstrations of this sexism issue are wanting, and I’ve covered a number of these examples before (see here, here, here, and here). Simply put, there are a lot of factors in the world that determine where people ultimately end up working (or whether they’re working at all). Finding a consistent gap between groups tells you something is different, just not what. As such, you don’t just get to assume that the cause of the difference is sexism and call it a day. My go-to example in that regard has long been plumbing. As a profession, it is almost entirely male dominated: something like 99% of the plumbers in the US are men. That’s as large of a gender gap as you could ask for, yet I have never once seen a campaign to get more women into plumbing or complaints about sexism in the profession keeping otherwise-interested women out. Similarly, men make up about 96% of the people shot by police, but the focus on police violence has never been on getting officers to shoot fewer men per se. In those cases, most people seem to recognize that factors other than sex are the primary determinants of the observed sex differences. Correlation isn’t causation, and maybe women aren’t as interested in digging around through human waste or committing violent felonies as men are. Not to say that many men are interested, just that more of those who are end up being men.

If that was the case and these sex differences aren’t caused by sexism, any efforts that sought to “fix” the gap by focusing on sexism would ultimately be unsuccessful. At the risk of saying something too obvious, you change outcomes by changing their causes; not unrelated issues. If we have the wrong idea as to what is causing an outcome, we end up wasting time and money (which often does not belong to us) trying to change it and accomplishing very little in the process (outside of getting people annoyed at us for wasting their time and money).

Today I wanted to add to that pile of questionable claims of sexism concerning an academic neighbor to psychology: philosophy. Though I was unaware of this debate, there is apparently some contention within the field concerning the perceived under-representation of women. As is typical, the apparent under-representation of women in this field has been chalked up to sexist biases keeping women discouraged and out of a job. To be clear about things, some people are looking at the percentage of men and women in the field of philosophy, noting that it differs from their expectations (whatever those are and however they were derived), calling it under-representation because of those expectations, and then further assuming a culprit in the form of sexism. As it turns out, the data has something to say about that.

It also has some great jokes about Polish people if you’re a racist.

The data in question come from a paper by Allen-Hermanson (2017), which examined sex differences in tenure-track hiring and academic publishing in philosophy departments. The reasoning behind this line of research was that if insidious forces are at work against women in philosophy departments, we ought to expect something of a leaky pipeline: women should not be as successful as men at landing desirable, tenure-track jobs, relative to the rates at which each sex earn philosophy degrees. So, if women earned, say, 40% of the philosophy PhDs during the last year, we might expect that they get 40% of the tenure-track jobs in the next, all else being equal. Across the 10 year period examined (2005-2014), there were three years in which women were hired very slightly below their relative percentage into the tenure-track jobs (and by “very slightly” I’m talking in range of about 1-2%), one year in which it was dead even, and during the remaining six years women were hired at above the rate which would be expected by much more substantial margins (in the range of 5-10%).

Putting some rough numbers to that, women earned about 28% of the PhDs and received about 36% of the jobs in the most recent hiring seasons. It seems, then, women tended to be over-represented in those positions, on average. Other data discussed in the paper corresponds to those findings, again suggesting that women had about a 25% advantage over men in finding desirable positions (in terms of less desirable positions, men and women were hired in about equal numbers).

This finding is made all the stranger by Allen-Hermanson (2017) noting that male and female degree holders differed with respect to how often they published. On average, the new tenure-track female candidates who had never held such a position before had 0.77 publications. The comparable male number was 1.37. Of those who secured a job in 2012-2013, men averaged 2.4 publications to women’s 1.17. Not only are the men publishing about twice as much, then, but they’re also modestly less successful at landing a job (and this effect did not appear to be driven by particularly prolific publishers). While one could possibly make the case that maybe female publications are in some sense higher qualitythat remains to be seen. One could more easily make the case that female candidates were held to lower standards than male ones.

As the data currently stand, I can’t imagine many people will be making a fuss about them and crying sexism. Perhaps the men with the degrees went out to seek work elsewhere and that explains why women are over-represented. Perhaps there are other causes. The world is a complicated place, after all. The point here is that there won’t be talk about how philosophy departments are biased against men, just like there wasn’t much talk I saw last time research found a much larger academic bias in favor of women, holding candidate quality constant. I think that is largely because the data apparently favor women with respect to hiring. If the results had run in the opposite direction, I can imagine that a lot more noise would have been made about them and many people would be getting scolded right now about their tolerance of sexism. But that’s just an intuition.

“Now, if you’ll excuse me, I’m off to find bias against my group somewhere else”

When asking a question of under-representation, the most pressing matter should always be, “under-represented with respect to what expectation?” In order to say that a group is under-represented, you need to make it clear what the expected degree of representation is as well as why. We shouldn’t expect that men and women be killed by police in equal numbers unless we also expect that both groups behave more-or-less identically. We similarly shouldn’t expect that men and women enter into certain fields in the same proportion unless they have identical sets of interests. On the other hand, if the two groups are different with respect to some key factor that determines an outcome, such as interests, using sex itself is just a poor variable choice. Compared to interest in fixing toilets (and other such relevant factors), I imagine sex itself uniquely predicts very little about who ultimately ends up becoming a plumber. If we can use those better, more directly-relevant factors, we should. You don’t build your predictive model with irrelevant factors; not if accuracy is your goal, in any case.

References: Allen-Hermanson S. (2017). Leaking pipeline myths: In search of gender effects on the job market and early career publishing in philosophy. Frontiers in Psychology, 8, doi: 10.3389/fpsyg.2017.00953

Understanding Sex In Advertising

When people post videos on YouTube, one major point of interest for content creators and aggregators is to capture as much attention as possible. Your video is adrift in a sea of information and you’re trying to get as many eyes/clicks on your work as possible. In that realm, first impressions are all important: you want your video to have an attention-grabbing thumbnail image, as that will likely be the only thing viewers see before they actually click (or don’t) on it. So how do people go about capturing attention in that realm? One popular method is to ensure their thumbnail has a very emotive expression on it; a face of shock, embarrassment, stress, or any similar emotion. That’s certainly one way of attracting attention: trying to convince people there is something worth looking at, not unlike articles titled along the lines of five shocking tips for a better sex life (and number 3 will blow your mind!). Speaking of sex, that’s another popular method of grabbing attention: it’s fairly common for video thumbnails to feature people or body parts in various stages of undress. Not much will pull eyes towards a video like the promise of sex (and if you’re feeling an urge to click on that link, you’ll have experienced exactly what I’m talking about).

Case in point: most of that content is unrelated to the featured women

If sex happens to be attention grabbing, the natural question arises concerning what you might do with that attention once you have it. Much of the time, that answer will involve selling some good or service. In other words, sex is used as a form of advertising to try and sell things. “If you enjoyed that picture of a woman wearing a thong, you’ll surely love our reasonably-costed laptops!”. Something along those lines, anyway. Provided that’s your goal, lots of questions naturally start to crop up: How effective is sex at these goals? Does it capture attention well? Does it help people notice and remember your product or brand? Are those who viewed your sexy advert more likely to buy the product you’re selling? How do other factors – the sex of the person viewing the ad – contribute to your success in these realms?

These are some of the questions examined in a recent meta-analysis by Wirtz, Sparks, & Zimbres (2017). The researchers searched the literature and found about 80 studies, representing about 18,000 participants. They sought to find out what effects featuring sexually provocative material had, on average (defined in terms of style of dress, sexual behavior, innuendo, or sexual embeds, which is where hidden messages or images are placed within the ad, like the word “sex” added somewhere to the picture, which is something people apparently think is a good idea sometimes). These ads had to have been compared against a comparable, non-sexual ad for the same product to be included in the analysis to determine which was more effective.

The effectiveness of these ads were assessed across a number of domains as well, including ad recognition (in aided and unaided contexts), whether the brand being advertised in the ad could be recalled (i.e., were people paying attention to just the sex, or did they remember the product?), the positive or negative response people had to the ad, what people thought about the brand being advertised with sex, and whether the ad actually got them interested in purchasing the product (does sex sell?).

Finally, a number of potentially moderating factors that might influence these effects were considered. The first of these was gender: did these ads have different impacts on men and women? Others factors included the gender of the model used in the advertisement, the date the article was published (to see if attitudes shifted over time), the sample used (college students or not), and – most interestingly – product/ad congruity: did the type of product being advertised matter when it came to whether sex was effective? Perhaps sex might help sell a product like sun-tan lotion (as the beach might be a good place to pick up mates), but be much less effective for selling, say, laptops.

Maybe even political views

In terms of capturing attention, sex works. Of the 20 effects looking at the recall for ads, the average size was d = .38. Interesting, this effect was slightly larger for the congruent ads (d = .45), but completely reversed for the incongruent ones (d = -.45). Sex was good at getting people to remember ads selling a sex-related product, but not just generally useful. That said, they seemed better at getting people to remember just the ads. When the researchers turned to the matter of whether the brands within the ads were more likely to be recalled, the 31 effects looking at brand recognition turned out to barely break zero (d = .09). While sex might be attention-grabbing, it didn’t seem especially good at getting people to remember the objects being sold.

Regarding people’s attitudes towards the ads, sex seems like something of a wash (d = -.07). Digging a little deeper revealed a more nuanced pictured of these reactions, though: while sexual ads seemed to be a modest hit with the men (d = .27), they had the opposite effect on women (d = -.38). Women seemed to dislike the ads modestly more than men liked them, as sexual strategies theory would suggest (for the record, the type of model being depicted didn’t make much of a difference. In order, people liked males models the least (d = -.28), then female models (d = -.20), and couples were mildly positive, d = .08).

Curiously, both the men and women seemed to be agreement regarding their stance towards brands that used sex to sell things: negative, on the whole (d – =.22). For women, this makes some intuitive sense: they didn’t see to be a fan of the sexual ads, so they weren’t exactly feeling too ingratiated towards the brand itself. But why were the men negatively inclined towards the brand if they were favorably inclined towards the ads? I can only speculate on that front, but I assume it would have something to do with their inevitable disappointment: either that the brands were promising on sex the male customers likely knew they couldn’t deliver on, or perhaps the men simply wanted to enjoy the sex part and the brand itself ended up getting in their way. I can’t imagine men would be too happy with their porn time being interrupted by an ad for toilet paper or fruit snacks mid-video.

Finally, turning the matter of purchase intentions – whether the ads encouraged people to want to buy the product or not – it seemed that sex didn’t really sell, but it didn’t really seem to hurt, either (d = .01). One interesting exception in that realm was that sex appeals were actually less likely to get people to buy a product when the product being sold was incongruent with the sexual appeal (d = -.24). Putting that into a simple example, the phrase “strip club buffet” probably doesn’t wet many appetites, and wouldn’t be a strong selling point for such a venue. Sex can be something of a disease vector, and associating your food with that might illicit more than a bit of disgust.

“Oh good, I was starving. This seems like as good a place as any”

As I’ve noted before, context matching matters in advertising. If you’re looking to sell people something that highlights their individuality, then doing so in a mating context works better than in a context of fear (as animals aren’t exactly aiming to look distinct when predators are nearby). The same seems to hold for using sex. While it might be useful for getting eyes on your advertisement, sex is by no mean guaranteed to ensure that people like what they see once you have their attention. In that regard, sex – like any other advertising tool – needs to be used selectively, targeting the correct audience in the correct context if it’s going to succeed at increasing people’s interest in buying. Sex in general doesn’t sell. However, it might prove more effective for those with more promiscuous attitudes than those with more monogamous ones; it might prove useful if advertising a product related to sex or mating, but not useful for selling domain names (like the old GoDaddy commercials; coincidentally, GoDaddy was also the brand I used to register this site); it might work better if you associate your product with things that lead to sex (like status), rather than sex itself. These are all avenues worth pursuing further to see when, where, and why sex works or fails.

That said, it is still possible that sex might prove useful, even in some inappropriate contexts. Consider the following hypothetical example: people will consider buying a product only after they have seen an advertisement for it. Advertisement X isn’t sexual, but when paired with the product will increase people’s intentions to buy it by 10%. However, it will also not really get noticed by many people, as the content is bland. By contrast, advertisement Y is sexual, will decrease people’s intentions to buy a product by 10%, but will also get four-times as many eyes on it. The latter ad might well be more successful, as it will capture the eye of more potential customers that may still buy the product despite the inappropriate use of sexWhile targeting advertisements might be more effective, the attention model of advertising shouldn’t be ruled out entirely, especially if targeting advertising would prove too cumbersome.

References: Wirtz, J., Sparks, J., & Zimbres, T. (2017). The effect of exposure to sexual appeals in advertisements on memory, attitude, and purchase intention: A meta-analytic review. International Journal of Advertising, https://doi.org/10.1080/02650487.2017.1334996

 

Divorced Dads And Their Daughters

Despite common assumptions, parents have less of an impact on their children’s future development than they’re often credited with. Twins reared apart usually aren’t much different than twins reared together, and adopted children don’t end up resembling their adoptive parents substantially more than strangers. While parents can indeed affect their children’s happiness profoundly, a healthy (and convincing) literature exists supporting the hypothesis that differences in parenting behaviors don’t do a whole lot of shaping in terms of children’s later personalities (at least when the child isn’t around the parent; Harris, 2009). This makes a good deal of theoretical sense, as children aren’t developing to be better children; they’re developing to become adults in their own right. What children learn works when it comes to interacting with their parents might not readily translate to the outside world. If you assume your boss will treat you the same way your parents would, you’re likely in for some unpleasant clashes with reality. 

“Who’s a good branch manager? That’s right! You are!”

Not that this has stopped researchers from seeking to find ways that parent-child interactions might shape children’s future personalities, mind you. Indeed, I came upon a very new paper purporting to do just that this last week. It suggested that the quality of a father’s investment in his daughters causes shifts in his daughter’s willingness to engage in risky sexual behavior (DelPriore, Schlomer, & Ellis, 2017). The analysis in the paper is admittedly a bit tough to follow, as the authors examine three- and even four-way interactions (which are difficult to keep straight in one’s mind: the importance of variable A changes contingent on the interaction between B, C, & D), so I don’t want to delve too deeply into the specific details. Instead, I want to discuss the broader themes and design of the paper.

Previous research looking at parenting effects on children’s development often suffers from the problem of relatedness, as genetic similarities between parents and children make it hard to tease apart the unique effects of parenting behaviors (how the parents treat their children) from natural resemblances (nice parents have nice children). In a simple example, parents who love and nurture their children tend to have children who grow up kinder and nicer, while parents who neglect their children tend to have children who grow up to be mean. However, it seems likely that parents who care for their children are different in some important regards than those who neglect them, and those tendencies are perfectly capable of being passed on through shared genes. So are the nice kids nice because of how their parents treated them or because of inheritance? The adoption studies I mentioned previously tend to support the latter interpretation. When you control for genetic factors, parenting effects tend to drop out.

What’s good about the present research is its innovative design to try and circumvent this issue of genetic similarities between children and parents. To accomplish this goal, the authors examined (among other things) how divorce might affect the development of different daughters within the same family. The reasoning for doing so seems to go roughly as follows: daughters should base their sexual developmental trajectory, in part, on the extent of paternal investment they’re exposed to during their early years. When daughters are regularly exposed to fathers that invest in them and monitor their behavior, they should come to expect that subsequent male parental investment will be forthcoming in future relationships and avoid peers who engage in risky sexual behavior. The net result is that such daughters will engage in less risky sexual behavior themselves. By contrast, when daughters lack proper exposure to an investing father, or have one who does not monitor their peer behavior as tightly (due to divorce), they should come to view future male investment as unlikely, associate with those who engage in riskier sexual behavior, and engage in such behavior themselves.

Accordingly, if a family with two daughters experiences a divorce, the younger daughter’s development might be affected differently than the older daughter’s, as they have different levels of exposure to their father’s investment. The larger this age gap between the daughters, the larger this effect should be. After recruiting 42 sister pairs from intact families and 59 sister pairs from divorced families and asking them some retrospective questions about what their life was like growing up, this is basically the result the authors found. Younger daughters tended to receive less monitoring than older daughters in families of divorce and, accordingly, tended to associate with more sexually-risky peers and engage in such behaviors themselves. This effect was not present in biologically intact families. Do we finally have some convincing evidence of parenting behaviors shaping children’s personalities outside the home?

Look at this data and tell me the first thing that comes to your mind

I don’t think so. The first concern I would raise regarding this research is the monitoring measure utilized. Monitoring, in this instance, represented a composite score of how much information the daughters reported their parents had about their lives (rated from (1) didn’t know anything, (2) knew a little, or (3) knew a lot) in five domains: who their friends were, how they spent their money, where they spent their time after school, where they were at night, and how they spent their free time. While one might conceptualize that as monitoring (i.e., parents taking an active interest in their children’s lives and seeking to learn about/control what they do), it seems that one could just as easily think of that measure as how often children independently shared information with their parents. After all, the measure doesn’t specify, “how often did your parents try to learn about your life and keep track of your behavior?” It just asked about how much they knew.

To put that point concretely, my close friends might know quite a bit about what I do, where I go, and so on, but it’s not because they’re actively monitoring me; it’s because I tell them about my day voluntarily. So, rather than talking about how a father’s monitoring of his daughter might have a causal effect on her sexual behavior, we could just as easily talk about how daughters who engage in risky behavior prefer not to tell their parents about what they’re doing, especially if their personal relationship is already strained by divorce.

The second concern I have concerns divorce itself. Divorce can indeed affect the personal relationships of children with their parents. However, that’s not the only thing that happens after a divorce. There are other effects that extend beyond emotional closeness. An important example of these other factors are the financial ones. If a father has been working while the mother took care of the children – or if both parents were working – divorce can result in massive financial hits for the children (as most end up living with their mother or in a joint custody arrangement). The results of entering additional economic problems into an already emotionally-upsetting divorce can entail not only additional resentment between children and parents (and, accordingly, less sharing of information between them; the reduced monitoring), but also major alterations to the living conditions of the children. These lifestyle shifts could include moving to a new home, upsetting existing peer relations, entering new social groups, and presenting children with new logistical problems to solve.

Any observed changes in a daughter’s sexual behavior in the years following a divorce, then, can be thought of as a composite of all the changes that take place post-divorce. While the quality and amount of the father-daughter relationship might indeed change during that time, there are additional and important factors that aren’t controlled for in the present paper.

Too bad the house didn’t split down the middle as nicely

The final concern I wanted to discuss was more of a theoretical one, and it’s slightly larger than the methodological points above. According to the theory proposed at the beginning of the paper:

“…the quality of fathering that daughters receive provides information about the availability and reliability of male investment in the local ecology, which girls use to calibrate their mating behavior and expectations for long-term investment from future mates.”

This strikes me as a questionable foundation for a few reasons. First, it would require that the relationship of a daughter’s parents are substantially predictive of the relationships she is likely to encounter in the world with regard to male investment. In other words, if your father didn’t invest in your mother (or you) that heavily (or at least during your childhood), that needs to mean that many other potential fathers are likely to do the same to you (if you’re a girl). This would further require, then, that male investment be appreciably uniform across time in the world. If male investment wasn’t stable between males and across time within a given male, then trying to predict the general availability of future male investment from your father’s seems like a losing formula for accuracy.

It seems unlikely the world is that stable. For similar reasons, I suggested that children probably can’t accurately gauge future food availability from their access to food at a young age. Making matters even worse in this regard is that, unlike food shortages, the presence or absence of male parental investment doesn’t seem like the kind of thing that will be relatively universal. Some men in a local environment might be perfectly willing to invest heavily in women while others are not. But that’s only considering the broad level: men who are willing to invest in general might be unwilling to invest in a particular woman, or might be willing or unwilling to invest in that woman at different stages in her life, contingent on her mate value shifting with age. Any kind of general predictive power that could be derived about men in a local ecology seems weak indeed, especially if you are basing that decision off a single relationship: the one between your parents. In short, if you want to know what men in your environment are generally like, one relationship should be as informative as another. There doesn’t seem to be a good reason to assume your parents will be particularly informative.

Matters get even worse for the predictive power of father-daughter relationships when one realizes the contradiction between that theory and the predictions of the authors. The point can be made crystal clear simply by considering the families examined in this very study. The sample of interest was comprised of daughters from the same family who had different levels exposure to paternal investment. That ought to mean, if I’m following the predictions properly, that the daughters – the older and younger one – should develop different expectations about future paternal investment in their local ecology. Strangely, however, these expectations would have been derived from the same father’s behavior. This would be a problem because both daughters cannot be right about the general willingness of males to invest if they hold different expectations. If the older daughter with more years of exposure to her father comes to believe male investment will be available and the younger daughter with fewer years of exposure comes to believe it will be unavailable, these are opposing expectations of the world.

However, if those different expectations are derived from the same father, that alone should cast doubt on the ability of a single parental relationship to predict broad trends about the world. It doesn’t even seem to be right within families, let alone between them (and it’s probably worth mentioning at this point that, if children are going to be right about the quality of male investment in their local ecology more generally, all the children in the same area should develop similar expectations, regardless of their parent’s behavior. It would be strange for literal neighbors to develop different expectations of general male behavior in their local environment just because the parents of one home got divorced while the other stayed together. Then again, it should strange for daughters of the same home to develop different expectations, too).

Unless different ecologies have rather sharp boarders

On both a methodological and theoretical level, then, there are some major concerns with this paper that render its interpretation suspect. Indeed, at the heart of the paper is a large contradiction: if you’re going to predict that two girls from the same family develop substantially different expectations about the wider world from the same father, then it seems impossible that the data from that father is very predictive of the world. In any case, the world doesn’t seem as stable as it would need to be for that single data point to be terribly useful. There ought not be anything special about the relationship of your parents (relative to other parents) if you’re looking to learn something about the world in general.

While I fully expect that children’s lives following their parents divorce will be different – and those differences can affect development, depending on when they occur – I’m not so sure that the personal relationship between fathers and daughters is the causal variable of primary interest.

References: DelPriore, D., Schlomer, G., & Ellis, B. (2017). Impact of Fathers on Parental Monitoring of Daughters and Their Affiliation With Sexually Promiscuous Peers: A Genetically and Environmentally Controlled Sibling Study. Developmental Psychology. Advance online publication. http://dx.doi.org/10.1037/dev0000327

Harris, J. (2009) The Nurture Assumption: Why Children Turn Out the Way They Do. Free Press, NY.

Why Do So Many Humans Need Glasses?

When I was very young, I was given an assignment in school to write a report on the Peregrine Falcon. One interesting fact about this bird happens to be that it’s quite fast: when the bird spots prey (sometimes from over a mile away) it can enter into a high-altitude dive, reaching speeds in excess of 200 mph, and snatch its prey out of midair (if you’re interested in watching a video of such a hunt, you can check one out here). The Peregrine would be much less capable of achieving these tasks – both the location and capture of prey – if its vision was not particularly acute: failures of eyesight can result in not spotting the prey in the first place, or failing to capture it if distances and movements aren’t properly tracked. For this reason I suspect (though am not positive) that you’ll find very few Peregrines that have bad vision: their survival depends very heavily on seeing well. These birds would probably not be in need of corrective lens, like the glasses and contacts that humans regularly rely upon in modern environments. This raises a rather interesting question: why do so many humans wear glasses?

And why does this human wear so many glasses?

What I’m referring to in this case is not the general degradation of vision with age. As organisms age, all their biological systems should be expected to breakdown and fail with increasing regularity, and eyes are no exception. Crucially, all these systems should be expected to all breakdown, more-or-less, at the same time. This is because there’s little point in a body investing loads of metabolic resources into maintaining a completely healthy heart that will last for 100 years if the liver is going to shut down at 60. The whole body will die if the liver does, healthy heart (or eyes) included, so it would be adaptive to allocate those development resources differently. The mystery posed by frequently-poor human eyesight is appreciably different, as poor vision can develop early in life; often before puberty. When you observe apparent maladaptive development early in life like that, it requires another type of explanation.

So what might explain why human visual acuity appears so lackluster early in life (to the tune of over 20% of teenagers using corrective lenses)? There are a number of possible explanations we might entertain. The first of these is that visual acuity hasn’t been terribly important to human populations for some time, meaning that having poor eyesight did not have an appreciable impact on people’s ability to survive and reproduce. This strikes me as a rather implausible hypothesis on the face of it not only because vision seems rather important for navigating the world, but also because it ought to predict that having poor vision should be something of a species universal. While 20% of young people using corrective lenses is a lot, eyes (and the associated brain regions dedicated to vision) are costly organs to grow and maintain. If they truly weren’t that important to have around, then we might expect that everyone needs glasses to see better; not just pockets of the population. Humans don’t seem to resemble the troglobites that have lost their vision after living in caves away from sunlight for many generations.

Another possibility is that visual acuity has been important – it’s adaptive to have good vision – but people’s eyes fail to develop properly sometimes because of development insults, like infectious organisms. While this isn’t implausible in principle – infectious agents have been known to disrupt development and result in blindness, deafness, and even death on the extreme end – the sheer numbers of people who need corrective lenses seem a bit high to be caused by some kind of infection. Further, the numbers of younger children and adults who need glasses appear to have been rising over time, which might seem strange as medical knowledge and technologies have been steadily improving. If the need for glasses is caused by some kind of infectious agent, we would need to have been unaware of its existence and not accidentally treated it with antibiotics or other such medications. Further, we might expect glasses to be associated with other signs of developmental stress, like bodily asymmetries, low IQ, or other such outcomes. If your immune system didn’t fight off the bugs that harmed your eyes, it might not be good enough to fight off other development-disrupting infections. However, there seems to be a positive correlation between myopia and intelligence, which would be strange under a disease hypothesis.

The negative correlation with fashion sense begs for explanation, too

A third possible explanation is that visual acuity is indeed important for humans, but our technologies have been relaxing the selection pressures that were keeping it sharp. In other words, since humans invented glasses and granted those who cannot see as well a crutch to overcome this issue, any reproductive disadvantage associated with poor vision was effectively removed. It’s an interesting hypothesis that should predict people’s eyesight in a population begins to get worse following the invention and/or proliferation of corrective lenses. So, if glasses were invented in Italy around 1300, that should have lead to the Italian population’s eyesight growing worse, followed by the eyesight of other cultures to which glasses spread but not beforehand. I don’t know much about the history of vision across time in different cultures, but something tells me that pattern wouldn’t show up if it could be assessed. In no small part, that intuition is driven by the relatively-brief window of historical time between when glasses were invented, and subsequently refined, produced in sufficient numbers, distributed globally, and today. A window of only about 700 years for all of that to happen and reduce selection pressures for vision isn’t a lot of time. Further, there seems to be evidence that myopia can develop rather rapidly in a population, sometimes as quick as a generation:

One of the clearest signs came from a 1969 study of Inuit people on the northern tip of Alaska whose lifestyle was changing2. Of adults who had grown up in isolated communities, only 2 of 131 had myopic eyes. But more than half of their children and grandchildren had the condition. 

That’s much too fast for a relaxation of selection pressures to be responsible for the change.

This brings us to the final hypothesis I wanted to cover today: an evolutionary mismatch hypothesis. In the event that modern environments differ in some key ways from the typical environments humans have faced ancestrally, it is possible that people will develop along an atypical path. In this case, the body is (metaphorically) expecting certain inputs during its development, and if they aren’t received things can go poorly. As a for instance, it has been suggested that people develop allergies, in part, as a result of improved hygiene: our immune systems are expecting a certain level of pathogen threat which, when not present, can result in our immune system attacking inappropriate targets, like pollen.

There does seem to be some promising evidence on this front for understanding human vision issues. A paper by Rose et al (2008) reports on myopia in two samples of similarly-aged Chinese children: 628 children living in Singapore and 124 living in Sydney. Of those living in Singapore, 29% appeared to display myopia, relative to only 3% of those living in Sydney. These dramatic differences in rates of myopia are all the stranger when you consider the rates of myopia in their parents were quite comparable. For the Sydney/Singapore samples, respectively, 32/29% of the children had no parent with myopia, 43/43% had one parent with myopia, and 25/28% had two parents with myopia. If myopia was simply the result of inherited genetic mutations, its frequencies between countries shouldn’t be as different as they are, disqualifying hypotheses one and three from above.

When examining what behavioral correlates of myopia existed between countries, several were statistically – but not practically – significant, including number of books read and hours spent on computers or watching TV. The only appreciable behavioral difference between the two samples was the number of hours the children tended to spend outdoors. In Sydney, the children spent an average of about 14 hours a week outside, compared to a mere 3 hours in Singapore. It might be the case, then, that the human eye requires exposure to certain kinds of stimulation provided by outdoor activities to develop properly, and some novel aspects of modern culture (like spending lots of time indoors in a school when children are young) reduce such exposure (which might also explain the aforementioned IQ correlation: smarter children may be sent to school earlier). If that were true, we should expect that providing children with more time outdoors when they are young is preventative against myopia, which it actually seems to be.

Natural light and no Wifi? Maybe I’ll just go blind instead…

It should always strike people as strange when key adaptive mechanisms appear to develop along an atypical path early in life that ultimately makes them worse at performing their function. An understanding of what types of biological explanations can account for these early maladaptive outcomes goes a long way in helping you understand where to begin your searches and what patterns of data to look out for.

References: Rose, K., Morgan, I., Smith, W., Burlutsky, G., Mitchell, P., & Saw, S. (2008). Myopia, lifestyle, and schooling in students of Chinese ehtnicity in Singapore and Sydney. Archives of Ophthalmology, 126, 527-530.

More About Dunning-Kruger

Several years back I wrote a post about the Dunning-Kruger effect. At the time I was still getting my metaphorical sea legs for writing and, as a result, I don’t think the post turned out as well as it could have. In the interests of holding myself to a higher standard, today I decided to revisit the topic both in the interests of improving upon the original post and generating a future reference for me (and hopefully you) when discussing it with others. This is something of a time-saver for me because people talk about the effect frequently despite, ironically, not really understanding it too deeply.

First things first, what is the Dunning-Kruger effect? As you’ll find summarized just about everywhere, it refers to the idea that people who are below-average performers in some domains – like logical reasoning or humor – will tend to judge their performance as being above average. In other words, people are inaccurate at judging how well their skills stack up to their peers or, in some cases, to some objective standard. Moreover, this effect gets larger the more unskilled one happens to be. Not only are the worst performers worse at the task then others, but they’re also worse at understanding they’re bad at the task. This effect was said to obtain because people need to know what good performance is before they can accurately assess their own. So, because below-average performers don’t understand how to perform a task correctly, they also lack the skills to judge their performance accurately, relative to others.

Now available at Ben & Jerry’s: Two Scoops of Failure

As mentioned in my initial post (and by Kruger & Dunning themselves), this type of effect shouldn’t extend to domains where production and judging skills can be uncoupled. Just because you can’t hit a note to save your life on karaoke night, that doesn’t mean you will be unable to figure out which other singers are bad. This effect should also be primarily limited to domains in which the feedback you receive isn’t objective or standards for performance are clear. If you’re asked to re-assemble a car engine, for instance, unskilled people will quickly realize they cannot do this unassisted. That said, to highlight the reason why the original explanation for this finding doesn’t quite work – not even for the domains that were studied in the original paper – I wanted to examine a rather important graph of the effect from Kruger & Dunning (1999) with respect to their humor study:

My crudely-added red arrows demonstrate the issue. On the left-hand side, we see what people refer to as the Dunning-Kruger effect: those who were the worst performers in the humor realm were also the most inaccurate in judging their own performance, compared to others. They were unskilled and unaware of it. However, the right-hand side betrays the real issue that caught my eye: the best performers were also inaccurate. The pattern you should expect, according to the original explanation, is that the higher one’s performance, the more accurately they estimate their relative standings, but what we see is that the best performers aren’t quite as accurate as those who are only modestly above average. At this point, some of you might be thinking that this point I’m raising is basically a non-issue because the best performers were still more accurate than the worst performers, and the right-hand inaccuracy I’m highlighting isn’t appreciable. Let me try to persuade you otherwise.

Assume for a moment that people were just guessing as to how they performed, relative to others. Because having a good sense of humor is a socially-desirable skill, people all tend to rate themselves “modestly above-average” in the domain to try and persuade others they actually are funny (and because, in that moment, there are no consequences to being wrong). Despite these just being guesses, those who actually are modestly above-average will appear to be more accurate in their self-assessment than those who are in the bottom half of the population; that accuracy just doesn’t have anything to do with their true level of insight into their abilities (referred to as their meta-cognitive skills). Likewise, those who are more than modestly above average (i.e. are underestimating their skills) will be less accurate as well; there will just be fewer of them than those who overestimated their abilities.

Considering the findings of Kruger & Dunning (1999) on the whole, the above scenario I just outlined doesn’t reflect reality perfectly. There was a positive correlation between people’s performance and their rating of their relative standing (r = .39), but, for the most part, people’s judgments of their own ability (the black line) appear relatively uniform. Then again, if you consider their results in studies two and three of that same paper (logical reasoning and grammar), the correlations between performance and judgments of performance relative to others drop to a low of r = .05 ranging up to a peak of r = .19, which was statistically significant. People’s judgments of their relative performance were almost flat across several such tasks. To the extent these meta-cognitive judgments of performance use actual performance as an input for determining relative standings, it’s clearly not the major factor for either low or high performers.

They all shop at the same cognitive store

Indeed, actual performance shouldn’t be expected to be the primary input for these meta-cognitive systems (the ones that generate relative judgments of performance) for two reasons. The first of these is the original performance explanation posited by Kruger & Dunning (1999): if the system generating the performance doesn’t have access to the “correct” answer, then it would seem particularly strange that another system – the meta-cognitive one – would have access to the correct answer, but only use it to judge performance, rather than to help generate it.

To put that in a quick memory example, say you were experiencing a tip-of-the-tongue state, where you are sure you know the right answer to a question, but you can’t quite recall it.  In this instance, we have a long-term memory system generating performance (trying to recall an answer) and a meta-cognitive system generating confidence judgments (the tip-of-the-tongue state). If the meta-cognitive system had access to the correct answer, it should just share it with the long-term memory system, rather than using the correct answer to tell the other system to keep looking for the correct answer. The latter path is clearly inefficient and redundant. Instead, the meta-cognitive system should use some cues other than direct access to information in generating its judgments.

The second reason actual performance (relative to others) wouldn’t be an input for these meta-cognitive systems is that people don’t have reliable and accurate access to population-level data. If you’re asking people how funny they are relative to everyone else, they might have some sense for it (how funny are you, relative to some particular people you know), but they certainly don’t have access to how funny everyone is because they don’t know everyone; they don’t even know most people. If you don’t have the relevant information, then it should go without saying that you cannot use it to help inform your responses.

Better start meeting more people to do better in the next experiment

So if these meta-cognitive systems are using inputs other than accurate information in generating their judgments about how we stack up to others, what would those inputs be? One possible input would be task difficulty, not in the sense of how hard the task objectively is for a person to complete, but rather in terms of how difficult a task feels. This means that factors like how quickly an answer can be called to mind likely play a role in these judgments, even if the answer itself is wrong. If judging the humor value of a joke feels easy, people might be inclined to say they are above average in that domain, even if they aren’t.

This yields an important prediction: if you provide people with tasks that feel difficult, you should see them largely begin to guess they are below-average in that domain. If everyone is effectively guessing that they are below average (regardless of their actual performance), this means that those who perform the best will be the most inaccurate in judging their relative ability. In tasks that feel easy, people might be unskilled and unaware; for those that feel hard, people might be skilled but still unaware.

This is precisely what Burson, Larrick, & Klayman (2006) tested, across three studies. While I won’t go into details about the specifics of all their studies (this is already getting long), I will recreate a graph from one of their three studies that captures their overall pattern of results pretty well:

As we can see, when the domains being tested became harder, it was now the case that the worst performers were more accurate in estimating their percentile rank than the best ones. On tasks of moderate difficulty, the best and worst performers were equally calibrated. However, it doesn’t seem that this accuracy is primarily due to their real insights into their performance; it just so happened to be the case that their guesses landed closer to the truth. When people think, “this task is hard,” they all seem to estimate their performance as being modestly below average; when the task feels easy instead, they all seem to estimate their performance as being modestly above average. The extent to which that matches reality is largely due to chance, relative to true insight.

Worth noting is that when you ask people to make different kinds of judgments, there is (or at least can be) a modest average advantage for top performers, relative to bottom ones. Specifically, when you ask people to judge their absolute performance (i.e., how many of these questions did you get right?) and compare that to their actual performance, the best performers sometimes had a better grasp on that estimate than the worst ones, but the size of that advantage varied depending on the nature of the task and wasn’t entirely consistent. Averaged across the studies reported by Burson et al (2006), top-half performers displayed a better correlation between their perceived and actual absolute performance (r = .45), relative to bottom performers (r = .05). The corresponding correlations for actual and relative percentiles were in the same direction, but lower (rs = .23 and .03, respectively). While there might be some truth to the idea that the best performers are more sensitive to their relative rank, the bulk of the miscalibration seems to be driven by other factors.

Driving still feels easy, so I’m still above-average at it

These judgments of one’s relative standing compared to others appear rather difficult for people to get accurate. As they should, really; for the most part we lack access to the relevant information/feedback and there are possible social-desirability issues to contend with, coupled with a lack on consequences for being wrong. This is basically a perfect storm for inaccuracy. Perhaps worth noting is that the correlation between one’s relative performance and their actual performance was pretty close for one domain in particular in Burson et al (2006): knowledge of pop music trivia (the graph of which can seen here). As pop music is the kind of thing people have more experience learning and talking about with others, it is a good candidate for a case when these judgments might be more accurate because people do have more access to the relevant information.

The important point to take away from this research is that people don’t appear to be particularly good at judging their abilities relative to others, and this obtains regardless of whether the judges are themselves skilled or unskilled. At least for most of the contexts studied, anyway; it’s perfectly plausible that people – again, skilled and unskilled – will be better able to judge their relative (and absolute) performance when they have experience with a domain in question and have received meaningful feedback on their performance. This is why people sometimes drop out of a major or job after receiving consistent negative feedback, opting to believe they aren’t as cut out for it instead of persisting to believe they are actually above average in that context. You will likely see the least miscalibration for domains where people’s judgments of their ability need to hit reality and there are consequences for being wrong.

References: Burson, K., Larrick, R., & Klayman, J. (2006). Skilled or unskilled, but still unaware of it: How perceptions of difficulty drive miscalibration in relative comparisons. Journal of Personality & Social Psychology, 90, 60-77.

Kruger, J. & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality & Social Psychology, 77, 1121-1134.

Why Do We Roast The Ones We Love?

One very interesting behavior that humans tend to engage in is murder. While we’re far from the only species that does this (as there are some very real advantages to killing members of your species – even kin – at times), it does tend to garner quite a bit of attention, and understandably so. One very interesting piece of information about this interesting behavior concerns motives; why people kill. If you were to hazard a guess as to some of the most common motives for murder, what would you suggest? Infidelity is a good one, as is murder resulting from other deliberate crimes, like when a robbery is resisted or witnesses are killed to reduce the probability of detection. Another major factor that many might not guess is minor slights or disagreements, such as one person stepping on another person’s foot by accident, followed by an insult (“watch where you’re going, asshole!”), which is responded to with an additional insult, and things kind of get out of hand until someone is dead (Daly & Wilson, 1988). Understanding why seemingly minor slights get blown so far out of proportion is a worthwhile matter in its own right. The short-version of the answer as to why it happens is that one’s social status (especially if you’re a male) can be determined, in large part, by whether other people know they can push you around. If I know you will tolerate negative behavior without fighting back, I might be encouraged to take advantage of you in more extreme ways more often. If others see you tolerating insults, they too may exploit you, knowing you won’t fight back. On the other hand, if I know you will respond to even slight threats with violence, I have a good reason to avoid inflicting costs on you. The more dangerous you are, the more people will avoid harming you.

“Anyone else have something to say about my shirt?! Didn’t think so…”

This is an important foundation for understanding why another facet of human behavior is strange (and, accordingly, interesting): friends frequently insult each other in a manner intended to be cordial. This behavior is exemplified well by the popular Comedy Central Roasts, where a number of comedians will get together to  publicly make fun of each other and their guest of honor. If memory serves, the (unofficial?) motto of these events is, “We only roast the ones we love,” which is intended to capture the idea that these insults are not intended to burn bridges or truly cause harm. They are insults born of affection, playful in nature. This is an important distinction because, as the murder statistics help demonstrate, strangers often do not tolerate these kinds of insults. If I were to go up to someone I didn’t know well (or knew well as an enemy) and started insulting their drug habits, dead loved ones, or even something as simple as their choice of dress, I could reasonably expect anything from hurt feelings to a murder. This raises an interesting series of mysteries surrounding the matter of why the stranger might want to kill me but my friends will laugh, as well as when my friends might be inclined to kill me as well.

Insults can be spoken in two primary manners: seriously and in jest. In the former case, harm is intended, while in the latter it often isn’t. As many people can attest to, however, the line between serious and jesting insults is not always as clear as we’d like. Despite our best intentions, ill-phrased or poorly-timed jokes can do harm in much the same way that a serious insult can. This suggests that the nature of the insults is similar between the two contexts. As the function of a serious insult between strangers would seem to be to threaten or lower the insulted target’s status, this is likely the same function of an insult made in jest between friends, though the degree of intended threat is lower in those contexts. The closest analogy that comes to mind is the difference between a serious fight and a friendly tussle, where the combatants either are, or are not, trying to inflict serious harm on each other. Just like play fighting, however, things sometimes go too far and people do get hurt. I think joking insults between friends go much the same way.

This raises another worthwhile question: as friends usually have a vested interest in defending each other from outside threats and being helpful, why would they then risk threatening the well-being of their allies through such insults? It would be strange if they were all risk and reward, so it would be up to us to explain what that reward is. There are a few explanations that come to mind, all of which focus on one crucial facet of friendships: they are dynamic. While friendships can be – and often are – stable over time, who you are friends with in general as well as the degree of that friendship changes over time. Given that friendships are important social resources that do shift, it’s important that people have reliable ways of assessing the strength of these relationships. If you are not assessing these relationships now and again, you might come to believe that your social ties are stronger than they actually are, which can be a problem when you find yourself in need of social support and realize that you don’t have it. Better to assess what kind of support you have before you actually need it so you can tailor your behavior more appropriately.

“You guys got my back, right?….Guys?….”

Insults between friends can help serve this relationship-monitoring function. As insults – even the joking kind – carry the potential to inflict costs on their target, the willingness of an individual to tolerate the insult – to endure those costs – can serve as a credible signal for friendship quality. After all, if I’m willing to endure the costs of being insulted by you without responding aggressively in turn, this likely means I value your friendship more than I dislike the costs being inflicted. Indeed, if these insults did not carry costs, they would not be reliable indications of friendship strength. Anyone could tolerate behavior that didn’t inflict costs to maintain a friendship, but not everyone will tolerate behaviors that do. This yields another prediction: the degree of friendship strength can also be assessed by the degree of insults willing to be tolerated. In other words, the more it takes to “go too far” when it comes to insults, the closer and stronger the friendship between two individuals. Conversely, if you were to make a joke about your friend that they become incredibly incensed over, this might result in your reevaluating the strength of that bond: if you thought the bond was stronger than it was, you might either take steps to remedy the cost you just inflicted and make the friendship stronger (if you value the person highly) or perhaps spend less time investing in the relationship, even to the point of walking away from it entirely (if you do not).

Another possible related function of these insults could be to ensure that your friends don’t start to think too highly of themselves. As mentioned previously, friendships are dynamic things based, in part, on what each party can offer to the other. If one friend begins to see major changes to their life in a positive direction, the other friend may no longer be able to offer the same value they did previously. To put that in a simple example, if two friends have long been poor, but one suddenly gets a new, high-paying job, the new status that job affords will allow that person to make friends he likely could not before. Because the job makes them more valuable to others, others will now be more inclined to be their friend. If the lower-status friend wishes to retain their friendship with the newly-employed one, they might use these insults to potentially undermine the confidence of their friend in a subtle way. It’s an indirect way of trying to ensure the high-status friend doesn’t begin to think he’s too good for his old friends.

Such a strategy could be risky, though. If the lower-status party can no longer offer the same value to the higher-status one, relative to their new options, that might also not be the time to test the willingness of the higher-status one to tolerate insults. At the same time, times of change are also precisely when the value of reassessing relationship strength can be at its highest. There’s less of a risk of a person abandoning a friendship when nothing has changed, relative to when it has. In either case, the assessment and management of social relationships is likely the key for understanding the tolerance of insults from friends and intolerance of them from strangers.

“Enjoy your new job, sellout. You used to be cool”

This analysis can speak to another interesting facet of insults as well: they’re directed towards the speaker at times, referred to self-deprecating humor when done in jest (and just self-deprecation when not). It might seem strange that people would insult themselves, as it would act to directly threaten their own status. That people do so with some regularity suggests there might be some underlying logic to these self-directed insults as well. One possibility is that these insults do what was just discussed: signal that one doesn’t hold themselves in high esteem and, accordingly, signal that one isn’t “too good” to be your friend. This seems like a profitable place from which to understand self-depreciating jokes. When such insults directed towards the self are not made in jest, they likely carry additional implications as well, such as that expectations should be set lower (e.g., “I’m really not able to do that”) or that one is in need of additional investment, relative to the joking kind. 

References: Daly, M. & Wilson, M. (1988). Homicide. Aldine De Gruyter: NY.

To Meaningfully Talk About Gender

Let’s say I was to tell you I am a human male. While this sentence is short and simple, the amount of information you could glean from it is a potential goldmine, assuming you are starting from a position of near total ignorance about me. First, it provides you with my species identification. In the most general sense, that lets you know what types of organisms in the world I am capable of potentially reproducing with (to produce reproductively-viable offspring in turn). In addition to that rather concrete fact, you also learn about my likely preferences. Just as humans share a great deal of genes in common (which is why we can reproduce with one another), we also share a large number of general preferences and traits in common (as these are determined heavily by our genes). For instance, you likely learn that I enjoy the taste of fruit, that I make my way around the world on two feet, and that hair continuously grows from the top of my head but much more sparingly on the rest of my body, among many other things. While these probable traits might not hold true for me in particular – perhaps I am totally hairless/covered in hair, have no legs, and find fruit vile – they do hold for humans more generally, so you can make some fairly-educated guesses as to what I’m like in many regards even if you know nothing else about me as a person. It’s not a perfect system, but you’ll do better on average with this information than you would if you didn’t have it. To make the point crystal clear, imagine trying to figure out what kind of things I liked if you didn’t even know my species. 

Could be delicious or toxic, depending on my species. Choose carefully.

When you learn that I am a male, you learn something concrete about the sex chromosomes in my body: specifically, that I have an XY configuration and tend to produce particular types of gametes. In addition to that concrete fact, you also learn about my likely traits and preferences. Just as humans share a lot of traits in common, males tend to share more traits in common with each other than they do with females (and vice versa). For instance, you likely learn that the distribution of muscle mass in my upper body is more substantial than females, that I have a general willingness to relax my standards when it comes to casual sex, that I have a penis, and that I’m statistically more likely to murder you than a female (I’m also more likely to be murdered myself, for the record). Again, while these might not all hold true for me specifically, if you knew nothing else about me, you could still make some educated guesses as to what I enjoy and my probable behavior because of my group membership.

One general point I hope these examples illuminate is that, to talk meaningful about a topic, we need to have a clear sense for our terms. Once we know what the terms “human” and “male” mean, we can begin to learn a lot about what membership in those groups entail. We can learn quite a bit about deviations from those general commonalities as well. For instance, some people might have an XY set of chromosomes and no penis. This would pose a biological mystery to us, while someone having an XX set and no penis would pose much less of one. The ability to consistently apply a definition – even an arbitrary one – is the first step in being able to say something useful about a topic. Without clear boundary conditions on what we’re talking about, you can end up with people talking about entirely different concepts using the same term. This yields unproductive discussions and is something to be avoided if you’re looking to cut down on wasted time.

Speaking of unproductive discussions, I’ve seen a lot of metaphorical ink spilled over the concept of gender; a term that is supposed to be distinct from sex, yet is highly related to it. According to many of the sources one might consult, sex is supposed to refer to biological features (as above), while gender is supposed to refer, “…to either social roles based on the sex of the person (gender role) or personal identification of one’s own gender based on an internal awareness (gender identity).” I wanted to discuss the latter portion of that gender definition today: the one referring to people’s feelings about their gender. Specifically, I’ve been getting the growing sense that this definition is not particularly useful. In essence, I’m not sure it really refers to anything in particular and, accordingly, doesn’t help advance our understanding of much in the world. To understand why, let’s take a quick trip through some interesting current events. 

Some very colorful, current events…

In this recent controversy, a woman called Rachel Dolezal claimed her racial identity was black. The one complicating factor in her story is that she was born to white parents.  Again, there’s been a lot of metaphorical ink spilled over the issue (including the recent mudslinging directed at Rebecca Tuvel who published a paper on the matter), with most of the discussions seemingly unproductive and, from what I can gather, mean-spirited. What struck me when I was reading about the issue is how little of those discussions explicitly focused on what should have been the most important, first point: how are we defining our terms when it comes to race? Those who opposed Rachel’s claims to be black appear to fall back on some kind of implicit hereditary definition: that one or more of one’s parents need to be black in order to consider oneself a member of that group. That’s not a perfect definition as we need to then determine what makes a parent black, but it’s a start. Like the definition of sex I gave above, this concept of race references some specific feature of the world that determines one racial identity and I imagine it makes intuitive sense to most people. Crucially, this definition is immune to feelings. It doesn’t matter if one is happy, sad, indifferent, or anything else with respect to their ethnic heritage; it simply is what it is regardless of those feelings. In this line of thinking, Rachel is white regardless of how she feels about it, how she wears her hair, dresses, acts, or even whether we want to accept her identification as black and treat her accordingly (whatever that is supposed to entail). What she – or we – feel about her racial identity is a different matter than her heritage.

On the other side of the issue, there are people (notably Rachel herself) who think that what matters is how you feel when it comes to determining identity. If you feel black (i.e., your internal awareness tells you that you’re black), then you are black, regardless of biological factors or external appearances. This idea runs into some hard definitional issues, as above: what does it mean to feel black, and how is it distinguished from other ethnic feelings? In other words, when you tell me that you feel black, what am I supposed to learn about you? Currently, that’s a big blank in my mind. This definitional issue is doubly troubling in this case, however, because if one wants to say they are black because they feel black, then it seems one first needs to identify a preexisting group of black people to have any sense at all for what those group members feel like. However, if you can already identify who is and is not black from some other criteria, then it seems the feeling definition is out of place as you’d already have another definition for your term. In that case, one could just say they are white but feel like they’re black (again, whatever “feeling black” is supposed to mean). I suppose they could also say they are white and feel unusual for that group, too, without needing to claim they are a member of a different ethnic group.

The same problems, I feel, apply to the gender issue despite the differences between gender and race. Beginning with the feeling definition, the parallels are clear. If someone told me they feel like a woman, a few things have to be made clear for that statement to mean anything. First, I’d need to know what being a woman feels like. In order to know what being a woman feels like, I’d need to already have identified a group of women so the information could be gathered. This means I’d need to know who was a woman and who was not in advance of learning about their specific feelings. However, if I can do that – if I can already determine who is and is not a woman – then it seems I don’t need to identify them on the basis of their feelings; I would be doing so with some other criteria. Presumably, the most common criteria leveraged in such a situation would be sex: you’d go out and find a bunch of females and ask them about what it was like to be a woman. If those responses are to be meaningful, though, you need to consider “female” to equate to “woman” which, according to definitions I listed above, it does not. This leaves us in a bit of a catch-22: we need to identify women by how they feel, but we can’t say how they feel until we identify them. Tricky business indeed (even forgoing the matter of claims that there are other genders).

Just keep piling the issues on top of each other and hope that sorts it out

On the other hand, let’s say gender is defined by some objective criteria and is distinct from sex. So, someone might be a male because of their genetic makeup but fall under the category of “woman” because, say, their psychology has developed in a female-typical pattern for enough key traits. Perhaps enough of their metaphorical developmental dials have been turned towards the female portion. Now that’s just a hypothetical example, but it should demonstrate the following point well enough: regardless of whether the male in question wants to be identified as a female or not, it wouldn’t matter in terms of this definition. It might matter a whole bunch if you want to be polite and nice to them, but not for our definition. Once we had a sense for what dials – or how many of them – needed to be flipped to “female” and had a way of measuring that for a male to be considered a woman, one’s internal awareness seems to be besides the point.

While this definition helps us talk more meaningfully about gender, at least in principle, it also seems like the gender term is a little unnecessary. If we’re just using “man” as a synonym for “male” and “woman” as one for “female”, then the entire sex/gender distinction kind of falls apart, which defeats the whole purpose. You wouldn’t feel like a man; you’d feel like a male (whatever that feels like, and I say that as a male myself). Rather than calling our female-typical male a woman, we could also call him an atypical man.

The second issue with this idea nagging at me is that almost all traits do not run on a spectrum from male to female. Let’s consider traits with psychological sex differences, like depression or aggression. Since females are more likely to experience depression than males, we could consider experiencing depression as something that pushes one towards the “woman” end of the gender spectrum. However, when one feels depressed, they don’t feel like a woman; they feel sad and hopeless. When someone feels aggressive, they don’t feel like a man; they feel angry and violent. The same kind of logic can be applied to most other traits as well, including components of personality, risk-seeking, and so on. These don’t run on a spectrum between male/masculine and female/feminine, as it would make no sense to say that one has a feminine height.

If this still all sounds very confusing to you, then you’re on the same page as me. As far as I’ve seen, it is incredibly difficult for people to verbalize anything of a formal definition or set of standards that tells us who falls into one category or the other when it comes to gender. In the absence of such a standard, it seems profitable to just discard the terms and find something better – something more precise – to use instead.

Unusual Names In Learning Research

Learning new skills and bodies of knowledge takes time, repetition, and sustained effort. It’s a rare thing indeed for people to learn even simple skills or bodies of knowledge fluently with only a single exposure to them if they’re properly motivated. Given the importance of learning to succeed in life, a healthy body of literature in psychology examines people’s ability to learn and remember information. This literature extends both to how we learn successfully and the contexts in which we fail. Good research in this realm will often leverage something in the way of adaptive function for understanding why we learn what we do. It is unfortunate that this theoretical foundation appears to be lacking in much of the research on psychology in general, with learning and memory research being no exception. In the course I taught on the topic last semester, for instance, I’m not entirely sure the world “relevance” appeared once in the textbook I was using to help the reader understand our memory mechanisms. There was, however, a number of parts of that book which caught my attention, though not for the best reasons.

You have my attention, but no longer have a working car.

Recently, for instance, I came upon a reference to a phenomenon called the labor-in-vain effect through this textbook. In it, the effect was summarized as such: 

Here’s the basic methodology. Nelson and Leonesio (1988) asked participants to study words paired with nonsense syllables (e.g., monkey–DAX). Participants made judgments of learning in an initial stage. Then, when given a chance to study the items again, each participant could choose the amount of time to study for each item. Finally, in a cued recall test, participants were given the English word and asked to recall the nonsense syllable….Even though they spent most of their time studying the difficult items, they were still better at remembering the easy ones. For this reason, Nelson and Leonesio labeled the effect labor in vain because their experiment showed that participants were unable to compensate for the difficulty of those items

As I like to be thorough when preparing the materials for my course, I did what every self-respecting teacher should do (even though not all of them will): I went to go track down and read the primary literature upon which this passage was based. Professors (or anyone who wants to talk about these findings) ought to go read the source material themselves for two reasons: first, because you want to be an expert in the material you’re teaching your students about (why else would they be listening to you?) and, second, because textbooks – really secondary sources in general – have a bad habit of getting details wrong. What I found in this case was not only that the textbook mischaracterized the effect and failed to provide crucial details about the research, but the original study itself was a bit ambitious in their naming and assessment of the phenomenon. Let’s take those points in order.

First, to see why the textbook’s description wasn’t on point, let’s consider the research itself (Nelson & Leonesio, 1988). The general procedure in their experiments was as follows: participants (i.e., undergraduate students looking for extra credit) were given lists to study. In the first experiment these were trigrams (like BUG or DAX), in the second they were words paired with trigrams (like Monkey-DAX), and in the third they were tested on general-information questions they had failed to answer correctly (like, “what is the capital of Chile?”). During each experiment, the participants would be broken up into groups that either emphasized speed or accuracy in learning. Both groups were told they could study the target information at their own pace and that the goal was to remember as much of the information as possible, but the speed groups were told their study time would count against their eventual score. Following that study phase, participants were then given a recall task after a brief delay to see how successful their study time had been. 

As one might expect, the speed-emphasis groups studied the information for less time than the accuracy-emphasis groups. Crucially, the extra study time invested by the participants did not yield statistically significant gains in their ability to subsequently recall the information in 2 of the 3 experiments (in experiment three, the difference was significant). This was dubbed the labor-in-vain effect because participants were putting in extra labor for effectively little to no gain.

We can see from this summary that the textbook’s description of the labor-in-vain effect isn’t quite accurate. The labor in vain effect does not refer to the fact that participants were unable to make up the difference between the easy and hard items (which they actually did in one of the three studies); instead, it refers to the idea that the participants were not gaining anything at all from their extra study time. To quote the original paper: 

We refer to this finding of substantial extra study time yielding little or no gain in recall as the labor-in-vain effect. Although we had anticipated that extra study time might yield diminishing (i.e., negatively accelerated) gains in recall, the present findings are quite extreme in showing not even a reliable gain in recall after more than twice as much extra study time.

This mischaracterization might seem like a minor error speaking to the meticulousness of the author, but that’s not the only problem with the book’s presentation of the information. Specifically, the textbook provided no sense as for the exact methodological details, the associated data, and whether the interpretation of these findings were accurate. So let’s turn to those now.

If the labor will all be in vain, why bother laboring at all?

The general summary of the research I just provided is broadly true, but very important details are missing that help contextualize it. The first of these involves how the study phases of the experiments took place. Let’s just consider the first experiment, as the methods are broadly similar across the three. In the study phase, the participants had 27 trigrams to commit to memory. The participants were seated at a computer, and one of these trigrams would appear on the screen at a time. After the participants felt they had studied it enough, they would hit the enter key to advance to the next item, but they could not go back to previous items once they did. This meant there was no ability to restudy or practice test oneself in advance of the formal test. To be frank, this method of study resembles no kind that I know humans to naturally engage in. Since the context of studying in the experiment is so strange, I would be hesitant to say that it tells us much about how learning occurs in the real word, but the problems get worse than that.

As I mentioned before, these are undergraduate participants trying to earn extra credit. With that mental picture of the samples in mind, we might come to expect that the participants are a little less than motivated to deliver a flawless performance. If they’re anything like the undergraduates I’ve known, they likely just want to get the experiment over and done with so they can go back to doing things they actually want to. In terms of the interests of college students, learning nonsense syllables isn’t high on that list; in fact, I don’t think that task is high on anybody’s list. The practical information value of what they’re learning is nonexistent, and very little is riding on their success. It might come as no surprise, then, that the participants dedicated effectively no time to studying these items. Bear in mind, there were 27 of these trigrams to learn. In the speed group, the average number of seconds devoted to study was 1.9 per trigram. Two whole seconds of learning per bit of nonsense. In the accuracy group, this study time skyrocketed to a substantial…5.4 seconds.

An increase of 3.3 seconds per item does not strike me as anything I’d refer to as labor, even if the amount of study time was nominally over twice as long. A similar pattern emerged in the other two experiments. The speed/accuracy study times were 4.8 and 15.2 in the second study, and 1.2 and 8.4 in the third. Putting this together up to this point, we have (likely unmotivated, undergraduate) participants studying useless information in unnatural ways for very brief periods of time. Given that, why on Earth would anyone expect to find large differences in later recall performance?

Speaking of eventual performance, though, let’s finally consider how well each group performed during the recall task; how much of that laboring was being done in vain. In the first experiment, the speed group recalled 43% of the trigrams; the accuracy group got 49% correct. That extra study time of about 3 seconds per item yields a 6% improvement in performance. The difference wasn’t statistically significant but, again, exactly how large of an improvement should have been expected, given the context? In the second study, these percentages were 49% and 57%, respectively (a gain of 8%); in the third, they were 75% and 83% (another 8% difference that actually was statistically significant given the larger sample size for experiment 3). So, across three studies, we do not see evidence of people laboring in vain; not really. Instead, what we see is that very small amounts of extra time devoted to studying nonsense in unusual ways by people who want to be doing other things yields corresponding small – but consistent – gains in recall performance. It’s not that this labor was in vain; it’s that not much labor was invested in the first place, so the gains were minimal.  

If you want to make serious gains, you’ll need more than baby weight

On a theoretical level, it sure would be strange if people would spend substantially extra time laboring in study to make effectively no gains. Why waste all that valuable time and energy doing something that has no probability of paying off? That’s not something anyone should posit a brain would do if they were using evolutionary theory to guide their thinking. It would be strange to truly observe a labor-in-vain effect in the biological sense of the word. However, given a fuller picture of the methods of the research and the data it uncovered, it doesn’t seem like the name of that effect is particularly apt. The authors of the original paper seem to have tried to make these results sound more exciting than they are (through their naming of the effect and the use of phrases like, “…substantial extra study time,” and differences in study time that are, “highly significant,” as well as an exclamation point here and there). That the primary literature is a little ambitious is one thing, but we also saw that the secondary summary of the research by my textbook was less than thorough or accurate. Anyone reading the textbook would not leave with a good sense for what this research found. It’s not hard to imagine how this example could extend further to a student summarizing the summary they read to someone else, at which point all the information to be gained from the original study is effectively gone.

The key point to take away from this is that textbooks (indeed, secondhand sources in general) should certainly not be used as an end-point for research; they should be used as a tentative beginning to help track down primary literature. However, that primary literature is not always to be taken at face value. Even assuming the original study was well-designed and interpreted properly, it would still only represent a single island of information in the academic ocean. Obtaining true and useful information from that ocean takes time and effort which, unfortunately, you often cannot trust others to do on your behalf. To truly understand the literature, you need to dive into it yourself.

References: Nelson, T. & Leonesio, R. (1988). Allocation of self-paced study time and the “Labor-in-Vain Effect”. Journal of Experimental Psychology, 14, 676-686.