My Father Was A Gambling Man

And if you think I stole that title from a popular song, you’re very wrong

Hawaii recently introduced some bills aimed at prohibiting the sale of games with for-purchase loot boxes to anyone under 21. For those not already in the know concerning the world of gaming, loot boxes are effectively semi-random grab bags of items within video games. These loot boxes are usually received by players either as a reward for achieving something within a game (such as leveling up) and/or can be purchased with currency, be that in-game currency or real world money. Specifically, then, the bills in question are aimed at games that sell loot boxes for real money, attempting to keep them out of the hands of people under 21.

Just like tobacco companies aren’t permitted to advertise to minors out of fear that children will come to find smoking an interesting prospect, the fear here is that children who play games with loot boxes might develop a taste for gambling they otherwise wouldn’t have. At least that’s the most common explicit reason for this proposal. The gaming community seems to be somewhat torn about the issue: some gamers welcome the idea of government regulation of loot boxes while others are skeptical of government involvement in games. In the interest of full disclosure for potential bias – as a long-time gamer and professional loner – I consider myself to be a part of the latter camp.

My hope today is to explore this debate in greater detail. There are lots of questions I’m going to discuss, including (a) whether loot boxes are gambling, (b) why gamers might oppose this legislation, (c) why gamers might support it, (d) what other concerns might be driving the acceptance of regulation within this domain, and (e) talk about whether this kind of random mechanics actually make for better games.

Lets begin our investigation in gaming’s seedy underbelly

To set the stage, a loot box is just what it sounds like: a package randomized of in-game items (loot) which are earned by playing the game or purchased. In my opinion, loot boxes are gambling-adjacent types of things, but not bone-fide gambling. The prototypical example of gambling is along the lines of a slot machine. You put money into it and have no idea what you’re going to get out. You could get nothing (most of the time), a small prize (a few of the times), or a large prize (almost never). Loot boxes share some of those features – the paying money for randomized outcomes – but they don’t share others: first, with loot boxes there isn’t a “winning” and “losing” outcome in the same way there is with a slot machine. If you purchase a loot box, you should have some general sense as to what you’re buying; say, 5 items with varying rarities. It’s not like you sometimes open a loot box and there are no items, other times there are 5, and other times there are 20 (though more on that in a moment). The number of items you receive is usually set even if the contents are random. More to the point, the items you “receive” you often don’t even own; not in the true sense. If the game servers get shut down or you violate terms of service, for instance, your account with the items get deleted and they disappear from existence and you don’t get to sue someone for stealing from you. There is also no formal cashing out of many of these games. In that sense, there is less of a gamble in loot boxes than what we traditionally consider gambling.

Importantly, the value of these items is debatable. Usually players really want to open some items and don’t care about others. In that sense, it’s quite possible to open a loot box and get nothing of value, as far as you’re concerned, while hitting jackpots in others. However, if that valuation is almost entirely subjective in nature, then it’s hard to say that not getting what you want is losing while getting what you do is winning, as that’s going to vary from person to person. What you are buying with loot boxes isn’t a chance at a specific item you want; it is a set number of random items from a pool of options. To put that into an incomplete but simple example, if you put money into a gumball machine and get a gumball, that’s not really a gamble and you didn’t really lose. It doesn’t become gambling, nor do you lose, if the gumballs are different colors/flavors and you wanted a blue one but got a green one.

One potential exception to the argument of equal value to this is when the items opened aren’t bound to the opener; that is, they can be traded or sold to other players. You don’t like your gumball flavor? Well, now you can trade your friend your gumball for theirs, or even buy their gumball from them. When this possibility exists, secondary markets pop up for the digital items where some can be sold for lots of real money while others are effectively worthless. Now, as far as the developers are concerned, all the items can have the same value, which makes it look less like gambling; it’s the secondary market that makes it look more like gambling, but the game developers aren’t in control of that.

Kind of like these old things

An almost-perfect metaphor for this can be found in the sale of Baseball cards (which I bought when I was younger, though I don’t remember what the appeal was): packs containing a set number of cards – let’s say 10 – are purchased for a set price – say $5 – but the contents of those packs is randomized. The value of any single card, from the perspective of the company making them, is 1/10 the cost of the pack. However, some people value specific cards more than others; a rookie card of a great player is more desired than the card for a veteran who never achieved anything. In such cases, a secondary market crops up among those who collect the cards, and those collectors are willing to pay a premium for the desired items. One card might sell for $50 (worth 10-times the price of a pack), while another might be unable to find a buyer at all, effectively worth $0.

This analogy, of course, raises other questions about the potential legality of existing physical items, like sports cards, or those belonging to any trading card game (like Magic: The Gathering, Pokemon, or Yugioh). If digital loot boxes are considered a form of gambling and might have effects worth protecting children from, then their physical counterparts likely pose the same risks. If anything, the physical versions look more like gambling because at least some digital items cannot be traded or sold between players, while all physical items pose that risk of developing real value on a secondary market. Imagine putting money into a slot machine, hitting the jackpot, and then getting nothing out of it. That’s what many virtual items amount to.

Banning the sale of loot boxes in gaming from people under the age of 21 likely also entails the banning of card packs from them as well. While the words “slippery slope” are usually used together with the word “fallacy,” there does seem to be a very legitimate slope here worth appreciating. The parallels between loot boxes and physical packs of cards are almost perfect (and, where they differ, card packs look more like gambling; not less). Strangely, I’ve seen very few voices in the gaming community suggesting that the sale of packs of cards should be banned from minors; some do (mostly for consistency sake; they don’t raise the issue independently of the digital loot box issue almost ever as far as I’ve seen), but most don’t seem concerned with the matter. The bill being introduced in Hawaii doesn’t seem to mention baseball or trading cards anywhere either (unless I missed it), which would be a strange omission. I’ll return to this point later when we get to talking about the motives behind the approval of government regulation in the digital realm coming from gamers.

The first step towards addiction to that sweet cardboard crack

But, while we’re on the topic of slippery slopes, let’s also consider another popular game mechanic that might also be worth examination: randomized item drops from in-game enemies. These aren’t items you purchase with money (at least not in game), but rather ones you purchase with time and effort. Let’s consider one of the more well-known games to use this: WoW (World of Warcraft). In WoW, when you kill enemies with your character, you may receive valued items from their corpse as you loot the bodies. The items are not found in a uniform fashion: some are very common and other quite rare. I’ve watched a streamer kill the same boss dozens of times over the course of several weeks hoping to finally get a particular item to drop. There are many moments of disappointment and discouragement, complete with feelings of wasted time, after many attempts are met with no reward. But when the item finally does drop? There is a moment of elation and celebration, complete with a chatroom full of cheering viewers. If you could only see the emotional reaction of the people to getting their reward and not their surroundings, my guess is that you’d have a hard time differentiating a gamer getting a rare drop they wanted from someone opening the desired item out of a loot box for which they paid money.

What I’m not saying is that I feel random loot drops in World of Warcraft are gambling; what I am saying is that if one is concerned about the effects loot boxes might have on people when it comes to gambling, they share enough in common with randomized loot drops that the latter are worth examining seriously as well. Perhaps it is the case that the item a player is after has a fundamentally different psychological effect on them if chances at obtaining it are purchased with real money, in-game currency, or play time. Then again, perhaps there is no meaningful difference; it’s not hard to find stories of gamers who spent more time than is reasonable trying to obtain rare in-game items to the point that it could easily be labeled an addiction. Whether buying items with money or time have different effects is a matter that would need to be settled empirically. But what if they were fundamentally similar in terms of their effects on the players? If you’re going to ban loot boxes sold with cash under the fear of the impact they have on children’s propensity to gamble or develop a problem, you might also end up with a good justification for banning randomized loot drops in games like World of Warcraft as well, since both resemble pulling the lever of a slot machine in enough meaningful ways.

Despite that, I’ve seen very few people in the pro-regulation camp raise the concern about the effects that World of Warcraft loot tables are having on children. Maybe it’s because they haven’t thought about it yet, but that seems doubtful, as the matter has been brought up and hasn’t been met with any concern. Maybe it’s because they view the costs of paying real money for items as more damaging than paying with time. Either way, it seems that even after thinking about it, those who favor regulation of loot boxes largely don’t seem to care as much about card games, and even less about randomized loot tables. This suggests there are other variables beyond the presence of gambling-like mechanics underlying their views.

“Alright; children can buy some lottery tickets, but only the cheap ones”

But let’s talk a little more about the fear of harming children in general. Not that long ago there were examination of other aspects of video games: specifically, the component of violence often found and depicted within them. Indeed, research into the topic is still a thing today. The fear sounded like a plausible one to many: if violence is depicted within these games – especially within the context of achieving something positive, like winning by killing the opposing team’s characters – those who play the games might become desensitized to violence or come to think it acceptable. In turn, they would behave more violently themselves and be less interested in alleviating violence directed against others. This fear was especially pronounced when it came to children who were still developing psychologically and potentially more influenced by the depictions of violence.

Now, as it turns out, those fears appear to be largely unfounded. Violence has not been increasing as younger children have been playing increasingly violent video games more frequently. The apparent risk factor for increasing aggressive behavior (at least temporarily; not chronically) was losing at the game or finding it frustrating to play (such as when the controls feel difficult to use). The violent content per se didn’t seem to be doing much causing when it came later violence. While players who are more habitually aggressive might prefer somewhat different games than those who are not, that doesn’t mean the games are causing them to be violent.

This gives us something of a precedent for worrying about the face-validity of the claims that loot boxes are liable to make gambling seem more appealing on a long-term scale. It is possible that the concern over loot boxes represents more of a moral panic on the part of the legislatures, rather than a real issue having a harmful impact. Children who are OK with ripping an opponent’s head off in a video game are unlikely to be OK with killing someone for real, and violence in video games doesn’t seem to make the killing seem more appealing. It might similarly be the case that opening loot boxes makes people no more likely to want to gamble in other domains. Again, this is an empirical matter that requires good evidence to prove the connection (and I emphasize the word good because there exists plenty of low-quality evidence that has been used to support the connection between violence in video games causing it in real life).

Video games inspire cosplay; not violence

If it’s not clear at this point, I believe the reasons that some portion of the gaming community supports this type of regulation has little to nothing to do with their concerns about children gambling. For the most part, children do not have access to credit cards and so cannot themselves buy lots of loot boxes, nor do they have access to lots of cash they can funnel into online gift cards. As such, I suspect that very few children do serious harm to themselves or their financial future when it comes to buying loot boxes. The ostensible concern for children is more of a plausible-sounding justification than one actually doing most of the metaphorical cart-pulling. Instead, I believe the concern over loot boxes (at least among gamers) is driven by two more mundane concerns.

The first of these is simply the perceived cost of a “full” game. There has long been a growing discontent in the gaming community over DLC (downloadable content), where new pieces of content are added to a game after release for a fee. While that might seem like the simple purchase of an expansion pack (which is not a big deal), the discontent arises were a developer is perceived to have made a “full” game already, but then cut sections out of it purposefully to sell later as “additional” content. To place that into an example, you could have a fighting game that was released with 8 characters. However, the game became wildly popular, resulting in the developers later putting together 4 new characters and selling them because demand was that high. Alternatively, you could have a developer that created 12 characters up front, but only made 8 available in the game to begin with, knowingly saving the other 4 to sell later when they could have just as easily been released in the original. In that case, intent matters.

Loot boxes do something similar psychologically at times. When people go to the store and pay $60 for a game, then take it home to find out the game wants them to pay $10 or more (sometimes a lot more) to unlock parts of the game that already exist on the disk, that feels very dishonest. You thought you were purchasing a full game, but you didn’t exactly get it. What you got was more of an incomplete version. As games become increasingly likely to use these loot boxes (as they seem to be profitable), the true cost of games (having access to all the content) will go up.

Just kidding! It’s actually 20-times more expensive

Here is where the distinction between cosmetic and functional (pay-to-win) loot boxes arises. For those not in the know about this, the loot boxes that games sell vary in terms of their content. In some games, these items are nothing more than additional colorful outfits for your characters that have no effect on game play. In others, you can buy items that actually increase your odds of winning a game (items that make your character do more damage or automatically improve their aim). Many people who dislike loot boxes seem to be more OK (or even perfectly happy) with them so long as the items are only cosmetic. So long as they can win the game as effectively spending $0 as they could spending $1000, they feel that they own the full version. When it feels like the game you bought gives an advantage to players who spent more money on it, it again feels like the copy of the game you bought isn’t the same version as theirs; that it’s not as complete an experience.

Another distinction arises here in that I’ve noticed gamers seem more OK with loot boxes in games that are Free-to-Play. These are games that cost nothing to download, but much of their content is locked up-front. To unlock content, you usually invest time or money. In such cases, the feeling of being lied to about the cost of the game don’t really exist. Even if such free games are ultimately more expensive than traditional ones if you want to unlock everything (often much more expensive if you want to do so quickly), the actual cost of the game was $0. You were not lied to about that much and anything else you spent afterwards was completely voluntary. Here the loot boxes look more like a part of the game than an add-on to it. Now this isn’t to say that some people don’t dislike loot boxes even in free-to-play games; just that they mind them less.

“Comparatively, it’s not that bad”

The second, related concern, then, is that developers might be making design decisions that ultimately make games worse to try and sell more loot boxes. To put that in perspective, there are some cases of win/win scenarios, like when a developer tries to sell loot boxes by making a game that’s so good people enjoy spending money on additional content to show off how much they like it. Effectively, people are OK with paying for quality. Here, the developer gets more money and the players get a great game. But what happens when there is a conflict? A decision needs to be made that will either (a) make the game play experience better but sell fewer loot boxes, or (b) make the game play experience worse, but sell more loot boxes? However frequently these decisions needs to be made, they assuredly are made at some points.

To use a recent example, many of the rare items in the game Destiny 2 were found within an in-game store called Eververse. Rather than unlocking rare items through months of completing game content over and over again (like in Destiny 1), many of these rare, cosmetic items were found only within Eververse. You could unlock them with time, in theory, but only at very slow rates (which were found to actually be intentionally slowed down by the developers if a player put too much time into the game). In practice, the only way to unlock these rare items was through spending money. So, rather than put interesting and desirable content into the game as a reward for being good at it or committed to it, it was largely walled off behind a store. This was a major problem for people’s motivation to continue playing the game, but it traded off against people’s willingness to spend money on the game. These conflicts created a worse experience for a great many players. It also yielded the term “spend-game content” to replace “end-game content.” More loot boxes in games potentially means more decisions like that will be made where reasons to play the game are replaced with reasons to spend money.

Another such system was discussed in regards to a potential patent by Electronic Arts (EA), though as far as I’m aware it has not made its way into a real game yet. This system revolved around online, multiplayer games with items available for purchase. The system would be designed such that players who spent money on some particular item would be intentionally matched against players of lower skill. As the lower-skill players would be easier for the buyer to beat with their new items, it would make the purchaser feel like their decision to buy was worth it. By contrast, the lower-level player might become impressed by how good the player with the purchased item performed and feel they would become better at the game if they too purchased it. While this might encourage players to buy in-game items, it would yield an ultimately less-competitive and interesting matchmaking system. While such systems are indeed bad for the game play experience, it is at least worth noting that such a system would work if the items were being sold came from loot boxes or were directly purchased.

“Buy the golden king now to get matched against total scrubs!”

If I’m right and the reasons gamers who favor regulation center around the cost and design direction of games, why not just say that instead of talking about children and gambling? Because, frankly, it’s not very persuasive. It’s too selfish of a concern to rally much social support. It would be silly for me to say, “I want to see loot boxes regulated out of games because I don’t want to spend money on them and think they make for worse gaming experiences for me.” People would just tell me to either not buy loot boxes or not buy games with loot boxes. Since both suggestions are reasonable and I can do them already, the need for regulation isn’t there.

Now if I decide to vote with my wallet and not buy games with loot boxes, that won’t have any impact on the industry. My personal impact is too small. So long as enough other people buy those games, they will continue to be produced and my enjoyment of the games will be decreased because of the aforementioned cost and design issues. What I need to do, then, is convince enough people to follow my lead and not buy these games either. It wouldn’t be until enough gamers aren’t buying the games that there would be incentives for developers to abandon that model. One reason to talk about children, then, is because you don’t trust that the market will swing in your favor. Rather than allow the market to decide feely, you can say that children are incapable of making good choices and are being actively harmed. This will rally more support to tip the scales of that market in your favor by forcing government intervention. If you don’t trust enough people will vote with their wallet like you do, make it illegal for younger gamers to be allowed to vote in any other way.

A real concern about children, then, might not be that they will come to view gambling as normal, but rather that they will come to view loot boxes (or other forms of added content, like dishonest DLC) in games as normal. They will accept that games often have loot boxes and they will not be deterred from buying titles that include them. That means more consumers now and in the future who are willing to tolerate or purchase loot boxes/DLC. That means fewer games without them which, in turn, means fewer options available to those voting with their wallets and not buying them. Children and gambling are brought up not because they are the gamer’s primary target of concern, but rather because they’re useful for a strategic end.

Of course, there are real issues when it comes to children and these microtransactions: they don’t tend to make great decisions, sometimes get access to the parent’s credit card information and then go on insane spending sprees in their games. This type of family fraud has been the subject of previous legal disputes, but it is important to note that this is not a loot box issue per se. Children will just as happily waste their parents money on known quantities of in-game resources as they would on loot boxes. It’s also something more a matter of parental responsibilities and creating purchasing verification than it is the heart of the matter at hand. Even if children do occasionally make lots of unauthorized purchases, I don’t think major game companies are counting on that as an intended source of vital revenue.

They start ballin’ out so young these days

For what it’s worth, I think loot boxes do run certain risks for the industry, as outlined above. They can make games costlier than they need to be and they can result in design decisions I find unpleasant. In many regards I’m not a fan of them. I just happen to think that (a) they aren’t gambling and (b) don’t require government intervention to remove because they are harming children, persuading them that gambling is fun and leading to more of it in the future. I think any kinds of microtransactions – whether random or not – can result in the same kinds of harms, addiction, and reckless spending. However, when it comes to human psychology, I think loot boxes are designed more a tool to fit our psychology than one that shapes it, not unlike how water takes the shape of the container it is in and not the other way around. As such, it is possible that some facets of loot boxes and other random item generation mechanics make players engage with the game in a way that yields more positive experiences, in addition to the costs they carry. If these gambling-like mechanics weren’t, in some sense, fun people would simply avoid games with them. 

For instance, having content that one is aiming to unlock can provide a very important motivation to continue playing a game, which is a big deal if you want your game to last and be interesting for a long time. My most recent example of this is Destiny 2 again. Though I didn’t play the first Destiny, I have a friend who did that told me about it. In that game, items randomly dropped, and they dropped with random perks. This means you could get several versions of the same item, but have them all be different. It gave you a reason and a motivation to be excited about getting the same item for the 100th time. This wasn’t the case in Destiny 2. In that game, when you got a gun, you got the gun. There was no need to try and get another version of it because that didn’t exist. So what happened when Destiny 2 removed the random rolls from items? The motivation for hardcore players to keep playing long-term largely dropped off a cliff. At least that’s what happened to me. The moment I got the last piece of gear I was trying to achieve, a sense of, “why am I playing?” washed over me almost instantly and I shut the game off. I haven’t touched it since. The same thing happen to me in Overwatch when I unlocked the last skin I was interested in at the time. Had all that content be available from the start, the turning-off point likely would have come much sooner. 

As another example, imagine a game like World of Warcraft, where a boss has a random chance to drop an amazing item. Say this chance is 1 in 500. Now imagine an alternative reality where this practice is banned because it’s deemed to be too much like gambling (not saying it will be; just imagine that it was). Now the item is obtained in the following way: whenever the boss is killed, it drops a token guaranteed. After you collect 500 of those tokens, you can hand them in and get the item as a reward. Do you think players would have a better time under that kind of gambling-like system, where each boss kill represents the metaphorical pull of a slot machine lever, or in the consistent condition? I don’t know the answer to that question offhand, but what I do know is that collecting 500 tokens sure sounds boring, and that’s coming from the person who values consistency, saving, and doesn’t enjoy traditional gambling. No one is going to make a compilation video of people reacting to finally collecting 500 items because all you’d have was another moment, just like the last 499 moments where the same thing happened. People would – and do – make compilation videos of streamers finally getting valuable or rare items, as such moments are more entertaining for views and players alike.

Diversity: A Follow-Up

My last post focused on the business case for demographic diversity. Summarizing briefly, an attempted replication of a paper claiming that companies with greater gender and racial diversity outperformed those with less diversity failed to reach the same conclusion. Instead, these measures of diversity were effectively unrelated to business performance once you controlled for a few variables. This should make plenty of intuitive sense, as demographic variables per se aren’t related to job performance. While they might prove to be rough proxies if you have no information (men or women might be better at tasks X or Y, for instance), once you can assess skills, competencies, and interests, the demographic variables cease to be good predictors of much else. Being a man or a woman, African or Chinese, does not itself make you competent or interested in any particular domain. Today, I wanted to tackle the matter of diversity itself on more of a philosophical level. With any luck, we might be able to understand some of the issues that can cloud discussions on the topic.

And if I’m unlucky, well…

Let’s start with the justifications for concerns with demographic diversity. As far as I’ve seen, there are two routes people take with this. The first – and perhaps most common – has been the moral justification for increasing diversity of race and gender in certain professions. The argument here is that certain groups of people have been historically denied access to particular positions, institutions, and roles, and so they need to be proactively included in such endeavors as a means of reparation to make up for past wrongs. While that’s an interesting discussion in its own right, I have not found many people who claim that, say, more women should be brought into a profession no matter the impact. That is, no one has said, “So what if bringing in more women would mess everything up? Bring them in anyway.” This brings us to the second justification for increasing demographic diversity that usually accompanies the first: the focus on the benefits of cognitive diversity. The general idea here is not only that people from all different groups will perform at least as well in such roles, but that having a wider mix of people from different demographic groups will actually result in benefits. The larger your metaphorical cognitive toolkit, the more likely you will successfully meet and overcome the challenges of the world. Kind of like having a Swiss Army knife with many different attachments, just with brains.

This idea is appealing on its face but, as we saw last time, diversity wasn’t found to yield any noticeable benefits. There are a few reasons why we might expect that outcome. The first is that cognitive diversity itself is not always going to be useful. If you’re on a camping trip and you need to saw through a piece of wood, the saw attachment on your Swiss Army knife would work well; the scissors, toothpick, and can opener will all prove ineffective at solving your problem. Even the non-serrated knife will prove inefficient at the task. The solutions to problems in the world are not general-purpose in nature. They require specialized equipment to solve. Expanding that metaphor into the cognitive domain, if you’re trying to extract bitumen from tar sands, you don’t want a team of cognitively diverse individuals including a history major, a psychology PhD, and a computer scientist, along with a middle-school student. Their diverse set of skills and knowledge won’t help you solve your problem. You might do better if you hired a cognitively non-diverse group of petroleum engineers.

This is why companies hiring for positions regularly list rather specific qualification requirements. They understand – as we all should – that cognitive diversity isn’t always (or even usually) useful when it comes to solving particular tasks efficiently. Cognitive specialization does that. Returning this point back to demographic diversity, the problem should be clear enough: whatever cognitive diversity exists between men and women, or between different racial groups, it needs to be task relevant in order for it to even potentially improve performance outcomes. Even if the differences are relevant, in order for diversity to improve outcomes, the different demographic groups in question need to complement the skill sets of the other. If, say, women are better at programming than men, then diversity of men and women wouldn’t improve programming outcomes; the non-diverse outcome of hiring women instead of men would.

Just like you don’t improve your track team’s relay time by including diverse species

Now it’s not impossible that such complementary cognitive demographic differences exist, at least in theory, even though the former restrictions are already onerous. However, the next question that arises is whether such cognitive differences would actually exist in practice by the time hiring decisions were made. There’s reason to expect they would not, as people do not specialize in skills or bodies of knowledge at random. While there might be an appreciable amount of cognitive diversity between groups like men and women, or between racial groups, in the entire population, (indeed, meaningful differences would need to exist in order for the beneficial diversity argument to make any sense in the first place) people do not get randomly sorted into groups like professions or college majors.

Most people probably aren’t that interested in art history, or computer science, or psychology, or math to the extent they would pursue it at the expense of everything else they could do. As such, the people who are sufficiently interested in psychology are probably more similar to one another than they are to people who major in engineering. Those who are interested in plumbing are likely more similar to other plumbers than they are to nurses.

As such, whatever differences exist between demographics on the population level may be reduced in part or in whole once people begin to self-select into different groups based on skills, interests, and aptitudes. Even if men and women possess some cognitive differences in general, male and female nurses, or psychologists, or engineers, might not differ in those same regards. The narrower the skill set you’re looking for when it comes to solving a task, the more similar we might expect people who possess those skills to be. Just to use my profession, psychologists might be more similar than non-psychologists; those with a PhD might be more similar than those with just a BA; those who do research may differ from those who enter into the clinical field, and so on.

I think these latter points are where a lot of people get tripped up when thinking about the possible benefits of demographic diversity to task performance. They notice appreciable and real differences between demographic groups on a number of cognitive dimensions, but fail to appreciate that these population differences might (a) not be large once enough self-selection by skills and interests has taken place, (b) not be particularly task relevant, and (c) might not be complementary.

Ironically, one of the larger benefits to cognitive diversity might be the kind that people typically want to see the least: the ability of differing perspectives to help check the personal biases we possess. As people become less reliant on those in their immediate vicinity and increasingly able to self-segregate into similar-thinking social and political groups around the world, they may begin to likewise pursue policies and ideas that are increasingly self-serving and less likely to benefit the population on the whole. Key assumptions may go unchallenged and the welfare of others may be taken into account less frequently, resulting in everyone being worse off. Groups like the Heterodox Academy have been set up to try and counteract this problem, though the extent of their success is debatable.

A noble attempt to hold back the oncoming flood all the same

Condensing this post a little, the basic idea is this: men and women (to use just one group), on average, are likely to show a greater degree of between-group cognitive diversity than are male and female computer science majors. Or male and female literature majors. Any group you can imagine. Once people are segregating themselves into different groups on the basis of shared abilities and interests, those within the groups should be much more similar to one another than you’d expect on the basis of their demographics. If much of the cognitive diversity between these groups is getting removed through self-selection, then there isn’t much reason to expect that demographic diversity within those groups will have as much of an effect one way or the other. If male and female programmers already know the same sets of skills and have fairly similar personalities, making those groups look more male or more female won’t have much of an overall effect on their performance.

For it to even be possible that such diversity might help, we need to grant that meaningful, task-relevant differences between demographic groups exist, are retained throughout a long process of self-selection, and that these differences complement each other, rather than one group being superior. Further, these differences would need to create more benefits than conflicts. While there might be plenty of cognitive diversity in, say, the US congress in terms of ideology, that doesn’t necessarily mean it helps people achieve useful outcomes all the time once you account for all the dispute-related costs and lack of shared goals. 

If qualified and interested individuals are being kept out of a profession simply because of their race or gender, that obviously carries costs and should be stopped. There would be many valuable resources going untapped. If, however, people left to their own devices are simply making choices they feel suit them better – creating some natural demographic imbalances – then just changing their representation in this field or that shouldn’t impact much.

Why Do We Roast The Ones We Love?

One very interesting behavior that humans tend to engage in is murder. While we’re far from the only species that does this (as there are some very real advantages to killing members of your species – even kin – at times), it does tend to garner quite a bit of attention, and understandably so. One very interesting piece of information about this interesting behavior concerns motives; why people kill. If you were to hazard a guess as to some of the most common motives for murder, what would you suggest? Infidelity is a good one, as is murder resulting from other deliberate crimes, like when a robbery is resisted or witnesses are killed to reduce the probability of detection. Another major factor that many might not guess is minor slights or disagreements, such as one person stepping on another person’s foot by accident, followed by an insult (“watch where you’re going, asshole!”), which is responded to with an additional insult, and things kind of get out of hand until someone is dead (Daly & Wilson, 1988). Understanding why seemingly minor slights get blown so far out of proportion is a worthwhile matter in its own right. The short-version of the answer as to why it happens is that one’s social status (especially if you’re a male) can be determined, in large part, by whether other people know they can push you around. If I know you will tolerate negative behavior without fighting back, I might be encouraged to take advantage of you in more extreme ways more often. If others see you tolerating insults, they too may exploit you, knowing you won’t fight back. On the other hand, if I know you will respond to even slight threats with violence, I have a good reason to avoid inflicting costs on you. The more dangerous you are, the more people will avoid harming you.

“Anyone else have something to say about my shirt?! Didn’t think so…”

This is an important foundation for understanding why another facet of human behavior is strange (and, accordingly, interesting): friends frequently insult each other in a manner intended to be cordial. This behavior is exemplified well by the popular Comedy Central Roasts, where a number of comedians will get together to  publicly make fun of each other and their guest of honor. If memory serves, the (unofficial?) motto of these events is, “We only roast the ones we love,” which is intended to capture the idea that these insults are not intended to burn bridges or truly cause harm. They are insults born of affection, playful in nature. This is an important distinction because, as the murder statistics help demonstrate, strangers often do not tolerate these kinds of insults. If I were to go up to someone I didn’t know well (or knew well as an enemy) and started insulting their drug habits, dead loved ones, or even something as simple as their choice of dress, I could reasonably expect anything from hurt feelings to a murder. This raises an interesting series of mysteries surrounding the matter of why the stranger might want to kill me but my friends will laugh, as well as when my friends might be inclined to kill me as well.

Insults can be spoken in two primary manners: seriously and in jest. In the former case, harm is intended, while in the latter it often isn’t. As many people can attest to, however, the line between serious and jesting insults is not always as clear as we’d like. Despite our best intentions, ill-phrased or poorly-timed jokes can do harm in much the same way that a serious insult can. This suggests that the nature of the insults is similar between the two contexts. As the function of a serious insult between strangers would seem to be to threaten or lower the insulted target’s status, this is likely the same function of an insult made in jest between friends, though the degree of intended threat is lower in those contexts. The closest analogy that comes to mind is the difference between a serious fight and a friendly tussle, where the combatants either are, or are not, trying to inflict serious harm on each other. Just like play fighting, however, things sometimes go too far and people do get hurt. I think joking insults between friends go much the same way.

This raises another worthwhile question: as friends usually have a vested interest in defending each other from outside threats and being helpful, why would they then risk threatening the well-being of their allies through such insults? It would be strange if they were all risk and reward, so it would be up to us to explain what that reward is. There are a few explanations that come to mind, all of which focus on one crucial facet of friendships: they are dynamic. While friendships can be – and often are – stable over time, who you are friends with in general as well as the degree of that friendship changes over time. Given that friendships are important social resources that do shift, it’s important that people have reliable ways of assessing the strength of these relationships. If you are not assessing these relationships now and again, you might come to believe that your social ties are stronger than they actually are, which can be a problem when you find yourself in need of social support and realize that you don’t have it. Better to assess what kind of support you have before you actually need it so you can tailor your behavior more appropriately.

“You guys got my back, right?….Guys?….”

Insults between friends can help serve this relationship-monitoring function. As insults – even the joking kind – carry the potential to inflict costs on their target, the willingness of an individual to tolerate the insult – to endure those costs – can serve as a credible signal for friendship quality. After all, if I’m willing to endure the costs of being insulted by you without responding aggressively in turn, this likely means I value your friendship more than I dislike the costs being inflicted. Indeed, if these insults did not carry costs, they would not be reliable indications of friendship strength. Anyone could tolerate behavior that didn’t inflict costs to maintain a friendship, but not everyone will tolerate behaviors that do. This yields another prediction: the degree of friendship strength can also be assessed by the degree of insults willing to be tolerated. In other words, the more it takes to “go too far” when it comes to insults, the closer and stronger the friendship between two individuals. Conversely, if you were to make a joke about your friend that they become incredibly incensed over, this might result in your reevaluating the strength of that bond: if you thought the bond was stronger than it was, you might either take steps to remedy the cost you just inflicted and make the friendship stronger (if you value the person highly) or perhaps spend less time investing in the relationship, even to the point of walking away from it entirely (if you do not).

Another possible related function of these insults could be to ensure that your friends don’t start to think too highly of themselves. As mentioned previously, friendships are dynamic things based, in part, on what each party can offer to the other. If one friend begins to see major changes to their life in a positive direction, the other friend may no longer be able to offer the same value they did previously. To put that in a simple example, if two friends have long been poor, but one suddenly gets a new, high-paying job, the new status that job affords will allow that person to make friends he likely could not before. Because the job makes them more valuable to others, others will now be more inclined to be their friend. If the lower-status friend wishes to retain their friendship with the newly-employed one, they might use these insults to potentially undermine the confidence of their friend in a subtle way. It’s an indirect way of trying to ensure the high-status friend doesn’t begin to think he’s too good for his old friends.

Such a strategy could be risky, though. If the lower-status party can no longer offer the same value to the higher-status one, relative to their new options, that might also not be the time to test the willingness of the higher-status one to tolerate insults. At the same time, times of change are also precisely when the value of reassessing relationship strength can be at its highest. There’s less of a risk of a person abandoning a friendship when nothing has changed, relative to when it has. In either case, the assessment and management of social relationships is likely the key for understanding the tolerance of insults from friends and intolerance of them from strangers.

“Enjoy your new job, sellout. You used to be cool”

This analysis can speak to another interesting facet of insults as well: they’re directed towards the speaker at times, referred to self-deprecating humor when done in jest (and just self-deprecation when not). It might seem strange that people would insult themselves, as it would act to directly threaten their own status. That people do so with some regularity suggests there might be some underlying logic to these self-directed insults as well. One possibility is that these insults do what was just discussed: signal that one doesn’t hold themselves in high esteem and, accordingly, signal that one isn’t “too good” to be your friend. This seems like a profitable place from which to understand self-depreciating jokes. When such insults directed towards the self are not made in jest, they likely carry additional implications as well, such as that expectations should be set lower (e.g., “I’m really not able to do that”) or that one is in need of additional investment, relative to the joking kind. 

References: Daly, M. & Wilson, M. (1988). Homicide. Aldine De Gruyter: NY.

To Meaningfully Talk About Gender

Let’s say I was to tell you I am a human male. While this sentence is short and simple, the amount of information you could glean from it is a potential goldmine, assuming you are starting from a position of near total ignorance about me. First, it provides you with my species identification. In the most general sense, that lets you know what types of organisms in the world I am capable of potentially reproducing with (to produce reproductively-viable offspring in turn). In addition to that rather concrete fact, you also learn about my likely preferences. Just as humans share a great deal of genes in common (which is why we can reproduce with one another), we also share a large number of general preferences and traits in common (as these are determined heavily by our genes). For instance, you likely learn that I enjoy the taste of fruit, that I make my way around the world on two feet, and that hair continuously grows from the top of my head but much more sparingly on the rest of my body, among many other things. While these probable traits might not hold true for me in particular – perhaps I am totally hairless/covered in hair, have no legs, and find fruit vile – they do hold for humans more generally, so you can make some fairly-educated guesses as to what I’m like in many regards even if you know nothing else about me as a person. It’s not a perfect system, but you’ll do better on average with this information than you would if you didn’t have it. To make the point crystal clear, imagine trying to figure out what kind of things I liked if you didn’t even know my species. 

Could be delicious or toxic, depending on my species. Choose carefully.

When you learn that I am a male, you learn something concrete about the sex chromosomes in my body: specifically, that I have an XY configuration and tend to produce particular types of gametes. In addition to that concrete fact, you also learn about my likely traits and preferences. Just as humans share a lot of traits in common, males tend to share more traits in common with each other than they do with females (and vice versa). For instance, you likely learn that the distribution of muscle mass in my upper body is more substantial than females, that I have a general willingness to relax my standards when it comes to casual sex, that I have a penis, and that I’m statistically more likely to murder you than a female (I’m also more likely to be murdered myself, for the record). Again, while these might not all hold true for me specifically, if you knew nothing else about me, you could still make some educated guesses as to what I enjoy and my probable behavior because of my group membership.

One general point I hope these examples illuminate is that, to talk meaningful about a topic, we need to have a clear sense for our terms. Once we know what the terms “human” and “male” mean, we can begin to learn a lot about what membership in those groups entail. We can learn quite a bit about deviations from those general commonalities as well. For instance, some people might have an XY set of chromosomes and no penis. This would pose a biological mystery to us, while someone having an XX set and no penis would pose much less of one. The ability to consistently apply a definition – even an arbitrary one – is the first step in being able to say something useful about a topic. Without clear boundary conditions on what we’re talking about, you can end up with people talking about entirely different concepts using the same term. This yields unproductive discussions and is something to be avoided if you’re looking to cut down on wasted time.

Speaking of unproductive discussions, I’ve seen a lot of metaphorical ink spilled over the concept of gender; a term that is supposed to be distinct from sex, yet is highly related to it. According to many of the sources one might consult, sex is supposed to refer to biological features (as above), while gender is supposed to refer, “…to either social roles based on the sex of the person (gender role) or personal identification of one’s own gender based on an internal awareness (gender identity).” I wanted to discuss the latter portion of that gender definition today: the one referring to people’s feelings about their gender. Specifically, I’ve been getting the growing sense that this definition is not particularly useful. In essence, I’m not sure it really refers to anything in particular and, accordingly, doesn’t help advance our understanding of much in the world. To understand why, let’s take a quick trip through some interesting current events. 

Some very colorful, current events…

In this recent controversy, a woman called Rachel Dolezal claimed her racial identity was black. The one complicating factor in her story is that she was born to white parents.  Again, there’s been a lot of metaphorical ink spilled over the issue (including the recent mudslinging directed at Rebecca Tuvel who published a paper on the matter), with most of the discussions seemingly unproductive and, from what I can gather, mean-spirited. What struck me when I was reading about the issue is how little of those discussions explicitly focused on what should have been the most important, first point: how are we defining our terms when it comes to race? Those who opposed Rachel’s claims to be black appear to fall back on some kind of implicit hereditary definition: that one or more of one’s parents need to be black in order to consider oneself a member of that group. That’s not a perfect definition as we need to then determine what makes a parent black, but it’s a start. Like the definition of sex I gave above, this concept of race references some specific feature of the world that determines one racial identity and I imagine it makes intuitive sense to most people. Crucially, this definition is immune to feelings. It doesn’t matter if one is happy, sad, indifferent, or anything else with respect to their ethnic heritage; it simply is what it is regardless of those feelings. In this line of thinking, Rachel is white regardless of how she feels about it, how she wears her hair, dresses, acts, or even whether we want to accept her identification as black and treat her accordingly (whatever that is supposed to entail). What she – or we – feel about her racial identity is a different matter than her heritage.

On the other side of the issue, there are people (notably Rachel herself) who think that what matters is how you feel when it comes to determining identity. If you feel black (i.e., your internal awareness tells you that you’re black), then you are black, regardless of biological factors or external appearances. This idea runs into some hard definitional issues, as above: what does it mean to feel black, and how is it distinguished from other ethnic feelings? In other words, when you tell me that you feel black, what am I supposed to learn about you? Currently, that’s a big blank in my mind. This definitional issue is doubly troubling in this case, however, because if one wants to say they are black because they feel black, then it seems one first needs to identify a preexisting group of black people to have any sense at all for what those group members feel like. However, if you can already identify who is and is not black from some other criteria, then it seems the feeling definition is out of place as you’d already have another definition for your term. In that case, one could just say they are white but feel like they’re black (again, whatever “feeling black” is supposed to mean). I suppose they could also say they are white and feel unusual for that group, too, without needing to claim they are a member of a different ethnic group.

The same problems, I feel, apply to the gender issue despite the differences between gender and race. Beginning with the feeling definition, the parallels are clear. If someone told me they feel like a woman, a few things have to be made clear for that statement to mean anything. First, I’d need to know what being a woman feels like. In order to know what being a woman feels like, I’d need to already have identified a group of women so the information could be gathered. This means I’d need to know who was a woman and who was not in advance of learning about their specific feelings. However, if I can do that – if I can already determine who is and is not a woman – then it seems I don’t need to identify them on the basis of their feelings; I would be doing so with some other criteria. Presumably, the most common criteria leveraged in such a situation would be sex: you’d go out and find a bunch of females and ask them about what it was like to be a woman. If those responses are to be meaningful, though, you need to consider “female” to equate to “woman” which, according to definitions I listed above, it does not. This leaves us in a bit of a catch-22: we need to identify women by how they feel, but we can’t say how they feel until we identify them. Tricky business indeed (even forgoing the matter of claims that there are other genders).

Just keep piling the issues on top of each other and hope that sorts it out

On the other hand, let’s say gender is defined by some objective criteria and is distinct from sex. So, someone might be a male because of their genetic makeup but fall under the category of “woman” because, say, their psychology has developed in a female-typical pattern for enough key traits. Perhaps enough of their metaphorical developmental dials have been turned towards the female portion. Now that’s just a hypothetical example, but it should demonstrate the following point well enough: regardless of whether the male in question wants to be identified as a female or not, it wouldn’t matter in terms of this definition. It might matter a whole bunch if you want to be polite and nice to them, but not for our definition. Once we had a sense for what dials – or how many of them – needed to be flipped to “female” and had a way of measuring that for a male to be considered a woman, one’s internal awareness seems to be besides the point.

While this definition helps us talk more meaningfully about gender, at least in principle, it also seems like the gender term is a little unnecessary. If we’re just using “man” as a synonym for “male” and “woman” as one for “female”, then the entire sex/gender distinction kind of falls apart, which defeats the whole purpose. You wouldn’t feel like a man; you’d feel like a male (whatever that feels like, and I say that as a male myself). Rather than calling our female-typical male a woman, we could also call him an atypical man.

The second issue with this idea nagging at me is that almost all traits do not run on a spectrum from male to female. Let’s consider traits with psychological sex differences, like depression or aggression. Since females are more likely to experience depression than males, we could consider experiencing depression as something that pushes one towards the “woman” end of the gender spectrum. However, when one feels depressed, they don’t feel like a woman; they feel sad and hopeless. When someone feels aggressive, they don’t feel like a man; they feel angry and violent. The same kind of logic can be applied to most other traits as well, including components of personality, risk-seeking, and so on. These don’t run on a spectrum between male/masculine and female/feminine, as it would make no sense to say that one has a feminine height.

If this still all sounds very confusing to you, then you’re on the same page as me. As far as I’ve seen, it is incredibly difficult for people to verbalize anything of a formal definition or set of standards that tells us who falls into one category or the other when it comes to gender. In the absence of such a standard, it seems profitable to just discard the terms and find something better – something more precise – to use instead.

Academic Perversion

As an instructor, I have made it my business to enact a unique kind of assessment policy for my students. Specifically, all tests are short-essay style and revisions are allowed after a grade has been received. This ensures that students always have some motivation to figure out what they got wrong and improve on it. In other words, I design my assessment to incentivize learning. From the standpoint of some abstract perspective on the value of education, this seems like a reasonable perspective to adopt (at least to me, though I haven’t heard any of my colleagues argue with the method). It’s also, for lack of a better word, a stupid thing for me to do, from a professional perspective. What I mean here is that – on the job market – my ability to get students to learn successfully is not exactly incentivized, or at least that’s the impression that others with more insight have passed on to me. Not only are people on hiring committees not particularly interested in how much time I’m willing to devote to my students learning (it’s not the first thing they look at, or even in the top 3, I think), but the time I do invest in this method of assessment is time I’m not spending doing other things they value, like seeking out grants or trying to publish as many papers as I can in the most prestigious outlets available.

“If you’re so smart, how come you aren’t rich?”

And my method of assessment does involve quite a bit of time. When each test takes about 5-10 minutes to grade and make comments on and you’re staring down a class of about 100 students, some quick math tells you that each round of grading will take up about 8 to 16 hours. By contrast, I could instead offer my students a multiple choice test which could be graded almost automatically, cutting my time investment down to mere minutes. Over the course of a semester, then, I could devote 24 to 48 hours to helping students learn (across three tests) or I could instead provide grades for them in about 15 minutes using other methods. As far as anyone on a hiring committee will be able to tell, those two options are effectively equivalent. Sure, one helps students learn better, but being good at getting students to learn isn’t exactly incentivized on a professional level. Those 24 to 48 hours could have instead been spent seeking out grant funding or writing papers and – importantly – that’s per 100 students; if you happen to be teaching three or more classes a semester, that number goes up.

These incentives don’t just extend to tests and grading, mind you. If hiring committees aren’t all that concerned with my student’s learning outcomes, that has implications as for how much time I should spend designing my lecture material as well. Let’s say I was faced with the task of having to teach my students about information I was not terribly familiar with, be that the topic of the class as a whole or a particular novel piece of information within that otherwise-familiar topic. I could take the time-consuming route and familiarize myself with the information first, tracking down relevant primary sources, reading them in depth, assessing their strengths and weaknesses, as well as search out follow-up research on the matter. I could also take the quick route and simply read the abstract/discussion section of the paper or just report on the summary of the research provided by textbook writers or publisher’s materials.

If your goal is prep about 12-weeks worth of lecture material, it’s quite clear which method saves the most time. If having well-researched courses full of information you’re an expert on isn’t properly incentivized, then why would we expect professors to take the latter path? Pride, perhaps – many professors want to be good at their job and helpful to their students – but it seems other incentives push against devoting time to quality education if one is looking to make themselves an attractive hire*. I’ve heard teaching referred to as a distraction by more than one instructor, hinting strongly as to where they perceive incentives exist.

The implications of these concerns about incentives extend beyond any personal frustrations I might have and they’re beginning to get a larger share of the spotlight. One of the more recent events highlighting this issue was dubbed the replication crisis, where many published findings did not show up again when independent research teams sought them out. This wasn’t some appreciable minority, either; in psychology it was well over 50% of them. There’s little doubt that a healthy part of this state of affairs owes its existence to researchers purposefully using questionable methods to find publishable results, but why would they do so in the first place? Why are they so motivated to find these results. Again, pride factors into the equation but, as is usually the case, another part of that answer revolves around the incentive structure of academia: if academics are judged, hired, promoted, and funded on their ability to publish results, then they are incentivized to publish as many of those results as they can, even if the results themselves aren’t particularly trustworthy (they’re also disincentivized from trying to publish negative results, in many instances, which causes other problems).

Incentives so perverse I’m sure they’re someone’s fetish

A new paper has been making the rounds discussing these incentives in academia (Edwards & Roy, 2017), which begins with a simple premise: academic researchers are humans. Like other humans, we tend respond to particular incentives. While the incentive structures within academia might have been created with good intentions in mind, there is always a looming threat from the law of unintended consequences. In this case, those unintended consequences as referred to as Goodhart’s Law, which can be expressed as such: “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes,” or, “when a measure becomes a target, it ceases to be a good measure.” In essence, this idea means that people will follow the letter of the law, rather than the spirit.

Putting that into an academic example, a university might want to hire intelligent and insightful professors. However, assessing intelligence and insight are difficult to do, so, rather than assess those traits, the university assesses proxy measures of them; something that tends to be associated with intelligence and insight, but is not itself either of those things. In this instance, it might be noticed that intelligent, insightful professors tend to publish more papers than their peers. Because the number of papers someone publishes is much easier to measure, the university simply measures that variable instead in determining who to hire and promote. While publication records are initially good predictors of performance, once they become the target of assessment, that correlation begins to decline. As publishing papers per se became the target behavior people are assessed on, they begin to maximize that variable rather than the thing it was intended to measure in the first place. Instead of publishing fewer quality papers full of insight, they publish many papers that do a worse job of helping us understand the world. 

In much the same vein, student grades on a standardized test might be a good measure of a teacher’s effectiveness; more effective teachers tend to produce students that learn more and subsequently do better on the test. However, if the poor teachers are then penalized and told to improve their performance or find a new job, the teachers might try to game the system. Now, instead of teaching their students about a subject in a holistic fashion that results in real learning, they just start teaching to the test. Rather than being taught, say, chemistry, students begin to get taught how to take a chemistry test, and the two are decidedly not the same thing. So long as teachers are only assessed on the grades of their students that take those tests, this is the incentive structure that ends up getting created.

Pictured: Not actual chemistry

Beyond just impacting the number of papers that academics might publish, a number of other potential unintended consequences of incentive structures are discussed. One of which involves measures of the quality of published work. We might expect that theoretically and empirically meaningful papers will receive more citations than weaker work. However, because the meaningfulness of a paper can’t be assessed directly, we look at proxy measures, like citation count (how often a paper is cited by other papers or authors). The consequence? People citing their own work more often and peer reviewers requesting their work be cited by people seeking to publish in the field. The number of pointless citations are inflated. There are also incentives for publishing in “good” or prestigious journals; those which are thought to preferentially publish meaningful work. Again, we can’t just assess how “good” a journal is, so we use other metrics, like how often papers from that journal are cited. The net result here is much the same, where journals would prefer to publish papers that cite papers they have previously published. Going a step further, when universities are ranked on certain metrics, they are incentivized to game those metrics or simply misreport them. Apparently a number of colleges have been caught just lying on that front to get their rankings up, while others can improve their rankings without really improving their institution. 

There are many such examples we might run though (and I recommend you check out the paper itself for just that reason), but the larger point I wanted to discuss was what all this means on a broader scale. To the extent that those who are more willing to cheat the system are rewarded for their behavior, those who are less willing to cheat will be crowded out, and there we have a real problem on our hands. For perspective, Fanelli (2009) reports that 2% of scientists admit to fabricating data and 10% report engaging in less overt, but still questionable practices, on average; he also reports that when asked about if they know of a case of their peers doing such things, those numbers are around 14% and 30%, respectively. While those numbers aren’t straightforward to interpret (it’s possible that some people cheat a lot, several people know of the same cases, or that one might be willing to cheat if the opportunity presented itself even if it hasn’t yet, for instance), they should be taken very seriously as a cause for concern.

(It’s also worth noting that Edwards & Roy misreport the Fanelli findings by citing his upper-bounds as if they were the average, making the problem of academic misconduct seem as bad a possible. This is likely just a mistake, but it highlights the possibility that mistakes likely follow the incentive structure as well; not just cheating. Just as researchers have incentives to overstate their own findings, they also have incentives to overstate the findings of others to help make their points convincingly)

Which is ironic for a paper complaining about incentives to overstate results

When it’s not just the case that a handful of bad apples within academia are contributing to a problem of, say, cheating with their data, but rather an appreciable minority of them are, this has the potential to have at least two major consequences. First, it can encourage more non-cheaters to become cheaters. If I were to observe my colleagues cheating the system and getting rewarded for it, I might be encouraged to cheat myself just to keep up when faced with (very) limited opportunities for jobs or funding. Parallels can be drawn to steroid use in sports, where those who do not initially want to use steroids might be encouraged to if enough of their competitors did.

The second consequence is that, as more people take part in that kind of culture, public faith in universities – and perhaps scientific research more generally – erodes. With eroding public faith comes reduced funding and increased skepticism towards research findings; both responses are justified (why would you fund researchers you can’t trust?) and worrying, as there are important problems that research can help solve, but only if people are willing to listen.    

*To be fair, it’s not that my ability as a teacher is entirely irrelevant to hiring committees; it’s that not only is this ability secondary to other concerns (i.e., my teaching ability might be looked at only after they narrow the search down by grant funding and publications), but my teaching ability itself isn’t actually assessed. What is assessed are my student evaluations and that is decidedly not the same thing.

References: Edwards, M. & Roy, S. (2017). Academic research in the 21st century: Maintaining scientific integrity in a climate of perverse incentives and hypercompetition. Environmental Engineering Science, 34, 51-61.

Fanelli, D. (2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PLoS One. 4, e5738

Courting Controversy

“He says true but unpopular things. If you can’t talk about problems, you can’t fix them.”

The above quote comes to us from an interview with Trump supporters. Regardless of what one thinks about Trump and the truth of what he says, that idea holds a powerful truth itself: the world we live in can be a complicated one, and if we want to figure out how to best solve the problems we face, we need to be able to talk about them openly; even if the topics are unpleasant or the ideas incorrect. That said, there are some topics that people tend to purposefully avoid talking about. Not because the topics themselves are in some way unimportant or uninteresting, but rather because the mere mention of them is not unlike the prodding of a landmine. They are taboo thoughts: things that are made difficult to even think without risking moral condemnation and social ostracism. As I’m no fan of taboos, I’m going to cross one of them today myself, but in order to talk about those topics with some degree of safety, one needs to begin by talking about other topics which are safe. I want to first talk about something that is not dangerous, and slowly ramp up the danger. As a fair warning, this does require that this post be a bit longer than usual, but I think it’s a necessary precaution. 

“You have my attention…and it’s gone”

Let’s start by talking about driving. Driving is a potentially dangerous task, as drivers are controlling heavy machinery traveling at speeds that regularly break 65 mph. The scope of that danger can be highlighted by estimates that put the odds of pedestrian death – were they to be struck by a moving vehicle – at around 85% at only 40 mph. Because driving can have adverse consequences for both the driver and those around them, we impose restrictions on who is allowed to drive what, where, when, and how. The goal we are trying to accomplish with these restrictions is to minimize harm while balancing benefits. After all, driving isn’t only risky; it’s also useful and something people want to do. So, how are we going to – ideally – determine who is allowed the ability to drive and who is not? The most common solution, I would think, is to determine what risks we are trying to minimize and then ensure that people are able to surpass some minimum threshold of demonstrated ability. Simply put, we want to know people are good drivers.

Let’s make that concrete. In order to safely operate a vehicle you need to be able: (a) see out of the windows, (b) know how to operate the car mechanically, (c) have the physical strength and size to operate the car, (d) understand the “rules of the road” and all associated traffic signals, (e) have adequate visual acuity to see the world you’ll be driving through, (f) possess adequate reaction time to be able to respond to the ever-changing road environment, and (g) possess the psychological restraint to not take excessive risks, such as traveling at unreasonably high speeds or cutting people off. This list is non-exhaustive, but it’s a reasonable place to start.

If you want to drive, then, you need to demonstrate that you can see out of the car while still being able to operate it. This would mean that those who are too small to accomplish both tasks at once – like young children or very short adults – shouldn’t be allowed to drive. Similarly, those who are physically large enough to see out of the windows but possess exceptionally poor eyesight should similarly be barred from driving, as we cannot trust they will respond appropriately. If they can see but not react in time, we don’t want them on the road either. If they can operate the car, can see, and know the rules but refuse to obey them and drive recklessly, we either don’t grant them a license or revoke it if they already have one.

In the service of assessing these skills we subject people to a number of tests: there are written tests that must be completed to determine knowledge of the rules of the road; there are visual tests; there are tests of driving ability. Once these tests are passed, they are still reviewed from time to time, and a buildup of infractions can yield to a revocation of driving privileges.

However, we do not test everyone for these abilities. All of these things that we want a driver’s license to reflect – like every human trait – need to develop over time. In other words, they tend to fall within some particular distribution – often a normal one – with respect to age. As such, younger drivers are thought to pose more risk than adult drivers along a number of these desired traits. For instance, while not every person who is 10 years old is too small to operate a vehicle, the large majority of them are. Similarly, your average 15-year-old might not appropriately understand the risks of reckless driving and avoid it as we would hope. Moreover, the benefits that these young individuals can obtain from driving are lower as well; it’s not common for 12-year-olds to need a car to commute to work.

Accordingly, we also set minimum age laws regarding when people can begin to be considered for driving privileges. These laws are not set because it is impossible that anyone below the specific age set by it might have need of a car and be able to operate it safely and responsibly, but rather a recognition that a small enough percentage of them can that it’s not really worth thinking about (in the case of two-year-olds, for instance, that percentage is 0, as none could physically operate the vehicle; in the case of 14-year-olds it’s non-zero, but judged to be sufficiently low all the same). There are even proposals floating around concerning something like a maximum driving age, as driving abilities appear to deteriorate appreciably in older populations. As such, it’s not that we’re concerned about the age per se of the drivers – we don’t just want anyone over the age of 18 on the road – but age is a still a good correlate of other abilities and allows us to save a lot of time in not having to assess every single individual for driving abilities from birth to death under every possible circumstance.

Don’t worry; he’s watched plenty of Fast & Furious movies

This brings us to first point of ramping up the controversy. Let’s talk a bit about drunk driving. We have laws against operating vehicles while drunk because of the effects that drinking has: reduced attention and reaction time, reduced inhibitions resulting in more reckless driving, and impaired ability to see or stay awake, all of which amount to a reduction in driving skill and increase potential for harmful accidents. Reasonable as these laws sound, imagine, if you would, two hypothetical drivers: the worst driver legally allowed to get behind a wheel, as well as the best driver. Sober, we should expect the former to pose a much greater risk to himself and others than the latter but, because they both pass the minimum threshold of ability, both are allowed to drive. It is possible, however, that the best driver’s abilities while he is drunk still exceed those of the worst driver’s while he is sober.

Can we recognize that exception to the spirit of the law against drunk driving without saying it is morally or legally acceptable for the best driver to drive drunk? I think we can. There are two reasons we might do so. The first is that we might say even if the spirit of the rule seems to be violated in this particular instance, the rule is still one that holds true more generally and should be enforced for everyone regardless. That is, sometimes the rule will make a mistake (in a manner of speaking), but it is right often enough that we tolerate the mistake. This seems perfectly reasonable, and is something we accept in other areas of life, like medicine. When we receive a diagnosis from a doctor, we accept that it might not be right 100% of the time, but (usually) believe it to be right often enough that we act as if it were true. Further, the law is efficient: it saves us the time and effort in testing every driver for their abilities under varying levels of intoxication. Since the consequences of making an error in this domain might outweigh the benefits of making a correct hit, we work on maximizing the extent to which we avoid errors. If such methods of testing driving ability were instantaneous and accurate, however, we might not need this law against drunk driving per se because we could just be looking at people’s ability, rather than blood alcohol content. 

The second argument you might make to uphold the drunk driving rule is to say that even if the best drunk driver is still better than the worst sober one, the best drunk driver is nevertheless a worse driver than he is while sober. As such, he would be imposing more risk on himself and others than he reasonably needs to, and should not be allowed to engage in the behavior because of that. This argument is a little weaker – as it sets up a double standard – but it could be defensible in the right context. So long as you’re explicit about it, driving laws could be set such that people need to pass a certain threshold of ability and need to be able to perform within a certain range of their maximum ability. This might do things like make driving while tired illegal, just like drunk driving. 

The larger point I hope to hit on here is the following, which I hope we all accept: there are sometimes exceptions (in spirit) to rules that generally hold true and are useful. It is usually the case that people below a certain minimum driving age shouldn’t be trusted with the privilege, but it’s not like something magical happens at that age where an ability appears fully-formed in their brain. People don’t entirely lack the ability to drive at 17.99 years old and possess it fully at 18.01 years. That’s just not how development works for any trait in any species. We can recognize that some young individuals possess exceptional driving abilities (at least for their age, if not in the absolute sense, like this 14-year-old NASCAR driver) without suggesting that we change the minimum age driving law or even grant those younger people the ability to drive yet. It’s also not the case (in principle) that every drunk driver is incapable of operating their vehicle at or above the prescribed threshold of minimum safety and competency. We can recognize those exceptional individuals as being unusual in ability while still believing that the rule against drunk driving should be enforced (even for them) and be fully supportive of it.

That said, 14-year-old drunk drivers are a recipe for disaster

Now let’s crank up the controversy meter further and talk about sex. Rather than talking about when we allow people to drive cars and under what circumstances, let’s talk about when we accept their ability to consent to have sex. Much like driving, sex can carry potential costs, including pregnancy, emotional harm, and the spread of STIs. Also like driving, sex tends to carry benefits, like physical pleasure, emotional satisfaction and, depending on your perspective, pregnancy. Further, much like driving, there are laws set for the minimum age at which someone can be said to legally consent to sex. These laws seem to be set in balancing the costs and benefits of the act; we do not trust the individuals below certain ages are capable of making responsible decisions about when to engage in the act, with whom, in what contexts, and so on. There is a real risk that younger individuals can be exploited by older ones in this realm. In other words, we want to ensure that people are at least at a reasonable point in their physical and psychological development that can allow them to make an informed choice. Much like driving (or signing contracts), we want people to possess a requisite level of skills before they are allowed to give consent for sex.

This is where the matter begins to get complicated because, as far as I have seen throughout discussions on the matter, people are less than clear about what skills or bodies of knowledge people should possess before they are allowed to engage in the act. While just about everyone appears to believe that people should possess a certain degree of psychological maturity, what that precisely means is not outlined. In this regard, consent is quite unlike driving: people do not need to obtain licenses to have sex (excepting some areas in which sex outside of marriage is not permitted) and do not need to demonstrate particular skills or knowledge. They simply need to reach a certain age. This is (sort of) like giving everyone over the age of, say, 16, a license to drive regardless of their abilities. This lack of clarity regarding what skills we want people to have is no doubt as least partially responsible for the greater variation in age of consent laws, relative to age of driving laws, across the globe.   

The matter of sex is complicated by a host of other factors, but the main issue is this: it is difficult for people to outline what psychological traits we need to have in order to be deemed capable of engaging in the behavior. For driving, this is less of a problem: pretty much everyone can agree on what skills and knowledge they want other drivers to have; for sex, concerns are much more strategic. Here’s a great for instance: one potential consequence (intended for some) to sex is pregnancy and children. Because sex can result in children and those children need to be cared for, some might suggest that people who cannot reasonably be expected to be able to provide well enough for said children should be barred from consenting to sex. This proposal is frequently invoked to justify the position that non-adults shouldn’t be able to consent to sex because they often do not have access to child-rearing resources. It’s an argument that has intuitive appeal, but it’s not applied consistently. That is, I don’t see many people suggesting that the age of consent should be lowered for rich individuals who could care for children, nor that people who fall below a certain poverty line be barred from having sex because they might not be able to care for any children it produced.

There are other arguments one might consider on that front as well: because the biological consequences of sex fall on men and women differently, might we actually hold different standards for men and women when considering whether they are allowed to engage in the behavior? That is, would it be OK for a 12-year-old boy to consent to sex with a 34-year-old woman because she can bear the costs of pregnancy, but not allow the same relationship when the sexes were reversed? Legally we have the answer: no, it’s not acceptable in either case. However, there are some who would suggest such the former relationship is actually acceptable. Even in the realm of law, it would seem, a sex-dependent standard has been upheld in the past. 

Sure hope that’s his mother…

This is clearly not an exhaustive list of questions regarding how age of consent laws might be set, but the point should be clear enough: without a clear standard about what capabilities one needs to possess to be able to engage in sex, we end up with rather unproductive discussions. Making things even trickier, sex is more of a strategic act than driving, yielding greater disagreements over the matter and inflamed passions. It is very difficult to make explicit what abilities we want people to demonstrate in order to be able to consent to sex and reach consensus on them for just this reason. Toss in the prospect of adults taking advantage of teenagers and you have all the makings of a subject people really don’t want to talk about. As such, we are sometimes left in a bit of an awkward spot when thinking about whether exceptions to the spirit of age of consent laws exist. Much like driving, we know that nothing magical happens to someone’s body and brain when they hit a certain age: development is a gradual process that, while exhibiting regularities, does not occur identically for all people. Some people will possess the abilities we’d like them to have before the age of consent; some people won’t possess those abilities even after it.

Importantly – and this is the main point I’ve been hoping to make – this does not mean we need to change or discard these laws. We can recognize that these laws do not fit every case like a glove while still behaving as if they do and intuitively judging them as being about right. Some 14-year-olds do possess the ability to drive, but they are not allowed to legally; some 14-year-olds possess whatever requisite abilities we hope those who consent to sex will have, but we still treat them as if they do not. At least in the US: in Canada, the age of consent is currently 16, up from 14 a few years ago, in some areas of Europe it is still 14, and in some areas of Mexico it can be lower than that.

“Don’t let that distract from their lovely architecture or beaches, though”

Understanding the variation in these intuitions both between countries, between individuals, and over time are interesting matters in their own right. However, there are some who worry about the consequences of even discussing the issue. That is, if we acknowledge that even a single individual is an exception to the general rule, we would be threatening the validity of the rule itself. Now I don’t think this is the case, as I have outlined above, but it is worth adding the following point to that concern: recognizing possible exceptions to the rule is an entirely different matter than the consequences of doing so. Even if there are negative consequences to discussing the matter, that doesn’t change the reality of the situation. If your argument requires that you fail to recognize parts of reality because it might upset people – or that your decree, from the get go, that certain topics cannot be discussed – then your argument should be refined.

There is a fair bit of danger in accepting these taboos: while it might seem all well and good when the taboo is directed against a topic you feel shouldn’t be discussed, a realization needs to be made that your group is not always going to be in charge of what topics fall under that umbrella, and to accept it as legitimate when it benefits you is to accept it as legitimate when it hurts you as well. For instance, not wanting to talk about sex with children out of fear it would cause younger teens to become sexually active yielded the widely-ineffective abstinence-only sex education (and, as far as I can tell, talking comprehensive sex education does not result in worse outcomes, but I’m always open to evidence that it does). There is a real hunger in people to understand the world and to be able to voice what is on their mind; denying that comes with very real perils.

Intergenerational Epigenetics And You

Today I wanted to cover a theoretical matter I’ve discussed before but apparently not on this site: the idea of epigenetic intergenerational transmission. In brief, epigenetics refers to chemical markers attached to your DNA that regulate how it’s expressed and regulated without changing the DNA itself. You could imagine your DNA as a book full of information and each cell in your body contains the same book. However, not every cell expressed the full genome; each cell only expresses part of it (which is why skin cells are different from muscle cells, for instance). The epigenetic portion, then, could be thought of as black tape placed over certain passages in the books so they are not read. As this tape is added or removed by environmental influences, different portions of the DNA will become active. From what I understand about how this works (which is admittedly very little at this juncture), usually these markers are not passed onto offspring from parents. The life experiences of your parents, in other words, will not be passed onto you via epigenetics. However, there has been some talk lately of people hypothesizing that not only are these changes occasionally (perhaps regularly?) passed on from parents to offspring; the implication seems to be present that they also might be passed on in an adaptive fashion. In short, organisms might adapt to their environment not just through genetic factors, but also through epigenetic ones.  

Who would have guessed Lamarckian evolution was still alive?

One of the examples given in the target article on the subject concerns periods of feast and famine. While rare in most first-world nations these days, these events probably used to be more recurrent features of our evolutionary history. The example there involves the following context: during some years in early 1900 Sweden food was abundant, while during other years it was scarce. Boys who were hitting puberty just at the time of a feast season tended to have grandchildren who died six years earlier than the grandchildren of boys who have experienced famine season during the same developmental window. The causes of death, we are told, often involving diabetes. Another case involves the children of smokers: men who smoked right before puberty tended to have children who were fatter, on average, than fathers who smoked habitually but didn’t start until after puberty . The speculation, in this case, is that development was in some way affected in a permanent fashion by food availability (or smoking) during a critical window of development, and those developmental changes were passed onto their sons and the sons of their sons.

As I read about these examples, there were a few things that stuck out to me as rather strange. First, it seems odd that no mention was made of daughters or granddaughters in that case, whereas in the food example there wasn’t any mention of the in-between male generation (they only mentioned grandfathers and grandsons there; not fathers). Perhaps there’s more to the data that is let on there but – in the event that no effects were found for fathers or daughters or any kind – it is also possible that a single data set might have been sliced up into a number of different pieces until the researchers found something worth talking about (e.g., didn’t find an effect in general? Try breaking the data down by gender and testing again). Now that might or might not be the case here, but as we’ve learned from the replication troubles in psychology, one way of increasing your false-positive rate is to divide your sample into a number of different subgroups. For the sake of this post, I’m going to assume that is not the case and treat the data as representing something real, rather than a statistical fluke.   

Assuming this isn’t just a false-positive, there are two issues with the examples as I see them. I’m going to focus predominately on the food example to highlight these issues: first, passing on such epigenetic changes seems maladaptive and, second, the story behind it seems implausible. Let’s take the issues in turn.

To understand why this kind of inter-generational epigenetic transmission seems maladaptive, consider two hypothetical children born one year apart (in, say, the years 1900 and 1901). At the time the first child’s father was hitting puberty, there was a temporary famine taking place and food was scarce; at the time of the second child, the famine had passed and food was abundant. According to the logic laid out, we should expect that (a) both children will have their genetic expression altered due to the epigenetic markers passed down by their parents, affecting their long-term development, and (b) the children will, in turn, pass those markers on to their own children, and their children’s children (and so on).

The big Thanksgiving dinner that gave your grandson diabetes

The problems here should become apparent quickly enough. First, let’s begin by assuming these epigenetic changes are adaptive: they are passed on because they are reproductively useful at helping a child develop appropriately. Specifically, a famine or feast at or around the time of puberty would need to be a reliable cue as to the type of environments their children could expect to encounter. If a child is going to face shortages of food, they might want to develop in a different manner than if they’re expecting food to be abundant.

Now that sounds well and good, but in our example these two children were born just a year apart and, as such, should be expected to face (broadly) the same environment, at least with respect to food availability (since feast and famines tends to be more global). Clearly, if the children were adopting different developmental plans in response to that feast of famine, both of them (plan A affected by the famine and plan B not so affected) cannot be adaptive. Specifically, if this epigenetic inheritance is trying to anticipate children’s future conditions by those present around the time of their father’s puberty, at least one of the children’s developmental plans will be anticipating the wrong set of conditions. That said, both developmental plans could be wrong, and conditions could look different than either anticipated. Trying to anticipate the future conditions one will encounter over their lifespan (and over their children’s and grandchild’s lifespan) using only information from the brief window of time around puberty seems like a plan doomed for failure, or at least suboptimal results.

A second problem arises because these changes are hypothesized to be intergenerational: capable of transmission across multiple generations. If that is the case, why on Earth would the researchers in this study pay any mind to the conditions the grandparents were facing around the time of puberty per se? Shouldn’t we be more concerned with the conditions being faced a number of generations backs, rather than the more immediate ones? To phrase this in terms of a chicken/egg problem, shouldn’t the grandparents in question have inherited epigenetic markers of their own from their grandparents, and so on down the line? If that were the case, the conditions they were facing around their puberty would either be irrelevant (because they already inherited such markers from their own parents) or would have altered the epigenetic markers as well.

If we opt for the former possibility, than studying grandparent’s puberty conditions shouldn’t be too impactful. However, if we opt for the latter possibility, we are again left in a bit of a theoretical bind: if the conditions faced by the grandparents altered their epigenetic markers, shouldn’t those same markers also have been altered by the parent’s experiences, and their grandson’s experiences as well? If they are being altered by the environment each generation, then they are poor candidates for intergenerational transmission (just as DNA that was constantly mutating would be). There is our dilemma, then: if epigenetics change across one’s lifespan, they are unlikely candidates for transmission between generations; if epigenetic changes can be passed down across generations stably, why look at the specific period pre-puberty for grandparents? Shouldn’t we be concerned with their grandparents, and so on down the lines?

“Oh no you don’t; you’re not pinning this one all on me”

Now, to be clear, a famine around the time of conception could affect development in other, more mundane ways. If a child isn’t receiving adequate nutrition at the time they are growing, then it is likely certain parts of their developing body will not grow as they otherwise would. When you don’t have enough calories to support your full development, trade-offs need to be made, just like if you don’t have enough money to buy everything you want at the store you have to pass up on some items to afford others. Those kinds of developmental outcomes can certainly have downstream effects on future generations through behavior, but they don’t seem like the kind of changes that could be passed on the way genetic material can. The same can be said about the smoking example provided as well: people who smoked during critical developmental windows could do damage to their own development, which in turn impacts the quality of the offspring they produce, but that’s not like genetic transmission at all. It would be no more surprising than finding out that parents exposed to radioactive waste tend to have children of a different quality than those not so exposed.

To the extent that these intergenerational changes are real and not just statistical oddities, it doesn’t seem likely that they could be adaptive; they would instead likely reflect developmental errors. Basically, the matter comes down to the following question: are the environmental conditions surrounding a particular developmental window good indicators of future conditions to the point you’d want to not only focus your own development around them, but also the development of your children and their children in turn? To me, the answer seems like a resounding, ‘”No, and that seems like a prime example of developmental rigidity, rather than plasticity.” Such a plan would not allow offspring to meet the demands of their unique environments particularly well. I’m not hopeful that this kind of thinking will lead to any revolutions in evolutionary theory, but I’m always willing to be proven wrong if the right data comes up. 

What Might Research Ethics Teach Us About Effect Size?

Imagine for a moment that you’re in charge of overseeing medical research approval for ethical concerns. One day, a researcher approaches you with the following proposal: they are interested in testing whether a food stuff that some portion of the population occasionally consumes for fun is actually quite toxic, like spicy chilies. They think that eating even small doses of this compound will cause mental disturbances in the short term – like paranoia and suicidal thoughts – and might even cause those negative changes permanently in the long term. As such, they intend to test their hypothesis by bringing otherwise-healthy participants into the lab, providing them with a dose of the possibly-toxic compound (either just once or several times over the course of a few days), and then see if they observe any negative effects. What would your verdict on the ethical acceptability of this research be? If I had to guess, I suspect that many people would not allow the research to be conducted because one of the major tenants of research ethics is that harm should not befall your participants, except when absolutely necessary. In fact, I suspect that were you the researcher – rather than the person overseeing the research – you probably wouldn’t even propose the project in the first place because you might have some reservations about possibly poisoning people, either harming them directly and/or those around them indirectly.

“We’re curious if they make you a danger to yourself and others. Try some”

With that in mind, I want to examine a few other research hypotheses I have heard about over the years. The first of these is the idea that exposing men to pornography will cause a number of harmful consequences, such as increasing how appealing rape fantasies were, bolstering the belief that women would enjoy being raped, and decreasing the perceived seriousness of violence against women (as reviewed by Fisher et al, 2013). Presumably, the effect on those beliefs over time is serious as it might lead to real-life behavior on the part of men to rape women or approve of such acts on the parts of others. Other, less-serious harms have also been proposed, such as the possibility that exposure to pornography might have harmful effects on the viewer’s relationship, reducing their commitment, making it more likely that they would do things like cheat or abandon their partner. Now, if a researcher earnestly believed they would find such effects, that the effects would be appreciable in size to the point of being meaningful (i.e., are large enough to be reliably detected by statistical test in relatively small samples), and that their implications could be long-term in nature, could this researcher even ethically test such issues? Would it be ethically acceptable to bring people into the lab, randomly expose them to this kind of (in a manner of speaking) psychologically-toxic material, observe the negative effects, and then just let them go? 

Let’s move onto another hypothesis that I’ve been talking a lot about lately: the effects of violent media on real life aggression. Now I’ve been specifically talking about video game violence, but people have worried about violent themes in the context of TV, movies, comic books, and even music. Specifically, there are many researchers who believe that exposure to media violence will cause people to become more aggressive through making them perceive more hostility in the world, view violence as a more acceptable means of solving problems, or by making violence seem more rewarding. Again, presumably, changing these perceptions is thought to cause the harm of eventual, meaningful increases in real-life violence. Now, if a researcher earnestly believed they would find such effects, that the effects would be appreciable in size to the point of being meaningful, and that their implications could be long-term in nature, could this researcher even ethically test such issues? Would it be ethically acceptable to bring people into the lab, randomly expose them to this kind of (in a manner of speaking) psychologically-toxic material, observe the negative effects, and then just let them go?

Though I didn’t think much of it at first, the criticisms I read about the classic Bobo doll experiment are actually kind of interesting in this regard. In particular, researchers were purposefully exposing young children to models of aggression, the hope being that the children will come to view violence as acceptable and engage in it themselves. The reason I didn’t pay it much mind is that I didn’t view the experiment as causing any kind of meaningful, real-world, or lasting effects on the children’s aggression; I don’t think mere exposure to such behavior will have meaningful impacts. But if one truly believed that it would, I can see why that might cause some degree of ethical concerns. 

Since I’ve been talking about brief exposure, one might also worry about what would happen to researchers were to expose participants to such material – pornographic or violent – for weeks, months, or even years on end. Imagine a study that asked people to smoke for 20 years to test the negative effects in humans; probably not getting that past the IRB. As a worthy aside on that point, though, it’s worth noting that as pornography has become more widely available, rates of sexual offending have gone down (Fisher et al, 2013); as violent video games have become more available, rates of youth violent crime have done down too (Ferguson & Kilburn, 2010). Admittedly, it is possible that such declines would be even steeper if such media wasn’t in the picture, but the effects of this media – if they cause violence at all – are clearly not large enough to reverse those trends.

I would have been violent, but then this art convinced me otherwise

So what are we to make of the fact that these research was proposed, approved, and conducted? There are a few possibility to kick around. The first is that the research was proposed because the researchers themselves don’t give much thought to the ethical concerns, happy enough if it means they get a publication out of it regardless of the consequences, but that wouldn’t explain why it got approved by other bodies like IRBs. It is also possible that the researchers and those who approve it believe it to be harmful, but view the benefits to such research as outstripping the costs, working under the assumption that once the harmful effects are established, further regulation of such products might follow ultimately reducing the prevalence or use of such media (not unlike the warnings and restrictions placed on the sale of cigarettes). Since any declines in availability or censorship of such media have yet to manifest – especially given how access to the internet provides means for circumventing bans on the circulation of information – whatever practical benefits might have arisen from this research are hard to see (again, assuming that things like censorship would yield benefits at all) .

There is another aspect to consider as well: during discussions of this research outside of academia – such as on social media – I have not noted a great deal of outrage expressed by consumers of these findings. Anecdotal as this is, when people discuss such research, they do not appear to raising the concern that the research itself was unethical to conduct because it will doing harm to people’s relationships or women more generally (in the case of pornography), or because it will result in making people more violent and accepting of violence (in the video game studies). Perhaps those concerns exist en mass and I just haven’t seen them yet (always possible), but I see another possibility: people don’t really believe that the participants are being harmed in this case. People generally aren’t afraid that the participants in those experiments will dissolve their relationship or come to think rape is acceptable because they were exposed to pornography, or will get into fights because they played 20 minutes of a video game. In other words, they don’t think those negative effects are particularly large, if they even really believe they exist at all. While this point would be a rather implicit one, the lack of consistent moral outrage expressed over the ethics of this kind of research does speak to the matter of how serious these effects are perceived to be: at least in the short-term, not very. 

What I find very curious about these ideas – pornography causes rape, video games cause violence, and their ilk – is that they all seem to share a certain assumption: that people are effectively acted upon by information, placing human psychology in a distinctive passive role while information takes the active one. Indeed, in many respects, this kind of research strikes me as remarkably similar to the underlying assumptions of the research on stereotype threat: the idea that you can, say, make women worse at math by telling them men tend to do better at it. All of these theories seem to posit a very exploitable human psychology capable of being manipulated by information readily, rather than a psychology which interacts with, evaluates, and transforms the information it receives.

For instance, a psychology capable of distinguishing between reality and fantasy can play a video game without thinking it is being threatened physically, just like it can watch pornography (or, indeed, any videos) without actually believing the people depicted are present in the room with them. Now clearly some part of our psychology does treat pornography as an opportunity to mate (else there would be no sexual arousal generated in response to it), but that part does not necessarily govern other behaviors (generating arousal is biologically cheap; aggressing against someone else is not). The adaptive nature of a behavior depends on context.

Early hypotheses of the visual-arousal link were less successful empirically

As such, expecting something like a depiction to violence to translate consistently into some general perception that violence is acceptable and useful in all sorts of interactions throughout life is inappropriate. Learning that you can beat up someone weaker than you doesn’t mean it’s suddenly advisable to challenge someone stronger than you; relatedly, seeing a depiction of people who are not you (or your future opponent) fighting shouldn’t make it advisable for you to change your behavior either. Whatever the effects of this media, they will ultimately be assessed and manipulated internally by psychological mechanisms and tested against reality, rather than just accepted as useful and universally applied.  

I have seen similar thinking about information manipulating people another time as well: during discussions of memes. Memes are posited to be similar to infectious agents that will reproduce themselves at the expense of their host’s fitness; information that literally hijacks people’s minds for its own reproductive benefits. I haven’t seen much in the way of productive and successful research flowing from that school of thought quite yet – which might be a sign of its effectiveness and accuracy – but maybe I’m just still in the dark there. 

References: Ferguson, C. & Kilburn, J. (2010). Much ado about nothing: The misestimation and overinterpretation of violent video game effects in eastern and western nations: Comment on Anderson et al (2010). Psychological Bulletin, 136, 174-178.

Fisher, W., Kohut, T., Di Gioacchino, L., & Fedoroff , P. (2013). Pornography, sex crime, and paraphilia. Current Psychiatry Reports, 15, 362.

Getting To Know Your Outliers: More About Video Games

As I mentioned in my last post, I’m a big fan of games. For the last couple of years, the game which has held the majority of my attention has been a digital card game. In this game, people have the ability to design decks with different strategies, and the success of your strategy will depend on the strategy of your own opponent; you can think of it as a more complicated rock-paper-scissors component. The players in this game are often interested in understanding how well certain strategies match up against others, so, for the sake of figuring that out, some have taken it upon themselves to collect data from the players to answer those questions. You don’t need to know much about the game to understand the example I’m about to discuss, but let’s just consider two decks: deck A and deck B. Those collecting the data managed to aggregate the outcome of approximately 2,200 matches between the two and found that, overall, deck A was favored to win the match 55% of the time. This should be some pretty convincing data when it comes to getting a sense for how things generally worked out, given the large sample size.

Only about 466 more games to Legend with that win rate

However, this data will only be as useful to us as our ability to correctly interpret it. A 55% success rate captures the average performance, but there is at least one well-known outlier player within the game in that match. This individual manages to consistently perform at a substantially higher level than average, achieving wins in that same match up around 70-90% of the time across large sample sizes. What are we to make of that particular data point? How should it affect our interpretation of the match? One possible interpretation is that his massively positive success rate is simply due to variance and, given enough games, the win rate of that individual should be expected to drop. It hasn’t yet, as far as I know. Another possible explanation is that this player is particularly good, relative to his opponents, and that factor of general skill explains the difference. In much the same way, an absolutely weak 15-year-old might look pretty strong if you put him in a boxing match against a young child. However, the way the game is set up you can be assured that he will be matched against people of (relatively) equal skill, and that difference shouldn’t account for such a large disparity.

A third interpretation – one which I find more appealing, given my deep experience with the game – is that skill matters, but in a different way. Specifically, deck A is more difficult to play correctly than deck B; it’s just easier to make meaningful mistakes and you usually have a greater number of options available to you. As such, if you give two players of average skill decks A and B, you might observe the 55% win rate initially cited. On the other hand, if you give an expert player both decks (one who understands that match as well as possible), you might see something closer to the 80% figure. Expertise matters for one deck a lot more than the other. Depending on how you want to interpret the data, then, you’ll end up with two conclusions that are quite different: either the match is almost even, or the match is heavily lopsided. I bring this example up because it can tell us something very important about outliers: data points that are, in some way, quite unusual. Sometimes these data points can be flukes and worth disregarding if we want to learn about how relationships in the world tend to work; other times, however, these outliers can provide us valuable and novel insights that re-contextualize the way we look at vast swaths of other data points. It all hinges on the matter of why that outlier is one. 

This point bears on some reactions I received to the last post I wrote about a fairly-new study which finds no relationship between violent content in video games and subsequent measures of aggression once you account for the difficulty of a game (or, perhaps more precisely, the ability of a game to impede people’s feelings of competence). Glossing the results into a single sentence, the general finding is that the frustration induced by a game, but not violent content per se, is a predictor of short-term changes in aggression (the gaming community tends to agree with such a conclusion, for whatever that’s worth). In conducting this research, the authors hoped to address what they perceived to be a shortcoming in the literature: many previous studies had participants play either violent or non-violent games, but they usually achieved this method by having them play entirely different games. This means that while violent content did vary between conditions, so too could have a number of other factors, and the presence of those other factors poses some confounds in interpreting the data. Since more than violence varied, any subsequent changes in aggression are not necessarily attributable to violent content per se.

Other causes include being out $60 for a new controller

The study I wrote about, which found no effect of violence, stands in contrast to a somewhat older meta-analysis of the relationship between violent games and aggression. A meta-analysis – for those not in the know – is when a larger number of studies are examined jointly to better estimate the size of some effect. As any individual study only provides us with a snapshot of information and could be unreliable, it should be expected that a greater number of studies will provide us with a more accurate view of the world, just like running 50 participants through an experiment should give us a better sense than asking a single person or two. The results of some of those meta-analyses seem to settle on a pretty small relationship between violent video games and aggression/violence (approximately r = .15 to .20 for non-serious aggression, and about r = .04 for serious aggression depending on who you ask and what you look at; Anderson et a, 2010; Ferguson & Kilburn, 2010; Bushman et al, 2010), but there have been concerns raised about publication bias and the use of non-standardized measures of aggression.

Further, were there no publication bias to worry about, that does not mean the topic itself is being researched by people without biases, which can affect how data gets analyzed, research gets conducted, measures get created and interpreted, and so on. If r = .2 is about the best one can do with those degrees of freedom (in other words, assuming the people conducting such research are looking for the largest possible effect and develop their research accordingly), then it seems unlikely that this kind of effect is worth worrying too much about. As Ferguson & Kilburn (2010) note, youth violent crime rates have been steadily decreasing as the sale of violent games have been increasing (r = -.95; as well, the quality of that violence has improved over time; not just the quantity. Look at the violence in Doom over the years to get a better sense for that improvement). Now it’s true enough that the relationship between youth violent crime and violent video game sales is by no means a great examination of the relationship in question, but I do not doubt that if the relationship ran in the opposite direction (especially if were as large), many of the same people who disregard it as unimportant would never leave it alone.

Again, however, we run into that issue where our data is only as good as our ability to interpret it. We want to know why the meta-analysis turned up a positive (albeit small) relationship whereas the single paper did not turn up such a relationship, despite multiple chances to find it. Perhaps the paper I wrote about was simply a statistical fluke; for whatever reason, the samples recruited for those studies didn’t end up showing the effect of violent content, but the effect is still real in general (perhaps it’s just too small to be reliably detected). That seems to be the conclusion some responses I received contained. In fact, I had one commenter who cited the results of three different studies suggesting there was a casual link between violent content and aggression. However, when I dug up those studies and looked at the methods section, what I found was that, as I mentioned before, all of them had participants play entirely different games between violent and non-violent conditions. This messes with your ability to interpret the data only in light of violent content, because you are varying more than just violence (even if unintentionally). On the other hand, the paper I mentioned in my last post had participants playing the same game between conditions, just with content (like difficulty or violence levels) manipulated. As far as I can tell, then, the methods of the paper I discussed last week were superior, since they were able to control more, apparently-important factors.

This returns us to the card game example I raised initially: when people play a particular deck incorrectly, they find it is slightly favored to win; when someone plays it correctly they find it is massively favored. To turn that point to this analysis, when you conduct research that lacks the proper controls, you might find an effect; when you add those controls in, the effect vanishes. If one data point is an outlier because it reflects research done better than the others, you want to pay more attention to it. Now I’m not about to go digging through over 130 studies for the sake of a single post – I do have other things on my plate – but I wanted to make this point clear: if a meta-analysis contains 130 papers which all reflect the same basic confound, then looking at them together makes me no more convinced of their conclusion than looking at any of them alone (and given that the specific studies that were cited in response to my post all did contain that confound, I’ve seen no evidence inconsistent with that proposal yet). Repeating the same mistake a lot does not make it cease to be a mistake, and it doesn’t impress me concerning the weight of the evidence. The evidence acquired through weak methodologies is light indeed.  

Research: Making the same mistakes over and over again for similar results

So, in summation, you want to really get to know your data and understand why it looks the way it does before you draw much in the way of meaningful conclusions from it. A single outlier can potentially tell you more about what you want to know than lots of worse data points (in fact, it might not even be the case that poorly-interpreted data is recognized as such until contrary evidence rears its head). This isn’t always the case, but to write off any particular data point because it doesn’t conform to the rest of the average pattern – or to assume its value is equal to that of other points – isn’t always right either. Meeting your data, methods, and your measures is quite important for getting a sense for how to interpret it all. 

For instance, it has been proposed that – sure – the relationship between violent game content and aggression is small at best (there seems to be some heated debate over whether it’s closer to r = .1 or .2) but it could still be important because lots of small effects can add up over time into a big one. In other words, maybe you ought to be really wary of that guy who has been playing a violent game for an hour each night for the last three years. He could be about to snap at the slightest hint of a threat and harm you…at least to the extent that you’re afraid he might suggest you listen to loud noises or eat slightly more of something spicy; two methods used to assess “physical” aggression in this literature due to ethical limitations (despite the fact that, “Naturally, children (and adults) wishing to be aggressive do not chase after their targets with jars of hot sauce or headphones with which to administer bursts of white noise.” That small, r = .2 correlation I referenced before concerns behavior like that in a lab setting where experimental demand characteristics are almost surely present, suggesting the effect on aggressive behavior in naturalistic settings is likely overstated.)

Then again, in terms of meaningful impact, perhaps all those small effects weren’t really mounting to much. Indeed, the longitudinal research in this area seems to find the smallest effects (Anderson et al, 2010). To put that into what I think is a good example, imagine going to the gym. Listening to music helps many people work out, and the choice of music is relevant there. The type of music I would listen to when at the gym is not always the same kind I would listen to if I wanted to relax, or dance, or set a romantic mood. In fact, the music I listen to at the gym might even make me somewhat more aggressive in a manner of speaking (e.g., for an hour, aggressive thoughts might be more accessible to me while I listen than if I had no music, but that don’t actually lead to any meaningful changes in my violent behavior while at the gym or once I leave that anyone can observe). In that case, repeated exposure to this kind of aggressive music would not really make me any more aggressive in my day-to-day life than you’d expect overtime.

Thankfully, these warnings managed to save people from dangerous music

That’s not to say that media has no impact on people whatsoever: I fully suspect that people watching a horror movie probably feel more afraid than they otherwise would; I also suspect someone who just watched an action movie might have some violent fantasies in their head. However, I also suspect such changes are rather specific and of a short duration: watching that horror movie might increase someone’s fear of being eaten by zombies or ability to be startled, but not their fear of dying from the flu or their probability of being scared next week; that action movie might make someone think about attacking an enemy military base in the jungle with two machine guns, but it probably won’t increase their interest in kicking a puppy for fun, or lead to them fighting with their boss next month. These effects might push some feelings around in the very short term, but they’re not going to have lasting and general effects. As I said at the beginning of last week, things like violence are strategic acts, and it doesn’t seem plausible that violent media (like, say, comic books) will make them any more advisable.

References: Anderson, C. et al. (2010). Violent video game effects on aggression, empathy, and prosocial behavior in eastern and western counties: A meta-analytic review. Psychological Bulletin, 136, 151-173.

Bushman, B., Rothstein, H., & Anderson, C. (2010). Much ado about something: Violent video game effects and school of red herring: Reply to Ferguson & KIlburn (2010). Psychological Bulletin, 136, 182-187.

Elson, M. & Ferguson, C. (2013). Twenty-five years of research on violence in digital games and aggression: Empirical evidence, perspectives, and a debate gone astray. European Psychologist, 19, 33-46.

Ferguson, C. & Kilburn, J. (2010). Much ado about nothing: The misestimation and overinterpretation of violent video game effects in eastern and western nations: Comment on Anderson et al (2010). Psychological Bulletin, 136, 174-178.

The Fight Against Self-Improvement

In the abstract, most everyone wants to be the best version of themselves they can. More attractive bodies, developing and improving useful skills, a good education, achieving career success; who doesn’t want those things? In practice, lots of people, apparently. While people might like the idea of improving various parts of their life, self-improvement takes time, energy, dedication, and restraint; it involves doing things that might not be pleasant in the short-term with the hope that long-term rewards will follow. Those rewards are by no means guaranteed, though, either in terms of their happening at all or the degree to which they do. While people can usually improve various parts of their life, not everyone can achieve the levels of success they might prefer no matter how much time they devote to their crafts. All of those are common reasons people will sometimes avoid improving themselves (it’s difficult and contains opportunity costs), but they do not straightforwardly explain why people sometimes fight against others improving.

“How dare they try to make a better life for themselves!”

I was recently reading an article about the appeal of Trump and came across this passage concerning this fight against the self-improvement of others:

“Nearly everyone in my family who has achieved some financial success for themselves, from Mamaw to me, has been told that they’ve become “too big for their britches.”  I don’t think this value is all bad.  It forces us to stay grounded, reminds us that money and education are no substitute for common sense and humility. But, it does create a lot of pressure not to make a better life for yourself…”

At first blush, this seems like a rather strange idea: if people in your community – your friends and family – are struggling (or have yet to build a future for themselves), why would anyone object to the prospect of their achieving success and bettering their lot in life? Part of the answer is found a little further down:

“A lot of these [poor, struggling] people know nothing but judgment and condescension from those with financial and political power, and the thought of their children acquiring that same hostility is noxious.”

I wanted to explore this idea in a bit more depth to help explain why these feelings might rear their head when faced with the social or financial success of others, be they close or distant relations.

Understanding these feelings requires drawing on a concept my theory of morality leaned heavily on: association value. Association value refers to the abstract value that others in the social world have for each other; essentially, it asks the question, “how desirable of a friend would this person make for me (and vice versa)?” This value comes in two parts: first, there is the matter of how much value someone could add to your life. As an easy example, someone with a lot of money is more capable of adding value to your life than someone with less money; someone who is physically stronger tends to be able to provide benefits a weaker individual could not; the same goes for individuals who are more physically attractive or intelligent. It is for this reason that most people wish they could improve on some or all of these dimensions if doing so were possible and easy: you end up as a more desirable social asset to others.

The second part of that association value is a bit trickier, however, reflecting the crux of the problem: how willing someone is to add value to your life. Those who are unwilling to help me have a lower value than those willing to make the investment. Reliable friends are better than flaky ones, and charitable friends are better than stingy ones. As such, even if someone has a great potential value they could add to my life, they still might be unattractive as associates if they are not going to turn that potential into reality. An unachieved potential is effectively the same thing as having no potential value at all. Conversely, those who are very willing to add to my life but cannot actually do so in meaningful ways don’t make attractive options either. Simply put, eager but incompetent individuals wouldn’t make good hires for a job, but neither would competent yet absent ones.

“I could help you pay down your crippling debt. Won’t do it, though”

With this understanding of association value, there is only one piece left to add to equation: the zero-sum nature of friendship. Friendship is a relative term; it means that someone values me more than they value others. If someone is a better friend to me, it means they are a worse friend to others; they would value my welfare over the welfare of others and, if a choice had to be made, would aid me rather than someone else. Having friends is also useful in the adaptive sense of the word: they help provide access to desirable mates, protection, provisioning, and can even help you exploit others if you’re on the aggressive side of things. Putting all these pieces together, we end up with the following idea: people generally want access to the best friends possible. What makes a good friend is a combination of their ability and willingness to invest in you over others. However, their willingness to do so depends in turn on your association value to them: how willing and able you are to add things to their lives. If you aren’t able to help them out – now or in the future – why would they want to invest resources into benefiting you when they could instead put those resources into others who could?

Now we can finally return to the matter of self-improvement. By increasing your association value through various forms of self-improvement (e.g., making yourself more physically attractive and stronger through exercise, improving your income by moving forward in your career, learning new things, etc) you make yourself a more appealing friend to others. Crucially, this includes both existing friends and higher-status individuals who might not have been willing to invest in you prior to your ability to add value to their life materializing. In other words, as your value as an associate rises, unless the value of your existing associates rises in turn, it is quite possible that you can now do better than them socially, so to speak. If you have more appealing social prospects, then, you might begin to neglect or break-off existing contacts in favor of newer, more-profitable friendships or mates. It is likely that your existing contacts understand this – implicitly or otherwise – and might seek to discourage you from improving your life, or preemptively break-off contact with you if you do, under the assumptions you will do likewise to them in the future. After all, if you’re moving on eventually they would be better off building new connections sooner, rather than later. They don’t want to invest in failing relationships anymore than you do.

In turn, those who are thinking about self-improvement might actually decide against pursuing their goals not necessarily because they wouldn’t be able to achieve them, but because they’re afraid that their existing friends might abandon them, or even that they themselves might be the ones who do the abandoning. Ironically, improving yourself can sometimes make you look like a worse social prospect.

To put that in a simple example, we could consider the world of fitness. The classic trope of weak high-schooler being bullied by the strong, jock type has been ingrained in many stories in our culture. For those doing the bullying, their targets don’t offer them much socially (their association value to others is low, while the bully’s is high) and they are unable to effectively defend themselves, making exploitation appear as an attractive option. In turn, those who are the targets of this bullying are, in some sense, wary of adopting some of the self-improvement behaviors that the jocks engage in, such as working out, because they either don’t feel they can effectively compete against the jocks in that realm (e.g., they wouldn’t be able to get as strong, so why bother getting stronger) or because they worry that improving their association value by working out will lead to them adopting a similar pattern of behavior to those they already dislike, resulting in their losing value to their current friends (usually those of similar, but relatively-low association value). The movie Mean Girls is an example of this dynamic struggle in a different domain.

So many years later, and “Fetch” still never happened…

This line of thought has, as far as I can tell, also been leveraged (again, consciously or otherwise) by one brand within the fitness community: Planet Fitness. Last I heard an advertisement for their company on the radio, their slogan appeared to be, “we’re not a gym; we’re planet fitness.” An odd statement to be sure, because they are a gym, so what are we to make of it? Presumably that they are in some important respects different from their competition. How are they different from other gyms? The “About” section on their website lays their differences out in true, ironic form:

“Make yourself comfy. Because we’re Judgement Free…you deserve a little cred just for being here. We believe no one should ever feel Gymtimidated by Lunky behavior and that everyone should feel at ease in our gyms, no matter what his or her workout goals are…We’re fiercely protective of our Planet and the rights of our members to feel like they belong. So we create an environment where you can relax, go at your own pace and just do your own thing without ever having to worry about being judged.”

This marketing is fairly transparent pandering to those who currently do not feel they can compete with those who are very fit or are worried about becoming a “lunk” themselves (they even have an alarm in the gym designed to bet set off if someone is making too much noise while lifting, or wearing the wrong outfit). However, in doing so, they devalue those who are successful or passionate in their pursuits of self-improvement. While I have never seen a gym more obsessed with judging their would-be members than Planet Fitness, so long as that judgment is pointed at the right targets, they try to appeal (presumably effectively) to certain portions of the population untapped by other gyms. Planet Fitness wants to be your friend; not the friend of those jerks who make you feel bad.

There is value in not letting success go to one’s head; no one wants a fair-weather friend who will leave the moment it’s expedient. Such an attitude undermines loyalty. The converse, however, is that using that as an excuse to avoid (or condemn) self-improvement will make you and others worse-off in the long term. A better solution to this dilemma is to improve yourself so you can improve those who matter the most to you, hoping they reciprocate in turn (or improve together for even better success).