Men Are Better At Selling Things On eBay

When it comes to gender politics, never take the title of the piece at face value; or the conclusions for that matter.

In my last post, I mentioned how I find some phrases and topics act as red flags regarding the quality of research one is liable to encounter. Today, the topic is gender equality – specifically some perceived (and, indeed, some rather peculiar) discrimination against women – which is an area not renowned for its clear-thinking or reasonable conclusions. As usual, the news articles circulating this piece of research made some outlandish claim that lacks even remote face validity. In this case, the research in question concludes that people, collectively, try to figure out the gender of the people selling things on eBay so as to pay women substantially less than men for similar goods. Those who found such a conclusion agreeable to their personal biases spread it to others across social media as yet another example of how the world is an evil, unfair place. So here I am again, taking a couple recreational shots at some nonsense story of sexism.

Just two more of these posts and I get a free smoothie

The piece question today is an article from Kricheli-Katz & Regev (2016) that examined data from about 1.1 million eBay auctions. The stated goals of the authors involve examining gender inequality in online product markets, so at least we can be sure they’re going into this without an agenda. Kricheli-Katz & Regev (2016) open their piece by talking about how gender inequality is a big problem, launching their discussion almost immediately with a rehashing of that misleading 20% pay gap statistic that’s been floating around forever. As that claim has been dissected so many times at this point, there’s not much more to say about it other than (a) when controlling for important factors, it drops to single digits and (b) when you see it, it’s time to buckle in for what will surely be an unpleasant ideological experience. Thankfully, the paper does not disappoint in that regard, promptly suggesting that women are discriminated against in online markets like eBay.

So let’s start by considering what the authors did, and what they found. First, Kricheli-Katz & Regev (2016) present us with their analysis of eBay data. They restricted their research to auctions only, where sellers will post an item and any subsequent interaction occurs between bidders alone, rather than between bidders and sellers. On average, they found that the women had about 10 fewer months of experience than men, though the accounts of both sexes had existed for over nine years of age, and women also had very-slightly better reputations, as measured by customer feedback. Women also tended to set slightly higher initial prices than men for their auctions, controlling for the product being sold. As such, women also tended to receive slightly fewer bids on their items, and ultimately less money per sale when they ended.

However, when the interaction between sex and product type (new or used) was examined, the headline-grabbing result appeared: while women netted a mere 3% less on average for used products than men, they netted a more-impressive 20% less for new products (where, naturally, one expects products to be the same). Kricheli-Katz & Regev (2016) claim that the discrepancy in the new-product case are due to beliefs about gender. Whatever these unspecified beliefs are, they cause people to pay women about 20% less for the same item. Taking that idea on face value for a moment, why does that gap all but evaporate in the used category of sales? The authors attribute that lack of a real difference to an increased trust people have in women’s descriptions of the condition of their products. So men trust women more when it comes to used goods, but pay them less for new ones when trust is less relevant. Both these conclusions, as far as I can see from the paper, have been pulled directly out of thin air. There is literally no evidence presented to support them: no data; not citations; no anything.

I might have found the source of their interpretations

By this point, anyone familiar with how eBay works is likely a bit confused. After all, the sex of the seller is at no point readily apparent in almost any listings. Without that crucial piece of information, people would have a very difficult time discriminating on the basis of it. Never fear, though; Kricheli-Katz & Regev (2016) report the results of a second study where they pulled 100 random sellers from their sample and asked about 400 participants to try and determine the sex of sellers in question. Each participant offered their guesses about five profiles, for a total of 2000 attempts. About 55% of the time, participants got the sex right, 9% of the time they got it wrong, and the remaining 36% of the time, they said they didn’t know (which, since they don’t know, also means they got it wrong). In short, people couldn’t determine the sex reliably about half the time. The authors do mention that the guesses got better as participants viewed more items that the seller had posted, however.

So here’s the story they’re trying to sell: When people log onto eBay, they seek out a product they’re looking to buy. When they find a seller listing the product, they examine the seller’s username, the listing in question, and their other listings in their store to attempt and discern the sex of the seller. Buyers subsequently lower their willingness to pay for an item by quite a bit if they see it is being sold by a woman, but only if it’s new. In fact, since women made 20% less, the actual reduction in willingness to pay must be larger than that, as sex can only be determined about half of the time reliably when people are trying. Buyers do all this despite even trusting female sellers more. Also, I do want to emphasis the word they, as this would need to be a pretty collective action. If it wasn’t a fairly universal response among buyers, the prices of female-sold items would eventually even out with the male price, as those who discriminated less against women would be drawn towards the cheaper prices and bump them back up.

Not only do I not buy this story – not even a little – but I wouldn’t pay the authors less for it because they happen to be women if I was looking to make a purchase. While people might be able to determine the sex of the seller on eBay sometimes, when they’re specifically asked to do so, that does not mean people engage in this sort of behavior naturally.

Finally, Kricheli-Katz & Regev (2016) report the results of a third study, asking 100 participants how much they value a $100 gift card being sold by either an Alison or a Brad. Sure enough, people were willing to pay Alison less for the card: she got a mere $83 to Brad’s $87; a 5% difference. I’d say someone should call the presses, but it looks like they already did, judging from the coverage this piece has received. Now this looks like discrimination – because it is – but I don’t think it’s based on sex per se. I say that because, earlier in the paper, Kricheli-Katz & Regev (2016) also report that women as buyers on eBay, tended to pay about 3% more than men for comparable goods. To the extent that the $4 difference in valuation is meaningful here, there are two things to say about it. First, it may well represent the fact that women aren’t as willing to negotiate prices in their favor. Indeed, while women were 23% of the sellers on eBay, they only represented 16% of the auctions with a negotiation component. If that’s the case, people are likely willing to pay less to women because they perceive (correctly) some population differences in their ability to get a good deal. I suspect if you gave them individuating information about the seller’s abilities, sex would stop mattering even 5%. Second, that slight, 5% difference would by no means account for the 20% gap the authors report finding with respect to new product sales; not even close.

But maybe your next big idea will work out better…

Instead, my guess is that in spite of the authors’ use of the word “equally qualified” when referring to the men and women in their seller sample, there were some important differences in listings the buyers noticed; the type of differences that you can’t account for when you’re looking at over a million of them and rough control measures aren’t effective. Kricheli-Katz & Regev (2016) never seemed to consider – and I mean really consider – the possibility that something about these listings, something they didn’t control for, might have been driving sale price differences. While they do control for factors like the seller’s reputation, experience, number of pictures, year of the sale, and some of the sentiments expressed by words in the listing (how positive or negative it is), there’s more to making a good listing than that. A more likely story is that differences in sale prices reflect different behaviors on the part of male and female sellers (as we already know others differences exist in the sample), as the alternative story attempting to be championed would require a level of obsession with gender-based discrimination in the population so wide and deep that we wouldn’t need to research it; it would be plainly obvious to everyone already.

Then again, perhaps it’s time I make my way over to eBay to pick up a new tinfoil hat.

References: Kricheli-Katz, T. & Regev, T. (2016). How many cents on the dollar? Women and men in product markets. Science Advances, 2, DOI: 10.1126/sciadv.1500599

Thoughtful Suggestions For Communicating Sex Differences

Having spent quite a bit of time around the psychological literature – both academic and lay pieces alike – there are some words or phrases I can no longer read without an immediate, knee-jerk sense of skepticism arising in me, as if they taint everything that follows and precedes them. Included in this list are terms like bias, stereotype, discrimination, and, for the present purposes, fallacy. The reason these words elicit such skepticism on my end is due to the repeated failure of people using them to  consistently produce high-quality work or convincing lines of reasoning. This is almost surely due to the perceived social stakes when such terms are being used: if you can make members of a particular group appear uniquely talented, victimized, or otherwise valuable, you can subsequently direct social support towards and away from various ends. When the goal of argumentation becomes persuasion, truth is not a necessary component and can be pushed aside. Importantly, the people engaged in such persuasive endeavors do not usually recognize they are treating information or arguments differently, contingent on how it suits their ends.

“Of course I’m being fair about this”

There are few areas of research that seem to engender as much conflict – philosophically and socially – as sex differences, and it is here those words appear regularly. As there are social reasons people might wish to emphasize or downplay sex differences, it has steadily become impossible for me to approach most of the writing I see on the topic with the assumption it is at least sort of unbiased. That’s not to say every paper is hopelessly mired in a particular worldview, rejecting all contrary data, mind you; just that I don’t expect them to reflect earnest examinations of the capital-T, truth. Speaking of which, a new paper by Maney (2016) recently crossed my desk; a the paper that concerns itself with how sex differences get reported and how they ought to be discussed. Maney (2016) appears to take a dim view of the research on sex differences in general and attempts to highlight some perceived fallacies of people’s understandings of them. Unfortunately, for someone trying and educate people about issues surrounding the sex difference literature, the paper does not come off as one written by someone possessing a uniquely deep knowledge of the topic.

The first fallacy Maney (2016) seeks to highlight is the idea that sexes form discrete groups. Her logic for explaining why this is not the case revolves around the idea that while the sexes do indeed differ to some degree on a number of traits, they also often overlap a great deal on them. Instead, Maney (2016) argues that we ought to not be asking whether the sexes differ on a given trait, but rather by how much they do. Indeed, she even puts the word ‘differences’ in quotes, suggesting that these ‘differences’ between sexes aren’t, in many cases, real. I like this brief section, as it highlights well why I have grown to distrust words like fallacy. Taking her points in reverse order, if one is interested in how much groups (in this case, sexes) differ, then one must have, at least implicitly, already answered the question as whether or not they do. After all, if the sexes did not differ, it would pointless to talk about the extent of those non-differences; there simply wouldn’t be variation. Second, I know of zero researchers whose primarily interest resides in answering the question of whether the sexes differ to the exclusion of the extent of those differences. As far as I’m aware, Maney (2016) seems to be condemning a strange class of imaginary researchers who are content to find that a difference exists and then never look into it further or provide more details. Finally, I see little value in noting that the sexes often overlap a great deal when it comes to explaining the areas in which they do not. In much the same way, if you were interested in understanding the differences between humans and chimpanzees, you are unlikely to get very far by noting that we share a great deal of genes in common. Simply put, you can’t explain differences with similarities. If one’s goal is to minimize the perception of differences, though, this would be a helpful move.  

The second fallacy that Maney (2016) seeks to tackle is that idea that the cause of a sex differences in behavior can be attributed to differing brain structures. Her argument on this front is that it is logically invalid to do the following: (1) note that some brain structure between men and women differ, (2) note that this brain structure is related to a given behavior on which they also differ, and so (3) conclude that a sex difference in brain structure between men and women is responsible for that different behavior. Now while this argument is true within the rules of formal logic, it is clear that differences in brain structure will result in differences in behavior; the only way that idea could be false would be if brain structure was not connected to behavior, and I don’t know of anyone crazy enough to try and make that argument. The researchers engaging in the fallacy thus might not get the specifics right all the time, but their underlying approach is fine: if a difference exists in behavior (between sexes, species, or individuals), there will exist some corresponding structural differences in the brain. The tools we have for studying the matter are a far cry from perfect, making inquiry difficult, but that’s a different issue. Relatedly, then, noting that some formal bit of logic is invalid is assuredly not the same thing as demonstrating that a conclusion is incorrect or the general approach misguided. (Also worth noting is that the above validity issue stops being a problem when conclusions are probabilistic, rather than definitive.)

“Sorry, but it’s not logical to conclude his muscles might determine his strength”

The third fallacy Maney (2016) addresses is the idea that sex differences in the brain must be preprogrammed or fixed, attempting to dispel the notion that sex differences are rooted in biology and thus impervious to experience. In short, she is arguing against the idea of hard genetic determinism. Oddly enough, I have never met a single genetic determinist in person; in fact, I’ve never even read an article that advanced such an argument (though maybe I’ve just been unusually lucky…). As every writer on the subject I have come across has emphasized – often in great detail – the interactive nature of genes and environments in determining the direction of development, it again seems like Maney (2016) is attacking philosophical enemies that are more imagined than real. She could have, for instance, quoted researchers who made claims along the lines of, “trait X is biologically-determined and impervious to environmental inputs during development”; instead, it looks like everyone she cites for this fallacy is making a similar criticism of others, rather than anyone making the claims being criticized (though I did not check those references myself, so I’m not 100% there). Curiously, Maney (2016) doesn’t seem to be at all concerned about the people who, more-or-less, disregard the role of genetics or biology in understanding human behavior; at the very least she doesn’t devote any portion of her paper to addressing that particular fallacy. That rather glaring omission – coupled with what she does present – could leave one with the impression that she isn’t really trying to present a balanced view of the issue.

With those ostensibly fallacies out of the way, there are a few other claims worth mentioning in the paper. The first is that Maney (2016) seems to have a hard time reconciling the idea of sexual dimorphisms – traits that occur in one form typical of males and one typical of females – with the idea that the sexes overlap to varying degrees on many of them, such as height. While it’s true enough that you can’t tell someone’s sex for certain if you only know their height, that doesn’t mean you can’t make some good guesses that are liable to be right a lot more often than they’re wrong. Indeed, the only dimorphisms she mentions are the presence of sex chromosomes, external genitalia, and gonads and then continues to write as if these were of little to no consequence. Much like height, however, there couldn’t be selection for any physical sex differences if the sexes did not behave differently. Since behavior is controlled by the brain, physical differences between the sexes, like height and genitalia, are usually also indicative of some structural differences in the brain. This is the case whether the dimorphism is one of degree (like height) or kind (like chromosomes).

Returning to the main point, outside of these all-or-none traits, it is unclear what Maney (2016) would consider a genuine difference, much less any clear justification for that standard. For example, she notes some research that found a 90% overlap in interhemispheric connectivity between the male and female distributions, but then seems to imply that the corresponding 10% non-overlap does not reflect a ‘real’ sex difference. We would surely notice a 10% difference in other traits, like height, IQ, or number of fingers but, I suppose in the realm of the brain, 10% just doesn’t cut it.

Maney (2016) also seems to take an odd stance when it comes to explanations for these differences. In one instance, she writes about a study on multitasking that found a sex difference favoring men; a difference which, we are told, was explained by a ‘much larger difference in video game experience,’ rather than sex per se. Great, but what are we to make of that ‘much larger’ sex difference in video game experience? It would seem that that finding too requires an explanation, and one is not present. Perhaps video game experience is explained more by, I don’t know, competitiveness than sex, but then what are we to explain competitiveness with? These kinds of explanations usually end up going nowhere in a hurry unless they eventually land on some kind of adaptive endpoint, as once a trait’s reproductive value is explained, you don’t need to go any further. Unfortunately, Maney (2016) seems to oppose evolutionary explanations for sex differences, scolding those who propose ‘questionable’ functional or evolutionary explanations for sex differences for being genetic determinists who see no role for sociocultural influences. In her rush to condemn those genetic determinists (who, again, I have never met or read, apparently), Maney’s (2016) piece appears to fall victim to the warning laid out by Tinbergen (1963) several decades ago: rather than seeking to improve the shape and direction of evolutionary, functional analyses, Maney (2016) instead recommends that people simply avoid them altogether.

“Don’t ask people to think about these things; you’ll only hurt their unisex brains”

This is a real shame, as evolutionary theory is the only tool available for providing a deeper understanding of these sex differences (as well as our physical and psychological form more generally). Just as species will differ in morphology and behavior to the extent they have faced different adaptive problems, so too will the sexes within a species. By understanding the different challenges faced by the sexes historically, one can get a much clearer sense as to where psychological and physical difference will – and will not – be expected to exist, as well as why (this extra level of ‘why’ is important, as it allows you to better figure out where an analysis has gone wrong if the predictions don’t work). Maney (2016), it would seem, even missed a golden opportunity within her paper to explain to her readers that evolutionary explanations complement, rather than supplant, more proximate explanations when quoting an abstract that seemed to contrast the two. I suspect this opportunity was missed because she is either legitimately unaware of that point, or does not understand it (judging from the tone of her paper), believing (incorrectly) instead that evolutionary means genetic, and therefore immutable. If that is the case, it would be rather ironic for someone who does not seem to have much understanding of the evolutionary literature lecturing others on how it ought to be reported.

References: Maney, D. (2016). Perils and pitfalls of reporting sex differences. Philosophical Transactions B, 371, 1-11.

Tinbergen, N. (1964). On aims and methods of ethology. Zeitschrift für Tierpsychologie, 20, 410-433.

 

Is Choice Overload A Real Thing?

Within the world of psychology research, time is often not kind to empirical findings. This unkindness was highlighted recently in the results of the reproducibility project, which found that the majority of psychological findings tested did not appear to replicate particularly well. There are a number of reasons this happens, including that psychological research tends to be conducted rather atheoretically (allowing large numbers of politically-motivated or implausible hypotheses to be successfully floated), and that researchers have the freedom to analyze their data in rather creative ways (allowing them to find evidence of effects where none actually exist). These practices are engaged in because positive findings tend to be published more often than null results. In fact, even if the researchers do everything right, that’s still not a guarantee of repeatable results; sometimes people just get lucky with their data. Accordingly, it is a fairly common occurrence for me to revisit some research I learned about during my early psychology education only to find out that things are not quite as straightforward or sensible as they had been presented to be. I’m happy to report that today is (sort of) one of those days. The topic in question has been called a few different things, but for my present purposes I will be referring to it as choice overload: the idea that having access to too many choices actually results in making decisions more difficult and less satisfying. In fact, if too many options are presented, people might even avoid making a decision altogether. What a fascinating idea.

Here’s to hoping time is kind to it…

The first time I had heard of this phenomenon, it was in the context of exotic jams. The summary of the research goes as follows: Iyengar & Lepper (2000) set up shop in a grocery store, creating a tasting booth for either six or 24 varieties of jams (from which the more-standard flavors, like strawberry, were removed). Shoppers were invited to stop by the booths, try as many of the jams as they wanted, given a $1 off coupon for that brand’s jam, and then left. The table with the more extensive variety did attract more customers (60% of those who walked by), relative to the table with fewer selections (40%), suggesting that the availability of more options was, at least initially, appealing to people. Curiously, however, there was no difference between the average number of jams sampled: whether the table had 6 flavors or 24, people only sampled about 1.5 of them, on average, and apparently, no one ever sampled more than two flavors (maybe they didn’t want to see rude or selfish). More interestingly still, because the customers were given coupons, their purchases could be tracked. Of those who stopped at the table with only six flavors, about 30% ended up later purchasing jam; when the table had 24 flavors, a mere 3% of customers ended up buying one.

There are a couple of potential issues with this study, of course, owing to its naturalistic design; issues which were noted by the authors. For instance, it is possible that people who were fairly uninterested in buying jam might have been attracted to the 24-flavor booth nevertheless, simply out of curiosity, whereas those with a greater interest in buying jams would have remained interested in sampling them even when a smaller number of options existed. To try and get around these issues, Iyengar & Lepper (2000) designed another two experiments, one of which I wanted to cover. This other experiment was carried out in a more standard lab setting (to help avoid some of the possible issues with the jam results) and involved tasting chocolate. There were three groups of participants in this case: the first group (n = 33) got to select and sample a chocolate from an array of six possible options, the second group (n = 34) got to select and sample a chocolate from an array of 30 possible options, and a final group (n = 67) were randomly assigned to test a chocolate they had not selected. In the interests of minimizing people’s familiar preferences for such things, only those who enjoyed chocolate, but did not have experience with that particular brand were selected for the study. After filling out a few survey items and completing the sampling task, the participants were presented with their payment option: either $5 in cash, or a box of chocolates from that brand worth $5. 

In accordance with the previous findings, participants who selected from 30 different options were somewhat more likely to say they had been presented with “too many” options (M = 4.88) compared with those who old had 6 possible choices (M = 3.61, on a seven-point scale, ranging from “too few” choices at 1, to “too many” choices at 7). Despite the subjects in the extensive-choice group saying that making a decision as to which chocolate to sample was more difficult, however, there was no correlation between how difficult participants found the decision and how much they reported enjoying making it. It seemed people could enjoy making more difficult choices. Additionally, participants in the limited-choice group were more satisfied with their choice (M = 6.28) than those in the extensive-choice group (M = 5.46), who were in turn more satisfied than those in the no-choice group (M = 4.92). Of particular interest are the compensation findings: those in the limited-choice group were more likely to accept a box of chocolate in lieu of cash (48%) than those in either the extensive-choice (12%) or no-choice conditions (10%). It seems that having some options was preferable to having no options, but having too many options seemed to cause people difficulty in making decisions. The research concluded that, to use the term, people could be overloaded by choices, hindering their decision making process.

“If it can’t be settled via coin flip, I’m not interested”

While such findings are indeed quite interesting, there is no guarantee they will hold up over time; as I mentioned initially, lots of research fails to do likewise. This is where meta-analyses can help. This is the kind of research where the results from many different studies can be examined jointly. Scheibehenne et al (2010) set out to conduct one of their own on the research surrounding choice overload, noting that some of the research on the phenomenon does not point in the same direction. They note a few examples, such as field research in which reducing the number of available items resulted in decreases or no changes to sales, rather than what should have been a predicted uptick in them. Indeed, the lead author also reports that their own attempt at replicating the jam study for their dissertation in 2008 failed, as well as the second author’s attempt to replicate the chocolate experiment. These failures to replicate the original research might indicate that the initial results of choice overload were something of a fluke, and so a wider swath of research needs to be examined to determine if that’s the case.

Towards this end, Scheibehenne et al (2010) collected 50 experiments from the literature on the subject, representing about 5,000 participants in 13 published and 16 unpublished papers from 2000-2009. In total, the average estimated effect size for the choice overload effect across all the experiments was a mere D = 0.02; the effect was all but non-existent. Further analysis revealed that the difference in effect sizes between studies did not seem to be randomly distributed; there were likely relevant differences between these papers determining what kind of results they found. To examine this issue further, Scheibehenne et al (2010) began by trimming off the 6 largest effects from both the top and the bottom ends of the reported research. The results showed that, in the trimmed data set, there was little evidence of difference between the remaining research. This suggests that most of the differences between these studies was being driven by unusually large positive and negative effects.

Returning to the complete, untrimmed data set, Scheibehenne et al (2010) started to pick apart how several moderating variables might be affecting the reported results. In line with the intuitions of Iyengar & Lepper (2000), preexisting preferences or expertise did indeed have an effect on the choice overload issue: people with existing preferences were not as troubled by additional items when making a choice, relative to those without such preferences. However, there was also an effect of publication – such that published papers were somewhat more likely to report an effect of choice overload, relative to unpublished ones – as well as a small effect of year – such that papers published more recently were a bit less likely to report choice overloading effects. In sum, the results of the meta-analysis indicated that the average effect size of choice overload was nearly zero, that older studies which saw publication report larger effects than those that came later or were not published, and that well-defined, preexisting preferences likely remove the negative effects of having too many options (to the extent they actually existed in the first place). Crucially, what should have been an important variable – the number of different options participants were presented with on the high end – explained essentially none of the variance. That is to say that 18 times didn’t seem to make any difference, compared to 30 items or more

“Well, there are too many different chip options; guess I’ll just starve”

While this does not rule out choice overload as being a real thing, it does cast doubt on the phenomenon being as pervasive or important as some might have given it credit for. Instead, it appears probable that such choice effects might be limited to particular contexts, assuming they reliably exist in the first place. Such contexts might include how easily the products can be compared to one another (i.e., it’s harder to decide when faced with two equally attractive, but quite distinct options), or whether people are able to use mental shortcuts (known as heuristics) to rapidly whittle down the number of options they actually consider (so as to avoid spending too much time making fairly unimportant choices). While future examination would be required to test some of these ideas, the larger message here extends beyond the choice overload literature to most of psychology research: it is probably fair to assume that, as things currently stand, the first thing you hear about the existence or importance of an effect will likely not resemble the last thing you do.

References: Iyengar, S. & Lepper, M. (2000). When choice is demotivating: Can one desire too much of a good thing? Journal of Personality & Social Psychology, 79, 995-1006.

Scheibehenne, B., Greifeneder, R., & Todd, P. (2010). Can there ever be too many options? A meta-analytic review of choice overload. Journal of Consumer Research, 37, 409-424.