As of late, I’ve been dipping my toes ever-deeper into the conceptual world of statistics. If one aspires towards understanding precisely what they’re seeing in when it comes to research in psychology, understanding statistics can go a long way. Unfortunately, the world of statistics is a contentious one and the concepts involved in many of these discussions can be easily misinterpreted, so I’ve been attempting to be as cautious as possible in figuring the mess out. Most recently, I’ve been trying to decipher whether the hype over Bayesian methods is to be believed. There are some people who seem to feel that there’s a dividing line between Bayesian and Frequentist philosophies that one must choose sides over (Dienes, 2011), while others seem to suggest that such divisions are basically pointless and the field has moved beyond them (Gelman, 2008; Kass, 2011). One of the major points which has been bothering me about the Bayesian side of things is the conceptualization of a “prior” (though I feel such priors can easily be incorporated in Frequentist analyses as well, so this question applies well to any statistician). Like many concepts in statistics, this one seems to both be useful in certain situations and able to easily lead one astray in others. Today I’d like to consider a thought experiment dealing with the latter cases.
First, a quick overview of what a prior is and why they can be important. Here’s an example that I discussed previously:
say that you’re doctor trying to treat an infection that has broken out among a specific population of people. You happened to know that 5% of the people in this population are actually infected and you’re trying to figure out who those people are so you can at least quarantine them. Luckily for you, you happen to have a device that can test for the presence of this infection. If you use this device to test an individual who actually has the disease, it will come back positive 95% of the time; if the individual does not have the disease, it will come back positive 5% of the time. Given that an individual has tested positive for the disease, what is the probability that they actually have it? The answer, unintuitive to most, is 50%.
In this example, your prior (bolded) is the percent of people who have the disease. The prior is, roughly, what beliefs or uncertainties you come to your data with. Bayesian analysis requires one to explicitly state one’s prior beliefs, regardless of what those priors are, as they will eventually play a role in determining your conclusions. Like in the example above, priors can be exceptionally useful when they’re known values.
In the world of research it’s not always (or even generally) the case that priors are objectively known: in fact, they’re basically what we’re trying to figure out in the first place. More specifically, people are actually trying to derive posteriors (prior beliefs that have been revised by the data), but one man’s posteriors are another man’s priors, and the line between the two is more or less artificial. In the previous example, we took the 5% prevalence in the population is taken as a given; if you didn’t know that value and only had the results of your 95% effective test, figuring out how many of your positives were likely false-positive and, conversely, how many of your negatives were likely false-negatives, would be impossible values to accurately estimate (except if you got lucky). If the prevalence of the disease in the population is very low, you’ll have many false-positives; if the prevalence is very high, you’ll likely have many false-negatives. Accordingly, what prior beliefs you bring to your results will have a substantial effect on how they’re interpreted.
This is a fairly common point discussed when it comes to Bayesian analysis: the frequent subjectivity of priors. Your belief about whether a disease is common or not doesn’t change the actual prevalence of it; just how you will eventually look at your data. This means that researchers with the same data can reach radically different conclusions on the basis on different priors. So, if one is given free-reign over which priors they want to use, this could allow confirmation bias to run wild and a lot of disagreeable data to be all but disregarded. As this is a fairly common point in the debate over Bayesian statistics, there’s already been a lot of ink (virtual and actual) spilled over it, so I don’t want to continue on with it.
There is, however, another issue concerning priors that, to the best of my knowledge, has not been thoroughly addressed. That question is to what extent we can consider people to have prior beliefs in the first place? Clearly, we feel that some things are more likely than others: I think it’s more likely that I won’t win the lottery than I will. No doubt you could immediately provide a list of things you think are more or less probable than others with ease. That these feelings can be so intuitive and automatically generated helps to mask an underlying problem with them: strictly speaking, it seems we ought to either not update our priors at all or not say that we “really” have any. A shocking assertion, no doubt, (and maybe a bit hyperbolic) but I want to explore it and see where it takes us.
We can begin to explore this intuition with another thought experiment involving flipping a coin, which will be our stand-in for a random-outcome generator. Now this coin is slightly biased in a way that results in 60% of the flips coming up heads and the remaining 40% coming up tails. The first researcher has his entire belief centered 100% on the coin being 60% biased towards heads and, since there is no belief left to assign, thinks that all other states of bias are impossible. Rather than having a distribution of beliefs, this researcher has a single point. This first researcher will never update his belief about the bias of the coin no matter what outcomes he observed; he’s certain the coin is biased in a particular way. Because he just so happens to be right about the bias he can’t get any better and this is lack of updating his priors is a good thing (if you’re looking to make accurate predictions, that is).
Now let’s consider a second researcher. This researcher comes to the coin with a different set of priors: he thinks that the coin is likely fair, say 50% certain, and then distributes the rest of his belief equally between two additional potential values of the coin not being fair (say 25% sure that the coin is 60% biased towards heads and 25% sure that the coin is similarly biased towards tails). The precise distribution of these beliefs doesn’t matter terribly; it could come in the form of two or an infinite number of points. All that matters is that, because this researcher’s belief is distributed in such a way that it doesn’t lie on a single point, they are capable of being updated by the data from the coin flips. Researcher two, like a good Bayesian, will then update his priors to posteriors on the basis of the observed flips, then turn those posteriors into new priors and continues on updating for as long as he’s getting new data.
On the surface, then, the major difference between the two is that researcher one refuses to update his priors and researcher two is willing to do so. This implies something rather interesting about the latter researcher: researcher two has some degree of uncertainty about his priors. After all, if he was already sure he had the right priors, he wouldn’t update, since he would think he could do not better in terms of predictive accuracy. If researcher two is uncertain about his priors, then, shouldn’t that degree of uncertainty similarly be reflected somehow?
For instance, one could say that researcher two is 90% certain that he got the correct priors and 10% certain that he did not. That would represent his priors about his priors. He would presumably need to have some prior belief about the distribution he initial chose, as he was selecting from an infinite number of other possible distributions. His prior about his priors, however, must have its own set of priors as well. One can quickly see that this leads to an infinite regress: at some point, researcher two will basically have to admit complete uncertainty about his priors (or at least uncertainty about how they ought to be updated, as how one updates their priors depends upon the priors one is using, and there are an infinite number of possible distributions of priors), or admit complete certainty in them. If researcher two ends up admitting to complete uncertainty, this will give him a flat set of priors that ought to be updated very little (he will be able to rule out 100% biased towards heads or tails, contingent on observing either a heads or tails, but not much beyond that). On the other hand, if researcher two ends up stating one of his priors with 100% certainty, the rest of the priors ought to collapse on each other to 100% as well, resulting in an unwillingness to update.
It is not immediately apparent how we can reconcile these two stances with each other. On the one hand, researcher one has a prior that cannot be updated; on the other, researcher two has a potentially infinite number of priors with almost no idea how to update them. While we certainly could say that researcher one has a prior, he would have no need for Bayesian analysis. Given that people seem to have prior beliefs about things (like how likely some candidate is to win an election), and these beliefs seem to be updated from time to time (once most of the votes have been tallied), this suggests that something about the above analysis might be wrong. It’s just difficult to place precisely what that thing is.
One way of ducking the dilemma might be to suggest that, at any given point in time, people are 100% certain of their priors, but what point they’re certain about change over time. Such a stance, however, suggests that priors aren’t updated so much as priors just change, and I’m not sure that such semantics can save us here. Another suggestion that was offered to me is that we could just forget the whole thing as priors themselves don’t need to themselves have priors. A prior is a belief distribution about probability and probability is not a “real” thing (that is the biased coin doesn’t come up 60% and 40% tails per flip; the result will either be a heads or a tails). For what it’s worth, I don’t think such a suggestion helps us out. It would essentially seem to be saying that, out of the infinite number of beliefs one could start with, any subset of those beliefs is as good as any other, even if they lead to mutually-exclusive or contradictory results and we can’t think about why some of them are better than others. Though my prior on people having priors might have been high, my posteriors about them aren’t looking so hot at the moment.
References: Dienes, Z. (2011). Bayesian Versus Orthodox Statistics: Which Side Are You On? Perspectives on Psychological Science, 6 (3), 274-290 DOI: 10.1177/1745691611406920
Gelman, A. (2008). Rejoinder. Bayesian Analysis, 3, 467-478.
Kass, R. (2011). Statistical inference: The big picture. Statistical Science, 26, 1-9.