I’m going to preface this post by stating that statistics is not my primary area of expertise. Admittedly, this might not be the best way of generating interest, but non-expertise hasn’t seem to have stopped many a teacher or writer, so I’m hoping it won’t be too much of a problem here. This non-expertise, however, has apparently also not stopped me from stumbling upon an interesting question concerning Bayesian statistics. Whether this conceptual problem I’ve been mulling over would actually prove to be a problem in real-world data collection is another matter entirely. Then again, there doesn’t appear to be a required link between academia and reality, so I won’t worry too much about that while I indulge in the pleasure of a little bit of philosophical play time.
So first, let’s run through a quick problem using Bayesian statistics. This is the classic example that I was introduced to the idea by: say that you’re doctor trying to treat an infection that has broken out among a specific population of people. You happened to know that 5% of the people in this population are actually infected and you’re trying to figure out who those people are so you can at least quarantine them. Luckily for you, you happen to have a device that can test for the presence of this infection. If you use this device to test an individual who actually has the disease, it will come back positive 95% of the time; if the individual does not have the disease, it will come back positive 5% of the time. Given that an individual has tested positive for the disease, what is the probability that they actually have it? The answer, unintuitive to most, is 50%.
Though the odds of someone testing positive if they have the disease are high (95%), very few people actually have the disease (5%). So 5% of the 95% of the people who don’t have an infection will test positive and 95% of the of 5% of people who do have an infection also will. In case that example ran by too quickly, here’s another brief video example using hipsters drinking beer over treating infection. This method of statistical testing would seem to have some distinct benefits: for example, it will tell you the probability of your hypothesis, given your data, rather than the probability of your data, given your hypothesis (which, I’m told, is what most people actually want to be calculating). That said, I see two (possibly major) conceptual issue with this type of statistical analysis. If anyone more versed in these matters feels they have good answers to them, I’d be happy to hear it in the comments section.
The first issue was raised by Gelman (2008), who was discussing the usefulness of our prior knowledge. In the above examples, we know some information ahead of time (the prevalence of an infection or hipsters); in real life, we frequently don’t know this information; in fact, it’s often what we’re trying to estimate when we’re doing our hypothesis tests. This puts us in something of a bind when it comes to using Bayes’ formula. Lacking objective knowledge, one could use what are called subjective priors, which represent your own set of preexisting beliefs about how likely certain hypotheses are. Of course, subjective priors have two issues: first, they’re unlikely to be shared uniformly between people, and if your subjective beliefs are not my subjective beliefs, we’ll end up coming to two different conclusions given the same set of data. It’s also probably worth mentioned that subjective beliefs do not, to the best of my knowledge, actually effect the goings-on in the world: that I believe it’s highly probable it won’t rain tomorrow doesn’t matter; it either will or I won’t, and no amount of belief will change that. The second issue concerns the point of the hypothesis test; if you already have a strong prior belief about the truth of a hypothesis, for whatever reason you do, that would seem to suggest there’s little need for you to actually collect any new data.
One could attempt to get around this problem by using a subjective, but uninformative prior; that is, distribute your belief uniformly over your set of possible outcomes, or to enter into your data analysis with no preconceptions about how it’ll turn out. This might seem like a good solution to the problem, but it would also seem to make your priors all but useless. If you’re multiplying by the same constant, you can just drop it from your analysis. So it would seem in both cases, priors don’t do you a lot of good: they’re either strong, in which case you don’t need to collect more data, or uninformative, in which case they’re pointless to include in the analysis. Now perhaps there are good arguments to be made for subjective priors, but that’s not the primary point I hoped to address; my main criticism involves what’s known as the gambler’s fallacy.
This logical fallacy can be demonstrated with the following example: say you’re flipping a fair coin; given that this coin has come up heads 10 times in a row, how likely will the probability of a tails outcome be on the next flip? The answer, of course, is 50%, as a fair coin is one that is unbiased with respect to which outcome will obtain when you flip it; the probability of a heads outcome using this coin is always as likely as a tails outcome. However, someone making the gambler’s fallacy will suggest that the coin is more likely to come up tails, as all the heads outcomes makes the tails outcome feel more likely; as if a tails outcome is “due” to come up. This is incorrect, as each flip of this coin is independent of the other flips, so knowing what the previous outcomes of this coin have been tell you nothing about what the future outcomes of the coin will be, or, as others have put it, the coin has no memory. As I see it, Bayesian analysis could lead one to engaging in this fallacy (or, more precisely, something like the reverse gambler’s fallacy).
Here’s the example I’ve been thinking about: consider that you have a fair coin and an infinite stretch of time over which you’ll be flipping it. Long strings of heads or tails outcomes (say 10,000 in a row, or even 1,000,000 and beyond in a row) are certainly improbable, but given an infinite amount of time, they become an inevitability outcomes that will obtain eventually. Now, if you’re a good Bayesian, you’ll update your posterior beliefs following each outcome. In essence, after a coin comes up heads, you’ll be more likely to think that it will come up heads on the subsequent flip; since heads have been coming up, more heads are due to come up. Essentially, you’ll be suggesting that these independent events are not actually independent of each other, at least with respect to your posterior beliefs. Given these long strings of heads and tails which will inevitably crop up, over time you will go from believing the coin is fair, to believing that it is nearly completely biased towards both heads and tails and back again.
Though your beliefs about the world can never have enough pairs of flip-flips…
It seems to me, then, that you want some statistical test that will, to some extent, try and take into account data that you did not obtain, but might have if you want to more accurately estimate the parameter (in this case, the fairness of the coin: what might have happened if I flipped the coin another X number of times). This is, generally speaking, anathema to Bayesian statistics as I understand it, who only concern themselves with the data that was collected. Of course, that does raise the question of how one can accurately predict what data they might have obtained, but did not, for which I don’t have a good answer. There’s also the matter of precisely how large of a problem this hypothetical example poses for Bayesian statistics when you’re not dealing with an infinite number of random observations; in the real world, this conceptual problem might not be much of one as these events are highly improbable, so it’s rare that anyone will actually end up making this kind of mistake. That said, it is generally a good thing to be as conceptually aware of possible problems as we can be if we want any hope of fixing them.
References: Gelman, A. (2008). Objections to Bayesian statistics Bayesian Analysis, 3, 445-450 DOI: 10.1214/08-BA318