I’m going to preface this post by stating that statistics is not my primary area of expertise. Admittedly, this might not be the best way of generating interest, but non-expertise hasn’t seem to have stopped many a teacher or writer, so I’m hoping it won’t be too much of a problem here. This non-expertise, however, has apparently also not stopped me from stumbling upon an interesting question concerning Bayesian statistics. Whether this conceptual problem I’ve been mulling over would actually prove to be a problem in real-world data collection is another matter entirely. Then again, there doesn’t appear to be a required link between academia and reality, so I won’t worry too much about that while I indulge in the pleasure of a little bit of philosophical play time.

The link between academia and reality is about as strong as the link between my degree and a good job.

So first, let’s run through a quick problem using Bayesian statistics. This is the classic example that I was introduced to the idea by: say that you’re doctor trying to treat an infection that has broken out among a specific population of people. You happened to know that 5% of the people in this population are actually infected and you’re trying to figure out who those people are so you can at least quarantine them. Luckily for you, you happen to have a device that can test for the presence of this infection. If you use this device to test an individual who actually has the disease, it will come back positive 95% of the time; if the individual does not have the disease, it will come back positive 5% of the time. Given that an individual has tested positive for the disease, what is the probability that they actually have it? The answer, unintuitive to most, is 50%.

Though the odds of someone testing positive *if* they have the disease are high (95%), very few people *actually have* the disease (5%). So 5% of the 95% of the people who *don’t *have an infection will test positive and 95% of the of *5%* of people who *do* have an infection also will. In case that example ran by too quickly, here’s another brief video example using hipsters drinking beer over treating infection. This method of statistical testing would seem to have some distinct benefits: for example, it will tell you the probability of your hypothesis, given your data, rather than the probability of your data, given your hypothesis (which, I’m told, is what most people actually want to be calculating). That said, I see two (possibly major) conceptual issue with this type of statistical analysis. If anyone more versed in these matters feels they have good answers to them, I’d be happy to hear it in the comments section.

The first issue was raised by Gelman (2008), who was discussing the usefulness of our prior knowledge. In the above examples, we *know* some information ahead of time (the prevalence of an infection or hipsters); in real life, we frequently don’t know this information; in fact, it’s often what we’re trying to estimate when we’re doing our hypothesis tests. This puts us in something of a bind when it comes to using Bayes’ formula. Lacking objective knowledge, one could use what are called subjective priors, which represent your own set of preexisting beliefs about how likely certain hypotheses are. Of course, subjective priors have two issues: first, they’re unlikely to be shared uniformly between people, and if your subjective beliefs are not my subjective beliefs, we’ll end up coming to two different conclusions given the same set of data. It’s also probably worth mentioned that subjective beliefs do not, to the best of my knowledge, actually effect the goings-on in the world: that I believe it’s highly probable it won’t rain tomorrow doesn’t matter; it either will or I won’t, and no amount of belief will change that. The second issue concerns the point of the hypothesis test; if you already have a strong prior belief about the truth of a hypothesis, for whatever reason you do, that would seem to suggest there’s little need for you to actually collect any new data.

One could attempt to get around this problem by using a subjective, but uninformative prior; that is, distribute your belief uniformly over your set of possible outcomes, or to enter into your data analysis with no preconceptions about how it’ll turn out. This might seem like a good solution to the problem, but it would also seem to make your priors all but useless. If you’re multiplying by the same constant, you can just drop it from your analysis. So it would seem in both cases, priors don’t do you a lot of good: they’re either strong, in which case you don’t need to collect more data, or uninformative, in which case they’re pointless to include in the analysis. Now perhaps there are good arguments to be made for subjective priors, but that’s not the primary point I hoped to address; my main criticism involves what’s known as the gambler’s fallacy.

This logical fallacy can be demonstrated with the following example: say you’re flipping a fair coin; given that this coin has come up heads 10 times in a row, how likely will the probability of a tails outcome be on the next flip? The answer, of course, is 50%, as a fair coin is one that is unbiased with respect to which outcome will obtain when you flip it; the probability of a heads outcome using this coin is always as likely as a tails outcome. However, someone making the gambler’s fallacy will suggest that the coin is more likely to come up tails, as all the heads outcomes makes the tails outcome *feel* more likely; as if a tails outcome is “due” to come up. This is incorrect, as each flip of this coin is independent of the other flips, so knowing what the previous outcomes of this coin have been tell you nothing about what the future outcomes of the coin will be, or, as others have put it, the coin has no memory. As I see it, Bayesian analysis *could* lead one to engaging in this fallacy (or, more precisely, something like the reverse gambler’s fallacy).

Here’s the example I’ve been thinking about: consider that you have a fair coin and an infinite stretch of time over which you’ll be flipping it. Long strings of heads or tails outcomes (say 10,000 in a row, or even 1,000,000 and beyond in a row) are certainly *improbable*, but given an infinite amount of time, they become an inevitability outcomes that will obtain eventually. Now, if you’re a good Bayesian, you’ll update your posterior beliefs following each outcome. In essence, after a coin comes up heads, you’ll be *more likely* to think that it will come up heads on the subsequent flip; since heads have been coming up, more heads are *due* to come up. Essentially, you’ll be suggesting that these independent events are not actually independent of each other, at least with respect to your posterior beliefs. Given these long strings of heads and tails which will inevitably crop up, over time you will go from believing the coin is fair, to believing that it is nearly completely biased towards both heads and tails and back again.

Though your beliefs about the world can never have enough pairs of flip-flips…

It seems to me, then, that you want some statistical test that will, to some extent, try and take into account data that you *did not* obtain, but *might have* if you want to more accurately estimate the parameter (in this case, the fairness of the coin: what might have happened if I flipped the coin another X number of times). This is, generally speaking, anathema to Bayesian statistics as I understand it, who only concern themselves with the data that was collected. Of course, that does raise the question of how one can accurately predict what data they might have obtained, but did not, for which I don’t have a good answer. There’s also the matter of precisely how large of a problem this hypothetical example poses for Bayesian statistics when you’re not dealing with an infinite number of random observations; in the real world, this conceptual problem might not be much of one as these events *are* highly improbable, so it’s rare that anyone will actually end up making this kind of mistake. That said, it is generally a good thing to be as conceptually aware of possible problems as we can be if we want any hope of fixing them.

**References:** Gelman, A. (2008). Objections to Bayesian statistics Bayesian Analysis, 3, 445-450 DOI: 10.1214/08-BA318

I like the post and I use the gambler’s fallacy in my statistics classes all the time (and point out both versions of it). After positing that we have a fair coin, I point out that nonetheless, after a long sequence of heads, there are those who will think that tails is due. and there are others who will think that heads is on a hot streak. I point out that both of these positions require that the coin somehow remembers what it’s been doing. THEN I ask what they would bet on if a “fair” coin came up Heads 10 times in a row. I give them 3 options: H, T, Doesn’t Matter. Most vote for “Doesn’t Matter”. I then tell them that if I had to bet, I’d bet on heads, because I’m not so sure that it is a fair coin. (They understand this even without a discussion of Bayesian statistics.)

You are right, of course, that in an infinitely long sequence of heads, there WILL be arbitrarily long sequences of heads and tails, but I wanted to point out that this does not imply that you will at any stage in the process move from thinking that the coin is biased in favor of heads to biased in favor of tails and back. Let me slightly oversimplify: Though you will eventually see a string of 100 heads, this is unlikely to happen until such a string has VERY little effect on the overall percentage of heads seen thus far. So, even a string of 100 consecutive heads (after a HUGE overall number of tosses) isn’t enough to sway our belief that the coin is fair (or whatever belief we have come to by then).

With regards to your second paragraph, if we’re talking about infinite stretch of time, eventually you’d get outcomes where heads comes up more in a row than the entire number of times (or several times that number) that you’ve already flipped the coin. One could be forgiven for worrying about the practical problems this would pose (which I feel would be minimal), but I wasn’t talking about 100 heads in a row; I was thinking more like billions or trillions in a row. Impractical, yes, but OK in the land of philosophy.

I know this sounds crazy, but with infinity, you could have infinity within the infinite tosses, so I’ll entertain the idea that there could be infinite streaks of heads and tails, those infinities adding up to the infinite tosses made (Think Hilbert’s Hotel).

So with this hypothetical infinite tossing about, you could have already been tossing for infinity and going with a record of infinite streaks of heads and still have to conclude that the next one is still 50/50 (with a fair coin).

But I have no idea whether that makes any sense. Sorry for the digression.

Frequency definitions of probability that depend upon infinite sequences have that kind of problem. Even if the probability of heads is ½ at each coin toss, an infinite sequence of heads is possible. OC, the probability of that would be infinitesimal. The probability that half the tosses would be heads is also infinitesimal. However, the expected proportion of heads would be ½, not 1. But we define expectations in terms of probability.

I think that to avoid circularity, we need some basic notion, whether we call it probability or not, that does not depend upon the result of an infinite sequence. The Law of Large Numbers is not good enough.

I realize that you meant truly arbitrarily long sequences of heads (billions, trillions, or whatever). My point stands. While it’s true that in an infinitely long sequence of tosses, there WILL be sequences of N consecutive heads for any N, the probability is vanishingly small that this will happen soon enough to distort our judgment about the fairness of the coin. If the coin is actually fair, then the probability of a sequence of N consecutive heads starting at any specific location in the sequence is (1/2)^N. So, if we toss the coin 2^N times, we expect such a sequence to occur roughly once. If N = 100, then 2^N is approximately 10^30. We’re likely to see a sequence of 100 heads somewhere in the first 10^30 tosses. By the time we have much chance of seeing it, that string of 100 heads is inconsequential to the cumulative probability calculation we’re making. This isn’t to say that it’s IMPOSSIBLE for it to happen early, but we should not expect it to. It only gets worse for a string of a trillion (10^12) heads. In that case, we expect a sequence of a trillion heads to occur once every 2^(10^12) tosses – roughly 10^(3×10^11). The probability of such a string happening in time to make a dent in even the first hundred decimal places of our cumulative calculation is vanishingly small. It just gets worse and worse as N increases. It simply isn’t the case that the existence of arbitrarily long sequences of heads (as predicted by probability theory) implies that there will also be arbitrarily large swings in our ongoing cumulative probability calculation (# of heads / # of flips).

You state: “It seems to me, then, that you want some statistical test that will, to some extent, try and take into account data that you did not obtain, but might have if you want to more accurately estimate the parameter (in this case, the fairness of the coin: what might have happened if I flipped the coin another X number of times).”

Could you explain that? How do you see that affecting your estimate of the probability of heads or tails?

I.e. you want to estimate the parameter as if you had flipped the coin an infinite number of times, not just however many observations you had. Short of psychic abilities, I have little clue on how to do this, so it’s more just a thought than much else.

Thanks for the clarification.

Unfortunately, I am very busy through the weekend, but I would like to come back to this topic next week.

You write, “Now, if you’re a good Bayesian, you’ll update your posterior beliefs following each outcome. In essence, after a coin comes up heads, you’ll be more likely to think that it will come up heads on the subsequent flip; since heads have been coming up, more heads are due to come up.”

While that might be true in the short term, Bayesian reasoning would eventually come to the correct conclusion that a fair coin is fair. That is, a Bayesian will eventually recognize that there is no flip-to-flip correlation. So would a frequentist.

Eventually, Bayesian reasoning would come to many different conclusions, among which would be that the coin is likely fair. They would also come to the conclusion that the coin is unfair in a relatively infinite number of ways, depending on what time in the flipping process they make conclusions.

OC, that is true for the frequentist theory, as well.

“They would also come to the conclusion that the coin is unfair in a relatively infinite number of ways, depending on what time in the flipping process they make conclusions.” No, they would not. The quoted sentence seems to presume that the Bayesian would at any give time only consider the most recent observations (say, the last 1000). After a few hundred tosses the data collected will have overwhelmed the prior and from that point forward the posterior interval around .5 is going to have so much density that an occasional string of many heads (or tails) won’t change the posterior distribution much at all.

My point does not assume the Bayesian only cares about the last X number of observations. At some point, the Bayesian will have flipped the coin N times. Given an infinite amount of time, there will be strings of consecutive heads or tails that are 2N long, 5N, 0.5N or any constant times N, no matter how large N is. Yes, it’s Incredibly improbable and exponentially more improbable as N increases. Further, the amount of time that the Bayesian would believe that the fair coin was heavily biased would, over time, be vastly overwhelming by the amount of time they’d believe it was fair. Accordingly, to the extent this would actually pose a problem in real-world data collection is debatable but, for the purposes of this example, the Bayesian would continually update their posteriors to, at various points, represent beliefs about the bias of the coin flips from almost completely biased towards tails or heads.

Perhaps I don’t understand what you are envisioning. Suppose the first 1000 tosses are HTH followed by 997 Tails; then we see some heads, about 50-50 for the next several thousand tosses. We agree that this is highly unlikely, but possible. OK, then for a stretch of time (say, from toss 12 through toss 3000 or so) the Bayesian would think the coin to be biased towards tails. I count that as coming to the conclusion that the coin is unfair in one way, not “a relatively infinite number of ways.”

Ah. I was getting at the idea that if, after each toss, you asked for the posteriors, you’d see them representing an infinite number of values between -1 and 1, where -1 is totally biased towards tails and 1 totally towards heads. The specific degree of bias falls into 3 general categories (towards heads, tails, or neither), but the precise degree can be represented across an infinite number of possible values in that range.

“Further, the amount of time that the Bayesian would believe that the fair coin was heavily biased would, over time, be vastly overwhelming by the amount of time they’d believe it was fair.”

With a flat prior, which is what I think you have in mind, the Bayesian probability for the next toss at each point will be closer to 50:50 than the frequentist probability estimate. Work it out.

A Bayesian would not necessarily come to the conclusion that the coin is fair. To do so would require what I. J. Good called a Type II probability distribution, i. e., a prior distribution of priors, and one that included a prior that the coin is fair.

Pingback: A Frequentist And A Bayesian Walk Into Infinity… | Pop Psychology | Statistics- Bayes or Frequentist | Scoop.it

Pingback: Do People “Really” Have Priors? | Pop Psychology