Comments on: Do People “Really” Have Priors?

By: Jesse Marczyk

Jesse Marczyk — Fri, 08 Mar 2013 17:18:52 +0000

If you have a prior over priors then you can just compute a single prior that encodes your prior over priors.

I think that's right, except that your priors about your priors also need priors, so that leads to another recomputation of what priors you're using. Then, go again for your priors about your priors about your priors. If this continues on, it will either hit a point where you are 100% confident in your prior (or your distribution of priors), meaning you have assigned a 0% belief state to all other states of affairs (which means you cannot update your priors if you're using Bayes' theorem), or you continue on doing that infinitely, meaning you'll have (a) a uniform distribution with (b) no idea how it should be updated. I don't see a way out of that issue. I know something sounds wrong about that analysis in the same way that something sounds wrong about Zeno's paradox. Zeno's paradox seems to be easily resolved, though, in that it's missing a time component that is part of the calculation of motion; the same cannot be said of Bayes' theorem as it stands. If you still have belief in other possible distributions, you need to recalculate your priors to reflect that; if you don't have any belief left, you can't update. Clearly, beliefs are sometimes updated on the basis of evidence; of that there can be little doubt. It's what makes this example seem so strange. How that updating is done, however, doesn't seem to be through the use of Bayes' theorem. [EDIT] I feel it would be worthwhile to add an example: let's return to the doctor example I raised in the post. Here, the doctor is starting with a given 5% prior about the prevalence of the disease. When the results of his test come back, let's assume that, on the basis of that evidence, the doctor recalculates his prior belief about the prevalence of the disease: now he thinks it's more common than it was beforehand. So one could say he updated his prior about the disease, but, if he did so, he would need to recalculate the results of his initial test with that new prior. Given that his prior is now higher, he might come to think that there were fewer false-positives than previously imagined. This, however, makes the disease seem even more prevalent, given the same evidence. In other words, every time his priors change, his interpretation of the data changes, and every time the interpretation changes, so too should his priors, and so on. Bayes' theorem works in the initial example because the priors are being used as a given to compute an unknown value. When priors are not taken as a given, however, Bayes' theorem no longer works. The same data points could be used, it seems, to recalculate one's priors, which would recalculate one's likelihoods, which would recalculate one's priors, and so on. Unless I'm missing something, like some stopping rule for doing so?

By: Artem Kaznatcheev

Artem Kaznatcheev — Fri, 08 Mar 2013 16:28:29 +0000

There is a series of misconceptions in your reasoning about Bayesian (or even frequentist) inference that I wanted to clear up. The first researcher has his entire belief centered 100% on the coin being 60% biased towards heads When you are doing Bayesian inference, and you define some hypothesis class, your prior needs to have full support on that space. In this degenerate case you are simply not doing inference, since your approach is equivalent to restricting your language is so restricted that it allows you to generate only one hypothesis. His prior about his priors, however, must have its own set of priors as well. One can quickly see that this leads to an infinite regress This is only a challenge to inference in so much as Zeno's paradox is a challenge to motion. If you have a prior over priors then you can just compute a single prior that encodes your prior over priors. The easiest example is to look at the following two priors: (A) i believe that with 90% certainty the coin will land heads 40% of time and 10% certain that it will land heads 60% of the time and (B) I believe with 90% certainty that the coin will land heads 60% of the time and 10% certainty it will lands heads 40% of the time. My prior over prions is uniform: 50% I expect (A) and 50% I expect (B). The information in the prior over priors can be collapsed into a single prior of "I am 50% certain the coin will land heads 40% of the time and 50% certain the coin will land heads 60% of the time", all the Bayesian inference and predictions will be the same if you do inference over priors over priors or just the combined prior. Of course, you can repeat this procedure to collapse 3 or 4 or arbitrarily many levels of priors. In fact, you can collapse infinitely many levels of priors (although obviously the calculation gets more and more difficult with more and more levels). In fact, this is approach is regularly done in non-toy models when you know that you want to model your process as a simple distribution (say a Gaussian) but don't know the parameters of the distribution (say the variance). Instead you might have some prior (or prior of priors all the way down, as long as they are all computable). If you want a real example, then the first sentence of this paragraph is actually a description of what is done in financial modeling of stock returns. In general though, people don't just arbitrarily pick a prior, or infinite heirarchy of priors that they then need to collapse down. When doing the 'best' inference, a modeler determines what they mean by 'best' by defining an error function that they wish their learner to minimize (with respect to the 'real distribution' they are learning, in your case: how often the coin comes up heads).This error function can then be used to calculate what the best possible prior to start with is. If researcher two ends up admitting to complete uncertainty, this will give him a flat set of priors that ought to be updated very little (he will be able to rule out 100% biased towards heads or tails, contingent on observing either a heads or tails, but not much beyond that). This is just plain wrong. The research will in fact have defined maximum likelihood estimate for the probability of heads (a blog post that guides you through it). In the error minimization scheme we described, it would be equivalent to minimizing the squared error loss function. In general, however, these are not relevant points to understanding how Bayesians models human inference or if they make a good model. The above is just a clarification of technical difficulties with statistics. If you want to look at actual modeling questions, then consider the following Cognitive Sciences StackExchange questions: What are some of the drawbacks to probabilistic models of cognition? How can the success of Bayesian models be reconciled with demonstrations of heuristic and biased reasoning?

By: Jesse Marczyk

Jesse Marczyk — Tue, 05 Mar 2013 05:03:46 +0000

I’ve seen the comic before and understanding what the author of it gets wrong goes a long way.

By: Jim Birch

Jim Birch — Tue, 05 Mar 2013 05:01:45 +0000

But if you’re 100.00000000000% sure that a coin has a 60% bias, you’re an idiot.

http://xkcd.com/1132/

Cheers
Jim