I have recently thought about the question that I had long time ago. This question has arisen during the discussion at the astro-imaging workshop in Switzerland. If I remembered correctly, the discussion went along the two school of thoughts on how to model the astronomical images. The Frequentist school of thought was primarily supported by Stefan Harmeling. Christian Schule and Rob Fergus, on the other hand, represented the Bayesian school of thought. Other people who were at the discussion included Bernhard Schölkopf, David Hogg, Dillip Khrisnan, Michael Hirsch, etc.

In short, the story goes like this:

Stefan believed that he could somehow solve the problem by directly formulating the objective function and optimizing it. The message here, as I understood, was to avoid any prior knowledge. I believe there is more to his point of view on this problem, but for the sake of brevity, I will skip it as it is not the main topic we are going to discuss about.

On the other hand, Christian and Rob had a slightly different point of view. They believed one should incorporate a "*prior information*". They pointed out that the prior for modelling the astronomical images is a key. Similarly, there is more to the story, but I will skip it.

As an observer, I agreed with all three of them. Using only Stefan's objective function, I think he could find a reasonably good solution. Similarly, Christian and Rob might be able to find a better solution with a "reasonably right" prior. **The question is which approach should I use?**

This question essentially arises before you actually solve the problem. Christian and Rob may have a *good prior *which can possibly helps obtain better solutions than Stefan's approach. But as a observer, who does not know anything about the prior, it seems that I need to deal with another source of uncertainty: **Is the prior actually a good one?** The aforementioned statement may not hold anymore if one has a bad prior.

In summary, I would like to know the answer to the following questions:

- Does incorporating prior actually cause more uncertainty about the problem we are trying to solve?
- If so, is it then harder to solve a problem with a prior as opposed to without one?
- Statistically speaking, how do most statisticians deal with this uncertainty?

Feel free to leave comments if you have one. Thanks.

Very interesting question!

Firstly, let'sassume that if your prior is 'true', then you perform well in your Bayesian analysis and you reach your goal of inferring the posterior.

But if you are not certain about your prior, you can use a hyperparamter on it. It will describe your uncertainty on the prior distribution. Then you simply marginalise it out.

That's just off the top of my head, what do you think?

Greetings from London!

Tomek

This is a question that I have also asked myself several times. As Tomek says, one can specify a vague prior or one that is defined through hyper-parameters that you learn from data and marginalise out when doing predictions.

You can interpret the prior as a way to regularise your learning problem. To me it's simply more transparent to encode your assumptions about the problem in the form of probability distributions than in other ways. It makes it very helpful to use a common language to discuss the assumptions that went into a model. This is not always the case when your assumptions are encoded in the way you define your learning algorithm, regularisation, etc.

Hi Tomek and Roger,

Many thanks for your comments. I agree that one can deal with uncertainty by marginalizing out certain hyper-parameters and it works well if you assume that you have a "correct" prior.

Maybe this can also be viewed as a bias-variance tradeoff. Without a prior, your solution would be unbiased, but has high variance. On the other hand, incorporating the prior results in the biased solution, which has smaller variance. I think that is why one should be careful when choosing a "prior" because if the solution is biased toward the predefined prior.

Krik