Monthly Archives: December 2012

Does incorporating prior cause additional uncertainty?

I have recently thought about the question that I had long time ago. This question has arisen during the discussion at the astro-imaging workshop in Switzerland. If I remembered correctly, the discussion went along the two school of thoughts on how to model the astronomical images. The Frequentist school of thought was primarily supported by Stefan Harmeling. Christian Schule and Rob Fergus, on the other hand, represented the Bayesian school of thought. Other people who were at the discussion included Bernhard Schölkopf, David Hogg, Dillip Khrisnan, Michael Hirsch, etc.

In short, the story goes like this:

Stefan believed that he could somehow solve the problem by directly formulating the objective function and optimizing it. The message here, as I understood, was to avoid any prior knowledge. I believe there is more to his point of view on this problem, but for the sake of brevity, I will skip it as it is not the main topic we are going to discuss about.

On the other hand, Christian and Rob had a slightly different point of view. They believed one should incorporate a "prior information". They pointed out that the prior for modelling the astronomical images is a key. Similarly, there is more to the story, but I will skip it.

As an observer, I agreed with all three of them. Using only Stefan's objective function, I think he could find a reasonably good solution. Similarly, Christian and Rob might be able to find a better solution with a "reasonably right" prior. The question is which approach should I use?

This question essentially arises before you actually solve the problem. Christian and Rob may have a good prior which can possibly helps obtain better solutions than Stefan's approach. But as a observer, who does not know anything about the prior, it seems that I need to deal with another source of uncertainty: Is the prior actually a good one? The aforementioned statement may not hold anymore if one has a bad prior.

In summary, I would like to know the answer to the following questions:

  1. Does incorporating prior actually cause more uncertainty about the problem we are trying to solve?
  2. If so, is it then harder to solve a problem with a prior as opposed to without one?
  3. Statistically speaking, how do most statisticians deal with this uncertainty?

Feel free to leave comments if you have one. Thanks.

SVM, SMM, and the kernel trick

Today I gave a talk (and led an informal discussion) on the fundamental concept of support vector machine, support measure machine, and the kernel trick at the Center for Cosmology and Particle Physics (CCPP), NYU. Most of the audiences are astronomers who know very little about SVM and kernel methods, but they eventually seemed to understand the concept very quickly. I am quite impressed.

The highlight of the day seemed to be the discussion on the benefits of the kernel trick. David Hogg (NYU), who organized this talk for me, pointed out many interesting insights of kernel and how one can apply this technique in astronomy. In fact, he seemed to be very excited about the idea of kernel trick. David also pointed out that the distance metric we used in SMM for quasar target selection looks like the Chi-square, which is nice because this is naturally the case when comparing two Gaussian distributions. Moreover, Jonathan Goodman (NYU), who is a mathematician, also gave some insights about kernel function and Mercer's theorem. He was also curious about the different between SMM on distributions and SVM on infinitely many samples drawn from distributions, which was one of the most fundamental questions we addressed in our NIPS2012 paper.

At the end of the talk, I explained very briefly how one can use SMM for quasar target selection. The quasar target selection is essentially a classification problem in which one is interested in detecting a quasar, which looks very much like a star.