Also, I have to say that it is my pleasure to serve as one of the workflow managers for NIPS2016. I'll try to write my experience on this in another post.

Cheers.

]]>An extrasolar planet, or exoplanet, is basically a planet outside the Solar System. As far as I understand, an ultimate goal is to discover the Earth 2.0, those extrasolar planets that orbit in the habitable zone where it is possible liquid water to exist on the surface. The detection of exoplanets itself is very difficult, let alone the extraction of molecular composition of the planets, because planets are extremely faint compared to their parent stars.

On Friday morning, I also met Ralf Herbrich, who is currently a director of machine learning science at Amazon. We didn't talk much, but I guess I will meet him again at UAI2013.

This basically concludes my trip to NYC. I will be at ICML (Atlanta, Georgia) next week and looking forward to meeting many renowned machine learning people.

]]>One of the possibilities that we discussed about is "causal inference". Causal inference has been one of the main research directions of our department in Tuebingen (see http://webdav.tuebingen.mpg.de/causality/ for people who work in this area and their contributions). I have to admit that this topic is new to me. I have very little knowledge about causal inference, which is why I am quite excited about it.

In a sense, the goal of causal inference is rather different from standard statistical inference. In statistical inference, given random variables X and Y, the goal is to discover *association patterns* between them encoded in the joint distribution P(X,Y). On the other hand, causal inference aims to discover the *casual relations* between X and Y, i.e., either X causes Y or Y causes X or the there is a common cause between X and Y. Since revealing causal relation involves an intervention on one of the variables, it is not trivial how to do so on the non-experimental data. Moreover, there is an issue of identifiability, i.e., several causal models could have generated the same P(X,Y). As a result, certain assumptions about the model are necessary.

After submitting the papers, we hanged out with many people, including Rob Fergus, in the park to celebrate our submissions.

Right. The main topic of this post is about supernova classification. Early this week, Bernhard and I had a quick meeting with astronomy people from CCPP (Center for Cosmology and Particle Physics). They are working on the problem of supernova classification (identifying the type of supernova from their spectra), and are interested in applying machine learning techniques to solve this problem. Briefly, the main challenge of this problem is the fact that the supernova itself change over time. That is, it can belong to different type depending on when it is observed. . Another challenge of this problem is that we have a small dataset, usually in the order of hundred.

According to wikipedia, a supernova is an energetic explosion of a star. The explosion can be triggered either by the reignition of nuclear fusion in a degenerate star or by the collapse of the core of a massive star, Either way, a massive amount of energy is generated. Interestingly, the expanding shock waves of supernova explosions can trigger the formation of new stars.

Supernovae are important in cosmology because maximum intensities of their explosions could be used as "standard candles". Briefly, it helps astronomers indicate the astronomical distances.

One of the previous works used the correlation between the objects' spectra and set of templates to identify their type. I will read the paper on the weekend and see if we can build something better than just simple correlation.

]]>Hi readers, I am now in New York, a city that never sleeps and one of the cities where many great-mind in science reside (many great machine learners also live here).

It is good to be here. I will take this opportunity to interact with people who are working in different fields, such as astrophysics, particle physics, computer vision, etc, and hopefully learn something new.

The primary goal of this trip is to visit my advisor, Prof. Bernhard Scholkopf, who is visiting NYU for three months and to finish our nips paper. Also, another goal is to continue a collaboration with David Hogg and Jo Bovy on quasar target selection and see if we can continue our collaboration in another direction.

I started on Monday when Dustin Lang from CMU also visited David and Bernhard for three days to work on something about image denoising. I am very impressed by how much job they could get done in three days. Bernhard also told me about the idea of inferring the CCD sensitivity from image patches, which I find very interesting. Dustin also took us to the company where one of his friends works called Etsy. It's website company that sells hand-made stuffs. We had a quick tour inside the company and the office is quite relaxing.

While everyone was busy, I tried my best to finish the first draft for our nips paper. It's now in its final shape.

On Friday, we hanged out with Will Freeman from MIT. I met Will at the Astroimaging workshop in Switzerland. We spent the whole morning together with Bernhard, David, Rob, Ross, etc, discussing random stuffs. Will then gave a talk in the afternoon about his work on image/motion amplification. It's very cool stuffs.

I am now looking forward to another exciting week.

]]>This year, I have been involved in two ICML papers, both of which are in the area of kernel methods and transfer learning. The first paper is

Domain Generalization via Invariant Feature Representation

K. Muandet (MPI-IS), D. Balduzzi (ETH Zurich), and B. Schoelkopf (MPI-IS)

As opposed to domain adaptation, where one usually assume that the data from the target domain is available during training, domain generalization solves the problem without that assumption by collecting information from several source domains and, given the data from the target domain, infer the target domain during the test time. The paper is already available online (see the link above).

The second paper is

Domain Adaptation under Target and Conditional Shift

K. Zhang (MPI-IS), B. Scoelkopf (MPI-IS), K. Muandet (MPI-IS), and Z. Wang (MPI-IS)

The work investigates the domain adaptation problem when the conditional distribution also changes, as opposed to previous setting where only the marginal can change. We make use of the knowledge from causality to solve this problem. The paper will be available soon.

]]>

When I think of overfitting, it is unavoidable to refer to "generalisation". In fact, as we can see above, we give the definition of overfitting based on the generalization ability of the concept. The notion of overfitting is also closely related to the notion of ill-posedness in the inverse problem. The is in fact the motivation of the regularisation problem we often encounter in the machine learning.

In the past few weeks, I have to deal with estimation problem. In principle, it is different from regression or classification problem(and regularization problem as well). However, there seems to be a connection between estimation problem and regularization problem, that still puzzle me. In estimation theory, the problem is mostly unsupervised, so it is not clear how to define "overfitting" in this case. Can we look at overfitting based on something beyond generalisation?

So the question I would like to ask in this post is "what else can we see/consider as overfitting?" If you have good examples, please feel free to leave comments.

]]>Apart from work, I also had some trips to Tachikawa and Tokyo downtown, despite the fact that the weather was not on my side. They includes a trip to Shinjuku (walking around in the area and enjoying the Japanese lifestyle), Asakusa, and the Tokyo Skytree Tower. The weather is getting better this week so I hope I will have a wonderful trip this weekend.

Well, this post is not really about machine learning, I will keep posting about what I learn while I am here.

]]>In short, the story goes like this:

Stefan believed that he could somehow solve the problem by directly formulating the objective function and optimizing it. The message here, as I understood, was to avoid any prior knowledge. I believe there is more to his point of view on this problem, but for the sake of brevity, I will skip it as it is not the main topic we are going to discuss about.

On the other hand, Christian and Rob had a slightly different point of view. They believed one should incorporate a "*prior information*". They pointed out that the prior for modelling the astronomical images is a key. Similarly, there is more to the story, but I will skip it.

As an observer, I agreed with all three of them. Using only Stefan's objective function, I think he could find a reasonably good solution. Similarly, Christian and Rob might be able to find a better solution with a "reasonably right" prior. **The question is which approach should I use?**

This question essentially arises before you actually solve the problem. Christian and Rob may have a *good prior *which can possibly helps obtain better solutions than Stefan's approach. But as a observer, who does not know anything about the prior, it seems that I need to deal with another source of uncertainty: **Is the prior actually a good one?** The aforementioned statement may not hold anymore if one has a bad prior.

In summary, I would like to know the answer to the following questions:

- Does incorporating prior actually cause more uncertainty about the problem we are trying to solve?
- If so, is it then harder to solve a problem with a prior as opposed to without one?
- Statistically speaking, how do most statisticians deal with this uncertainty?

Feel free to leave comments if you have one. Thanks.

]]>The highlight of the day seemed to be the discussion on the benefits of the kernel trick. David Hogg (NYU), who organized this talk for me, pointed out many interesting insights of kernel and how one can apply this technique in astronomy. In fact, he seemed to be very excited about the idea of kernel trick. David also pointed out that the distance metric we used in SMM for quasar target selection looks like the Chi-square, which is nice because this is naturally the case when comparing two Gaussian distributions. Moreover, Jonathan Goodman (NYU), who is a mathematician, also gave some insights about kernel function and Mercer's theorem. He was also curious about the different between SMM on distributions and SVM on infinitely many samples drawn from distributions, which was one of the most fundamental questions we addressed in our NIPS2012 paper.

At the end of the talk, I explained very briefly how one can use SMM for quasar target selection. The quasar target selection is essentially a classification problem in which one is interested in detecting a quasar, which looks very much like a star.

]]>