It is very exciting to see many interesting papers at ICML this year (see http://icml.cc/2013/?page_id=43 for a list of accepted papers). It is also good to see that several papers are co-authored by the AGBS members.
This year, I have been involved in two ICML papers, both of which are in the area of kernel methods and transfer learning. The first paper is
Domain Generalization via Invariant Feature Representation
K. Muandet (MPI-IS), D. Balduzzi (ETH Zurich), and B. Schoelkopf (MPI-IS)
As opposed to domain adaptation, where one usually assume that the data from the target domain is available during training, domain generalization solves the problem without that assumption by collecting information from several source domains and, given the data from the target domain, infer the target domain during the test time. The paper is already available online (see the link above).
The second paper is
Domain Adaptation under Target and Conditional Shift
K. Zhang (MPI-IS), B. Scoelkopf (MPI-IS), K. Muandet (MPI-IS), and Z. Wang (MPI-IS)
The work investigates the domain adaptation problem when the conditional distribution also changes, as opposed to previous setting where only the marginal can change. We make use of the knowledge from causality to solve this problem. The paper will be available soon.
I am currently visiting Prof. Kenji Fukumizu at the Institute of Statistical Mathematics in Tokyo, Japan, where I will be spending most of the time working. Since I arrived last week, Kenji and I have already produced some interesting results on our joint work in kernel mean embedding for distributions. Hopefully, I can keep myself consistently productive in the next few weeks.
Apart from work, I also had some trips to Tachikawa and Tokyo downtown, despite the fact that the weather was not on my side. They includes a trip to Shinjuku (walking around in the area and enjoying the Japanese lifestyle), Asakusa, and the Tokyo Skytree Tower. The weather is getting better this week so I hope I will have a wonderful trip this weekend.
Well, this post is not really about machine learning, I will keep posting about what I learn while I am here.
My current research interest is in developing machine learning techniques for probability distributions. That is, instead of using samples as training data, we have probability distributions as training samples (these probability distributions may be seen as random measures from some unknown distributions). In support measure machines (SMMs), the training samples is where denotes a probability distribution over some input space . In the simplest case, , i.e., classification problem where the input is probability distribution.
Today, I will talk briefly about one of my ideas on learning from probability distributions called "distribution output learning". As its name suggests, we consider the learning problem when the output is "probability distribution". That is, the training sample in this case is where and is the probability distribution defined over some output space . Note that the input space may as well be a space of probability distributions, but to simplify the problem we will focus on .
Why is it useful to have such a framework? Is there any application which supports this idea? These are important questions we need to answer before really putting our effort on constructing the learning algorithm. To give some motivations, consider the following examples.
- Preference prediction -- one may look at the "preference" as a probability distribution (or positive measure) over a set of objects (either discrete or continuous depending on how you represent these objects). I will call it the "preference distribution". If one object is preferred over another, the probability associated with the object will be relatively higher. Therefore, in recommendation system, we can look at a set of products purchased by a customer as draws from the preference distribution. Given the purchase history of several customers, one may want to construct an "algorithm" that can predict the "preference" of the new customer so that the products can be recommended according to the predicted preference. Note in this case that we are predicting the "distribution".
- Multi-class prediction -- multi-class classification problem is very important in machine learning, and there have been many researches in this direction. Generally speaking, the main aim of multi-class classification is to estimate given a measurement . The conditional probability is the distribution over and varies as the measurement changes. Therefore, one can look at this problem as a "distribution output learning" problem.
Although I have a rough idea on how to perform a prediction algorithmically, there are some theoretical questions that I would like to investigate further such as:
- How to define the distribution over the space of probability measures (or measures in general)? This may seems trivial at first glance, but there are some technical issues here that need to be investigated further.
- What characterizes the universal kernels for distribution output learning?
- How does the generalization bound look like?
This framework is of course closely related to the "structured output learning". Many researchers have paid attention to the structured output learning in the past few years and they have proposed many different approaches to tackle this problem. In other words, this will keep me busy for awhile.
Thank for reading,
We have got a paper at NIPS this year.
Learning from Distributions via Support Measure Machines (Spotlight)
K. Muandet, K. Fukumizu, F. Dinuzzo, B. Schoelkopf
Abstract This paper presents a kernel-based discriminative learning framework on probability measures. Rather than relying on large collections of vectorial training examples, our framework learns using a collection of probability distributions that have been constructed to meaningfully represent training data. By representing these probability distributions as mean embeddings in the reproducing kernel Hilbert space (RKHS), we are able to apply many standard kernel-based learning techniques in straightforward fashion. To accomplish this, we construct a generalization of the support vector machine (SVM) called a support measure machine (SMM). Our analyses of SMMs provides several insights into their relationship to traditional SVMs. Based on such insights, we propose a flexible SVM (Flex-SVM) that places different kernel functions on each training example. Experimental results on both synthetic and real-world data demonstrate the effectiveness of our proposed framework.
This is joint work with Kenji Fukumizu (The Institute for Statistical Mathematics, Japan), Francesco Dinuzzo (MPI-IS), and my supervisor Prof. Bernhard Schoelkopf (MPI-IS). Parts of this work were done while Kenji was visiting us in summer 2011.
The arXiv manuscript can be found here. This is not up-to-date version, but will give a basic idea of this work.
I have been working on quasar target selection problem for awhile. Essentially, this is a classification problem where one want to identify the objects in the sky as quasars or stars based on their flux measurement. The problem is easy for the low-redshift range because there is a clear separation between quasars and stella objects, but as for the medium- and high-redshift ranges, quasar target selection becomes more difficult. For z>2.2, objects must be targeted down to g=22 mag, where the photometric measurement uncertainty becomes substantial. Moreover, at z = 2.8, the quasar and stella loci cross in color space.
Despite the challenges of the problem itself, it is very important to me to understand why such a distant object is worth detected at all. So I did some researches and came up with a simple explanation.
Shortly after the Big Bang, the cosmic plasma composed of photons and baryons were excited by the initial perturbation. Initially, the pressure from the cosmic microwave background keeps the photon+baryon plasma from decoupling. This plasma acts like a sound wave that moves outward until the Universe becomes neutral at redshift 1000. As the Universe has cooled enough, the proton captures the electron to form neutral Hydrogen, which also decouple the photons from the baryons. Photons continue to stream away, leading to the dramatic acoustic oscillations seen in cosmic microwave background anisotropy data. The baryons, on the other hand, remain in place and leave the baryon peak stalled at about 150 comoving Mpc. This causes a small excess in number of pairs of galaxies separated by such distance. These features are often referred to as the baryon acoustic oscillations (BAO). BAO determine the rate of growth of cosmic structure with the overall expansion of the universe. The observability of BAO will help cosmologists measure the expansion history of the universe and thereby a probe of cosmic dark energy.
In principle, BAO can also be observed in all forms of cosmic structure including the distribution of intergalactic medium as probed by the Lyman alpha forest (LAF). The LAF can be seen in the spectra of high redshift quasars. To detect BAO in the LAF, one may cross-correlate absorption spectra in widely separate quasar pairs. This has been previously impossible due to lack of sufficient data. Therefore, detection of sufficiently large number of high redshift quasars becomes substantially important.
After working in this direction for awhile, I have a feeling that machine learning in astronomy has not been explored much. There might be some open problems that one can tackle from machine learning point of view. I have also got this inspiration from the talk by David Hogg.
I have joined the Department of Empirical Inference at Max Planck Institute for Intelligent Systems for already one year. I've learnt and experienced a lot during the course of my PhD.
Apart from getting my own research projects done, I have always been enthusiastic about learning new things, broadening and deepening my knowledge. So I've been reading rigorously on many different topics ranging from econometrics to cosmology. Understanding the theoretical aspects of machine learning, I think, is very important, but understanding its role in real-world applications is even more important.
Reading lots of papers, of course, already gives me a big picture of where machine learning is in scientific communities. However, it lacks social context. I would also like to know what other people think about it.
So I have recently set up a journal club called the Empirical Inference Journal Club with a strong hope that it will provide such a platform for students and postdocs in the department to share their knowledge on some particular topics related to empirical inference. In the department, people have actually been organizing the reading groups on different topics, but to my knowledge they had the reading for a short period of time and then stop.
I commit to keeping this journal club running. Of course, I have to do some extra works, but I think it's worthwhile. After three-week of the journal club, things seem to go smoothly. I hope more people will join and contribute to the journal club.
We always have two options: accepting things the way they are or having enough courage to change them.
After many attempts in starting up an academic blog, I have been successful.
Primarily, I will try to write regularly about my ongoing works and ideas that I have during the day. I hope it will be somewhat helpful both to myself and to other people who will be reading my blog.