My research interest lies in the area of machine learning, its real-world applications, and ultimately how it can improve our understanding of intelligent systems.

One of the most unique and exciting aspects of machine learning is its multidisciplinary nature which brings together researchers from different backgrounds and often creates diverse areas of research. To cope with such a multifarious nature, my research methodology is to approach a problem from the theoretical point of view and develop applicable algorithms by incorporating the relevant domain knowledge. The research areas of interest include statistical learning theory, reinforcement learning, semi-supervised learning, kernel methods, Bayesian inference, and Bayesian nonparametric. By utilizing the aforementioned machine learning techniques, the main goal of my research is to understand the phenomena in complex and large-scale data including, but not limited to, an analysis of networked data, computational biology, bioinformatics, computer vision, natural language processing, and climate informatics. Briefly, my research interest lies in the area of theoretical machine learning and its applications in a variety of fields.

Examples of research topics that I am interested in include:

Statistical Learning Theory

It is basically the theory that governs problems of estimating the functional dependency from a set of empirical data, which is quite broad and closely related to classical problems in statistics including discriminant analysis and the density estimation problem. More specifically, we are given N pairs of observations {(x_1,y_1),...,(x_N,y_N)} called training set, which are independently and identically distributed (i.i.d.) according to some unknown but fixed probability distribution function F(x,y). The goal of the learning machine is to construct an appropriate approximation to the functional dependency f:X->Y using only data in the training set, where x_i in X are the input to function and y_i in Y are the output value of f for the corresponding data.

back to top

Kernel Methods

Many problem settings in machine learning lack a direct interaction with the environment and its simulation is usually to complex to implement. Instead, it may be easier to collect empirical data from which the desire models can be learnt. Most algorithms that learn from empirical data generally fall into two broad categories namely supervised and unsupervised learning. Many researches in this framework have been devoted to solve classification, regression, and dimensionality reduction problems.

In difficult classification and regression problems, the real world data often require non-linear methods to achieve a successful application. Instead of constructing the non-linear model explicitly, the kernel method allows implicit mapping of data from the input space to the kernel-induced space. A linear model in this new space could correspond to non-linear model in the input space. Consequently, a powerful non-linear model can be built efficiently using kernel methods. Since a kernel constitutes the prior knowledge about a task, the choice of kernel functions plays a crucial role in the success of the algorithms.

In addition to the successful application in SVM, the kernel methods have been shown to be very fruitful in regularization framework. One of the most powerful results is the Representer theorem. Roughly speaking, it states that the solutions to certain regularization problems have a finite representation. This result helps reduce the complexity of learning algorithms considerably as one can efficiently find the optimal model in high-dimensional feature space through much simpler representation of the solutions. However, the drawback of this approach is that strong constraints have to be met, e.g., the smoothness penalty to monotonic function of the RKHS norms, for its solution to admit the finite representation. As a result, this precludes the models based on l1 -penalty such as LASSO. Since the regularization has been shown to have a strong connection with Bayesian analysis, it would be interesting to approach the kernel methods from Bayesian perspective. More specifically, the regularizer, for example, acts as a constraint for functions contained in the reproducing kernel Hilbert space (RKHS) induced by the kernel. This constraint is in fact the prior knowledge incorporated into the process of estimation. The Bayesian approach would possibly allow the relaxation of these constraints and enable estimates of the confidence and reliability without additional methods such as bootstrapping or cross-validation.

back to top

Semi-supervised Learning

Learning algorithms in supervised learning have been used successfully in many applications such as hand-written recognition, object detection, and speech recognition. However, most of these techniques suffer from the poverty of labelled examples. Labelled examples in supervised learning are often difficult to obtain, as opposed to unlabelled examples that are easier to collect. Thus far, attempts have been made to exploit unlabelled data in addition to labelled data. Semi-supervised learning (SSL) is one of the machine learning techniques that copes with this problem. With few labelled examples, the goal of SSL is to create as accurate as or even better classifiers than what we would obtain from traditional supervised learning.

The exploitation of large number of unlabelled examples in conjunction with few labelled examples will become increasingly important for many learning algorithms as illustrated in SSL. Most successful applications of SSL, however, often rely on strong assumptions. For example, the optimal classifier is assumed to go through a low density region of the data space. In general, these assumptions assert that there are some non-trivial relationships between the distributions of labelled and unlabelled data. However, they do not always guarantee the superior performance of the classifiers. It is sometimes the case that when assumptions are made but do not hold, it can degrade the performance and can be worse than supervised learning. Moreover, the advantage of unlabelled data in SSL is still unclear, so there should be more theoretical analysis regarding this issue under more general problem setting.

In conclusion, learning problems are becoming more complex and require increasingly sophisticate algorithms and huge amount of data. Even though such data are abundant and cheap to obtain, their conversion into valid training data for a specific task requires human intervention which could be expensive and time consuming. Therefore, semi-supervised learning will certainly become one of the most significant learning algorithms in the future.

back to top

Bayesian Nonparametrics

When the supervision is not available, the goal of machine learning is instead to discover structures underlying the data. Many tasks in unsupervised learning are closely related to several problems in statistics where the probabilistic distributions are used to model uncertainty about the world and objects of interest. These probabilities are attributed to encode the subjective levels of belief. Bayesian inference, as opposed to frequentist inference, allows one to update the levels of belief by combining the prior knowledge and observation evidence. This process of updating the probabilities reflects many situations in our everyday life. Moreover, there is also a growing connection between Bayesian methods and Markov chain Monte Carlo (MCMC) simulations since complex models are often intractable.

In Bayesian parametric models, the use of probability distributions restricts the class of objects that can be modeled to the finite-dimensional objects. The framework of Bayesian nonparametric models, on the other hand, provides the flexible probabilistic representations that allow us to work with more general objects such as lists, trees, and partitions. To achieve this flexibility, the prior and posterior are no longer restricted to the probability distributions, but are general stochastic processes. Moving from simple probability distributions to stochastic processes provides the possibility to manipulate the objects in infinite-dimensional space. The Gaussian process (GP), Chinese restaurant process (CRP), and Indian buffet process (IBP) are well-known Bayesian nonparametric models that have been used in many successful applications.

back to top

To conclude, my research interest lies in general machine learning and the problems I have discussed so far. In my opinion, it is impossible for anyone to master machine learning as its core knowledge is an overlap of several disciplines and its applications range across a wide variety of fields. Although recent machine learning research is producing impressive advances, there remains a great deal of work to be done. As a result, I am always open to any challenging problems other than what I have mentioned that will facilitate in understanding the real-world phenomena.

If you are interested, feel free to contact me as exciting collaborations are always welcome.