Generalized Kernel Trick

If you are machine learner and are working on something related to kernel methods, I am sure most of you are familiar with the so-called kernel trick, which is very fundamentally important for most kernel-based learning machines. The equation below gives a formal definition of the kernel trick:

 \langle\phi(x),\phi(y)\rangle_{\mathcal{H}} = k(x,y)

That is, the inner product between the feature map \phi(x) and \phi(y) can be written in term of some positive semidefinite function k. This allows one to replace the inner product with the kernel evaluation, and thereby does not need to compute \phi(x) explicitly. Similar to the standard kernel trick, the generalized version can be written as

 \langle\mathcal{T}\phi(x),\phi(y)\rangle_{\mathcal{H}} = [\mathcal{T}k(x,\cdot)](y)

where \mathcal{T} is an operator in \mathcal{L}(\mathcal{H}). Note that the generalized kernel trick reduces to the standard kernel trick when \mathcal{T}=\mathcal{I} where \mathcal{I} is the identity operator. Kadri et al. (2012) showed that this trick holds for any implicit mapping \phi of a Mercer kernel given for self-adjoint operator \mathcal{T}. This is trick particularly useful when deriving the learning algorithm for structured output learning.


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>