# Generalized Kernel Trick

If you are machine learner and are working on something related to kernel methods, I am sure most of you are familiar with the so-called kernel trick, which is very fundamentally important for most kernel-based learning machines. The equation below gives a formal definition of the kernel trick:

That is, the inner product between the feature map $\phi(x)$ and $\phi(y)$ can be written in term of some positive semidefinite function $k$. This allows one to replace the inner product with the kernel evaluation, and thereby does not need to compute $\phi(x)$ explicitly. Similar to the standard kernel trick, the generalized version can be written as

where $\mathcal{T}$ is an operator in $\mathcal{L}(\mathcal{H})$. Note that the generalized kernel trick reduces to the standard kernel trick when $\mathcal{T}=\mathcal{I}$ where $\mathcal{I}$ is the identity operator. Kadri et al. (2012) showed that this trick holds for any implicit mapping $\phi$ of a Mercer kernel given for self-adjoint operator $\mathcal{T}$. This is trick particularly useful when deriving the learning algorithm for structured output learning.