Reproducing-Kernel-Hilbert-Space

9 minute read

Published:

This blog is created when I learn the Reproducing Kernel Hilbert Space.

1.Hilbert space

Definition. A norm induced by inner product is called canonical norm. $|f| = \sqrt{\langle f, f \rangle}$

Equivalent definition. (Canonical norm)

  • The parallelogram law is a necessary and sufficient condition for a norm to be defined by an inner product.

    $|x + y|^2 + |x - y|^2 = 2(|x|^2 + |y|^2)$ for all $x, y \in V$

  • The Ptolemy Inequality is the necessary and sufficient condition for a semi-norm to be defined by an inner product.

    $|x-y||z|+|y-z||x| \geq |x-z||y|$ for every $x,y,z \in V$

Definition. A Hilber Space is an inner product space that is complete and separable with respect to the norm defined by the inner product.

$\color{blue}{Example}:$

  • Continuous functions space $C[a,b]$, \(\|f\|_{\infty} = \max_{x\in[a,b]} |f(x)|\) . It is complete under the norm (Banach spaces), but the norm is not induced by the inner product (NOT Hilbert space).

    Proof: To show that no inner product induces this norm, it is sufficient to provide a counterexample to the Parallelogram law, which is a necessary and sufficient condition for a norm to come from an inner product. The Parallelogram law is:

    \[\|x + y\|^2 + \|x - y\|^2 = 2(\|x\|^2 + \|y\|^2)\]

    Let’s choose two functions in C([a, b]). We can choose constant functions for simplicity:

    $f(x) = 1 $ and $g(x) = -1$ for all $x$ in $[a, b]$.

  • Square intergrable functions $L_2[a,b]$: Is a Hilbert spaces but $\color{orange}{NOT}$ a RKHS.

    The square intergrable function might be unbounded: e.g. $f(x) = \frac{1}{x-a} I(x>a)+1I(x=a)$

    The norm is induced by the dot product \(\langle f,g\rangle = \int_a^bf(x)g(x)dx\)

Definition. (Kernel function) $K: X\times X\to \mathbb{R}$ is a kernel function on the set $X$ if:

  1. Symmetric: $K(x,y) =K(y,x)$.
  2. P.S.D: $[K]_{i,j} = K(x_i,x_j)$ , $[K]\geq 0$, the equality holds when $x_i = x_j$.

Definition. (Evaluation functional)

A evaluation functional over the Hilbert space $\mathcal{H}$ is a linear functional $F_t:\mathcal{H}\to \mathbb{R}$ that evaluates each function in the space at the point $t$. $F_t[f] = \langle f, K(x,\cdot)\rangle = f(t)$.

Now we are ready to define the RKHS.

Definition.(RKHS)

$K(.,.)$ is a reproducing kernel of a Hilbert space $\mathcal{H}$ if $f\in\mathcal{H}, f(x) = \langle K(x,.),f\rangle$. A Reproducing Kernel Hilbert Space (RKHS) is a Hilbert space $H$ with a reproducing kernel which span is dense in $H$.

This is equivalent to: the evaluation functionals are bounded for all $f\in \mathcal{H}$, $\vert F_t[f]\vert = \vert f(t)\vert \leq M |f|_{\mathcal{H}}$.

Later we can show such a evaluation functional is unique for each RKHS.

$\color{red}{Remark}: $

  1. Inner product spaces + completeness (Closed under metric induced by inner product) –> Hilbert spaces
  2. Hilbert spaces $\mathcal{H}$ + bounded linear evaluation function $F_t: \mathcal{H} \to \mathbb{R}$ –> RKHS

IonnKorr: Mathematical Spaces & Evolution


2. Main results from RKHS

First of all, lets look at the structure of RKHS $\mathcal{H}$ with a kernel $K$, define the kernel feature map $\Phi: \mathcal{X}\to \mathcal{H}$ as:

\[\begin{align*} \Phi(x) = K(\cdot,x) \end{align*}\]

The spaned vector space of the feature map:

\[\begin{align*} span(\{\Phi(x):x\in \mathcal{X}\}) = \{f(\cdot) = \sum_{i=1}^na_iK(\cdot,x_i):n\in\mathbb{N},x_i\in \mathcal{X},a_i\in \mathbb{R} \} \end{align*}\]

By definition, we know the span is dense in $\mathcal{H}$. And the Reize representation theorem indicates there exists a unique evaluation functional for $\mathcal{H}$

\[\begin{align*} F_t(f) &= \langle K(t,\cdot),f\rangle \\ & = \langle K(t,\cdot), \sum_{i=1}^na_iK(\cdot,x_i)\rangle \\ & = \sum_{i=1}^na_iK(t,x_i) \\ & = f(t) \end{align*}\]

Where the third equality holds because of the reproducing properties.

It’s not hard to find the kernel is positive definite:

Theorem(Moore-Aronszajn)

Suppose $K$ is a symmetric, Postive definite kernel on a set $X$. Then there is a unique Hilbert space of functions on $X$ for which $K$ is a reproducing kernel.

$\color{red}{Remark}:$

This theorem implies once obtaining a Positive definite kernel $K$, then we can construct a corresponding RKHS $\mathcal{H}_K$ from it.

Next we will show a way to characterize a symmetric positive definite Kernel:

Theorem(Merce’s theorem) [a.k.a spectrum decoposition in infinite dimension]

Suppose $K$ is a continuous positive-definite on a compact set $\mathcal{X}$, and the integral operator $T_K:L_2(\mathcal{X})\to L_2(\mathcal{X})$ defined by

\[\begin{align*} (T_Kf)(\cdot) = \int_XK(\cdot,x)f(x)dx \end{align*}\]

is postive-definite, that is , $\forall f\in L_2(\mathcal{X})$,

\[\begin{align*} \left \langle T_Kf,f \right \rangle = \int_XK(u,v)f(u)f(v)dudv >0 \end{align*}\]

Then there is an orrthonormal basis ${\phi_i}$ of $L_2(X)$ consisting of eignefunctions of $T_K$ such that the corresponding sequence of eigenvalues ${\lambda_i}$ are positive. Most importantly, the kernel function has the representation (Similiar to the spectrum decomposition we hace seen in finite case)

\[\begin{align*} K(u,v) = \sum_{i=1}^\infty\lambda_i\phi_i(u)\phi_i(v) \end{align*}\]

The convergence is absolute and uniform, that is $\lim_{n\to\infty}\sup_{u,v}\vert K(u,v)-\sum_{i=1}^\infty\lambda_i\phi_i(u)\phi_i(v)\vert = 0$.

3. Application of RKHS

RKHS were explicitly introduced in learning theory by Girosi (1991). Poggio and Girosi (1989) introduced Tikhonov regularization in learning theory and worked with RKHS only implicitlym because they dealt mainly with hypothesis spaces on unbounded domains.

Algorithm (Tikhonov regularization)

Let $H$ be the RKHS as defined by the p.d. kernel $K(\cdot,\cdot)$ and $\ell$ as loss function, then the Tikhonov regularization is considering the optimization problem restrict to the $H$. \(\begin{align*} f_s^\lambda = \arg\min_{f\in H}\frac{1}{n}\sum^n\ell(f(x_i),y_i)+\lambda||f||_H^2 \end{align*}\)

4. Example of Kernels

Gaussian Kernel:

How to find the Gaussian norm in RKHS, basicly we can show the inner product structure in functional space could be captured by intergration of Fouriers transforms.

Neural Tangent Kernel:


Reference

F. Girosi, M. Jones and T. Poggio, “Regularization Theory and Neural Networks Architectures,” in Neural Computation, vol. 7, no. 2, pp. 219-269, March 1995, doi: 10.1162/neco.1995.7.2.219.

Poggio T, Girosi F. A theory of networks for approximation and learning. Massachusetts INST of TECH Cambridge Artificial Intelligence LAB; 1989 Jul 1.