Motivation
The direct motivation of RKHS is to extend the following concepts and properties in $\mathbb{R}^n$
- the concept of evaluation functional $L_i$ at point $i\in[n]$ in $\mathbb{R}^n$ $$ L_{i}:x\mapsto \left\langle \delta_i, x\right\rangle=x_i, \forall x\in\mathbb{R}^n, $$ where $\left\langle \cdot, \cdot\right\rangle$ is linear functional notation and $\delta_i(\cdot)$ is Dirac function with mass at $i$,
- the eigenvector $e_i:=(0,0,\cdots,1,\cdots,0)$ in $\mathbb{R}^n$ such that $$ \left\langle \delta_i, x\right\rangle=\left\langle e_i, x\right\rangle_{\mathbb{R}^n}, \forall x\in\mathbb{R}^n, $$ and more generally there exists $k_i$ in Hibert space with inner product $\left\langle \cdot, \cdot\right\rangle_H$ such that $$ \left\langle \delta_i, x\right\rangle=\left\langle k_i, x\right\rangle_{H}, \forall x\in\mathbb{R}^n, $$
- the property of expressing inner product $\left\langle x, y\right\rangle_H$ in $\mathbb{R}^n$ as a production $$ x^{\textsf{T}}Hy, \forall x,y\in\mathbb{R}^n, $$
- the property that there exists an orthonormal basis $\{e_1,e_2,\cdots,e_n\}$ bases in $ \mathbb{R}^n $ such that the above positive definite matrix can be denoted as a diagonal matrix $$ U=(e_1,e_2,\cdots,e_n), U^{\textsf{T}}HU=\text{diag}(\lambda_1,\lambda_2,\cdots,\lambda_n)=\sum_{i=1}^n\lambda_ie_ie_i^\textsf{T}, $$ to corresponding ones in infinite dimensional linear space $\mathcal{B}$ like followings, and we hope:
- an evalutation functional $L_x$ at point $x\in\mathcal{X}$ in infinite dimensional linear space $\mathcal{B}(\mathcal{X})$ of functions on set $\mathcal{X}$ (usually card$(\mathcal{X})\geq\aleph_0$) can be defined as $$ L_{x}:f\mapsto \left\langle \delta_x, f\right\rangle=f(x), \forall x\in\mathcal{X}, f\in\mathcal{B}, $$
- the eigenvector $\delta_x(\cdot)$ in $\mathbb{L}^2(\mathcal{X},\mu)$ such that $$ \left\langle \delta_x, f\right\rangle=\left\langle \delta_x, f\right\rangle_{\mathbb{L}^2}, \forall f\in\mathbb{L}^2(\mathcal{X},\mu), $$ where $\mu$ is a measure, and more generally there exists $K_x(\cdot)\in\mathcal{H}$ can be defined as $$ \left\langle \delta_x, f\right\rangle=\left\langle K_x, f\right\rangle_{\mathcal{H}}, \forall f\in\mathcal{H}, $$
- the inner product in Hilbert space $\mathcal{H}$ can be defined as a binary integral form $$ \left\langle f, g\right\rangle_\mathcal{H}=\int_{\mathcal{X}\times\mathcal{X}} f(x)h(x,y)g(y)dxdy, \forall x,y \in\mathcal{X}, f,g \in\mathcal{H}, $$
- there exists a set of orthonormal basis $\phi(\cdot,z)$ such that the inner product in a Hilbert space can be expressed in a “diagonal matrix” form, $$ \begin{align*} \phi(z,w)^{\textsf{T}}&=\phi(w,z),\int_{\mathcal{X}\times\mathcal{X}} \phi^{\textsf{T}}(z,w)h(w,z)\phi(z,y)dwdz\\ &=\int_{\mathcal{X}\times\mathcal{X}} \sigma(z)\phi(x,z)\phi(y,z)dz\\ &=\sigma(y)\delta(x-y). \end{align*} $$
Conditions, Problems, and Solutions
-
It is obviously that $L_x$ exists. Due to the motivation involves inner product, $\mathcal{B}$ must be a Hilbert space $\mathcal{B}:=\mathcal{H}$.
-
To express the evaluation functional $L_x$ in the form of an inner product, we need to require that the linear functional is continuous (i.e. $sup_{f\in\mathcal{H}}\frac{|L_x(f)|}{\|f\|_\mathcal{H}}<\infty, \forall x\in\mathcal{X}$), since the inner product must be continuous (i.e.$\left\langle f, g\right\rangle_\mathcal{H}\leq{\|f\|_\mathcal{H}}{\|g\|_\mathcal{H}}$). Then, the Riesz representation theorem implies that there exists a unique elements $K_x\in\mathcal{H}$ called reproducing kernel such that $\left\langle K_x, f\right\rangle_{\mathcal{H}}=f(x),\forall f\in \mathcal{H}$. Therefore, $\left\langle K_x, K_y\right\rangle_{\mathcal{H}}=K_y(x):=K(y,x)$. Due to the sysmetry of inner product, $K(x,y)=K(y,x)$. In addition,
$$ \mathcal{H}_0:=\{f|f\in\overline{\text{span}}\{K_x(\cdot)|x\in{\mathcal{X}}\},\left\langle f, f\right\rangle_{\mathcal{H}}<\infty\}=\mathcal{H}. $$ -
Considering Due to the motivation involves integral on $\mathcal{X}$, we must introduce non-negative measure $\mu$ (the sign is put into coefficient $f_K$) and define
$$f:=\int f_K(x)K(x,\cdot)\mu(dx)=\sum_{i=1}^{\infty}f_K(i)K(x_i,\cdot)<\infty.$$where $\{K(x,\cdot)|x\in\mathcal{X}, s.t. K(x,\cdot)\text{s} \; \text{form a basis with}\;x\;\text{as less as possible}\}=E$ is a basis set, since Hilbert space is complete (any point in it can be approximated by finite linear combination of basis). If $\mathcal{H}$ is separable (e.g. $\mathbb{L}^2(\mathbb{R})$ and $\mathbb{L}^2([0,1])$ with orthonormal basis Hermite polynomials $H_n(x)$ times $e^{-x^2/2}$ and Legendre polynomials), $\text{card}(E)\leq\aleph_0$. If $\text{card}(E)>\aleph_0$, we can choose a countable subset $E_0\subset E$ to define $f$.
$$ \begin{align*} \|f\|^2_\mathcal{H}&=\left\langle\int f_K(x)K(x,\cdot)\mu(dx),\int f_K(y)K(y,\cdot)\mu(dy)\right\rangle_{\mathcal{H}}\\ &=\int f_K(x)K(x,y)f_K(y)\mu(dx)\mu(dy)\\ &=\int f(x)f_K(x)\mu(dx)<\infty. \end{align*} $$
Of course, we haveThe inner product is
$$ \begin{align*} \left\langle f, g\right\rangle_\mathcal{H}&=\left\langle\int f_K(x)K(x,\cdot)\mu(dx),\int g_K(y)K(y,\cdot)\mu(dy)\right\rangle_{\mathcal{H}}\\ &=\int f_K(x)K(x,y)g_K(y)\mu(dx)\mu(dy)<\infty. \end{align*} $$For any generaly linearly independent basis set $E$ of $\mathcal{H}$, and measure $\mu_E$ on $E$, it behaves like in finite dimensional case
$$ \left\langle f, g\right\rangle_\mathcal{H}=\int_E f_E(e_x)H_E(e_x,e_y)g_E(e_y)\mu_E(de_x)\mu_E(de_y), $$where $H_E$ is the representation of $\left\langle \cdot, \cdot \right\rangle_\mathcal{H}$ and $f_E$ is the representation (coefficient) of $f$ under $E$ ($H_E(e_x,e_y):=\left\langle e_x, e_y\right\rangle_\mathcal{H}, e_x(\cdot), e_y(\cdot)\in E, f=\int_E f_E(e_x)e_x(\cdot)\mu_E(de_x)$). If we can identify $(E,\mu_E)$ with $(\mathcal{x},\mu) $, we have
$$ \left\langle f, g\right\rangle_\mathcal{H}=\int_\mathcal{X} f_E(x)H_E(x,y)g_E(y)\mu(dx)\mu(dy). $$ -
For general cases, the measure $\mu_E$ or $\mu$ is count measure. Using Zorn’s Lemma, we know there exist a orthonormal basis $E$ of $\mathcal{H}$, under which
$$ \left\langle f, g\right\rangle_\mathcal{H}=\int_E f_E(e_x)g_E(e_x)\mu_E(de_x). $$ -
Fortunately, card($\{K_x(\cdot)\})=\aleph_0$ usually holds, but what is the condition required. Of course, $\mathcal{H}$ is separable is a sufficient condition, but not pragmatic. In fact, if we regard it as a non-negative definite matrix $K(x,y)$ we hope we can obtain \begin{equation*} K(x,y)=\sum_{i=1}^{\infty}\sigma_i\phi_i(x)\phi_i(y). \end{equation*} Luckily, the spectral theory guarantees the above equation holds, if $K$ is a compact operator. According to the requirement of spectral theory, one of following conditions should be satisfied:
- we let $\mu$ is $\sigma$-finite measure, $K$ is squared-integrable on $\mu\otimes\mu$, and then we find $T_K:\mathbb{L}^{2}(\mathcal{X})\to \mathbb{L}^{2}(\mathcal{X})$, $$[T_{K}f](\cdot)=\int _{\mathcal{X}}K(\cdot,x)f(x)\,d\mu (x)$$ is a Hilbert–Schmidt operator (compact operator).
- Mercer’s theorem. we let $K$ is continuous, $\mathcal{X}$ is compact, $\mu$ is positive finite Borel measure, and then we get $T_K:\mathbb{L}^{2}(\mathcal{X})\to \mathbb{L}^{2}(\mathcal{X})$ as $$[T_{K}f](\cdot)=\int _{\mathcal{X}}K(\cdot,x)f(x)\,d\mu (x)$$ is compact operator.
In both cases, $K(x, \cdot)\in \mathbb{L}^{2}(\mathcal{X},\mu),\;\forall x\in\mathcal{X}$. $K(x,y)$ can be regard as inner product ‘martix’ in both $\mathbb{L}^{2}(\mathcal{X},\mu)$ space $(\left\langle e_x, e_y\right\rangle_{\mathbb{L}^{2}})$ and $\mathcal{H}$ space $(\left\langle K_x, K_y\right\rangle_\mathcal{H})$. From $\mathbb{L}^{2}(\mathcal{X},\mu)$ space perspective,
$$ \begin{align} K(x,y)&=\sum_{i=1}^{\infty}\sigma_i\phi_i(x)\phi_i(y).\label{k} \end{align} $$From $\mathcal{H}$ space perspective, under basis
$$ h_i(\cdot)=\int K(\cdot,x)\phi_i(x)\mu(dx)=\sigma_i\phi_i(\cdot), $$inner product $\left\langle \cdot, \cdot\right\rangle_\mathcal{H}$ can be written as diag($\sigma_1,\sigma_2,\cdots$), hence
$$ \left\langle h_i, h_j\right\rangle_\mathcal{H}=\sigma_i\sigma_j\left\langle \phi_i, \phi_j\right\rangle_\mathcal{H}=\sigma_i\delta_{ij}=\sigma_i\left\langle \phi_i, \phi_j\right\rangle_{\mathbb{L}^{2}(\mathcal{X})} , $$For case one, we have $\sum_i \sigma_i^2<\infty$ ($\sigma_i\to 0$), therefore Im$(T_{K})\subset\mathcal{H}\subset\mathbb{L}^{2}(\mathcal{X})$. Conditions in case two is strictly stronger than that in one, it comes to the same conclusion as in case one. The difference is the decomposition of $K$ \eqref{k} in case one converges in $\mathbb{L}^{2}$ norm rather than uniformly in case two.
Remark: Conditions on reproducing kernel $K(\cdot, \cdot)$ (integrable, continuous) embed $\mathcal{H}$ into a countable dimensional subspace of $\mathbb{R}^\mathcal{X}$ without considering the topology on these two space. For any $h\in\mathcal{H}$, we can not evaluate $h(x)$ at $x$ randowly, as values at different points fulfill some constraints inherted from reproducing kernel $K$. The operator $T_K$ can be seen as a infinite dimensional version SVD decomposition. Due to bounded $\sigma_i$, $T_K$ compresses and projects a point in $\mathbb{L}^2$ to $\mathcal{H}$. Therefore,
$$ \left\langle f, g\right\rangle_\mathcal{H}=\left\langle \sum_i\left\langle f, \phi_i\right\rangle_{\mathbb{L}^{2}}\phi_i , \sum_j\left\langle f, \phi_j\right\rangle_{\mathbb{L}^{2}}\phi_j\right\rangle_\mathcal{H}=\sum_i\frac{\left\langle f, \phi_i\right\rangle_{\mathbb{L}^{2}}\left\langle g, \phi_i\right\rangle_{\mathbb{L}^{2}}}{\sigma_i}. $$Besides, considering an embedding map $I:\mathcal{H}\to\mathbb{L}^{2}$, we find the map is continuous (i.e.$\|x-y\|_{\mathbb{L}^{2}}\leq \|I(x)-I(y)\|_\mathcal{H},\; \forall x,y\in \mathcal{H}$). Hence topologies ($\mathcal{T}_{\mathbb{L}^{2}}, \mathcal{T}_\mathcal{H}$) determinded by their metric have relationship $\mathcal{T}_{\mathbb{L}^{2}}\subset\mathcal{T}_\mathcal{H}$.
-
All the conclusion can be applied to complex cases. Some values need to be written as their conjugates.
Examples
- Gaussian
- linear
- polynomial
Citation
Cited as:
https://zhaotong94.github.io/blog/rkhs/.
Or
@article{zhao2022reproducing,
title = "Reproducing Kernel Hilbert Space",
author = "Zhao, Tong",
journal = "zhaotong94.github.io",
year = "2022",
month = "May",
url = "https://zhaotong94.github.io/blog/rkhs/"}
...