Data Science Stack Exchange is a question and answer site for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. It only takes a minute to sign up.

when studying kernel methods a few years ago I got a bit confused with the concepts of feature space, hypothesis space and reproducing kernel Hilbert space. Recently, I thought a little about questions that I asked myself back then (with newly acquired math background) and noticed that some things are still unclear to me. I appreciate help and pointers to good - mathematical - literature.

Let's consider the following learning problem: We are given a training sample $((x_1, y_1), \dots, (x_n, y_n)) \in (\mathcal{X} \times \mathbb{R})^n$ and want to learn a functional relationship between $\mathcal{X}$ and $\mathbb{R}$ from this data. Two popular hypothesis spaces are:
$$\mathcal{H}_{\phi} = \{\langle w, \phi(\cdot)\rangle: w \in \mathcal{F}\},$$ in which $\phi: \mathcal{X} \to \mathcal{F}$ denotes a feature space mapping from the input space to a Hilbert space, and $$\mathcal{H}_{k} = \{\sum_{i=1}^m \alpha_i k(x_i, \cdot): m \in \mathbb{N, x_i \in \mathcal{X}, \alpha_i \in \mathbb{R},\ for\ i \in \{1, \dots, n\}}\},$$
in which $k$ denotes the positive definite kernel $k(x, x') := \langle\phi(x),\phi(x')\rangle$. The question is whether those two hypothesis spaces are equivalent and it bugs me. The Moore-Aronszajn Theorem states that there is a unique correspondence between reproducing kernel Hilbert space and positive definite kernel. When I studied kernel theory I proofed, that $\mathcal{H}_{\phi}$ is a reproducing kernel Hilbert space. Still, I did not manage to proof that the two mentioned hypothesis spaces are the same without requiring that $\phi$ is surjective (which is rarely required in ML-literature).