So basically, the orthogonalization process and the normalization are pretty
straightforward and simple to implement, same for the convergence.
Unfortunately, I'm having hard time trying to figure out the first steps:

How am I supposed to compute the covariance $\Sigma_x$ given that $x$ is one
of the feature vector of the input? (BTW, is the described algorithm actually
only explaining the definition of $\Sigma$ for one single features vector of the
$n$ input vectors?)

I was unable to understand how to apply the definition of the covariance
matrix to that case; is it possible to have a simple example of what it
takes in input, and what gets out?

Now suppose I am able to compute the covariance $\Sigma_x$, what does step 2 and
3 means? $\varphi_p$ is supposed to be one column of the $d \times h$ matrix
$\varphi$, so what does the random initialization of $\varphi_p$ means and
how to apply the transformation $\varphi_p \leftarrow \Sigma_x\varphi_p$ using
the previously computed covariance?

2 Answers
2

If $d$ is the dimension of your feature vectors, the covariance matrix is a (symmetric) $d \times d$ matrix. It is defined by $\Sigma = E[(x - \mu)(x - \mu)^T]$. The subscript "$x$" in $\Sigma_x$ does not denote a single vector, but just says that it is the covariance matrix "of all $x$'s". $E[...]$ is the expected value, which you can just think of as "average" in your case.

To compute it, you first have to find the mean $\mu$ of your feature vectors. The covariance matrix can then be built by iterating over all feature vectors, subtracting the mean from each one, and summing up the respective components of the matrix (effectively, by adding up the matrices $(x_i - \mu)(x_i - \mu)^T$ for each feature vector $x_i$, although you can do this on-the-fly without actually creating these temporary matrices - just write this out explicitly for a low-dimensional case and you will see the pattern). Finally, multiply the entire matrix by $1/n$.

"Random initialization" just means that you can freely choose the elements of the vector, so $(1, 0, ..., 0)$ will probably do.

$\varphi_p \leftarrow \Sigma_x\varphi_p$ is just a matrix/vector multiplication, you multiply $\Sigma_x$ by $\varphi_p$ and write the result back to $\varphi_p$.

Actually, it is not necessary to compute the covariance matrix $Σx$ explicity. Since $Σx = HH'$ and $H=(1/\sqrt{n})[(x_1-u),(x_2-u),...,(x_n-u)]$ (where $n$ is the number of samples and $u$ is the centroid of all $x$), then $Σx*φp$ can be written as $HH'* φp$. Now $v=H'*φp$ can be computed first (a vector multiplied by another vector), which will give a vector $v$. Thereafter, $H*v$ can be computed (which is again a vector multiplication). Thus, the computation of $Σx$ is altogether avoided.