subtract: average rating of user $i$, then average rating of item $j$.

subtract: first item, then user.

subtract: half of average of item and half of average of user.

Initializing $U$ and $V$.
choice: gives the elements of $UV$ the average of the nonblank elements of $M$.
$\implies$ the element of $U$ and $V$ should be $\sqrt{a/d}$,
where $a$ is the average nonblank element of $M$, $d$ is the lengths of the short sides of $U$ and $V$.

local minima contains global minima:

vary the initial values of $U$ and $V$:
perturb the value $\sqrt{a/d}$ randomly.

vary the way we seek the optimum.

Performing the Optimization.
different optimization path:
choose a permutation of the elements and follow that order for every round.

Gradient Descent $\to$ stochastic gradient descent.

Converging to a Minimum.
track the amount of improvement in the RMSE obtained.

stop condition:

stop when that improvement in one round falls below a threshold.

stop when the maximum improvement during a round is below a threshold.