1 %------------------------------------------------------------------------------ 2 \subsection{Measuring prediction accuracy}
3 %------------------------------------------------------------------------------ 4 5 A common performance metric for probabilistic approaches is the maximum data
6 likelihood or approximations like the \ac{bic} (see \S 7 \ref{sec:hmm:structure_learning}). However, for our particular application,
8 this metric has the drawback of not having any geometric interpretation.
9 Intuitively, we would like to know \emph{how far} was the predicted state from
10 the real one. Hence, we have preferred to measure the performance of our
11 algorithm in terms of the average error, computed as the expected distance between
12 the prediction for a time horizon $H$ and the effective observation $O_{t+H}$.
13 14 \begin{equation} 15 \langleE \rangle= \sum_{i \in\states} P([S_{t+H}=i]\midO_{1:t}) \lVert 16 O_{t+H} - \mu_i\rVert^{1/2} 17 \label{eq:results:expected_distance} 18 \end{equation}
19 20 \noindent for a single time step. This measure may be generalized for a complete
21 data set containing $K$ observation sequences:
22 23 \begin{equation} 24 \langleE \rangle= 25 \frac{1}{K} 26 \sum_{k = 1}^{K} 27 \frac{1}{T^k-H} 28 \sum_{t = 1}^{T^k - H} 29 \sum_{i \in\states} 30 P([S_{t+H}=i]\midO_{1:t}^k) \lVertO^k_{t+H} - \mu_i\rVert^{1/2} 31 \label{eq:results:expected_distance_general} 32 \end{equation}
33 34 It is worth noting that, as opposed to the standard approach in machine
35 learning of conducting tests using a ``learning'' and a ``testing'' data sets,
36 the experiments we have presented here will use only a single data set. The
37 reason is that, since learning takes place after prediction, there is no need
38 to such separation: every observation sequence is ``unknown'' when prediction
39 takes place.