I'm working on a problem that I've cast as an HMM, except that unlike the "traditional" case where the transition probabilities $a(i,j) = p(s_i = j \,|\, s_{i-1}=i)$, emission probabilities $b(j,o) = p(o_i | s_i = j)$ and initial probabilities $\pi(j) = p(s_0 = j)$ are all just independent numbers, in my case they are (relatively complicated) functions of a set of common parameters. I'm reasonably sure that this means I can't rely on Baum-Welch to find the optimal parameters (because Baum-Welch assumes that the three sets of probabilities can be optimized independently), and I've tried my hand at directly optimizing the likelihood by just computing the likelihood using the forward algorithm. However, this has proved slow (my sequence is extremely long) and somewhat unreliable (the likelihood surface is a bit weird looking). A friend suggested that both of those problems could be solved if I used an EM algorithm (using forward-backward probabilities in the E-step and numerically optimizing in the M-step).

However, I'm having a hell of a time figuring out exactly what the Q function should be. I believe it is
$$
Q(\theta\,|\, \theta^{t}) = \sum_j\log(\pi(j))p(s_0 = j \,|\, x, \theta^{t}) \\+ \sum_t\left[\sum_{i,j} \log(a(i,j))p(s_t = j, s_{t-1} = i \,|\, x, \theta^{t}) \right]\\+\sum_t\left[\sum_j\log(b(j,o))p(s_t = j \,|\, x, \theta^{t}) \right].
$$
However, it's not clear to me that this will make life any easier, as it still looks like it's $O(nm^2)$, where $n$ is the length of the sequence and $m$ is the number of hidden states, time to evaluate. And that's in addition to forward-backward run to calculate all the posterior probabilities necessary during the E step.

So I feel like I must be doing something wrong or else missing the point of using the EM algorithm rather than directly optimizing the likelihood in this problem.