We address the problem of learning discrete hidden Markov models from very long sequences of observations. Incremental versions of the Baum-Welch algorithm that approximate the &beta;-values used in the backward procedure are commonly used for this problem, since their memory complexity is independent of the sequence length. We introduce an improved incremental Baum-Welch algorithm with a new backward procedure that pproximates the &beta;-values based on a one-step lookahead in the training sequence. We justify the new approach analytically, and report empirical results that show it converges faster than previous incremental algorithms.