Abstract

We demonstrate that inference-based goal-directed behavior can be
done by utilizing the temporal gradients in recurrent neural network (RNN). The
RNN learns a dynamic sensorimotor forward model. Once the RNN is trained, it can
be used to execute active-inference-based, goal-directed policy optimization. The
internal neural activities of the trained RNN essentially model the predictive
state of the controlled entity. The implemented optimization process projects the
neural activities into the future via the RNN recurrences following a tentative
sequence of motor commands (encoded in neurons akin to recurrent parametric
biases). This sequence is adapted by back-projecting the error between the
forward-projected hypothetical states and desired (goal-like) system states onto
the motor commands. Few cycles of forward projection and goal-based error
backpropagation yield the sequences of motor commands that control the dynamical
systems. As an example, we show that a trained RNN model can be used to
effectively control a quadrocopter-like system.