Resumen

The current trend towards integrating software agents in safety–critical systems such as drones, autonomous cars and medical devices, which must operate in uncertain environments, gives rise to the need of on-line detection of an unexpected behavior. In this work, on-line monitoring is carried out by comparing environmental state transitions with prior beliefs descriptive of optimal behavior. The agent policy is computed analytically using linearly solvable Markov decision processes. Active inference using prior beliefs allows a monitor proactively rehearsing on-line future agent actions over a rolling horizon so as to generate expectations to discover surprising behaviors. A Bayesian surprise metric is proposed based on twin Gaussian processes to measure the difference between prior and posterior beliefs about state transitions in the agent environment. Using a sliding window of sampled data, beliefs are updated a posteriori by comparing a sequence of state transitions with the ones predicted using the optimal policy. An artificial pancreas for diabetic patients is used as a representative example.