While the most common way of evaluating a computational model is to see whether it shows a good fit with the empirical data, recent literature on theory testing and model selection criticizes the assumption that this is actually strong evidence for the validity of a model. This paper presents a case study from music cognition (modeling the ritardandi in music performance) and compares two families of computational models (kinematic and perceptual) using three different model selection criteria: goodness-of-fit, model simplicity, and the degree of surprise in the predictions. In the light of what counts as strong evidence for a model's validity --namely that it makes limited range, non-smooth, and relatively surprising predictions-- the perception-based model is preferred over the kinematic model.