Affective computing has strong ties with literature and film studies, e.g. text sentiment analysis, affective tagging of movies. In this work we report on recent findings towards identifying highlights in movies on the basis of the synchronization of physiological and behavioral signals of people. The proposed architecture is utilizing dynamic time warping for measuring the distance among the multimodal signals of pairs of spectators. The reported results suggest that this distance can be indicative for the dynamics and existence of aesthetic moments in movies.