A constraint-based approach to uniformly combining information from multiple representations and sources of sensory data is described. The approach is important to research in intermediate grouping, knowledge-based model matching, and information fusion. The techniques presented extend the capabilities of an earlier system that applied constraints to attributes of single types of extracted image events called tokens. Relational measures are defined between symbolic tokens so that sets of tokens across representations can he selected and grouped on the basis of constraint functions applied to these relational measures. Since typical low-level representations involve hundreds or thousands of tokens in each representation, even binary relational measures can involve very large numbers of token pairs. Control strategies for ordering and filtering tokens, based upon constraints on token attributes and token relationships, can be formed to reduce the computation involved in producing token aggregations. The system is demonstrated using region and line data and an associated set of relational measures. The approach can be naturally extended to include tokens extracted from motion, stereo, and range data.