<p><b>Abstract</b>—Recognizing multiple object behaviors from nonsegmented image sequences is a difficult problem because most of the motion recognition methods proposed so far share the limitation of the <it>single-object assumption</it>. Based on existing methods, the problem can be solved only by <it>bottom-up image sequence segmentation followed by sequence classification</it>. This straightforward approach totally depends on bottom-up segmentation which is easily affected by occlusions and outliers. This paper presents a completely novel approach for this task without using bottom-up segmentation. Our approach is based on <it>assumption generation and verification</it>, i.e., feasible assumptions about the present behaviors consistent with the input image and behavior models are dynamically generated and verified by finding their supporting evidence in input images. This can be realized by an architecture called the <it>selective attention model</it>, which consists of a <it>state-dependent event detector</it> and an <it>event sequence analyzer</it>. The former detects image variation (event) in a limited image region (focusing region), which is not affected by occlusions and outliers. The latter analyzes sequences of detected events and activates all feasible states representing assumptions about multiobject behaviors. In this architecture, event detection can be regarded as a verification process of generated assumptions because each focusing region is determined by the corresponding assumption. This architecture is sound since all feasible assumptions are generated. However, these redundant assumptions imply ambiguity of the recognition result. Hence, we further extend the system by introducing 1) <it>colored-token propagation</it> to discriminate different objects in state space and 2) integration of <it>multiviewpoint image sequences</it> to disambiguate the single-view recognition results. Extensive experiments of human behavior recognition in real world environments demonstrate the soundness and robustness of our architecture. </p>