The ability to detect and understand other people{\textquoteright}s social interactions is a fundamental part of the human visual experience that develops early in infancy and is shared with other primates. However, the neural computations underlying this ability remain largely unknown. Is the detection of social interactions a rapid perceptual process, or a slower post-perceptual inference? Here we used magnetoencephalography (MEG) decoding and computational modeling to ask whether social interactions can be detected via fast, feedforward processing. Subjects in the MEG viewed snapshots of visually matched real-world scenes containing a pair of people who were either engaged in a social interaction or acting independently. The presence versus absence of a social interaction could be read out from subjects{\textquoteright} MEG data spontaneously, even while subjects performed an orthogonal task. This readout generalized across different scenes, revealing abstract representations of social interactions in the human brain. These representations, however, did not come online until quite late, at 300 ms after image onset, well after the time period of feedforward visual processes. In a second experiment, we found that social interaction readout occurred at this same latency even when subjects performed an explicit task detecting social interactions. Consistent with these latency results, a standard feedforward deep neural network did not contain an abstract representation of social interactions at any model layer. We further showed that MEG responses distinguished between different types of social interactions (mutual gaze vs joint attention) even later, around 500 ms after image onset. Taken together, these results suggest that the human brain spontaneously extracts the presence and type of others{\textquoteright} social interactions, but does so slowly, likely relying on iterative top-down computations.

We experience our visual environment as a seamless, immersive panorama. Yet, each view is discrete and fleeting, separated by expansive eye movements and discontinuous views of our spatial surroundings. How are discrete views of a panoramic environment knit together into a broad, unified memory representation? Regions of the brain{\textquoteright}s {\textquotedblleft}scene network{\textquotedblright} are well poised to integrate retinal input and memory [ 1 ]: they are visually driven [ 2, 3 ] but also densely interconnected with memory structures in the medial temporal lobe [ 4 ]. Further, these regions harbor memory signals relevant for navigation [ 5{\textendash}8 ] and adapt across overlapping shifts in scene viewpoint [ 9, 10 ]. However, it is unknown whether regions of the scene network support visual memory for the panoramic environment outside of the current field of view and, further, how memory for the surrounding environment influences ongoing perception. Here, we demonstrate that specific regions of the scene network{\textemdash}the retrosplenial complex (RSC) and occipital place area (OPA){\textemdash}unite discrete views of a\ 360{\textdegree} panoramic environment, both current and out of sight, in a common representational space. Further, individual scene views prime associated representations of the panoramic environment in behavior, facilitating subsequent perceptual judgments. We propose that this dynamic interplay between memory and perception plays an important role in weaving the fabric of continuous visual experience.