Contextual Cueing (CC) is the powerful ability of humans to extract regularities from seemingly chaotic environment and utilize this knowledge to assist in visual search. Extensive research has shown that CC is a robust and ubiquitous phenomenon, but it is still unclear what exactly is being leaned, and what constitutes a "context". Prominent models of CC have typically focused on how people learn spatial configuration regularities and accordingly most of the research has used simplified, meaningless search stimuli. Nevertheless, the world we live in is filled with meaningful heterogeneous objects and there is ample behavioral and neuropsychological evidence that object identities ("what") frequently interact with spatial information ("where"). What is the role of what processing in CC? In this study I tested CC with everyday objects and found that the mere repetition of arbitrary spatial configurations was not sufficient to facilitate search when identity information varied across trials. This finding was replicated in three experiments and thus refutes the view that the repetition of spatial configuration is sufficient for context learning. At the same time, the results revealed that the repetition of spatial configuration might be necessary for context learning, as no learning was observed when only identity information was repeated. Instead, context learning was found only when both what and where information remained constant across the trials. Moreover, similar results were obtained when CC was tested in hybrid search tasks, in which people looked for multiple possible targets ("where are the keys, wallet and phone?"). Together these findings challenge current models of CC as well as the ecological validity of standard lab-based CC procedures, and indicate that although visual context learning might be highly specific; this learning is robust and not modulated by memory load.