According to usage-based approaches to language acquisition, linguistic knowledge is represented in the form of constructions – form-meaning pairings – at multiple levels of abstraction and complexity. The emergence of syntactic knowledge is assumed to be a result of the gradual abstraction of lexically-specific and item-based linguistic knowledge. In this article, we explore how the gradual emergence of a network consisting of constructions at varying degrees of complexity can be modeled computationally. Linguistic knowledge is learned by observing natural language utterances in an ambiguous context. In order to determine meanings of constructions starting from ambiguous contexts we rely on the principle of cross-situational learning. While this mechanism has been implemented in several computational models, these models typically focus on learning mappings between words and referents. In contrast, in our model we show how cross-situational learning can be applied consistently to learn correspondences between form and meaning beyond such simple correspondences.