The act of integrating physical, virtual and textual conceptual spaces to create a unified new media artifact can be challenging. In our work we embrace a constant state of reconceptualization, as we transition between physical aspects of a performance and AI based manipulations that are both visual (art abstraction) and semantic. In these experiments the AI system works with meaning in terms of semantic keywords about the emotions and descriptions of a work. To fully explore the capabilities of these new immersive, virtual and semantic technologies, we can no longer rely on traditional creation, editing and production techniques, but must develop new practices and styles. We propose a new framework for creating what we refer to as ‘multimodal media-spaces.’ These media-spaces include interactive and video-based work that seek to combine physical, virtual and textual entities. This paper details our framework and its uses in several juried multimedia pieces. In our work we use 360 VR and multi-camera filming, movement performance from amateur and professionals, distortions of space and time using AI cinematic projections. We then process this footage using AI Deep Learning systems we have written that take text based cues and artfully abstracts the performance footage to create layers of emergent meaning. We explore and define this framework in relation to previous hybrid multimodal media-spaces we have created, and describe how this framework has led to the development of existing works and can be used by other media artists to explore emerging physical, virtual and textual conceptual multimodal media-spaces.