Issues of syntax have dominated research in multimedia information systems (MMISs), with video developing as a technology of images and audio as one of signals. But when we use video and audio, we do so for their content. This is a semantic issue. Current research in multimedia on semantic content-based models has adopted a structure-oriented approach, where video and audio content is described on a frame-by-frame or segment-by-segment basis (where a segment is an arbitrary set of contiguous frames). This approach has failed to cater for semantic aspects, and thus has not been fully effective when used within an MMIS. The research undertaken for this thesis reveals seven semantic aspects of video and audio: (1) explicit media structure; (2) objects; (3) spatial relationships between objects; (4) events and actions involving objects; (5) temporal relationships between events and actions; (6) integration of syntactic and semantic information; and (7) direct user-media interaction. This thesis develops a full-scale semantic content-based model that caters for the above seven semantic aspects of video and audio. To achieve this, it uses an entities of interest approach, instead of a structure-oriented one, where the MMIS integrates relevant semantic content-based information about video and audio with information about the entities of interest to the system, e.g. mountains, vehicles, employees. A method for developing an interactive MMIS that encompasses the model is also described. Both the method and the model are used in the development of ARISTOTLE, an interactive instructional MMIS for teaching young children about zoology, in order to demonstrate their operation.