The presentation offers an introduction of my research group's studies of
multimedia & mulitmodal corpus integration, of which a conceptual model
has been formulated by Professor Gu (2006, 2009). The present procedure is
to build a data model according to the conceptual model and integrate
different data types. What makes this research more demanding and
challenging lies in the fact that we established the Spoken Chinese Corpus
of Situated Discourse (SCCSD) involving three basic types of data,
orthographic transcripts, audio sound and video image streams recorded in
everyday scenarios (and not in sound-proof studios). In other words, the
integration has to face the totally different data, sometimes considered
"impure" or "noisy". With these regards, my own work plan falls into
three parts: (1) to demonstrate the conceptual models and data model
according to different multimedia types. (2) to identify where the problem
is; (3) and to introduce the procedure of constructing a multimedia &
multimodal corpus. (4) to show certain applications based on the
multimedia & multimodal corpus.