The
Text REtrieval Conference (TREC) is a workshop series designed to encourage research on text retrieval for realistic applications by providing large test collections, uniform scoring procedures and a forum for organizations
interested in comparing results. In recent years the conference has contained one main task and a set of additional tasks called tracks. The main task investigates the performance of systems that search a static set of
documents using new questions. This task is similar to how a researcher might use a library - the collection is known, but the questions likely to be asked are not known. The tracks focus research on problems related to the main
task, such as retrieving documents written in a variety of languages using questions in a single language (cross-language retrieval), retrieving documents from very large (100GB) document collections and retrieval performance with
humans in the loop (interactive retrieval). Taken together, the tracks represent the majority of the research performed in the most recent TRECs, and they keep TREC a vibrant research program by encouraging research in new areas of
information retrieval.

The three most recent TRECs have had a track on Spoken Document Retrieval (SDR), that is, on content-based retrieval of excerpts from recordings of speech. In practice, SDR is accomplished by using a
combination of automatic speech recognition and information retrieval technologies. A speech recognizer is applied to an audio stream and generates a time-marked transcription of the speech. The transcript is then indexed and
searched by a retrieval system. The result returned for a query is a list of temporal pointers to the audio stream ordered by decreasing similarity between the content of the speech being pointed to and the query.

The aim of the
TREC SDR track is to provide the infrastructure required to enable research on the SDR problem. Large collections of multimedia documents are already being assembled, and content-based access to all of the information is required.
While the component speech recognition and information retrieval technologies are mature enough to expect usable SDR systems for some domains, there remain a number of research issues. The track fosters research on the development
of large-scale, near-real-time, continuous speech recognition technology as well as on retrieval technology that is robust in the face of input errors. More importantly, the track provides a venue for investigating hybrid systems
that may be more effective than simple stovepipe combinations.

The three SDR tracks used different corpora, but for each track the corpus was a collection of broadcast news stories that were made available in several different
forms. A hand-produced transcript of the entire corpus was the "reference" (ground truth) transcript. The reference transcript contained story boundaries that defined the documents in the collection; all other versions of the
corpus used these same human-defined boundaries. "Baseline" transcripts were produced using one particular speech recognizer configured for different levels of recognition accuracy. In addition, many participants ran their own
speech recognition system against audio files to produce their own "speech" transcripts.

The National Institute of Standards and Technology (NIST) provided a set of written information needs (called "topics" in TREC) that were
used to search each version of the transcripts. The different versions of the transcripts allowed participants to observe the effect of recognizer errors on their retrieval strategy. The different speech transcripts provided a
comparison of how different recognition strategies affect retrieval. To make this comparison as complete as possible, participants were also encouraged to retrieve using other groups' speech transcripts.

The TREC-6 (1997) SDR
track was the first formal evaluation of SDR technology. The corpus was 50 hours of news broadcasts, an enormous amount of audio to recognize at the time, but a tiny IR document collection of only 1451 stories. The task in the
track was "known item" searching using 49 test topics. The goal in known item searching is to retrieve a single specific document rather than a set of relevant documents. Although the TREC-6 track was primarily a feasibility
experiment, it did demonstrate that speech recognition and IR technologies were sufficiently advanced to do a credible job of retrieving specific documents. The better systems were able to retrieve the target document at rank 1
over 70% of the time using their own speech transcripts, compared to the best performance on the reference transcripts of 79%. Search performance was a bigger factor in the overall results than recognition accuracy, although
participants that had both speech and IR expertise obtained the best results. These promising results were considered preliminary, however, because the known item task is diagnostically limited and the collection size was so small.

The TREC-7 (1998) SDR track used the standard ad hoc retrieval task and an 87-hour, 2866-story broadcast news corpus. A team of three NIST assessors created 23 test topics and judged the retrieved documents for relevance after
the retrieval results were submitted to NIST. Once again, the overall performance of the systems was quite good, with only a very gradual decline in retrieval performance as recognition errors rose. Nonetheless, analysis of the
retrieval results when participants used each other's speech transcripts did show a correlation between recognition word error rate and retrieval performance, a correlation that was not present in the TREC-6 known item search
results. The correlation is stronger when recognizer error is computed over content-based words (for example, named entities) rather than all words.

The 1999 TREC-8 SDR track was designed to determine if the technology scaled for
realistically large spoken document collections. As such, the track used a subset of the TDT-2 corpus consisting of 557 hours and almost 22,000 stories. This amount of audio was large enough that it required recognition algorithms
that worked in close to real time, as opposed to the 40- or even 300-times real time algorithms that were common in other speech recognition evaluations. In addition to the test conditions supported in TREC-7, a story boundaries
unknown condition was added to provide a more realistic picture of how systems could perform if given a set of continuous, unsegmented recording streams to recognize and search. Despite the required focus on recognition speed, the
recognition error rates improved from 1998. The retrieval results were comparable to TREC-7, suggesting that the technology scaled for a collection almost an order of magnitude larger with no loss in accuracy. The rate at which
retrieval performance degrades due to increasing recognition errors also appears to be independent of collection size. Retrieving from unsegmented streams is a harder problem, however. Retrieval effectiveness for the unknown
boundary condition was always worse than the corresponding run using known story boundaries.

The TREC SDR Track has provided an infrastructure for the development and evaluation of spoken document retrieval technology and a
common forum for the exchange of knowledge between the speech recognition and information retrieval research communities. It has also provided objective, demonstrable proof that the technology can be successfully applied to
realistic audio collections. The track is scheduled to continue in TREC-9 and beyond, with an eventual goal of expanding the retrieval task to include other media types.

AT&T Labs ResearchCarnegie Mellon University (2 groups)Defense Evaluation and Research AgencyRoyal Melbourne Institute of Technology/University of Melbourne/CSIROTNO-TPD TU-Delft
University of CambridgeUniversity of MarylandUniversity of MassachusettsUniversity of Sheffield/University of Cambridge/SoftSound/ICSIU.S. Department of Defense

TREC-8 (1999)

AT&T Labs ResearchCarnegie Mellon UniversityIBM T.J. Watson Research CenterLIMSI-CNRSRoyal Melbourne Institute of TechnologySUNY BuffaloTwentyOneUniversity of Cambridge
University of MassachusettsUniversity of Sheffield/University of Cambridge/SoftSound/ICSI

Ellen Voorhees and John Garofolo are with the National Institute of Standards and Technology (NIST) in Gaithersburg, MD 20899.