Tools

"... Our lives are full of memorable and important moments, as well as important items of information. The last few years have seen the proliferation of digital devices intended to support prosthetic memory (PM), to help users recall experiences, conversations and retrieve personal information. We nevert ..."

Our lives are full of memorable and important moments, as well as important items of information. The last few years have seen the proliferation of digital devices intended to support prosthetic memory (PM), to help users recall experiences, conversations and retrieve personal information. We nevertheless have little systematic understanding of when and why people might use such devices, in preference to their own organic memory (OM). Although OM is fallible, it may be more efficient than accessing information from a complex PM device. We report a controlled lab study which investigates when and why people use PM and OM. We found that PM use depended on users ’ evaluation of the quality of their OM, as well as PM device properties. In particular, we found that users trade-off Accuracy and Efficiency, preferring rapid access to potentially inaccurate information over laborious access to accurate information. We discuss the implications of these results for future PM design and theory. Rather than replacing OM, future PM designs need to focus on allowing OM and PM to work in synergy.

...ow us to make sense of the controversy surrounding the utility of PM in the form of pen and paper notes. Some researchers have argued (counter-intuitively) that notes are of little use as memory aids =-=[8,18]-=-, while others suggest they are useful [10]. Our results suggest a resolution to this dispute. We found that notes may be useful short-term, but have little long-term utility, becoming no better than ...

"... The widespread availability of broadband connections has led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. In the absence of transcripts of what was said, users have difficulty searching and scanning for speci ..."

The widespread availability of broadband connections has led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. In the absence of transcripts of what was said, users have difficulty searching and scanning for specific topics. This research investigates user needs for transcription accuracy in webcast archives, and measures how the quality of transcripts affects user performance in a question-answering task, and how quality affects overall user experience. We tested 48 subjects in a within-subjects design under 4 conditions: perfect transcripts, transcripts with 25 % Word Error Rate (WER), transcripts with 45 % WER, and no transcript. Our data reveals that speech recognition accuracy linearly influences both user performance and experience, shows that transcripts with 45 % WER are unsatisfactory, and suggests that transcripts having a WER of 25 % or less would be useful and usable in webcast archives.

...pts, and to develop better ASR systems that deliver transcripts with lower WERs. Equally important, since ASR techniques that achieve close to 0% WER are not likely to be available in the near future =-=[28]-=-, more studies are needed to understand users’ expectations from transcripts and to explore how imperfect transcripts should be integrated into a highly-interactive webcast system. Transcribing lectur...

"... Searching audio data can potentially be facilitated by the use of automatic speech recognition (ASR) technology to generate text transcripts which can then be easily queried. However, since current ASR technology cannot reliably generate 100 % accurate transcripts, additional techniques for fluid br ..."

Searching audio data can potentially be facilitated by the use of automatic speech recognition (ASR) technology to generate text transcripts which can then be easily queried. However, since current ASR technology cannot reliably generate 100 % accurate transcripts, additional techniques for fluid browsing and searching of the audio itself are required. We explore the impact of transcripts of various qualities, dichotic presentation, and time-compression on an audio search task. Results show that dichotic presentation and reasonably accurate transcripts can assist in the search process, but suggest that time-compression and low accuracy transcripts should be used carefully. Author Keywords Dichotic listening, transcripts, audio time-compression.

"... The increased availability of broadband connections has recently led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. One challenge to skimming and browsing through such archives is the lack of text transcripts o ..."

The increased availability of broadband connections has recently led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. One challenge to skimming and browsing through such archives is the lack of text transcripts of the webcast’s audio channel. This paper describes a procedure for prototyping an Automatic Speech Recognition (ASR) system that generates realistic transcripts of any desired Word Error Rate (WER), thus overcoming the drawbacks of both prototype-based and Wizard of Oz simulations. We used such a system in a user study showing that transcripts with WERs less than 25% are acceptable for use in webcast archives. As current ASR systems can only deliver, in realistic conditions, Word Error Rates (WERs) of around 45%, we also describe a solution for reducing the WER of such transcripts by engaging users to collaborate in a “wiki ” fashion on editing the imperfect transcripts obtained through ASR.

...ines are also clickable, allowing users to cue the video playback to the corresponding location. As it is expected that such systems will not reach perfect or near perfect accuracy in the near future =-=[6]-=-, we are also proposing alternative tools to reduce current WER levels to the 25% level determined acceptable by our study. For this, we have developed a collaborative tool that extends ePresence func...

"... As the use of Internet broadcasting (webcasting) increases, more webcasts will be archived and accessed numerous times retrospectively. One challenge in skimming and browsing through such archives is the lack of textual transcripts of the archived medias ’ audio channel. Ideally, transcripts would b ..."

As the use of Internet broadcasting (webcasting) increases, more webcasts will be archived and accessed numerous times retrospectively. One challenge in skimming and browsing through such archives is the lack of textual transcripts of the archived medias ’ audio channel. Ideally, transcripts would be obtainable through Automatic Speech Recognition (ASR). However, current ASR systems can only deliver, in realistic conditions, Word Error Rates (WERs) of around 45 % – unsatisfactory, as shown in our recent study [1], which revealed that transcripts are useful and usable in webcast archives for WERs equal to or less than 25%. We therefore propose an extension to the ePresence webcast system that engages users to collaborate in a wiki manner on editing the imperfect transcripts obtained through ASR. 1.

... webcast with transcripts 20-30% in more artificial and better controlled conditions [5, 6]). Also, it is expected that such systems will not reach perfect or near-perfect accuracy in the near future =-=[7]-=-. In order to achieve useful and usable transcript-enhanced webcast archives of lectures and presentations, we are proposing alternative tools to reduce current WER levels of 40-45% to the desired 25%...

"... The increased availability of broadband connections has recently led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. One of the hurdles users face when browsing and skimming through archives is the lack of text ..."

The increased availability of broadband connections has recently led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. One of the hurdles users face when browsing and skimming through archives is the lack of text transcripts of the audio channel of the webcast archive. In this paper, we proposed a procedure for prototyping an Automatic Speech Recognition (ASR) system that generates realistic transcripts of any desired Word Error Rate (WER), thus overcoming the drawbacks of both prototypebased and Wizard of Oz simulations. We used such a system in a study where human subjects perform question-answering tasks using archives of webcast lectures, and showed that their performance and perception of transcript quality is linearly affected by WER, and that transcripts of WER equal or less than 25 % would be acceptable for use in webcast archives.

... perform satisfactorily in domains such as transcribing lectures or conference presentations. Also, it is expected that such systems will not reach perfect or near perfect accuracy in the near future =-=[6]-=-. Currently, due to the adverse acoustic and linguistic characteristics of lecture speech (large vocabulary, speaker independent, continuous speech, imperfect recording conditions), most lecture recog...

"... A growing number of lecture webcasts are archived after being delivered live. In the absence of transcripts, users are faced with increased difficulty in performing tasks easily achieved with text documents (retrieval, browsing, skimming). Unfortunately, speech recognition systems do not perform sat ..."

A growing number of lecture webcasts are archived after being delivered live. In the absence of transcripts, users are faced with increased difficulty in performing tasks easily achieved with text documents (retrieval, browsing, skimming). Unfortunately, speech recognition systems do not perform satisfactorily when transcribing lectures. In this paper, we present an overview of the ePresence lecture transcription project, whose goal is to improve the usefulness and usability of automaticallygenerated transcripts of webcast lectures. We achieve this by integrating novel speech recognition techniques specifically addressed at increasing the accuracy of webcast transcriptions with the development of an interactive collaborative interface that facilitates users&apos; contribution to the improvement of machine-generated transcripts. We conclude by discussing the challenges (and possible solutions) to successfully integrate transcripts into archives of webcast lectures.

...a 20-30% WER for lectures given in more artificial and better controlled conditions [6]). Moreover, it is expected that such systems will not reach perfect or near-perfect accuracy in the near future =-=[7]-=-. Therefore, in our research we measured the acceptable WER of webcast lecture transcripts, and we developed ASR- and HCI- based solutions to reduce the current WER to desirable values. Making speech ...

"... People are aware of the fact that their memories are fallible and as a result they spend significant amounts of time preparing for subsequent memory challenges, e.g. by taking notes about information they think they will later have to remember. There has been extensive research into note taking and ..."

People are aware of the fact that their memories are fallible and as a result they spend significant amounts of time preparing for subsequent memory challenges, e.g. by taking notes about information they think they will later have to remember. There has been extensive research into note taking and whether it is effective as a memory aid, but most of this has concerned pen and paper rather than digital notes. We conducted an experiment investigating the relationship between note-taking behaviors (whether digital or paper based) and subsequent recall. We gave people two systems: a note-taking device called ChittyChatty (CC) that combines digital notes with an audio record – Fig 1; and conventional Pen &amp; Paper (PP) – Fig 2. We observed the note taking patterns that occurred in digital CC notes and paper based PP notes. We then examined whether the quality and quantity of

...do such PM notes aid our OM in general and in the longer-term? Some research studies suggest that notes are only effective as a short-term memory aid [7]. Research into note taking has been extensive =-=[7, 6, 8, 4]-=-, however, most of this work has focused on when and why people take paper notes. And another crucial issue that has not been systematically explored is the relationship between the quality and quanti...

"... When users access information from text, they engage in strategic fixation, visually scanning the text to focus on regions of interest. However, because speech is both serial and ephemeral, it does not readily support strategic fixation. This paper describes two design principles, indexing and trans ..."

When users access information from text, they engage in strategic fixation, visually scanning the text to focus on regions of interest. However, because speech is both serial and ephemeral, it does not readily support strategic fixation. This paper describes two design principles, indexing and transcript-centric access that address the problem of speech access by supporting strategic fixation. Indexing involves users constructing external visual indices into speech. Users visually scan these indices to find information-rich regions of speech for more detailed processing and playback. Transcription involves transcribing speech using automatic speech recognition (ASR) and enriching that transcription with visual cues. The resulting enriched transcript is time-aligned to the original speech, allowing users to scan the transcript as a whole or the additional visual cues present in the transcript, to fixate and play regions of interest. We tested the effectiveness of these two approaches on a set of reference tasks derived from observations of current voicemail practice. A field trial evaluation of JotMail, an indexed-based interface similar to commercial unified messaging clients, showed that our approaches were effective in supporting speech scanning, information extraction and status tracking, but not archive management. However, users found it onerous to take manual notes with JotMail to provide effective retrieval indices. We therefore built SCANMail, a transcript-based interface that constructs indices automatically, using ASR to generate a transcript of the speech data. SCANMail also uses information extraction techniques to identify regions of

"... The increased availability of broadband connections has recently led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. One of the hurdles users face when browsing and skimming through archives is the lack of text ..."

The increased availability of broadband connections has recently led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. One of the hurdles users face when browsing and skimming through archives is the lack of text transcripts of the audio channel of the webcast archive. In this paper, we proposed a procedure for prototyping an Automatic Speech Recognition (ASR) system that generates realistic transcripts of any desired Word Error Rate (WER), thus overcoming the drawbacks of both prototypebased and Wizard of Oz simulations. We used such a system in a study where human subjects perform question-answering tasks using archives of webcast lectures, and showed that their performance and perception of transcript quality is linearly affected by WER, and that transcripts of WER equal or less than 25 % would be acceptable for use in webcast archives.

... perform satisfactorily in domains such as transcribing lectures or conference presentations. Also, it is expected that such systems will not reach perfect or near perfect accuracy in the near future =-=[6]-=-. Currently, due to the adverse acoustic and linguistic characteristics of lecture speech (large vocabulary, speaker independent, continuous speech, imperfect recording conditions), most lecture recog...