Technology to Watch

The InSite system is a living, breathing set of best practices (InSite: A Guide for Recording, Transcribing and Publishing Interviews). We’re counting on it getting better over time, making it ever easier and more efficient to make source material more transparent. Even as our current practices improve, we’d like to encourage other software and technologies that might be useful for some element of recording, transcribing, organizing, publishing and/or sharing.

There are three particularly tricky steps: transcribing audio into an accurate transcript, connecting the transcript at the sentence level to timecodes in the audio/video, and keeping this connection in the published version.

What follows are some details on software and research projects from our research that bear watching.

These “to watch” applications didn’t make our best practices, but they each contain some key elements useful for transcribing and formatting interactive transcripts, and they might be encouraged to add others. The more tools out there aiming to make recording, transcription and formatting easier, the better.

The tools are organized by whether they run locally on a desktop or laptop, or on a server in the cloud. Cloud servers require an Internet connection, and so can’t guarantee privacy. The names of current best practices also appear in their respective categories for context.

Desktop/Laptop transcription and formatting

Manual with optional ASR

Express Scribe is PC/Mac software with an efficient interface and the ability to integrate Dragon automatic transcription.
Mac/PC, free or $40.
Downsides: timestamps must be entered manually, no support for markup, no WebVTT export.www.nch.com.au/scribe

Manual

Inqscribe is an audio/video transcription player that lets you add timestamps with a keyboard shortcut and exports to WebVTT. Mac/PC, $99.Downsides: no integrated automatic transcription, requires more steps than Audio Notetaker, no support for markup.www.inqscribe.com

Audio Note (as opposed to Audio Notetaker) is smart phone/tablet/Mac/PC software for recording/taking notes/transcribing that timestamps a line every time you hit Enter and connects the transcription to the audio. You can record using the Smart phone version, then go back and transcribe using the desktop version. It also supports drawing. Mac/PC, $20; iPhone/Android including tablet, $5Downsides: no ability to adjust time codes, no export with timecodes, no WebVTT export, no integrated automatic speech recognition integration.luminantsoftware.com/iphone/audionote.html

Scrivener is smart phone/tablet/Mac/PC writing software (see writing section), that also provides a player and efficient interface for manual transcription.
Mac/PC $40/45; Smart phone/tablet $15.Downsides: it doesn’t connect transcription to audio, so timestamps must be entered manually. No automatic speech recognition integration, no WebVTT export.www.literatureandlatte.com/scrivener.php

Transcriva is Mac manual transcription software with an efficient interface that connects the transcription to audio via a QuickTime file and automatically timestamps.
Mac-only, free.Downsides: Mac only, doesn’t export with time codes, no automatic speech recognition integration, no support for markup, no WebVTT export.ranscriva.en.softonic.com/mac

OTranscribe is an open source manual transcription tool with an unusual twist: it’s used through a web browser, but doesn’t it transmit any information. Instead it just uses the local browser cache, so it’s private. It includes a timestamp button, which speeds entering timestamps manually. Easy export to Google Drive.
Mac/PC Free.Downsides: because it uses the browser cache it can be unstable, losing information. You have to clear your browser cache to remove extra copies of the transcription, no automatic speech recognition integration.otranscribe.com

Cloud transcription and formatting

If guaranteeing privacy is not an issue, it may be useful to use hosted web tools for transcribing, formatting and organizing source material. There are several hosted web alternatives to watch that contain some key elements and might be encouraged to add others:
Manual with optional ASR

Cadet is a web-based open source captioning tool from WGBH. Imports and exports WebVTT.
It’s in beta as of 2017-03-10. Free.
Downsides: aimed at captioning (timestamps at random places in text, no paragraphing or subheads).ncamftp.wgbh.org/cadet

Pop-up Archive is a hosted web tool that includes automatic speech recognition with automatic timestamps, automatic tagging, and a good interface for correcting ASR mistakes. Exports WebVTT.Downsides: aimed at captioning (timestamps at random places in text, no paragraphing or subheads), hosted by a third party, starts with automatic transcription, so less of an option for some accents, recordings with background noise etc.
$0.20-0.25/minutewww.popuparchive.comUpdate: Pop-up Archive has been acquired by Apple. The Pop-up Archive site shut down in November 2017.

Trint is a hosted web tool that includes automatic transcription and annotation. Good interface elements for reporters notebook, including cross out and highlight text. WebVTT export, including just highlighted text. Share with others who have account. Embeddable player.Downsides: hosted by a third party, starts with automatic transcription, so less of an option for some accents, recordings with background noise etc. Won’t take any recording with more than one track, including Skype or Google hangout recordings. Share snapshot (does not remain in sync) with others only if they also have an account. Can’t copy. WebVTT export has timestamps at random places in text rather than by sentence.
$0.25/minutetrint.com/#features

YouTube hosts videos and also contains a web tool that includes automatic speech recognition and timestamps. It also has a tool to timestamp (sync) an existing transcript. Free.Downsides: hosted by a third party, aimed at captioning (timestamps at random places in text, no paragraphing or subheads) automatic speech recognition doesn’t work well on low-quality recordings.www.youtube.com/

Desktop/laptop Syncing

AVID is professional video editing software, $1,299 plus Scriptsync add-on $499 (or Scriptsync and phrasefind add-on $599)
Scriptsync phonetically syncs an existing transcript to a video.Downsides: No export options for the transcript, so it’s not a tool you can use to prepare a transcript for publication. No facility to segment by sentence.www.avid.com/media-composer

Cloud Syncing

autoEdit is an open source tool that uses automatic transcription (via the IBM Watson engine or Kaldi/Gentle) as a navigation aid to cut video. Aimed at using the transcription to cut a video via an Edit Decision List. There’s also a paper edit option designed to make short videos from excerpts. Exports as Json.Downsides: Transcript editing facility basic and not efficient – takes a couple of clicks and typing in the right word to correct one word at a time. Does not export as WebVTT. www.autoedit.io

Gentle (Beta as of 4/8/18) is an open source automatic transcription and syncing tool based on the Kaldi open-source speech engine. Upload your transcript and audio, and download a Json file that contains a timecode for every line or every word.Downsides: Just one piece of the puzzle – lacks facility to segment by sentence. Requires download and upload steps. lowerquality.com/gentle

Speechmatics has a syncing service – upload your transcript and audio, and download a Json file that contains a timecode for every line or every word.Downsides: Just one piece of the puzzle – lacks facility to segment by sentence. Requires download and upload steps.www.speechmatics.com/solutions/#time-alignment%0A

Note: there are many syncing tools aimed at captioning – they run the gamut from Cadet, an open source tool from WGBH, to YouTube’s automatic captioning tool. These tools automatically sync text to audio and add periodic timestamps appropriate for captioning. We haven’t found any that are a practical tool for transcripts, however. The missing ingredient is a facility to allow the content creator to choose between caption-style segments, which need to be below a certain wordcount to fit at the bottom of a video, and transcript-style segments, which need to be structured by sentence so they can be easily followed along with and shared. In order to do this, captioning tools would need facilities to programmatically determine the beginnings and ends of sentences. Tools of this sort that might eventually prove useful are those that output a Json file that contains a timecode for every line or every word. Because these contain timecode information for every word, they can in theory be used in another tool to segment by sentence.

Publishing

The key to publishing transcripts connected to audio is a audio/video player that supports text connected to video. A good way to do this is to support the WebVTT standard markup language. WebVTT is very readable by humans and therefore easy to work with, but contains time codes that link the text to audio/video. It’s also a standard that works across players.

Giving the Developers Feedback

Desktop/laptop transcribing and formatting

Here are places to encourage, make suggestions and complain to the developers of desktop/laptop/device transcribing and formatting tool that may someday be useful for producing interactive transcripts.

Publishing

Here’s where to encourage video players to support WebVTT, interactive transcripts, and other features that make interactive transcripts easier to use.Video.js:github.com/videojs/video.js/issues

Automatic Speech Recognition to Watch

There are many research projects aiming to make automatic transcription better. As this technology progresses, there will be more situations where correcting an automatic transcript is faster and more comfortable than doing a manual transcription from scratch.

For the foreseeable future, however, even as automatic transcription become more useful, they’ll always have to be verified to some degree by humans. It’s a little like spellchecking – early spellcheckers had a button you could push to correct all, which quickly turned out to be a bad idea. Thirty years later, we still put human eyes on the mistakes the spellchecker catches to verify each correction.

Here are some automatic transcription engines and research that bear watching:

Desktop/laptop/device

Android
There’s good real-time speech recognition software built into the Android phone operating system. The thing to watch for here is the option to transcribe an existing audio file. It would be even more useful if the automatic transcription were kept linked to audio and could be downloaded as WebVTT.

iOS
Since the iPhone 6s, real-time speech recognition software has run locally on as part of the Apple iPhone operating system. The thing to watch for here is the option to transcribe an existing audio file. It would be even more useful if the automatic transcription were kept linked to audio and could be downloaded as WebVTT.

MacOS
Apple’s Mac operating system comes with good real-time speech recognition software, but there’s no option for automatically transcribing an audio or video file. It would be very useful if Apple opted to include this feature in its operating system. It would be even more useful if the automatic transcription were kept linked to audio and could be downloaded as WebVTT.

Cloud APIs

These application programming interfaces connect a speech recognition engine to a web app, enabling any web developer to use a given speech engine.

Google automatic speech recognition is built into YouTube. The Scrb project, which is based on the YouTube automatic speech recognition, gives a good sense of how good the engine is at transcribing speech under the best of circumstances – a recorded TED talk given by someone who speaks slowly and clearly.www.scrb.co
Scroll to the bottom of the page to get to the automatically transcribed talk in a web-based tool where you can listen and correct. The other major speech engines listed give generally similar
results.