InSite: A Guide for Recording, Transcribing and Publishing Interviews

Introduction

The InSite (for “Interview Site”) system makes processing interviews easier and creates new possibilities for publishing excerpts or entire interviews. It includes best practices for recording, transcribing and organizing interviews, and an open-source publishing system that enables journalists to post interactive transcripts, making their work more transparent, and engaging audiences with deeper content.

The InSite publishing system gives audiences two key tools:

Navigation: readers can explore an interview transcript and click on a sentence to navigate directly to that sentence in the video recording or podcast.

Sharing: audiences can select one or more sentences in the transcript, and, with a single click, copy the excerpt plus direct link, or share it on Twitter or Facebook. They can paste copied links into documents or email. And they can build excerpt/link playlists.

The InSite system also provides publishers with three key abilities:

Annotation: content providers can add related text, images, maps and links to a transcript to enhance an interview’s depth and transparency. Link options include links to a given point in an audio or video file. This allows publishers to illuminate and link related content from different interviews – from their own collections or from collections elsewhere on the web.

Timelines: content providers can build timelines that include direct links to interview quotes.

Adapting Existing Media: a publisher can publish an interactive transcript using any video published on YouTube.

One goal of InSite is to help make interviews, an essential source material of journalism, more useful for journalists and audiences. Our method increases quotation accuracy and transparency, which boosts reporting credibility. It makes it easy to annotate interviews and integrate timelines, which add depth and nuance to reporting.

A living archive of interviews can become a resource for exploring history and current events through primary sources.Adding interactive transcripts to audio and video makes them not only more accessible to site visitors, but searchable across the web.

InSite’s best practices are flexible enough to accommodate existing tools and workflows. They can be adapted to different interviewing techniques and include options for manual and automatic speech transcription.

Below we’ll take you through each of the steps needed to produce an interactive transcript. The InSite Best Practices flowchart maps the full workflow of record->transcribe->organize->publish->share.

Recording

The first step in the InSite process is recording.

There are many brands of stand-alone digital recorders, and there are many recording apps for smartphones. When choosing a recorder you should consider the following:

How easy is it to turn the recorder on?

How easy is it to make sure that the recorder is still recording?

Is the recorder likely to be a distraction to the interviewee while recording?

How reliable is the recorder battery life?

Can you mark or highlight quotes as you’re recording?

How easy is it to transfer the recording from the recorder to your computer?

If you’re using a smartphone as your recorder, what happens when you get a phone call or text?

A stand-alone digital recorder and a smartphone each comes with advantages and drawbacks.

A stand-alone digital recorder has a longer battery life, and you don’t have to worry about a call or text interrupting the recording. But some stand-alone recorders don’t have a good way to check battery life, and microphone quality may not be high, especially on inexpensive digital recorders. The ease of transferring the recording to your computer varies by brand.

The advantages of using a smartphone app include not carrying an extra device, more data transfer options and more ways to mark the recording on the fly. The quality of the microphone is usually fairly high – often higher than all but relatively expensive stand-alone recorders.

You can also increase smartphone microphone quality by adding an external microphone. A high quality lavalier microphone is a small step up in recording quality. A larger standalone condenser microphone gives you a larger step up because it captures a wider range of sound, especially at the lower register. Better sound quality makes it a bit more comfortable to manually transcribe and slightly increases accuracy in automatic speech recognition.

One good choice for a lavalier microphone is the Apogee/Sennheiser lavalier microphone. And one good choice for a standalone condenser microphone is the Apogee microphone.

Possible downsides of using a separate microphone include the small hassle of setting it up, and a more obvious reminder to the interviewee that you are recording.

If you’re using a smartphone, also consider using a stand-alone digital recorder as a backup if there’s a chance you might get caught short without enough phone battery. It’s also useful backup when your interview subject says something important just as your smartphone takes over the sound system to ring or make a text noise, which cuts off the recording for a couple of seconds. You can also avoid this by putting your phone in airplane mode during an interview. There are many stand-alone digital recorder models and they range widely in price. One good choice for a fairly high-quality backup is the Sony-ICD-UX533, which has an easy-to-use interface, a relatively good microphone and a direct USB connection.

Recording Best Practice

Our best practice for recording is a smartphone with the Sonocent recording app (iPhone or Android), plus a Flic button remote. The Sonocent recording app is the mobile companion app of the Audio Notetaker software.

To follow the InSite process you can use any recorder or smartphone app that exports common audio file formats. But we found that the Sonocent recorder includes a number of key features in one place that make recording and managing interviews easier.

The Sonocent Recorder app allows you to do the following:

Section and mark highlights of an interview while recording, and while replaying. The sectioning and marking are visible when you replay or transcribe the recording.

Transfer the recording (directly, or via Wi-Fi or Dropbox), with sectioning and highlight marks preserved, to a PC or Mac running Audio Notetaker software.

Here’s what the Sonocent Recorder App looks like:

The app also has a glance mode where the screen appears mostly black with a simple white counter and section number. The low-key screen is non-distracting and preserves battery life. The app is also subtly idiot proof. When you put the phone into glance mode it starts recording, and it’s impossible to stop recording while in glance mode – you have to first switch it to a much different looking screen to stop recording.

Sonocent Recorder App, glance mode:

You can also use an optional Flic button as a remote to section and mark recordings on-the-fly without having to reach for the recorder. Single-press to indicate a new section; double-press to mark an important moment by highlighting. This lets you use your non-writing hand to section and mark as you take written notes.

Flic button:

This setup, with or without the Flic button, can be used for in-person recordings or with a phone tap.

Recording from an iPhone

As of May 2017 it’s not possible to directly record calls made on an iPhone. If it’s necessary to record from an iPhone call there are two options, neither ideal. If two smartphones or a smart phone and iPad are available, and it’s relatively quiet and there’s no problem with being overheard, the best choice is to make the call on one smartphone on speaker and record and mark using the Sonocent recording app on the second phone or iPad.

If it’s necessary to use an iPhone for both the phone call and recording, the only way to do this is to use an app that sets up a three-way call, records the call, then emails the recording to you. One good choice is TapeACall. This isn’t ideal in several ways: the recording is stored on a company server and so privacy is not guaranteed, and you can’t mark the recording on the fly. Also, there’s no way to monitor that the system is working while you’re recording.

Recording from Skype

You can record directly from Skype onto your laptop computer using Audio Notetaker (and other programs). Because your voice and the incoming voice are in separate channels, the software automatically marks the recording by speaker. If you can connect your computer to the Internet by tethering your laptop to your smartphone’s cell connection (many mobile phone carriers have tethering options) you can use Skype on your laptop anywhere you have cell phone coverage.

To record from Skype, set Audio Notetaker to record from microphone and speakers:

Start a Skype call, and click the Audio Notetaker record button. Audio coming through the microphone is light blue. Audio coming through the speakers is hatched:

Note: as of May, 2017, only the PC version of Audio Notetaker supports recording directly from Skype.

Best Practice Resources: recording

Sonocent Recording App downloadiOSAndroid
The free version allows recording, marking, transferring but limits playback on the smartphone to the first five minutes. The $12 version adds full playback.

Transcribing and Organizing

The second and third steps in the InSite process are transcribing interviews and organizing the transcribed material.

There are numerous transcription tools available. But there is no fast, free or foolproof way to automatically generate a flawless transcript.

In identifying the best transcription tools for you, there are two major considerations.

The first choice is whether to use transcription software that resides on your computer or is Web-based. Our best practices are limited to tools that reside on your computer rather than on the web. This sidesteps the formidable problem of guaranteeing privacy for data that goes over the Internet, and also allows you to access your data without an Internet connection.

The second choice is whether to use automatic transcription, do manual transcription from scratch, or pay a human to do a manual transcription. Automated transcription and outsourcing will still require you to proofread and (especially with automation) correct errors. Your choice may depend on the quality of the audio and length of the recording. If you have a long recording where a single person is speaking or being interviewed, the recording is very clear, and that person doesn’t have an unusual accent, then using automatic transcription and correcting and paragraphing the results may be faster than doing the transcription from scratch.

If you transcribe manually, there is a further, hybrid option: you can opt to type text using your keyboard, or you can “re-speak” the interview, using headphones to listen to playback and your own voice to dictate using a speech recognition program that will have already adapted to your voice, generating fewer errors.

Transcribing and Organizing Best Practice

Our best practice for transcribing and organizing is to use Audio Notetaker software. The program preserves the highlights and sectioning from the Sonocent Recorder, has good options for manual and automatic transcription, and allows for annotation, marking key sections with color, and searching across files.

Note: as of May 2017, only the PC version of Audio Notetaker integrates Dragon automatic transcription.

You can download a recording from the Sonocent Recorder onto your laptop using a physical connection, Wi-Fi or Dropbox. You can also download any recording into the Audio Notetaker desktop app for PC or Mac.

When you open a file in Audio Notetaker you’ll see up to four columns (any of the columns can be hidden).

The column on the far right is the audio column. Audio is represented by rectangles of varying lengths. The spaces between rectangles are pauses in the speech. If you sectioned the audio during recording, your audio will be separated into rows. If you highlighted excerpts during recording, the rectangles that represent the portions of audio you marked will be a different color.

You can transcribe manually or use one of the two automatic transcription options.

Manual Transcription

Audio Notetaker has keyboard shortcuts for all audio controls. You can type into a column and use the audio shortcuts without having to change focus. Two of Audio Notetaker’s columns are reserved for text. You can transcribe into one of the columns and use the other for notes.

It’s also possible to navigate the audio by moving the cursor among the rectangles. This makes it easy to go directly to a highlighted section, or to find a quote that comes before or after another quote. The visual representation spares you from having to fast-forward or rewind in search of a specific quote.

Also, you can listen to the audio and speed it up or slow it down without changing the pitch.

This all makes for a relatively speedy manual transcription process, and also keeps things organized if you do an initial pass and only transcribe portions you marked, and then come back later.

Re-speaking

For manual transcription there are two ways to input the words: typing or re-speaking. Re-speaking can be considerably faster than typing, but it takes a little getting used to. Re-speaking requires using headphones to listen to the recording, a microphone connected to your computer, speech input software, and a reasonably quiet environment. A good noise-canceling microphone boosts accuracy, which, in turn, boosts speed.

Re-speaking can speed manual transcription once you get used to listening through headphones and speaking into the microphone at the same time. But this method also has some downsides. It requires a lot of talking. People who are nearby may hear the re-speaking. Noisy environments can decrease its accuracy.

Best practice for re-speaking is Dragon NaturallySpeaking on a PC or native Mac speech input on a Mac (Windows also has built-in speech input software, but it is not as accurate). The quality of built-in Mac Speech input is generally equal to Dragon.

Google Docs running in a Chrome browser has free built-in speech input. Because this option is web-based you can’t guarantee privacy, and it depends on an Internet connection. (It’s a good free option for trying out re-speaking).

Automatic Transcription

The usefulness of automatic transcription depends on the quality of the recording.

If you have Dragon Preferred, Professional Individual or Professional installed on your PC, it can automatically transcribe a recording or selected sections from within Audio Notetaker. Because Dragon is installed on your PC, the automatic transcription remains private.

The Dragon automatic speech transcription does not paragraph text. If all your audio is in one section in Audio Notetaker the automatic transcription will come out as a single paragraph. The automatic transcription is much more useful if the audio is already sectioned. The most efficient way to do this is to section the recording as you are making it. You can can also section existing audio both on the smartphone and on the desktop.

You can also use the Speechmatics automatic transcription service from within Audio Notetaker. This service adds paragraphing and punctuation and separates speakers. There are some privacy measures including that the recording and transcription are erased from the Speechmatics website once you download it, but because it is a web service it’s not possible to guarantee privacy.

It’s important to remember that automatic speech transcription is not perfect. There are going to be mistakes. Although the technology can be very good given a recording with no background noise, it’s important to verify any automatic transcription. If you’re only verifying the portions you need, it’s a good idea to keep track of what’s verified and what is not. The section colors in Audio Notetaker are one way to do this (mark a finished section a new color).

Uncorrected automatic transcriptions may be useful for searching, although it’s important to remember that automatic transcription mistakes can cause both false positives and false negatives.

When the recording is of adequate quality, the fastest method may be to start with automatic transcription, and then listen to the recording to correct and verify. When the recording quality is not good, however, it may be more comfortable and reassuring to transcribe manually than to correct numerous errors.

One More Transcription Tool

Back in the days of cassette recorders, foot pedals allowed you to control playback with your feet, leaving your hands free to type a transcript. The foot pedal is still an option. Xkeys makes a back-hinged pedal that connects to a PC or Mac. You can program the pedals with keyboard shortcuts. These can speed up both transcription and correcting automatic transcription.

Organizing: the Reporter’s notebook

Integrating the whole workflow from recording to transcribing to publishing can save a lot of time. For an oral history site publishing raw transcripts, that can be the whole workflow.

Journalists may want to do more, either as part of the reporting process or to enhance a published story. Reporters may see value in associating transcripts with notes, photographs of an interviewee, or what was written on a whiteboard during the interview. Reporters also need to be able to mark up and annotate transcripts, search across transcripts, and share transcripts – or certain portions of transcripts – with colleagues.

Organizing Best Practice

The most efficient way to organize is to work within the framework of the recording and transcription. Our best practice is to use the tools in Audio Notetaker to mark up transcripts. You can color text and, separately, color sections. You can add notes in a separate research column beside each section of the transcript. And you can add pictures – of people, places, objects or your own notes – in an image column. Transcript excerpts can be extracted according to background color with or without the notes in the notes column.

Audio Notetaker allows you to search across all your files by topic, speaker, title, any word or phrase in the body of the text, or any combination of those. For instance, you can search on all files that include “Nebraska” in the title for the words “pipeline”. When you search, the program returns a list of hits in context. Click on a hit and the file opens with the text highlighted.

You can also import slides and PDFs into Audio Notetaker and extract slide and PDF text into a text column so that these can also be included in searches across files.

Publishing and Sharing

The InSite system makes interactive transcripts easy to publish and share. The interactive transcripts give audiences the ability to navigate to a specific point in the video by clicking on the transcript. Viewers can easily clip and share sections of video and text. They can even make their own playlists of quotes from an interview or documentary film. Searchable transcripts also make audio and video more discoverable by audiences via search engines.

Publishers can easily annotate transcripts by connecting them at the sentence level to supporting material, including to other interactive transcript quotes. This provides layers of depth and transparency to recorded interviews.

Publishing and Sharing Best Practices

Our best practices for publishing are a combination of the WordPress content management system and videos and audio hosted on YouTube and played by Able Player. Able Player is a video player that supports the WebVTT standard and is available as a plugin to WordPress.

We implemented this system to bring interactive transcripts to Duke’s Rutherfurd Living History site. The interface is designed to make publishing efficient from the first to last step.

Navigating Video via Transcript

When you click a given sentence in the transcript, the video advances or rewinds to that sentence in the video. In this way, viewers can navigate around a video using its transcript. Each interview also has a drop-down list of subheadings. You can click a subheading on the drop-down list to navigate to the subhead in the transcript.

Searching

You can search the text of interview titles and descriptions site-wide or narrowed to a particular collection using the site search bar that appears near the top right of the homepage and collections pages.

You can also search the full text of any individual interview using the Cmd-F (Mac), Ctrl-F (PC) functionality of your browser. Here’s an example using the search term “Trump” in the Chrome browser.

Sharing at the Quote Level

When you select text, a share dialog box appears. Click on the Facebook or Twitter icon to share the excerpt plus a link that starts at the beginning of the text you highlighted. Click on the link icon to copy the excerpt and direct link.

You can copy an excerpt plus direct link and email it.

Or you can email a whole playlist of quotes.

Here are a couple of quote playlists that you can click on:

Key quotes from an interview with David Fahrenthold (from the Duke collection What Just Happened: Making Sense of the 2016 Election):

So I talked to a couple of tax experts after that and said, “Well, does that hold water, the idea that Trump’s business is just, you know, storing free of charge its painting on the wall of the sports bar?” And one of the tax experts said, “It’s hard to make an IRS auditor laugh, but that would do it.”http://livinghistory.sanford.duke.edu/interviews/david-fahrenthold/#226

Insightful quotes from leaders reminiscing about the 20th century (from the Duke collection The Lessons of Crisis: Vietnam and the Cold War):

Robert McNamara:
I was in high school and university at the time of the Great Depression, a time when 25% of the males, adult males, were unemployed, and there was tremendous distress in the country. The only reason I was able to attend the University was that it was in a sense a free university, one of the great universities of the world, but available at essentially no cost.http://livinghistory.sanford.duke.edu/interviews/robert-mcnamara/#126

Annotations and Cross-linking Quotes

Publishers can also annotate the transcript to provide additional layers of depth and transparency.

An annotation dialog box can contain any of seven types of content: text, block quote, image, gallery, map, file download, and links, including links scrubbed to a particular place in the same or another interactive transcript. This lets content providers cross-link within and between interviews, enabling readers to see conflicting or complementary versions or compare quotes.

Annotation boxes can be toggled open or closed by the viewer. The content provider can set each annotation box to be open or closed by default.

Timeline

The InSite system also contains a vertical timeline that can include the same types of annotations, including direct quotes from interviews.

An example from the PBS series FRONTLINE

In collaboration with Duke, the documentary series FRONTLINE adapted the InSite system for its web publishing to accompany the January, 2017 film Trump’s Road to the White House. The FRONTLINE adaptation allowed viewers to stream the film while seeing an annotated version of the script. Viewers could navigate to specific scenes by using the script. FRONTLINE used inline annotations rather than the annotation boxes.

Publishing Under the Hood

The key to making interactive transcripts practical is making sure all the steps along the way are as efficient as possible. Beyond recording and transcribing, this means finding good ways to connect transcripts to video and annotations, so that the package can be quickly and easily published on the web.

Formatting

To publish an interactive transcript you need an efficient way to add a timestamp to each sentence. We use Audio Notetaker to section the transcript and audio by sentence either during or after transcribing. Audio Notetaker automatically adds a timestamp to each section. Timestamping a transcript at the sentence level is what makes it possible to show each sentence highlighted on the transcript as the video plays.

We put supporting content (web links, maps, images, galleries, downloads and internal links that connect to a specific sentence of a transcript) into the reference column of Audio Notetaker. We then export the supporting content to WebVTT.

Here’s a view that shows supporting content in the reference column on the left and the automatic timestamps at the top of each section in the audio column on the right:

Exporting

Once the transcript is set in Audio Notetaker in the Text (middle) column, we export it into WebVTT, a format that can be read by a video player that supports the WebVTT standard. WebVTT includes the timestamps that show an audio player where to connect the transcript and video to make it interactive. This file is uploaded or pasted into a WebVTT window on the publishing site.

We use the video player Able Player. Although several video players support WebVTT, Able Player was the player that best supported key elements of WebVTT when we were testing players to use in our system. As a bonus, it is open source software, and there’s a plug-in for WordPress.

(Note: as of 2017-04-21 the Audio Notetaker WebVTT export feature is in the beta version of the software.)

Here’s a view of the first three sections of the transcript formatted in WebVTT:

00:21.000 –> 00:31.00
But, I understand from your record that that began after you were about 55 years old, that you have half a century of experience that leads up to that point.

00:31.000 –> 00:47.000
From the record, it appears that your first entry into official diplomatic work was in 1951 when you were chairman of the board of the National Sugar Refining Company and became ambassador to Argentina.

The export is pasted into the Transcript WebVTT field in WordPress:

Special Cases: Subheads and Paragraphs

As we put together this system it became apparent that transcripts are different than captions in several ways. Automatic captioning systems, including YouTube, generally segment by a given amount of time, such as every three seconds. But we wanted to highlight interactive transcripts by sentence, which meant we needed to provide timestamps at the start of each sentence. We accomplished this by sectioning by sentence in Audio Notetaker. We also needed a way to mark headings and paragraphs. These weren’t provided by the existing tools.

So we improvised. We adapted the WebVTT “NOTE” tag to indicate subheads and paragraphs. The NOTE tag is ordinarily used for notes that don’t show up on the live site. A WebVTT player knows to not show the text after a NOTE tag. We used “NOTE chapter” followed by some words to indicate that the words be treated as a subhead on the site. And we used “NOTE paragraph” to indicate a paragraph break in the text. These are understood by the site as long as they follow a blank line. In Audio Notetaker these tags appear immediately before a new section so they’ll be positioned before the next timestamp in the exported copy of the transcript.

These special tags are specially integrated into the Duke site. But we are also working with the makers of Able Player, asking them to integrate this use of the tags into the player.

The “NOTE chapter Unique History subhead and “NOTE paragraph” paragraph indicator both appear in this portion of a transcript in Audio Notetaker.

00:21.000 –> 00:31.000
But, I understand from your record that that began after you were about 55 years old, that you have half a century of experience that leads up to that point.

00:31.000 –> 00:47.000
From the record, it appears that your first entry into official diplomatic work was in 1951 when you were chairman of the board of the National Sugar Refining Company and became ambassador to Argentina.

NOTE paragraph

00:47.000 –> 00:54.000
Could you just tell us the story of that transition that you made from business life diplomatic work?

Annotations

Supporting content needs to appear as an annotation in the correct place on a webpage — beside the corresponding section of transcript. We collect this content in the research column in Audio Notetaker, beside the corresponding spot in the transcript, then export the research column including timestamps into WebVTT and paste it into the supporting content WebVTT field in WordPress:

Save the file, and the annotations appear on the site in the correct places beside the posted transcript.

Maps, image and gallery supporting content entries can be added in a drag-and-drop view.

Image entry in drag-and-drop supporting content tab:

Map entry in drag-and-drop supporting content tab:

Automating most of the annotations using WebVTT allows all the non-graphical annotations for a transcript to be posted in a couple of minutes versus a more common interface where annotations and timestamps are pasted into separate fields one-by-one. Using a direct WebVTT interface for annotations this way saves about half an hour for a transcript that contains 35 annotations.

Sharing: the big picture

Adding interactive transcripts to audio and video makes them more transparent not only because they’re more accessible to site visitors, but because it makes them searchable across the web.

The more oral history sites and publications that publish interactive transcripts, the more powerful the ecosystem becomes as sites are able to cross-link each other’s interviews.

Best Practice Resources: publishing and sharing

Improving the Tools

Give the Developers Feedback

Reach out with clear and constructive commentary or requests. Start by describing to a developer the feature or change you’re looking for:

I very much like that timestamps appear in the audio column, and I very much like that I can include timestamps when I export the text column. I’d also like to be able to copy a timestamp by clicking on it.

It’s also important to tell the developer why you want this by including a use case:

Use case: I’m emailing a colleague different versions of a quote – different lengths. I’d like to do this quickly – my aim is to include the timestamps of the different versions in that email without having to type out the time codes. So I’d like to section in one way and copy/paste the timecode for that version, then section in an alternative way and copy/paste the timecode for that version. Being able to copy the timestamp would save me time.

Here are places to encourage, make suggestions and complain to the developers of the software we’ve mentioned:

Technology to Watch

In the course of our research, we tested many applications, devices and services for recording, transcribing, organizing, publishing and sharing interviews. We took hundreds of pages of notes, and sent hundreds of emails to technologists explaining what’s needed, asking questions, and requesting features.

Technology to Watch is a distillation of some of that research. It contains a list of tools that are worth watching. These tools contain some elements that are useful for recording, transcribing, organizing, publishing and sharing, but aren’t currently as effective or easy to use as our current best practices. It’s always good to have alternatives, however, and we’d like to encourage as many alternatives as possible.

Join the Effort

We’re continuing to improve the InSite system by working with existing software makers to improve the best practices software and add alternatives.

We are also encouraging software developers – and especially open software developers – to provide new tools that improve the process of recording, transcribing, organizing and publishing interactive transcripts.

Open software efforts we’re currently working with:

We encourage volunteer programmers to join the open source Able Player effort. We’d like to see it fully support the W3C WebVTT standard, including nested cues and meta-tags. If you’re interested in helping with the Able Player effort, take a look at Able player on Github.

We’re also working with volunteer programmers on an open-source formatting tool that will parse any transcript that includes time codes into WebVTT, including those prepared manually by transcriptionists.

And we’re working with volunteer programmers to develop open-source software that will allow reporters to more easily correct transcripts generated by automatic transcription.

If you’d like to be part of these efforts, either contact me directly (kim@scriven.com). Or join the effort through CodeAlliance.

InSite and the Rutherfurd Living History program

The Rutherfurd Living History at Duke was used to develop the InSite system.

FRONTLINE is the first news organization to adapt InSite to its own publishing platform.

We’re encouraging other oral history collections and news sites to use the system to publish interactive transcripts that include shareable links at the sentence level and annotations that can point to shareable links.

This adds to the ecosystem of transcripts and transcript excerpts that are transparent, shareable and able to be cross-linked.

Our current Web configuration, including a link to the Github repository that contains the source code for the current site is detailed in the Colophon