Transcription Guidelines for Captioning

Closed Captioning your files with CaptionSync allows you to submit your own verbatim transcript. This article shows how to format it correctly.

How do I format and upload my own transcript?

CaptionSync allows you to upload your own formatted transcript. An accurate transcript is the basis from which our automated captioning process generates captions. So please follow these guidelines:

Check the table of contents below and format your transcript according to the guidelines. At the very least, use properly formatted speaker IDs and parenthetical comments, and save your transcript as a UTF-8 .txt file.

Note that you need to submit a verbatim transcript of the audio content; not a script or a screenplay. Ensure you also remove all the ancillary text such as title, date, author, header, footer, etc.

If your content is "sweetened", i.e., it's mixed with music, noise, sound-effects, talk-over, pauses or unclear audio, ensure you use sync markers to improve the results. A complete description of how to use sync markers in the transcript is available in our Sync Marker Summary article.

We recommend using Notepad, Word or TextEdit to create the transcript file.

If you prefer to not have to format your own transcript, you can request Transcription too and send your existing text as guidance for the transcriber. Our transcribers are trained to generate well-formatted transcripts that yield optimal results with our automated system.

1. General Guidelines

1.1. Transcribe Verbatim:

The words in the audio should be transcribed exactly as the speaker says them, in the same order, with no additions or deletions. Ensure you remove all the ancillary text such as title, date, author, pagination, etc. Transcribers sometimes extract the meaning of the language in the audio by summarizing slightly or by leaving out segments of speech that may not interfere with the meaning. For goals other than automated captioning, this can serve a useful purpose, but for automated captioning, the transcript must match the audio, even for sentence fragments. Furthermore, if the audio is not readily audible simply transcribe [inaudible].

Example:

Speaker: Alright ladies and gentlemen. If we could get started, please.

Transcription for automated captioning:

Correct: Alright ladies and gentlemen. If we could get started, please. Incorrect: If we could get started ladies and gentlemen.

Exceptions - What does not need to be transcribed:

Speaker hesitations and disfluencies, such as “um”, “uh,” “mmm” do not need to be transcribed. It is ok to include these in the transcription, but not absolutely necessary.

If a speaker backs up a bit and repeats a short phrase, it is not absolutely necessary to transcribe this. Again, it is fine to include these in the transcription, but not required.

Example:

Speaker: I have…I have some …um…administrative announcements.

Transcription for automated captioning:

OK: I have some administrative announcements.

1.2. Spell Out Words Instead of Using Symbols:

Special symbols in the text can lead to uncertainty in exactly what was said, making it more difficult for automated captioning. Also, many special symbols are not included in the standard character set for captioning. Instead, transcribe the exact words of the speaker. For example, if the speaker says something like “backslash”, don’t try to use a backslash symbol in the transcription. Instead, spell it out. This is true for all mathematical and other representational expressions, such as N2 (use “N squared”), or division (use “divided by”) or multiplication signs, for example.

Exceptions - Digits, such as “6” instead of “six,” or “25” instead of “twenty-five,” are OK.

Example:

Speaker:It is written like this: eight backslash twenty-five.

Transcription for automated captioning:

Correct: It is written like this: Eight backslash twenty-five. Correct: It is written like this: 8 backslash 25. Incorrect: It is written like this: 8 \ 25.

1.3. Omit Background Noise/Sounds from the Transcription:

Background sounds should not be transcribed, or if the transcriber feels it is necessary for the caption reader to read to understand, then these noises should be transcribed using square brackets to set them off.

Example:

Speaker: Let’s pause so you can discuss among yourselves.Background noise while students discuss among themselves for a few minutes.Speaker: Ok, let’s compare notes.

Transcription for automated captioning:

Correct: Let's pause so you can discuss among yourselves. Ok, let's compare notes.

Incorrect: Let's pause so you can discuss among yourselves. Now we hear some background noise. Ok, let's compare notes.

1.4. Speaker IDs:

Speaker intros can be formatted in a number of acceptable ways:

Standalone multiple chevrons. e.g. >> Hi!

Multiple chevrons, name colon. e.g. >> Brent: Hi!

Open square brace, name colon, close square brace. No space before closing brace. e.g. [Brent:] Hi!

Example:

Speaker named Paul: Let’s take a look at this function.

Transcription for automated captioning:

Correct: [Paul:] Let's take a look at this function. Correct: >> Paul: Let's take a look at this function. Correct: Let's take a look at this function. Correct: >> Let's take a look at this function. Incorrect: Paul: Let's take a look at this function.

Make sure the speaker IDsare not larger than 58 characters. A speaker ID cannot span captions. Make sure you choose a line length that accommodates for unusually large speaker IDs and the first word on their speech -- the size of a caption, that includes a speaker ID in it, must accommodate the max size of the speaker ID (58) plus the first word the speaker says. Don't include an abundance of punctuation or special characters on the speaker IDs. E.g. it is correct to write >> Dr. Patrick Smith, Geology Lecturer: , but not >> Dr. *Patrick Smith*, our Geology Lecturer!: .

Example:

The settings on your submission are a Line Length of 32 characters and 2 Lines per Caption.

Please note that our transcribersidentify speaker changes just with a double chevron, e.g., >> Speech . If you're making a Captioning and/or Transcription request, and wish to have our transcribers identify speakers by name (e.g. >> Pat: Speech ), you need to make that request in the Guidance for Transcriber field, on the New Submission page.

1.5. Use Square Brackets for non-spoken Content:

Non-spoken content (e.g., music, applause, noise, etc), or any content that is not present in the audio (e.g., credits or Speaker IDs), must be enclosed in square brackets (parenthetical comments). The transcript should contain only what the speaker said, and nothing more. Any other content must be in square brackets.

Note that CaptionSync differentiates between parentheticals with spaces and those without, i.e.:[ Laughter ] is different than [Laughter]. The former is a standalone descriptive caption, whereas the latter is an inline comment within a caption. So speaker introductions should not have a space before the closing brackets.

Example:

Speaker named Paul throws his chalk then says: Let’s take a look at this function.

Transcription for automated captioning:

Correct: [ Throws chalk ] [Paul:] Let's take a look at this function. Correct: [ Throws chalk ] Let's take a look at this function. Correct: Let's take a look at this function. Incorrect: (Throws chalk) Paul: Let's take a look at this function.

Example:

Children playing, music and multiple speakers: Where's the yellow bike? I left it (inaudible). (Shouts) Did you see it?

Transcription for automated captioning:

^M00:03:36[ Children playing ]^M00:03:45[ Music ]^M00:04:28>> Where's the yellow bike? >> I left it [inaudible]. >> [Shouts] Did you see it?

Example:

Multiple speakers saying the same thing.

Transcription for automated captioning:

>> [All Together] We'll be back! We'll be back!

>> [Al Unísono] Feliz Cumpleaños!

1.6. Ensure Square Brackets are Matched and are Symmetrical:

Ensure that every opening square bracket has a matching closing one and the spacing matches.

1.7. Avoid Abbreviations:

Avoid using abbreviations in the text whenever possible, as they are not always clear to an automated parser. “St.” for example, could mean “saint” or “street”. “No.” could be a statement, or an abbreviation for “number”.

Example:

Speaker: Use a number one pencil.

Transcription for automated captioning:

Correct: Use a number one pencil. Correct: Use a #1 pencil. Incorrect: Use a No. one pencil.

1.8. Avoid High ASCII Characters:

Depending on the media type, captions usually use a very restricted character set. Most characters in the so-called "high ASCII" set are not permitted. Characters such as special symbols (e.g. the degree symbol: °), or single quotes (e.g. ’) should not be used. Because they are not permitted in the captioning output, they are replaced by a space by the automated system – this can result in some odd-looking captions. For special symbols, type out the name; and for single quotes, use the apostrophe symbol. Formatting like bold, different font types, bullet points, etc., are also not required and can interfere with the automation process. Keep the formatting as simple as possible.

If you are using Microsoft Word, you need to turn off "Smart quotes" to prevent it from automatically using quotes instead of apostrophes. To do this, go to Tools -> AutoCorrect Options -> AutoFormat As You Type, and turn off both smart quotes and symbol characters. This will make all subsequent typing without those high ASCII characters, but does not correct what has already been typed!

3.2. Sync Markers:

AST uses the following markups to communicate timing information. The frame (:ff) is optional. This is particularly useful to isolate intro music or heavy “sweetening”.

Markup

Description

^Bhh:mm:ss:ff

Begin synchronization at this timestamp.

^Ehh:mm:ss:ff

End synchronization at this timestamp.

^Mhh:mm:ss:ff

Arbitrary midstream marker at this timestamp.

^Fhh:mm:ss:ff

Hard end caption at this timestamp. Example: a ^E00:00:30 marker will put an end marker after the current text at 30 seconds, but that caption will be allowed to end normally -- i.e., it is subject to all of the caption timing rules about minimum hang, distance from subsequent caption, and caption gapping. While a ^F00:00:30 puts an end marker after the current text at 30 seconds, and forces that caption to end at 00:00:30.

You can use the ^B and ^E as many times as you like, but they must be logical.Example:

^B00:00:01 Hello Walter.>> What time does this end?^B00:00:09:26 Never!!This is invalid since you cannot have two begins in a row.

^B and ^M tags refer to time at the beginning of the caption -- they will start a new caption if placed in the middle.

^E tags refer to the time at the end of the caption -- it will end the caption where it is placed.Example:

This text is ignored ^B00:01:02 Robert Smith, correct?.>> What's the number called ^E00:01:04:20This text will be ignored too^B00:02:01:20 Great tune! We're working again...

This is valid, but keep in mind that text before the ^B or after the ^E is ignored.

Robert Smith, correct? This caption starts after 00:01:02:00>> What's the number called This caption ends after 00:01:04:20Great tune! This caption starts after 00:02:01:20We're working again...

The timestamps must be increasing!Example:

^M00:04:01:20 Slow down!^M00:04:01 ...or else!!This is invalid because the second timestamp is smaller than the first (00:04:01:00 < 00:04:01:20).

3.3. Style:

AST uses the following markup to apply style or print special characters:

Markup

Description

^IT

Adds Italics to the current style. In effect until reset to Normal

^UL

Adds Underline to current style. In effect until reset to Normal

^ST

Adds Bold to current style. In effect until reset to Normal

^NO

Resets all formatting to Normal

^MU

Prints the music symbol character

^P

Forces a paragraph break in the clean transcript, i.e., creates a new paragraph

Example:

I need the following words in italics. ^ITUsing these markers, this text will be in italics.^NO ^M00:00:21:20 Let's add a marker, then a music symbol ^MU. Don't worry about spaces!

This gets presented as follows:

I need the following words in italics.Using these markers, this text will be in italics.Let's add a marker, then a music symbol ♪.Don't worry about spaces!

Note that only the short forms described in the table above are supported: ^IT, ^NO and ^MU.

Note that for some outputs a suitable replacement for the ♪ symbol will be presented, as not all of them support this symbol.

3.4. Position:

AST uses the following markup to apply positioning to individual captions. Note that results will only be visible in formats that store positioning data:

Markup

Description

^TO

Caption at top of CEA-608 area (top of the screen).

^BO

Caption at bottom of CEA-608 area (bottom of the screen).

^RI

Right justification of the caption.

^LE

Left justification of the caption.

^CE

Center justification of the caption.

3.5. Escape Sequences:

If the captions are not for the EIA-608 character set (e.g. broadcast constrained), the following escape sequences can be used:

Markup

Description

\\

Prints the \

\^

Prints the ^, and interprets following characters as spoken words

\*

Prints the *

\[

Prints the [, and does not apply descriptive text processing rules

\]

Prints the ], and does not apply descriptive text processing rules

Notes:

If ^ or * are seen without the backslash they are passed through for webcasts. If they are seen for broadcast, the transcript is rejected.Example:

This webcast shows the \^MUSIC symbol syntax plus \^\*. \[ and that this text is in the audio! \]This gets presented as follows:This webcast shows the ^MUSIC symbol syntax plus ^*.[ and that this text is in the audio! ]

The following escape sequences can be used in both broadcast and web captions:

\.

Do not treat this period as end of sentence

\?

Do not treat this question mark as end of sentence

\!

Do not treat this exclamation point as end of sentence

Example:

This punctuation should be banned\. and\? or limited. Right?This gets presented as follows:This punctuation should be banned. and? or limited.Right?