Transcoding Preserving Captions

name=unnamed
author=Christopher Neufeld
webpage=none
short=Transcode to remove commercials, but retaining closed caption data
long=This framework supplies the means for a user job to do a cut on a MythTV recording, while still retaining the ability of MythTV to display closed captions. It also provides methods for transcoding the video to H.264 format, with a valid seek table, while preserving captions.
file=xcode_to_h264.pl
category=User Job Scripts
supports=S21:unset,S22:unset,S23:unset,S231:unset,S24:unset,S241:unset,S25:unset,S251:unset,S252:unset,S26:unset,S27:unset,S28:unset

Author

Christopher Neufeld

Description

This framework supplies the means for a user job to do a cut on a MythTV recording, while still retaining the ability of MythTV to display closed captions. It also provides methods for transcoding the video to H.264 format, with a valid seek table, while preserving captions.

Supports

A common request is to perform a transcode without losing the closed caption data in the ivtv data stream. For technical reasons, this is a difficult problem. The technique outlined here allows the preservation of captions during a transcode. This is done not by preserving the caption data in the transcoded stream, but by extracting it into a .srt file, which is a file that MythTV automatically detects and makes available if caption displaying is enabled. Once the .srt file is produced, it can be used with the stream however it is transcoded, so converting it to H.264 format is also supported, and still preserves the captions.

This script honours the cutlist, doing exact cuts even when the cuts are not located on keyframes. It does this by forcing extra keyframes at the edges of the cuts during the transcode, then cutting between these new keyframes.

This technique makes use of the pull-captions.pl script shown at Closed_captioning, which in turn depends on the ccextractor project (see that page for details).

To use this script, first put the following .pm file into your Perl library path. The name I've used is MythXCode.pm, but you can rename that with a global search/replace if it collides with something in your namespace. To view the documentation for MythXCode.pm, simply run:

pod2man MythXCode.pm | groff -man -Tascii | less

Changes

2013-01-03:

Updated for compatibility with MythTV 0.25.2. Added the ability to use .srt files that exist at the time of the invocation, for instance if produced by an HD-PVR recording using the technique at Captions_With_HD_PVR. There have been some API changes since the original version. One now calls $worker->prepare_captions() in the place of $worker->cut_and_caption(). The latter function no longer exists.

We now use -qmax instead of forced bitrate values when generating the H.264 transcode.

We now use the default (High) profile, rather than Baseline.

We no longer support generating a lossless transcode preserving captions by omitting the transcode_to_h264() function call. That behaviour can be easily obtained by running the pull_captions.pl script then transcoding through the usual interfaces, this script isn't needed for that.

Edited some documentation related to the debug level and testing for new database schema. The backend process no longer logs the output of user jobs, so the script must now be run from the command line instead, and the on-screen output examined.

2014-01-04:

Updated for compatibility with MythTV 0.27. Note that the arguments to invoke the script must change, %STARTTIMEISOUTC% in place of %STARTTIMEISO%.

Read the documentation for MythXCode.pm for more details. You can set up a locking scheme that allows transcode jobs to move out of the way of commercial flagging jobs, in the event that your transcoding runs take a very long time. You can (and probably should) use the backup facility to produce files capable of exactly restoring the state of the recording in case something goes wrong. You can use a remote ffmpeg offload engine (I do, as a 64-bit box is much faster than my 32-bit backend on transcodes).