These two were my first (and only) failed attempts at producing SRT files (no --video) out of unencrypted DVDs using t2extract:

1. Initially using the GUI, I obtained the info about VTS and PGCs; I then ported it into several command lines, one by one (-sub EN):
--vts 1 --pgc 2 worked fine, yielding a 17 minute usable if not perfect SRT
--vts 1 --pgc 3 same, 18 minute clip, OK
--vts 1 --pgc 4 worked for 42% and abnormally terminated t2extract
(same result, same spot when using only GUI, so it appears data dependent)

2. Different DVD (from a PBS show, via a standalone DVD burner), this time attempting to extract from caption. First I tried the logical (program chain) extraction, resulting in one large SRT, then the physical (VOB based) one, yielding 2 SRT files (which, if concatenated, would equal the first). Completion was fine, text rendering as well, but there were two problems:
- a minor one, the character pair HD popped up at random intervals in the SRT file text - was easily corrected by a replace all.
- a more serious one, whereby the time stamps on the title lines were not in ascending temporal sequence; when played back via Streambaby, captions used themselves out at high speed, in the first minute of play.
I am told the original show (in HD) displayed the captions properly on TV when aired.

Chalk it up to beginners (bad) luck? I would appreciate any help before re-venturing. Thanks in advance

These two were my first (and only) failed attempts at producing SRT files (no --video) out of unencrypted DVDs using t2extract:

1. Initially using the GUI, I obtained the info about VTS and PGCs; I then ported it into several command lines, one by one (-sub EN):
--vts 1 --pgc 2 worked fine, yielding a 17 minute usable if not perfect SRT
--vts 1 --pgc 3 same, 18 minute clip, OK
--vts 1 --pgc 4 worked for 42% and abnormally terminated t2extract
(same result, same spot when using only GUI, so it appears data dependent)

The are not many settings or adjustments that you can make in this process so it is unlikely to be beginners luck. At the same time, there are many ways to prepare a decrypted DVD for processing. This makes it hard to say anything definitive about t2sami without seeing the streams either. If you want me to try, PM me because I will need to get a copy of the stream and there is no reason to run that transaction in the forum. Note that subtitles are video streams and have to be converted to text via OCR so they will never be prefect.

Quote:

Originally Posted by gliobene

2. Different DVD (from a PBS show, via a standalone DVD burner), this time attempting to extract from caption. First I tried the logical (program chain) extraction, resulting in one large SRT, then the physical (VOB based) one, yielding 2 SRT files (which, if concatenated, would equal the first). Completion was fine, text rendering as well, but there were two problems:
- a minor one, the character pair HD popped up at random intervals in the SRT file text - was easily corrected by a replace all.
- a more serious one, whereby the time stamps on the title lines were not in ascending temporal sequence; when played back via Streambaby, captions used themselves out at high speed, in the first minute of play.
I am told the original show (in HD) displayed the captions properly on TV when aired.

t2sami has only be tested against Tivos and commercial DVDs so I can't speak to its reliability against other recorders - PVRs or DVDs. There are no standards for recording captions for manufacturers to follow so their methods and results vary widely. However, if the resulting .srt files does not have a monotonically increasing time index, it will mess up caption playback. I have a Panasonic DVD recorder that will sometimes produce great captions up to the end where the time index drops back to 00:00:00.0. I usually have to delete the stuff at the end to make it work well. In the .srt file, every caption starts with its time index in the form of hh:mm:ss.

You could try VideoReDo QSF with this case. Use the "Open Title from DVD" to save the logical program to an MPEG and then extract the captions from that. This approach runs QSF as part of the process cleaning up any stream glitches that might be there. Using ccextractor is also an alternative but since I never use it, I can't give you any advice.

__________________To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.

This makes it hard to say anything definitive about t2sami without seeing the streams either. If you want me to try, PM me because I will need to get a copy of the stream and there is no reason to run that transaction in the forum.

Thanks for your suggestions. I'll PM you once I figure out how to cut the stream down to size while preserving both a piece of the bad and one of the good program chain (as a mini-DVD). I guess the important thing is to see why the abnormal termination (could it be in the OCR?). I have saved all dll names present at the time as well as the T2extract offset.
I will also try the CCextractor route for the out-of-sequence SRT.
Thanks again for the prompt response.

Note that subtitles are video streams and have to be converted to text via OCR so they will never be prefect.
...
t2sami has only be tested against Tivos and commercial DVDs so I can't speak to its reliability against other recorders - PVRs or DVDs. There are no standards for recording captions for manufacturers to follow so their methods and results vary widely.

Woah! Is OCR also employed if one uses kmttg to transfer and decrypt hi def video (resulting in a .mpg file) from a TiVoHD?

It seems that at least thru HDMI and component video while in 720p or 1080i modes, there's no means of carrying CC data... one most turned on CC via TiVo UI. It's totally unavailable on my TV in those two modes.

I only care about t2sami for the purposes of having some sort of subtitles when trying to archive my HD content to DVD +/-R. It seems I've settled on making AVCHD discs.

No, OCR is only used when we convert DVD subtitle streams to text captioning files such as .srt. Those streams are secondary video streams that get overlaid on the main video and are bitmap images not text. As a result OCR is the only way to end up with an .srt version.

Internally the TivoHD uses text captions for everything. When it wants to send captions across HDMI or component, it renders them as video and overlays them onto the outgoing video information itself. That is a final step that we never see in any stored program material. There is no industry standard for passing them to your television in any other way at 720p or 1080i.

__________________To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.

if the resulting .srt files does not have a monotonically increasing time index, it will mess up caption playback. I have a Panasonic DVD recorder that will sometimes produce great captions up to the end where the time index drops back to 00:00:00.0. I usually have to delete the stuff at the end to make it work well. In the .srt file, every caption starts with its time index in the form of hh:mm:ss.

You could try VideoReDo QSF with this case. Use the "Open Title from DVD" to save the logical program to an MPEG and then extract the captions from that. This approach runs QSF as part of the process cleaning up any stream glitches that might be there. Using ccextractor is also an alternative but since I never use it, I can't give you any advice.

Curiously, all these inversions were all about a minute.
I must be the unluckiest guy - the third DVD I tried to subtitle had two sub streams (2 PGCs) - one completed OK, the other got stuck at 34% in the SRT file (as opposed to my first sub attempt, where T2extract terminated abnormally, due to, I now believe, the fact that the file had been truncated while being copied from a drive to a memory stick).
I have a question I think Emillion first asked a while ago: what names should I give the SRT files if I want to streambaby from the VOBs directly (w/o mpg generation), given that chains can span over different VOBs? The name of the first (VTS-01-1.SRT) if I press FF or the directory (VIDEO_TS.SRT) if I hit play on it? What if there are several episodes? (For captioning, my approach is one SRT per VOB.)
Also, and I guess this too is a Streambaby question, how can I correctly render accented subs (like in VOBSUB, where the font script can be selected in the configuration)?

I have a question I think Emillion first asked a while ago: what names should I give the SRT files if I want to streambaby from the VOBs directly (w/o mpg generation), given that chains can span over different VOBs? The name of the first (VTS-01-1.SRT) if I press FF or the directory (VIDEO_TS.SRT) if I hit play on it? What if there are several episodes? (For captioning, my approach is one SRT per VOB.)

As far as I know most players including Streambaby require that the name of the .srt file match the playback file. So if the video is VTS_17_1.VOB, the .srt file must be VTS_17_1.SRT to be found.

In general, I would not recommend using VOBs for episodic disks, extras and features or if you want to use subtitles as a caption source. In all of these cases it is more reliable to work from the logical program chain ( VTS, PGC ). VOBs are related more to file systems than they are to files. They contain multiple logical files and in episodic disks, the individual episodes often begin in one VOB and conclude in another. Typically, the logical chains are sequential and in order so concatenating the VOBs to produce one long file works. The fact that this does work is happenstance however. DVD players always use the logical chains and the logical pieces withing the VOB can be interleaved, mixed with other content or out of order. If they are, you get garbage out. To extract an episode or get the subtitle video stream for OCR, you really need to follow the logical chain and piece it together. Trying to get something useful out of whole VOBs can be dicey.

In part this is why I would like to find the source of this problem. ccextractor and VideoRedo will help if you are using DVD closed captions but I don't think either will help if you want captions from the subtitles. I think I need to fix t2sami directly to resolve that.

__________________To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.

most players including Streambaby require that the name of the .srt file match the playback file...

Typically, the logical chains are sequential and in order so concatenating the VOBs to produce one long file works

I understand the difficulty with the titles as SRT. I am also attracted to an as unintrusive solution as possible - i.e., moving the least amount of bytes in the given hard drive-based DVD library that plays double duty: TIVO and PC playback (via PowerDVD, etc).
So if, as you say, the VOB concatenation works (albeit only de facto), why not also their logical concatenation, implemented by Streambaby I believe via the FF key? If Streambaby actually treated this as a true concatenation (I don't think it does now), then I could also concatenate the PGC SRTs generated by T2sami (by a joint that applies straight offsets to the time stamps) and rename the larger result something like VTS_01_1.SRT (to match the file name I started playback with).
Better yet, an improvement could be made to Streambaby where it would automatically switch to the next SRT file (by adopting a simple numbering scheme such as that used by t2sami) as soon as a new chain is to begin. (The only downside is: to play PGC3, you'd need to FF from the beginning.)
Of course, in an ideal world of personal super-dupercomputers, Streambaby should call t2sami (GOCR and all) on the fly and stream the A/V with the resulting text.

Until then, I will shortly PM you with the hanging title DVD - thanks again for your support.

However, if the resulting .srt files does not have a monotonically increasing time index, it will mess up caption playback. I have a Panasonic DVD recorder that will sometimes produce great captions up to the end where the time index drops back to 00:00:00.0.

I wonder if there's a way to make the end times in the srt file at least greater than or equal to the start time on each line within t2extract?

For example, I recorded SNL, and then used kmttg to bring to iTunes for watching later on Apple TV 2.

The first two captions in the .srt file are:

1
00:00:42,968 --> 00:00:43,468
>>> WE'RE OUT OF TIME.

2
00:00:43,469 --> 00:00:43,468
>>> WE'RE OUT OF TIME.
TH

Note that the end time for caption #2 (43 seconds, 468 milliseconds) is before the start time (43 seconds, 469 milliseconds).

Apple TV 2 doesn't deal well with this, and displays this caption for the remainder of the show.

I honestly think this is an Apple TV bug, but wonder if there's a way to make t2extract replace the 468 with (say) 469 in this case?