Video begins on first keyframe while audio begins at start of file

I have noticed that Wowza video-on-demand begins video playback at the first keyframe, rather than the actual start of the file, but audio begins at the start regardless of keyframe. This creates sync issues. For example, in our workflow we might:

1) live stream and record for two hours at our desired streaming settings (frame rate 30, keyframe every 150 frames)

2) use QuickTime Pro to delete the first 30 minutes from the resulting file (maybe this was just padding before our event, or maybe there were consecutive events recorded to one file, and we want to stream the second event on-demand)

3) use QuickTime Pro to quickly export the trimmed video to a MP4 container (passing through the existing video/audio rather than re-exporting)

4) copy this MP4 to a streaming server for on-demand streaming

When streamed on-demand in Wowza (by any method, either RTMP, RTSP, or HTTP), the audio and video begin simultaneously when you hit play, but audio starts precisely from the trim point, while the video starts from the first keyframe (which might be up to 5 seconds later). So the audio and video are out of sync.

This does not happen when streaming the same file through Darwin streaming server (RTSP) -- the first few seconds might show "green" video until the first keyframe comes, but the video and audio are in sync.

(Sorry for the lack of demo files/streams to illustrate this -- I can provide some later if necessary.)

I have tried to work around this issue by trimming exactly to a keyframe point instead of an arbitrary time (using MPEG Streamclip, which is the only free software I know which locates keyframe times), but this doesn't help us when we record two video sources for parallel streaming playback in a SMIL file (for example, a camera feed next to a laptop feed). Even though both source streams/files get the same specified frame and keyframe rates in the encoder (Wirecast or QuickTime Broadcaster), no encoder seems able to maintain a precise enough frame rate to keep the keyframes aligned across the two files, even just for a few minutes. (Again, Darwin Streaming Server has no problem streaming two files in parallel in SMIL with different keyframe intervals -- green video is simply shown whenever starting or seeking at a non-keyframe point.)

Is there any way to correct this behavior in Wowza, just for RTSP? I know I could re-encode the trimmed video to create new keyframes, but that can take a lot of time and will degrade the video quality. I haven't had a chance to look yet, but I'm guessing this throws off parallel SMIL sync in Wowza even when I don't trim the beginning of the files.

Richard -- thanks. I assume that only affects when the Wowza server starts recording the file? I had been recording locally using Wirecast (or QuickTime Broadcaster), then editing and uploading to the Wowza server, so I don't think this would affect me.

I am recording with Wowza too, but mainly as a backup, although I could try streaming that file instead (back in Wowza 2 I thought those recorded files showed a lot of artifacts etc. but maybe that has improved -- I am on a high speed reliable LAN). Is there an easy way to send a reset/restart command to Wowza, through JMX, so I can keep the stream running but start/restart the recorded file at the desired point?

Back to the original issue: can this starting to play at keyframe issue indeed affect audio sync, if there isn't a keyframe at the very beginning of the file?

Yes, I think audio before the first key frame can be the problem. I'm not sure how to fix on the encoding side with your workflow. Try recording with Wowza that is on the same machine or lan as the encoder (to avoid artifacts from data loss) with that Property setting

You said, "but this doesn't help us when we record two video sources for parallel streaming playback in a SMIL file (for example, a camera feed next to a laptop feed). no encoder seems able to maintain a precise enough frame rate to keep the keyframes aligned across the two files, even just for a few minutes"

I don't understand why you need keyframe alignment here. Keyframe alignment applies to multibitrate switching which doesn't apply when you have different video sources. Maybe you're just switching between videos? Then enhanced seek should work for you.

I had never heard of the "seekTarget" variable before, but changing it to "enhanced" did not fix the issue. I definitely see the "enhanced" seek behavior now, but it seems the audio still begins and stays out of sync unless there is a keyframe at the beginning of the file in RTMP. Same for RTSP too. (Is there any difference between "enhanced" or "audio" as seekTarget values?)

The original source was a Wirecast m4v file. Then I used QuickTime Pro to trim a random small segment from the middle and export to an MP4 container without re-encoding. (I tried trimming with FFMPEG too but I got similar poor results.) If I instead trim a segment beginning at a keyframe, everything works fine.

My point about keyframe alignment is that if I have two synchronized movies (live recordings started and stopped at the same time) with different or drifting keyframe intervals, Wowza can't maintain sync when RTSP streaming them in parallel (side-by-side) in a SMIL file. Even if they both start on a keyframe, they drift out of sync when seeking, presumably as their keyframes drift.

So I'm stuck using a separate Darwin server to do this RTSP stuff, in addition to Wowza. Would love to finally consolidate if I can get this sorted out.

We do not currently support edit lists which are used to synchronize the audio and video when they are written unsynchronized. This means the files will play back unsynchronized. We may address in the future, but I don't know when. I'll pass along your feedback to our product management team.

It plays fine locally, but it played fine locally before doing this too.

If I do a re-encode (ffmpeg -i test2.mp4 -vcodec libx264 -acodec aac -strict -2 test2reencode.mp4), I get a 24 second file where the first ~4 seconds are silent when played locally so the audio stays in sync. Still out of sync in Wowza, however:

Remember, that temp2.mp4 file was just a random segment trimmed with QuickTime Pro from a larger file recorded with keyframes every 5 seconds. So it sounds like the start point I randomly picked was about four seconds after a keyframe, and roughly one second before the next (hence why FFMPEG reports a -4 start time, and your examination found the next keyframe around +1). VLC, QuickTime, Darwin, etc. all play/stream temp2.mp4 beginning at my non-keyframe start point just fine. FFMPEG starts the video at the -4 keyframe and waits until 0 to start the audio. But Wowza wants to simultaneously start the video from the -4 keyframe and the audio from 0 and thus won't be in sync.

I was hoping to do edits within files and stream them in Wowza without re-encoding, but I'm finding that's just not possible unless my edit start point lands on a keyframe. I've tried QuickTime Pro, MPEG Streamclip, and FFMPEG -- is there some other tool I should be using?

We currently use ffmpeg to create video clips from existing files, using the -vcodec copy command, which as previously stated does not necessarily start the clip on an I-frame (it actually almost never will when using the accurate seek method stated here:
http://ffmpeg.org/trac/ffmpeg/wiki/Seeking%20with%20FFmpeg)

The clips play fine locally as well as just using the standard HTML5 <video> tag. They will start and show a static image, black screen, or some gibberish while the audio plays in the background, and as soon as it hits a keyframe the video will start and everything will be in sync.

Using Flash and Wowza's RTMP stream however starts the video instantly at the nearest keyframe, and the audio at the beginning of the actual stream, so they are out of sync, and the audio will end before the video.

I was just wondering if there have been any updates or added features/variables to Wowza in the last year that may be able to alter this behavior? This is the only thread I was able to find that accurately explains the issue.

Regarding seek, you can either re-encode setting a strict keyframe interval with ffmpeg. Or use enhanced seek to seek to specific timecodes using flash. Definitely look into recording the two streams with Wowza using the LiveRecordModule to set specific record times.