audio/video sync in SMPEG

Hi,
Recently I put some efforts in trying to understand
the audio/video synchronisation in smpeg and wasn't
comfortable with what I understood. So I decided to
prepare this writeup on what I digested and place
it before the group for critique. In the process I hope
to learn the missing pieces.
<CONFESS>
Before I start I must admit that my knowledge of
MPEG1 is what I got from the book "Video Demystified",
brief chapter on MPEG1.
Also what I write here could be wrong, very wrong
</CONFESS>
Inn the following XXXX marks the spot where I
am very confused and look forward to some guidance.
--------
Synchronisation in smpeg:
===========================
In smpeg, a/v sync is broadly achieved using MPEG
Presentation Time Stamps (PTS).
Presentation time stamp:
" ..The optional PTS is a 33-bit number coded using three
fields, separated by marker bits. PTS indicates the
intended time of display by the decodder. The value
of PTS is the number of periods of a 90 kHZ system
clock. This field is present only if PTS_bits is
present and stream_ID not equal to private stream 2...."
From what is seen so far, it appears that a time stamp
is calcualted for each packet and is inserted into the
MPEGstream along with the packet data. Mainly the
time stamp is derived from the PTS.
AUDIO SYNCHRONISATION
=====================
As raw audio output (afer decoding) is played back by the
SDL, SDL driver independently synchronises the audio playback
based on the expected time for one frame of audio .
That is, there is NO LINK BETWEEN THE MPEG TIMESTAMPS AND THE SDL
SYNCHRONISATION !!! (XXXX)
typedef struct {
int freq; /* DSP frequency -- samples per second */
Uint16 format; /* Audio data format */
Uint8 channels; /* Number of channels: 1 mono, 2 stereo */
Uint8 silence; /* Audio buffer silence value (calculated) */
Uint16 samples; /* Audio buffer size in samples (power of 2) */
Uint16 padding; /* Necessary for some compile environments */
Uint32 size; /* Audio buffer size in bytes (calculated) */
.......
} SDL_AudioSpec;
From this information two values are calculated:
#define frame_ticks (this->hidden->frame_ticks)
- based on spec data structure , the time duration
of one audio frame is calculated in ticks
(nothing but milliseconds)
#define next_frame (this->hidden->next_frame)
- This contains the expected start of time for
the next frame.
frame_ticks = (float)(spec->samples*1000)/spec->freq;
next_frame = SDL_GetTicks()+frame_ticks;
One thing that has bugged me is the SDL_Audiospec->samples member.
How is this computed ? XXXX
The actual synchronisation....
#ifndef USE_BLOCKING_WRITES /* Not necessary when
using blocking writes */
/* See if we need to use timed audio synchronization */
if ( frame_ticks ) {
/* Use timer for general audio synchronization */
Sint32 ticks;
ticks = ((Sint32)(next_frame - SDL_GetTicks()))-FUDGE_TICKS;
SDL_Delay(ticks);
Preparing for the next frame...
/* If timer synchronization is enabled, set the next write frame */
if ( frame_ticks ) {
next_frame += frame_ticks;
}
Note:
- synchronisation means a wait (the assumption being that
the incoming data rate is much faster than the rate at
which the driver is outputting) XXXX
If it is not underruns at the device driver level? XXXX
- FUDGE_TICKS - this is linked to RR scheduling delays
(overheads). How to calibrate this ? XXXX
SMPEG: Time stamp processing in MPEGaudio
=========================================
The MPEG PTS info is kept in this array found in
MPEGaudio class.
/* Timestamp sync */
#define N_TIMESTAMPS 5
double timestamp[N_TIMESTAMPS];
Here timestamp[] is used as a FIFO to store and then use the timestamps.
PTS -> stream -> MPEGring -> timestamp array[]
The timestamp info is put into the MPEGRing by mpeg audio decoder
(in Decode_MPEGaudio).
In MPEGaudio::run, the timestamp info is read from the MPEGstream
and put into the MPEGring buffer in the call
audio->ring->WriteDone(...., timestamp);
Time stamps from the MPEG ring are used to compare with the
audio->Time() (which internally uses the play_time variable) and
calculate the difference. This difference is added to the
play_time to keep it in sync with the PTS. The main purpose of
timestamp[] array seems to keep the Time() virtual method to
return the correct time of the playback.
MPEG::seekIntoStream(int position) methos is used by
MPEG:Rewind and MPEGL:Seek to implement the functionality.
The seekIntoStream uses the MPEGstream-> time() method to
get the timestamp info.
In addition, there is an usage of frags_playing and frag_time variables
as shown below. Their meanings have completely eluded me XXXX.
int Play_MPEGaudio(MPEGaudio *audio, Uint8 *stream, int len)
/* Increment the current play time (assuming fixed frag size) */
switch (audio->frags_playing++) {
// Vivien: Well... the theorical way seems good to me :-)
case 0: /* The first audio buffer is being filled */
break;
case 1: /* The first audio buffer is starting playback */
audio->frag_time = SDL_GetTicks();
break;
default: /* A buffer has completed, filling a new one */
audio->frag_time = SDL_GetTicks();
audio->play_time += ((double)len)/audio->rate_in_s;
break;
VIDEO SYNCHRONISATION
----------------------
Basic synchronisation strategy is to skip frames if
the playback is slow (less than the desired framerate)
or wait if it is fast.
To do this two times are important and are used used:
- one is the actual playback time
- desired or the specified time in the encoded
stream.
There is another twist to this , video timing is
tied to the audio timing. This is done by using
the audio time stamps in the synchronisation
calculations. 'SetTimeSource' method is used to put
a reference to audio timing information in the video module.
void MPEG::EnableAudio(bool enabled) {
videoaction->SetTimeSource(audioaction);
The MPEG video module appears to have three ways of knowing the time:
gdith.cpp::CurrentTime()
This inline function returns the current playback
time of the MPEGaudio module.
MPEGaction::Time()/play_time
The Time() method in base class MPEGaction
returns the current playback time of the video module.
It is a 'get' method for the attribute play_time
but the attribute is directly modifed in the video module
vid_stream-> current-> show_time
This contains the fine-grained timing information.
The MPEG PTS gives the frame level timing information,
however the GoP timestamp gives the timing at the
sub-frame level (like picture level).
- Other data structures used in the synchronisation
vid_stream->_skipFrame
This can have values from -1,0,.....n. When it
is > 0, it indicates that so many frames have to
be parsed but not displayed. However following
is a mystery to me: XXXX
MPEGvideo:: RenderFinal
/* Process all frames without displaying any */
_stream->_skipFrame = 1;
vid_stream->_skipCount
How is this relayed to _skipFrame ? It is updaed
only in timesync and carries its values across
frames decoding.
vid_stream->_jumpFrame
This contains the frame no. to which a seek can be done.
vid_stream->need_frameadjust
When a seek is done in seconds the current_frame
attribute will not contain the correct value.
This flag indicates that the current frame no.
should be re-calculated using GoP time code.
vid_stream->totNumFrames
vid_stream->current_frame
It may sound silly but I still haven't got the true
meaning of these variables.
At the beginning of the timeSync method, we have...
/* Update the number of frames displayed */
vid_stream->totNumFrames++;
vid_stream->current_frame++;
XXXX Why increment both ? why do we need both ?
Here is the sync loop...
------------------------
While loop
mpegVidRsrc( 0, mpeg->_stream, 0 );
< parse video stream >
if (need to skip frame)
look for new Picture Start Code
timeSync(..)
else
....
When complete frame is decoded.
....
call MPEGvideo::ExecuteDisplay( VidStream*
vid_stream )
if( ! vid_stream->_skipFrame )
DisplayFrame(vid_stream);
timeSync( vid_stream );
end of loop
Sync logic
----------
'time behind' = Audio playback - Video playback
time time
'time behind' range is used to compute the sync level:
..........-TimeSlice.......0....FUDGE_TIME....MAX_FUDGE_TIME....2*MAX
| | | |
-----Ahead----> <-------In Sync------> <----little-----> <------- lot -----
out of sync out of sync
Based on this range, no. of frames to be skipped is computed
and this information is used in the sync.loop to skip frames.
XXXX But won't 'time behind'/(one frametime) give the correct no.
of frames to skip ???
SOME FINAL COMMENTS XXXX
===================
- The sync. logic assumes RR scheduling for the threads. If
a different policy is in place for the threads (e.g. FIFO), the
sync breaks down.
- The strength of smpeg compared to other (open source) mpeg
decoders seems to be that smpeg is arch. neutral while
others seem to be optimised for i386
---
Satish