H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts

H04N21/462—Content or additional data management, e.g. creating a master electronic program guide from data received from the Internet and a Head-end, controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabilities

Abstract

A device (1800) for processing an input data stream comprising a sequence of input frames, wherein the device comprises a processing unit (1802) for generating an output data stream as a trick-play stream comprising a sequence of output frames based on the input data stream and based on a predetermined replication rate, and a timing unit (1803) for assigning timing information to the output frames, said timing information pointing from an output frame to be reproduced for the first time to a subsequent one of the output frames which is to be reproduced for the first time.

Description

A device for and a method of processing an input data stream comprising a sequence of input frames

FIELD OF THE INVENTION

The invention relates to a device for processing an input data stream comprising a sequence of input frames.

The invention further relates to a method of processing an input data stream comprising a sequence of input frames.

The invention further relates to a program element.

The invention further relates to a computer-readable medium.

BACKGROUND OF THE INVENTION Electronic entertainment devices become more and more important.

Particularly, an increasing number of users buy hard disk based audio/video players and other entertainment equipment.

Since the reduction of storage space is an important issue in the field of audio/video players, audio and video data are often stored in a compressed manner, and for security reasons in an encrypted manner.

MPEG2 is a standard for the generic coding of moving pictures and associated audio and creates a video stream out of frame data that can be arranged in a specified order called the GOP ("Group Of Pictures") structure. An MPEG2 video bit stream is made up of a series of data frames encoding pictures. The three ways of encoding a picture are intra-coded (I picture), forward predictive (P picture) and bi-directional predictive (B picture). An intra- coded frame (I-frame) is an independently decodable frame. A forward predictive frame (P- frame) needs information of a preceding I-frame or P-frame. A bi-directional predictive frame (B-frame) is dependent on information of a preceding and/or subsequent I-frame or P- frame. It is an interesting function in a media playback device to switch from a normal reproduction mode, in which media content is played back in a normal speed, to a trick-play reproduction mode, in which media content is played back in a modified manner, for instance with a reduced speed ("slow forward"), a still picture, or vice versa.
Davies, C, Murray. K. "Buy a PVR, get one free", IBC 2003 Conference Papers, discloses an in-home network that uses existing digital set top boxes (STBs) as fully functional clients of a personal video recorder (PVR). The architecture enables users to simultaneously view multiple live digital TV programs and multiple stored digital TV programs within the home. The architecture and the technical issues of such a system is disclosed, with a focus on the problems of providing trick modes, such as fast forward, rewind and slow motion between a PVR and a standard STB. The technical solutions proposed are extended to support content protection in the network. The approach described could be commercially attractive when introducing PVRs to homes that already have an existing STB. Functionally, for the user with an existing STB, it really can be "buy a PVR, get one free".

BRIEF SUMMARY OF THE INVENTION

It is an object of the invention to enable efficient processing of a data stream. In order to achieve the object defined above, a device for processing an input data stream comprising a sequence of input frames, a method of processing an input data stream comprising a sequence of input frames, a program element and a computer-readable medium according to the independent claims are provided.

According to an exemplary embodiment of the invention, a device for processing an input data stream comprising a sequence of input frames is provided, wherein the device comprises a processing unit for generating an output data stream as a trick-play stream comprising a sequence of output frames based on the input data stream and based on a predetermined replication rate, and a timing unit for assigning timing information to the output frames, said timing information pointing from an output frame to be reproduced for the first time to a subsequent one of the output frames which is to be reproduced for the first time.

According to another exemplary embodiment of the invention, a method of processing an input data stream comprising a sequence of input frames is provided, the method comprising generating an output data stream as a trick-play stream comprising a sequence of output frames based on the input data stream and based on a predetermined replication rate, and assigning timing information to the output frames, said timing information pointing from an output frame to be reproduced for the first time to a subsequent one of the output frames which is to be reproduced for the first time.
Beyond this, according to another exemplary embodiment of the invention, a computer-readable medium is provided, in which a computer program is stored, which computer program, when being executed by a processor, is adapted to control or carry out the above-mentioned method. Moreover, according to still another exemplary embodiment of the invention, a program element is provided, which program element, when being executed by a processor, is adapted to control or carry out the above-mentioned method.

The data processing according to the invention can be realized by a computer program, that is to say by software, or by using one or more special electronic optimization circuits, that is to say in hardware, or in hybrid form, that is to say by means of software components and hardware components.

According to an exemplary embodiment, an input data stream, for instance a stream of video data, may be processed so as to generate a trick-play stream, for instance a slow- forward stream, in which some of the frames (B-frames in the context of MPEG-2) are repeated a plurality of times in accordance with a trick-play factor (of for instance 3). Other frames (I-frames and P-frames in the context of MPEG-2) may be treated in such a manner that spaces in between the frames are stuffed with empty frames. When such a processing is performed, it may happen that the original timing information does not fit any longer with the modified stream. In other words, the timing information of the input data stream may no longer be suitable for use for the output frames, since the frames may be stretched in time or may be reordered. Thus, it may be appropriate or necessary that the timing information is corrected. In such a scenario it might be reasonable to correct the timing information (for instance a time stamp) so as to point from an output frame to be played back for the first time during slow- forward play to a subsequent one of the output frames to be reproduced for the first time.

According to such an embodiment, the timing data does not point to output frames to be reproduced for the second, the third, and so on, time and does not point to output frames which are empty frames being inserted between subsequent output frames to be repeated for the first time, but the timing data essentially points from one picture to the next one. By taking this measure, it can be securely ensured with low computational burden that the timing of the reproduction of the output data stream in the trick-play does not suffer from artefacts due to the update or new calculation of the timing information.

Thus, timing information may be newly created when switching from normal play to trick-play. By correcting the timing reference between subsequent frames, it may be
ensured that no problems arise during the decoding as a consequence of an inappropriate synchronization. Particularly, anchor frames may be smoothed as long as possible over the entire period, and, if desired, PCRs (Program Clock References) may be placed after the transmission of data. Therefore, a smooth bit rate may be obtained. According to an exemplary embodiment, a storage device for storing MPEG transport streams with a digital interface to an MPEG compliant decoder may be provided that is capable of providing an MPEG compliant transport stream for slow- forward play mode. Embodiments of the invention provide an algorithm for the placement of frames and packets on an MPEG compliant time axis for a slow-forward stream without the use of original timing relation of packets from the normal play stream, that is so-called time stamps. The Decoding Time Stamp (DTS) may be corrected to point to a time that corresponds to the start of the transmission period of the next original frame, that is not a repeated frame from a slow- forward algorithm. A new Program Clock Reference (PCR) has to be created based upon the slow- forward factor for smoothing the local bit rate of the transport stream. A further exemplary embodiment is directed to situations when the slow- forward factor is large, which may limit the maximum period over which anchor frames are smoothed. The replacement of complete B-frame data with empty B-frames referring to anchor frames may be performed to reduce the local bit rate of the transport stream for exceptional cases of large B-frames caused by shortcuts in the middle of a GOP. A detailed algorithm to calculate the corrected DTS and Presentation Time Stamps (PTS) will be explained below.

According to an exemplary embodiment, frames and packets for slow-forward may be placed without using the original recording time stamps. In a scheme for smoothing a local bit rate in a slow- forward MPEG stream, the DTS of the current anchor frame is adjusted to point to the time at which next original frame (not the repeated frame) is transmitted, and the corresponding PCR values are computed and placed at appropriate points in the time scale.

Anomalies which may be caused by large slow motion factors may be handled by limiting the number of anchor frames smoothed to a predetermined value of, for example, six. Hence, if the slow motion factor is more than this predetermined value of, for instance, six, the first six frames (original anchor frame plus five empty frames) may be smoothed together over six display periods. The remaining empty frames, B-frames, may be smoothed over one display period individually.
Furthermore, exceptionally large B-frames may be replaced by empty B- frames, which in turn may repeat the previous or next anchor frame instead of a B-frame.

Thus, a scheme or method may be provided to regenerate the timing information for playing out a (modified) recorded stream. Next, further exemplary embodiments of the invention will be described. In the following, exemplary embodiments of the device for processing an input data stream comprising a sequence of input frames will be described. However, these embodiments also apply for the method of processing an input data stream comprising a sequence of input frames, for the program element and for the computer-readable medium. The timing unit may be adapted for assigning timing information to the output frames independently from timing information of the input frames. In other words, a straight forward approach may be taken, namely to generate new timing information instead of correcting the timing information of the original input frames, which might cause a higher amount of calculation time and resources. The timing unit may further be adapted for assigning Decoding Time Stamps as the timing information to the output frames. Such Decoding Time Stamps, in the nomenclature of MPEG2, may provide the information at which time an encrypted portion of a data stream is to be decoded. This timing information, in an encrypted data stream, may be necessary to ensure for a proper reproduction of the data stream. The timing unit may be adapted for inserting a timing packet in the sequence of the output frames at a position preceding an output frame to be reproduced for the first time. Therefore, after the last repetition of a frame, a new timing packet, in the nomenclature of MPEG2 a Program Clock Reference (PCR), may be inserted in the output data stream so as to properly adjust timing information for the output data stream. The processing unit may be adapted such that the sequence of output frames is formed from the sequence of input frames being reproduced a number of times and/or being filled with empty frames in accordance with the predetermined replication rate. It is possible that- for a slow forward trick play mode - a part of the output frames is repeated several times. For other output frames, empty frames may be inserted succeeding such a frame. Particularly, the processing unit may be adapted such that bi-directional predictive frames (B-frames) are reproduced a number of times in accordance with the predetermined replication rate. For example, when the replication rate is 3, such a B-frame may be repeated simply three times. This is an easy, simple and very efficient way of generating trick-play from a normal play stream.
Accordingly, the processing unit may be adapted such that anchor frames may be repeated by using empty frames in accordance with the predetermined replication rate so as to smooth the output data stream. The term "anchor frame" may particularly denote a frame which, in transmission order and/or in display order, keeps its relative temporal position with respect to other anchor frames. In the context of MPEG2, 1-frames and P- frames may be denoted as anchor frames. In contrast to this, B-frames would not be denoted as anchor frames in the context of MPEG2.

Although it may be possible, in special scenarios, that also anchor frames are simply repeated several times in accordance with the trick-play factor, it may be easier and better that the anchor frames are repeated by using empty frames which are inserted after or before an anchor frame and do not contain new or additional information, but may make it possible that the trick-play reproduction may be performed.

The processing unit may be adapted such that the sequence of output frames is formed from the sequence of input frames being filled with empty frames in accordance with the predetermined replication rate only in case that the predetermined replication rate does not exceed a predetermined threshold value. In other words, very large replication rates (wherein a very large replication rate may be defined to be six, or any value larger than 4) may be treated separately from smaller replication rates. Thus, artifacts caused by large trick- play factors may be prevented by limiting the number of frame periods used to transmit anchor frames to a predetermined threshold value of, for instance, six. In case that the trick- play factor should exceed this predetermined threshold value, the first frames (for instance an anchor frame and a number of succeeding empty frames) may be smoothed together over a number of display periods which may equal to the predetermined threshold value. The remaining empty frames may be smoothed separately. The processing unit may further be adapted such that bi-directional predictive frames (B-frames) having a size exceeding a predetermined threshold value, may be substituted by empty bi-directional predictive frames. Such a predetermined threshold value of exceptionally large B-frames may be, for instance, 60 kB. Such very large B-frames (which may occur very rarely in practice) may be replaced by empty B-frames which in turn may repeat the previous or next anchor frame instead of the B-frame.

The processing unit may be adapted for generating the output data stream in a trick-play reproduction mode of the group consisting of a slow- forward reproduction mode, a slow-reverse reproduction mode, a stand still reproduction mode, a step reproduction mode, and an instant replay reproduction mode. This trick-play generation may be adjusted or
controlled by a user by selecting corresponding options in a user interface, for instance buttons of a device, a keypad or a remote control. For trick-play, only a portion of subsequent data shall be used for output (for instance for visual display and/or for acoustical output) or the same content may be used several times. The input frames and/or the output frames may include at least one frame of the group consisting of an intra-coded frame (I-frame), a forward predictive frame (P-frame) and a bi-directional predictive frame (B-frame). Such frames may be part of an MPEG2 video bit stream. An intra-coded frame is related to a particular picture and contains the corresponding data. A forward predictive frame needs information of a preceding I-frame or B-frame. A bi-directional predictive frame may be dependent on information of a preceding and/or of a subsequent I-frame or P-frame.

The device may comprise a storing unit for storing the input data stream and/or the output data stream. Such a storage unit may be a hard disk, a flash card or any other data carrier like a CD or a DVD. However, the storage unit may also be an Internet server to which the device has (network) access for downloading required information.

The device may further be adapted to process a plaintext data stream, a fully encrypted data stream or a mixture of encrypted parts and plaintext parts (a so-called hybrid stream). In other words, the entire data streams may be entirely encrypted or entirely decrypted or may be a combination of both. Thus, decrypters and/or encryptors may be foreseen at appropriate positions of a data processing device according to an embodiment of the invention.

The device may further be adapted to process a data stream of video data or audio data. However, such content is not the only type of data which may be processed with the scheme according to embodiments of the invention. Trick-play generation in similar applications may be an issue for both, video processing and (pure) audio processing.

The device may further be adapted to process a data stream of digital data.

Furthermore, the device may comprise a reproduction unit for reproducing the processed data stream. Such a reproduction unit may comprise a loudspeaker or earphones and/or an optical display device so that both, audio and visual data can be reproduced perceivable for a human being.

The device according to exemplary embodiments of the invention may be adapted to process an MPEG2 data stream. MPEG2 is a designation for a group of audio and video coding standards agreed upon by MPEG (moving pictures experts group), and published as the ISO/IEC 13818 International Standard. For example, MPEG2 is used to
encode audio and video broadcast signals including digital satellite and cable TV, but may also be used for DVD.

However, the device according to exemplary embodiments of the invention may also be adapted to process an encrypted MPEG4 data stream. More generally, any codec scheme may be implemented which uses anchor frames from which other frames are dependent, particularly any type of encoding using predictive frames and thus any kind of MPEG encoding/decoding.

The device according to embodiments of the invention may be realized as one of the group consisting of a digital video recording device, a network-enabled device, a conditional access system, a portable audio player, a portable video player, a mobile phone, a DVD player, a CD player, a hard disk based media player, an Internet radio device, a public entertainment device, and an MP3 player. However, these applications are only exemplary. The aspects defined above and further aspects of the invention are apparent from the examples of embodiment to be described hereinafter and are explained with reference to these examples of embodiment.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be described in more detail hereinafter with reference to examples of embodiment but to which the invention is not limited. Fig. 1 illustrates a time-stamped transport stream packet.

Fig. 2 shows an MPEG2 group of picture structure with intra-coded frames and forward predictive frames.

Fig. 26 illustrates the insertion of empty P-frames before the anchor frames.

Fig. 27 illustrates the use of backward predictive empty B-frames. Fig. 28 illustrates the use of forward predictive empty B-frames.

Fig. 29 illustrates a temporal reference for normal play.

Fig. 30 illustrates a temporal reference for slow-forward with Bf-frames.

Fig. 31 illustrates a temporal reference for pre-insertion of Bb-frames.

Fig. 32 illustrates a temporal reference for pre-insertion of Pe-frames. Fig. 33 illustrates a temporal references for three types of B-frames.

Fig. 34 illustrates a distance D and a slow motion factor L for a normal play and a slow-forward stream.

Fig. 35 illustrates a temporal reference for the I-frame with empty B-frames used.
Fig. 36 illustrates a temporal reference for the P-frame when empty B-frames are used.

Fig. 37 illustrates a temporal reference for the I-frame when empty P-frames are used. Fig. 38 illustrates a temporal reference for the P-frame when empty P-frames are used.

Fig. 39 illustrates a temporal reference for empty P-frames. Fig. 40 illustrates the splitting of the stream for one PES packet per frame. Fig. 41 illustrates the splitting of the stream at the start of a PES header. Fig. 42 illustrates the splitting of the stream at the start of a Picture Start Code.

Fig. 43 illustrates the splitting of the stream within a Picture Start Code. Fig. 44 illustrates an incomplete picture start code at the concatenation point. Fig. 45 illustrates an example of n+m=4. Fig. 46 illustrates an example of n+m>4. Fig. 47 illustrates an example of n+m<4.

Fig. 48 shows a table illustrating Delta as a function of the frame rate. Fig. 49 illustrates a structure of a slow- forward stream. Fig. 50 illustrates limitations of the smoothing period. Fig. 51 illustrates an I-frame size in kB for an exemplary stream. Fig. 52 illustrates a P-frame size in kB for the exemplary stream.

Fig. 53 illustrates a B-frame size in kB for the exemplary stream. Fig. 54 illustrates replacement of a B-frame by an empty frame. Fig. 55 illustrates packet smoothing in trick-play.

Fig. 56 illustrates a Decoding Time Stamp and a Presentation Time Stamp in relation to a Program Clock Reference time base.

Fig. 57 illustrates inserting ECMs between trick-play GOPs. Fig. 58 illustrates inserting ECMs within an I-frame. The Figures are schematically drawn and not true to scale, and the identical reference numerals in different Figures refer to corresponding elements. It will be clear for those skilled in the art, that alternative but equivalent embodiments of the invention are possible without deviating from the true inventive concept, and that the scope of the invention will be limited by the claims only.
DETAILED DESCRIPTION OF THE INVENTION

In the following, referring to Fig. 1 to Fig. 13, different aspects of trick-play implementation for transport streams according to exemplary embodiments of the invention will be described. Particularly, several possibilities to perform trick-play on an MPEG2 encoded stream will be described, which may be partly or totally encrypted, or non-encrypted. The following description will target methods specific to the MPE G2 transport stream format. However, the invention is not restricted to this format.

Experiments were actually done with an extension, the so-called time-stamped transport stream. This comprises transport stream packets, all of which are pre-pended with a 4 bytes header in which the transport stream packet arrival time is placed. This time may be derived from the value of the program clock reference (PCR) time-base at the time the first byte of the packet is received at the recording device. This is a proper method to store the timing information with the stream, so that playback of the stream becomes a relatively easy process.

One problem during playback is to ensure that the MPEG2 decoder buffer will not overrun nor underflow. If the input stream was compliant to the decoder buffer model, restoring the relative timing ensures that the output stream is also compliant. Some of the trick-play methods described herein are independent of the time stamp and perform equally well on transport streams with and without time stamps.

Fig. 1 illustrates a time stamped transport stream packet 100 having a total length 104 of 188 Bytes and comprising a time stamp 101 having a length 105 of 4 Bytes, a packet header 102, and a packet payload 103 having a length of 184 Bytes. This following description will give an overview of the possibilities to create an MPEG/DVB (digital video broadcasting) compliant trick-play stream from a recorded transport stream and intends to cover the full spectrum of recorded streams from those that are completely plaintext, so every bit of data can be manipulated, to streams that are completely encrypted (for instance according to the DVB scheme), so that only transport stream headers and some tables may be accessible for manipulation. When creating trick-play for an MPEG/DVB transport stream, problems may arise when the content is at least partially encrypted. It may not be possible to descend to the elementary stream level, which is the usual approach, or even access any packetized elementary stream (PES) headers before decryption. This also means that finding picture
frames is not possible. Known trick-play engines need to be able to access and process this information.

In the frame of this description, the term "ECM" denotes an Entitlement Control Message. This message may particularly comprise secret provider proprietary information and may, among others, contain encrypted Control Words (CW) needed to decrypt the MPEG stream. Typically, Control Words expire in 10-20 seconds. The ECMs are embedded in packets in the transport stream.

In the frame of this description, the term "keys" particularly denotes data that may be stored in a smart card and may be transferred to the smart card using EMMs, that is so-called "Entitlement Management Messages" that may be embedded in the transport stream. These keys may be used by the smart card to decrypt the Control Words present in the ECM. An exemplary validity period of such a key is one month. In the frame of this description, the term "Control Words" (CW) particularly denotes decryption information needed to decrypt actual content. Control words may be decrypted by the smart card and then stored in a memory of the decryption core.

Some aspects related to trick-play on plaintext streams will now be described. It is preferable that any MPEG2 streams created are MPEG2 compliant transport streams. This is because the decoder may not only be integrated within a device, but may also be connected via a standard digital interface, such as an IEEE 1394 interface, for example.

Account should also be taken of any problems that may occur when using a video coding technique like MPEG2 that exploits the temporal redundancy of video to achieve high compression ratios. Frames may no longer be decoded independently. A structure of a plurality of groups of pictures (GOPs) is shown in Fig. 2. Particularly, Fig. 2 shows a stream 200 comprising several MPEG2 GOP structures with a sequence of I-frames 201 and P-frames 202. The GOP size is denoted with reference numeral 203. The GOP size 203 is set to 12 frames, and only I-frames 201 and P-frames 202 are shown here.

In MPEG, a GOP structure may be used in which only the first frame is coded independently of other frames. This is the so-called intra-coded or I-frame 201. The predictive frames or P-frames 202 are coded with a unidirectional prediction, meaning that they only rely on the previous I-frame 201 or P-frame 202 as indicated by arrows 204 in Figure 2. Such a GOP structure has typically a size of 12 or 16 frames 201, 202.
Another structure 300 of a plurality of GOPs is shown in Fig. 3. Particularly, Fig. 3 shows the MPEG2 GOP structure with a sequence of I-frames 201, P-frames 202 and B-frames 301. The GOP size is again denoted with reference numeral 203.

It is possible to use a GOP structure containing also bi-directionally predictive frames or B-frames 301 as shown in Fig. 3. A GOP size 203 of 12 frames is chosen for the example. The B-frames 301 are coded with a bi-directional prediction, meaning that they rely on a previous and a next I- or P-frame 201, 202 as indicated for some B-frames 301 by curved arrows 204. The transmission order of the compressed frames may be not the same as the order in which they are displayed. To decode a B-frame 301, both reference frames before and after the B-frame

301 (in display order) are needed. To minimize the buffer demand in a decoder, the compressed frames may be reordered. So in transmission, the reference frames may come first. The reordered stream, as it is transmitted, is also shown in Fig. 3, lower part. The reordering is indicated by straight arrows 302. A stream containing B-frames 301 can give a nice looking trick-play picture if all the B-frames 301 are skipped. For the present example, this leads to a trick-play speed of 3x forward.

Even if an MPEG2 stream is not encrypted (that is to say plaintext), trick-play is not trivial. The possibility of a slow-reverse based on I-frames only is briefly mentioned. An efficient frame based slow-reverse is more difficult though, due to the necessary inversion of the MPEG2 GOP. Slow-forward which is also known as slow motion forward is a mode in which the display picture runs at a lower than normal speed. A rudimentary form of slow- forward is already possible with the technique making use of a fast-forward algorithm that generates trick-play GOPs. Setting the fast-forward speed to a value between zero and one results in a slow- forward stream based on a repetition of fast-forward trick-play GOPs. For a plaintext stream this is no problem but for an encrypted stream it can lead to the erroneous decryption of part of the I-frame in certain specific conditions. There are several options to solve this problem but the most suitable way is not to repeat the fast-forward trick-play GOP but to extend the size of the trick-play GOP by the addition of empty P-frames. This technique in fact also enables slow-reverse, because it is based on the trick-play GOPs used for fast-forward/reverse and therefore on the independently decodable I-frames. However, it is not preferred to make use of this kind of I-frame based slow- forward or slow-reverse for the following reason. The distance between I-frames in normal play is around half a second and for slow- forward/reverse it is multiplied with the slow motion factor. So this type of
slow- forward or slow-reverse is not really the slow motion consumers are used to but in fact it is more like a slide show with a large temporal distance between the successive pictures.

In another trick-play mode called still picture mode the display picture is halted. This can be achieved by adding empty P-frames to the I-frame for the duration of the still picture mode. This means that the picture resulting from the last I-frame is halted. When switching to still picture from normal play, this can also be the nearest I-frame according to the data in the CPI file. This technique is an extension of the fast-forward/reverse modes and results in nice still pictures especially if interlace kill is used. However the positional accuracy is often not sufficient when switching from normal play or slow- forward/reverse to still picture.

The still picture mode can be extended to implement a step mode. The step command advances the stream to some next or previous I-frame. The step size is at minimum one GOP but can also be set to a higher value equal to an integer number of GOPs. Step forward and step backward are both possible in this case because only I-frames are used. The slow- forward can also be based on a repetition of every frame, which results in a much smoother slow motion. The best form of slow- forward would in fact be a repetition of fields instead of frames because the temporal resolution is doubled and there are no interlace artifacts. This is however practically impossible for the intrinsically frame based MPEG2 streams and even more so if they are largely encrypted. The interlace artifacts can be significantly reduced for the I- and P-frames by using special empty frames to force the repetition. Such an interlace reduction technique is not available for the B-frames though. Whether the use of interlace kill for the I- and P-frames is still advantageous in this case or in fact leads to a more annoying picture for the viewer can only be verified by experiments.

Slow-reverse on the basis of individual frames is in fact very complicated for MPEG signals due to the temporal predictions. A complete GOP has to be buffered and reversed. There is no simple method that we know of to recode the frames in a GOP to the reverse order. So an almost complete decoding and encoding might be necessary with an inversion of the frame order between these two. This asks for the buffering of a complete decoded GOP as well as an MPEG decoder and encoder. Still picture mode can be defined as an extension of the frame-based slow- forward mode. It is based on a repeated display of the current frame for the duration of the still picture mode whatever the type of this frame is. This is in fact a slow- forward with an infinite slow motion factor if this indicates the factor with which the normal play stream is slowed down. No interlace kill is possible if the picture is halted on a B-frame. In that sense
this still picture mode is worse than the trick-play GOP based still picture mode. This can be corrected by only halting the picture at an I- or P- frame at the cost of a somewhat less accurate still picture position. Discontinuities in the temporal reference and the PTS can also be avoided in this case. Moreover, the bit rate is significantly reduced because the repetition of an I- or P-frame is forced by the insertion of empty frames instead of a repetition of the frame data itself as is necessary for the B-frames. So, technically speaking, the halting of a picture at an I- or P-frame is the best choice.

The still picture mode can also be extended with a step mode. The step command advances the stream in principle to the next frame. Larger step sizes are possible by stepping to the next P-frame or some next I-frame. A step backward on frame basis is not possible. The only option is to step backward to one of the previous I-frames.

Two types of still picture mode have been mentioned, namely trick-play GOP based and frame based. The first one is most logically connected to fast-forward/reverse whereas the second one is related to slow-forward. When switching from some mode to still picture, it is preferable to choose the related still picture mode to minimize the switching delay. The streams resulting from both methods look very alike because they are both based on the insertion of empty frames to force the repetition of an anchor frame. But on detailed stream construction level there are some differences. In the following, some aspects related to a CPI ("characteristic point information") file will be described.

Finding I-frames in a stream usually requires parsing the stream, to find the frame headers. Locating the positions where the I-frame starts can be done while the recording is being made, or off-line after the recording is completed, or semi on-line, in fact being off-line but with a small delay with respect to the moment of recording. The I-frame end can be found by detecting the start of the next P-frame or B-frame. The meta-data derived this way can be stored in a separate but coupled file that may be denoted as characteristic point information file or CPI file. This file may contain pointers to the start and eventually end of each I-frame in the transport stream file. Each individual recording may have its own CPI file. The structure of a characteristic point information file 400 is visualized in Fig.

4.

Apart from the CPI file 400, stored information 401 is shown. The CPI file 400 may also contain some other data that are not discussed here.
With the data from the CPI file 400 it is possible to jump to the start of any I- frame 201 in the stream. If the CPI file 400 also contains the end of the I-frames 201, the amount of data to read from the transport stream file is exactly known to get a complete I- frame 201. If for some reason the I-frame end is not known, the entire GOP or at least a large part of the GOP data is to be read to be sure that the entire I-frame 201 is read. The end of the GOP is given by the start of the next I-frame 201. It is known from measurements that the amount of I-frame data can be 40% or more of the total GOP data.

It is known that reducing the trick-play picture refresh rate can be achieved by displaying each I-frame 201 several times. The bit rate will be reduced accordingly. This may be achieved by adding so-called empty P-frames 202 between the I-frames 201. Such an empty P-frame 202 is not really empty but may contain data instructing the decoder to repeat the previous frame. This has a limited bit cost, which can in many cases be neglected compared to an I-frame 201. From experiments it is known that trick-play GOP structures like IPP or IPPP may be acceptable for the trick-play picture quality and even advantageous at high trick-play speeds. The resulting trick-play bit rate is of the same order as the normal play bit rate. It is also mentioned that these structures may reduce the required sustained bandwidth from the storage device.

Here some aspects related to timing issues and stream construction will be described. A trick-play system 500 is schematically depicted in Fig. 5.

The I-frame selector 502 reads specific I-frames 201 from the storage device 501. Which I-frames 201 are chosen depends on the trick-play speed as will be described below. The retrieved I-frames 201 are used to construct an MPEG-2/DVB compliant trick- play stream that is then sent to the MPEG-2 decoder 504 for decoding and rendering.

The position of the I-frame packets in the trick-play stream cannot be coupled to the relative timing of the original transport stream. In trick-play, the time axis may be compressed or expanded with the speed factor and additionally inversed for reverse trick-
play. Therefore, the time stamps of the original time stamped transport stream may not be suitable for trick-play generation.

Moreover, the original PCR time base may be disturbing for trick-play. First of all it is not guaranteed that a PCR will be available within the selected I-frame 201. But even more important is that the frequency of the PCR time base would be changed.

According to the MPEG2 specification, this frequency should be within 30 ppm from 27 MHz. The original PCR time base fulfils this requirement, but if used for trick-play it would be multiplied by the trick-play speed factor. For reverse trick-play this even leads to a time base running in the wrong direction. Therefore, the old PCR time base has to be removed and a new one added to the trick-play stream.

Finally, I-frames 201 normally contain two time stamps that tell the decoder 504 when to start decoding the frame (decoding time stamp, DTS) and when to start presenting, for instance displaying, it (presentation time stamp, PTS). Decoding and presentation may be started when DTS respectively PTS are equal to the PCR time base, which is reconstructed in the decoder 504 by means of the PCRs in the stream. The distance between, e.g., the PTS values of 2 I-frames 201 corresponds to their nominal distance in display time. In trick-play this time distance is compressed or expanded with the speed factor. Since a new PCR time base is used in trick-play, and because the distance for DTS and PTS is no longer correct, the original DTS and PTS of the I-frame 201 have to be replaced. To solve above-mentioned complications, the I-frame 201 may first be parsed into an elementary stream in the parsing unit 505. Then the empty P-frames 202 are added on elementary stream level. The obtained trick-play, GOP is mapped into one PES packet and packetized to transport stream packets. Then corrected tables like PAT, PMT, etc. are added. At this stage, a new PCR time base together with DTS and PTS are included. The transport stream packets are pre-pended with a 4 bytes time stamp that is coupled to the PCR time base such that the trick-play stream can be handled by the same output circuitry as used for normal play.

In the following, some aspects related to trick-play speeds will be described. In this context, firstly, fixed trick-play speeds will be discussed. As mentioned before, a trick-play GOP structure like IPP may be used in which the I-frame 201 is followed by two empty P-frames 202. It is assumed that the original GOP has a GOP size 203 of 12 frames and that all the original I-frames 201 are used for trick-play. This means that the I-frames 201 in the normal play stream have a distance of 12 frames and the same I-frames 201 in the trick-play stream a distance of 3 frames. This leads
to a trick-play speed of 12/3 = 4x. If the original GOP size 203 in frames is denoted by G, the trick-play GOP size in frames by T and the trick-play speed factor by Nb, the trick-play speed in general is given by:

Nb=G/T (1)

Nb will also be denoted as the basic speed. Higher speeds can be realized by skipping I-frames 201 from the original stream. If every second I-frame 201 is taken, the trick-play speed is doubled, if every third I-frame 201 is taken, the trick-play speed is tripled and so on. In other words, the distance between the used I-frames 201 of the original stream is 2, 3 and so on. This distance may be always an integer number. If the distance between the I-frames 201 used for trick-play generation is denoted by D (D=I meaning that every I-frame 201 is used), then the general trick-play speed factor N is given by:

N=D*G/T (2)

This means that all integer multiples of the basic speed can be realized, leading to an acceptable set of speeds. It should be noticed that D is negative for reverse trick-play and that D=O results in a still picture. Data can only be read in a forward direction. Therefore, in reverse trick-play, data is read forward and jumps are made backwards to retrieve the preceding I-frame 201 given by D. It should also be noticed that a larger trick- play GOP size T results in a lower basic speed. For instance, IPPP leads to a finer grained set of speeds than IPP.

Referring to Fig. 6, time compression in trick-play will be explained. Fig. 6 shows the situation for 7=3 (IPP) and G=Yl. For D=2, an original display time of 24 frames is compressed into a trick-play display time of 3 frames resulting in N=S. In the given example, the basic speed is an integer but this is not necessarily the case. For G= 16 and 7=3, the basic speed is 16/3 = 5 1/3 which does not result in a set of integer trick-play speeds. Therefore, the IPPP structure (7=4) is better suited for a GOP size of 16 resulting in a basic speed of 4x. If a single trick-play structure is desired that fits to the most common GOP sizes of 12 and 16, IPPP may be chosen.

Secondly, arbitrary trick-play speeds will be discussed.

In some cases, the set of trick-play speeds resulting from the method described above is satisfying, in some cases not. In the case of G= 16 and 7=3 one probably still would
prefer integer trick-play speed factors. Even in the case of G= 12 and 7=4 it might be preferred to have a speed not available in the set like for instance 7x. Now, the trick-play speed formula will be inverted and the distance D will be calculated which is given by:

D=N*T/G (3)

Using the above example with G=12, 7=4 and N=I results in D=2 1/3. Instead of skipping a fixed number of I-frames 201, an adaptive skipping algorithm might be used that chooses the next I-frame 201 based on the fact what I-frame 201 best matches the required speed. To choose the best matching I-frame 201, the next ideal point Ip with the distance D may be calculated and one of the I-frames 201 may be chosen closest to this ideal point to construct a trick-play GOP. In the following step, again the next ideal point may be calculated by increasing the last ideal point by D.

As visualized in Fig. 7 illustrating trick-play with fractional distances, there are particularly three possibilities to choose the I-frame 201 :

A. The I-frame closest to the ideal point; / = round(//?)

B. The last I-frame before the ideal point; / = mt{Ip)

C. The first I-frame after the ideal point; / = int(//?)+l

As can clearly be seen, the actual distance is varying between int(D) and int(D)+l, the ratio between the occurrences of the two being dependent on the fraction of D, such that the average distance is equal to D. This means that the average trick-play speed is equal to N, but that the actually used frame has a small jitter with respect to the ideal frame.

Several experiments have been performed with this, and although the trick-play speed may vary locally, this is not visually disturbing. Usually, it is not even noticeable especially at somewhat higher trick-play speeds. It is also clear from Fig.7 that it makes no essential difference whether to choose method A, B or C.

With this method, trick-play speed N does not need to be an integer but can be any number above the basic speed Nb. Also speeds below this minimum can be chosen, but then the picture refresh rate may be lowered locally because the effective trick-play GOP size T is doubled or at still lower speeds even tripled or more. This is due to a repetition of the trick-play GOPs, as the algorithm will choose the same I-frame 201 more than once.

Fig. 8 shows an example for D=2/3 which is equivalent to N=2/3 Nb. Here, the round function is used to select the I-frames 201 and as can be seen frames 2 and 4 are selected twice.
Anyway, the described method will allow for a continuously variable trick- play speed. For reverse trick-play a negative value is chosen for N. For the example of Fig. 7 this simply means that the arrows 700 are pointing in the other direction. The method described will also include the sets of fixed trick-play speeds mentioned earlier and they will have the same quality, especially if the round function is used. Therefore, it might be appropriate that the flexible method described in this section should always be implemented whatever the choice of the speeds will be.

Now some aspects related to the refresh rate of the trick-play picture will be discussed. The term "refresh rate" particularly denotes the frequency with which new pictures are displayed. Although not speed dependent, it will be briefly discussed here because it can influence the choice of T. If the refresh rate of the original picture is denoted by R (25Hz or 30Hz), the refresh rate of the trick-play picture (Rt) is given by:

Rt=R/T (4)

With a trick-play GOP structure of IPP (T=3) or IPPP [T=A), the refresh rate Rt is 8 1/3 Hz respectively 6 1/4 Hz for Europe and 10 Hz respectively 7 1/2 Hz for the USA. Although the judgment of trick-play picture quality is a somewhat subjective matter, there are clear hints from experiments that these refresh rates are acceptable for low speeds and even advantageous at higher speeds.

In the following, some aspects related to encrypted stream environments will be described.

Here some information about encrypted transport streams is presented as a basis for the description of trick-play on encrypted streams. It is focussed on the Conditional Access System used for broadcast.

Fig. 9 illustrates a conditional access system 900 which will now be described.

In the conditional access system 900, content 901 may be provided to a content encryption unit 902. After having encrypted the content 901, the content encryption unit 902 supplies a content decryption unit 904 with encrypted content 903.

In this specification it has been stated that ECM denotes Entitlement Control Messages. Furthermore, it is meant that KMM denotes Key Management Messages, GKM denotes Group Key Messages and EMM denotes Entitlement Management Messages. A Control Word 906 may be supplied to the content encryption unit 902 and to an ECM
generation unit 907. The ECM generation unit 907 generates an ECM and provides the same to an ECM decoding unit 908 of a smart card 905. The ECM decoding unit 908 generates from the ECM a Control Word that is decryption information that is needed and provided to the content encryption unit 904 to decrypt the encrypted content 903. Furthermore, an authorization key 910 is provided to the ECM generation unit

907 and to a KMM generation unit 911, wherein the latter generates a KMM and provides the same to a KMM decoding unit 912 of the smart card 905. The KMM decoding unit 912 provides an output signal to the ECM decoding unit 908.

Moreover, a group key 914 may be provided to the KMM generation unit 911 and to a GKM generation unit 915 which may further be provided with a user key 918. The GKM generation unit 915 generates a GKM signal GKM and provides the same to a GKM decoding unit 916 of the smart card 905, wherein the GKM decoding unit 916 gets as a further input a user key 917.

Beyond this, entitlements 919 may be provided to an EMM generation unit 920 that generates an EMM signal and provides the same to an EMM decoding unit 921. The EMM decoding unit 921 located in the smart card 905 is coupled with an entitlement list unit 913 which provides the ECM decoding unit 908 with corresponding control information.

In many cases, content providers and service providers want to control access to certain content items through a conditional access (CA) system. To achieve this, the broadcasted content 901 is encrypted under the control of the CA system 900. In the receiver, content is decrypted before decoding and rendering if access is granted by the CA system 900.

The CA system 900 uses a layered hierarchy (see Fig. 9). The CA system 900 transfers the content decryption key (Control Word CW 906, 909) from server to client in the form of an encrypted message, called an ECM. ECMs are encrypted using an authorization key (AK) 910. For security reasons, the CA server 900 may renew the authorization key 910 by issuing a KMM. A KMM is in fact a special type of EMM, but for clarity the term KMM may be used. KMMs are also encrypted using a key that for instance can be a group key (GK) 914, which is renewed by sending a GKM that is again a special type of EMM. GKMs are then encrypted with the user key (UK) 917, 918, which is a fixed unique key embedded in the smart card 905 and known by the CA system 900 of the provider only. Authorization keys and group keys are stored in the smart card 905 of the receiver.

Entitlements 919 (for instance viewing rights) are sent to individual customers in the form of an EMM and stored locally in a secure device (smart card 905). Entitlements
919 are coupled to a specific program. An entitlements list 913 gives access to a group of programs depending on the type of subscription. ECMs are only processed into keys (Control Words) by the smart card 905 if an entitlement 919 is available for the specific program. Entitlement EMMs are subject to an identical layered structure as the KMMs (not depicted in Fig. 9).

In an MPEG2 system, encrypted content, ECMs and EMMs (including the KMM and GKM types) are all multiplexed into a single MPEG2 transport stream. The description above is a generalized view of the CA system 900. In digital video broadcasting, only the encryption algorithm, the odd/even Control Word structure, the global structure of ECMs and EMMs and their referencing are defined. The detailed structure of the CA system 900 and the way the payloads of ECMs and EMMs are encoded and used are provider specific. Also the smart card is provider specific. However, from experience it is known that many providers follow essentially the structure of the generalized view of Fig. 9. In the following, DVB Encryption/Decryption topics will be discussed. The applied encryption and decryption algorithm is defined by the DVB standardization organization. In principle two encryption possibilities are defined namely PES level encryption and TS level encryption. However, in real life mainly the TS level encryption method is used. Encryption and decryption of the transport stream packets is done packet based. This means that the encryption and decryption algorithm is restarted every time a new transport stream packet is received. Therefore, packets can be encrypted or decrypted individually. In the transport stream, encrypted and plaintext packets are mixed because some stream parts are encrypted (e.g. audio/video) and others are not (e.g. tables). Even within one stream part (e.g. video) encrypted and plaintext packets may be mixed.

The stream packet 1000 has a length 1001 of 188 Bytes and comprises three portions. A packet header 1002 has a size 1003 of 4 Bytes. Subsequent to the packet header 1002, an adaptation field 1004 may be included in the stream packet 1000. After that, a DVB encrypted packet payload 1005 may be sent. Fig. 11 illustrates a detailed structure of the transport stream packet header

1002 of Fig. 10.

The transport stream packet header 1002 comprises a synchronization unit (SYNC) 1010, a transport error indicator (TEI) 1011 which may indicate transport errors in a packet, a payload unit start indicator (PLUSI) 1012 which may particularly indicate a
possible start of a PES packet in the subsequent payload 1005, a transport priority unit (TPI) 1017 indicating priority of the transport, a packet identifier (PID) 1013 used for determining the assignment of the packet, a transport scrambling control (SCB) 1014 is used to select the CW that is needed for decrypting the transport stream packet, an adaptation field control (AFLD) 1015, and a continuity counter (CC) lOlβ.Thus, Fig. 10 and Fig. 11 show the

MPEG2 transport stream packet 1000 that has been encrypted and which comprises different parts:

- Packet header 1002 is in plaintext. It serves to obtain important information such as a packet identifier (PID) number, presence of an adaptation field, scrambling control bits, etc. - Adaptation field 1004 is also in plaintext. It can contain important timing information such as the PCR.

- DVB Encrypted Packet Payload 1005 contains the actual program content that may have been encrypted using the DVB algorithm.

In order to select the correct CW that is needed to decrypt the broadcasted program it is necessary to parse the transport stream packet header. A schematic overview of this header is given in Fig.11. An important field for the decryption of the broadcasted program is the scrambling control bits (SCB) field 1014. This SCB field 1014 indicates which CW the decrypter must use to decrypt the broadcasted program. Moreover, it indicates whether the payload of the packet is encrypted or in plaintext. For every new transport stream packet, this SCB 1014 must be parsed since it changes over time and can change from packet to packet.

In the following, some aspects related to trick-play on fully encrypted streams will be described.

The first reason why this is an interesting topic is that trick-play on plaintext and fully encrypted streams are the two extremes of a range of possibilities. Another reason is that there exist applications in which it may be necessary to record fully encrypted streams. Thus, it would be useful to have a technique at hand to perform trick-play on a fully encrypted stream. A basic principle is to read a large enough block of data from the storage device, decrypt it, select an I-frame in the block and construct a trick-play stream with it. Such a system 1200 is depicted in Fig. 12

Fig. 12 shows the basic principle of trick-play on a fully encrypted stream. For this purpose, data stored on a hard disk 1201 are provided as a transport stream 1202 to a decrypter 1203. Further, the hard disk 1201 provides a smart card 1204 with an ECM,
wherein the smart card 1204 generates Control Words from this ECM and sends the same to the decrypter 1203.

Using the Control Words, the decrypter 1203 decrypts the encrypted transport stream 1202 and sends the decrypted data to an I-frame detector and filter 1205. From there, the data are provided to an insert empty P frame unit 1206 which conveys the data to a set top box 1207. From there, data are provided to a television 1208.

Some aspects will be mentioned with respect to the question of what a recording contains.

Making a recording of a single channel, the recording must contain all the data required to playback the recording of the channel at a later stage. One can resort to just record everything on a certain transponder, but this way one would record far more than one needs to playback the program intended to record. This means that both bandwidth and storage space would be wasted. So instead of this, only the packets really needed should be recorded. For each program this means one must record all the MPEG2 mandatory packets like PAT (program association table), CAT (conditional access table), and obviously for each program the video and audio packets as well as the PMT (program map table) that describes which packets belong to a program. Furthermore, the CAT/PMT may describe CA packets (ECMs) needed for decryption of the stream. Unless the recording is made in plaintext after decryption, those ECM packets have to be recorded as well. If the recording made does not consist of all packets from the full multiplex, the recording becomes a so-called partial transport stream 1300 (see Fig. 13). Further, Fig. 13 illustrates a full transport stream 1301. The DVB standard requires that if a partial transport stream 1300 is played, all normal DVB mandatory tables like NIT (network information table), BAT (bouquet association table) etc. are removed. Instead of these tables, the partial stream should have SIT (selection information table) and DIT (discontinuity information table) tables inserted.

In the following, some aspects related to dealing with ECMs will be described. Jumping to the next block during trick-play can mean jumping back in the stream. It will be explained that this may not be only the case for trick-play reverse but also for trick-play forward at moderate speeds. The situation for forward trick-play with forward jumps and for reverse trick-play with inherently backward jumps will be explained afterwards.

Specific problems may occur caused by the fact that data has to be decrypted. A conditional access system may be designed for transmission. In normal play, the
transmitted stream may be reconstructed with original timings. But trick-play may have severe implications for the handling of cryptographic metadata due to changed timings. The data may be compressed or expanded in time due to trick-play, but the latency of the smart card may remain constant. To create a trick-play stream, the mentioned data blocks may go through a decrypter. This decrypter needs the Control Words used in the encryption process to decrypt the data blocks. These Control Words may also be encrypted and stored in ECMs. In a normal set-top-box (STB), these ECMs may be part of the program tuned to. A conditional access module may extract the ECMs, send them to a smart card, and, if the card has rights or an authorization to decrypt these ECMs, may receive the decrypted Control Words from it. Control Words usually have a relatively short lifetime of, for instance, approximately 10 seconds. This lifetime may be indicated by the Scrambling Control Bit, SCB 1014, in the transport stream packet headers. If it changes, the next Control Word has to be used. This SCB change or toggle is indicated in Fig. 14 by a vertical line and with a reference numeral 1402.

Referring to Fig. 14, particularly two different scenarios or stream types may be distinguished:

According to a stream type I shown in a lower row 1401 in Fig. 14, two Control Words (CWs) are provided per ECM. According to a stream type II shown in an upper row 1400 in Fig. 14, only one

Control Word (CW) is provided per ECM.

Fig. 14 illustrates the two data streams 1400, 1401 comprising subsequently arranged periods or segments A, B, C denoted with reference numeral 1403. In the scenario illustrated in the upper row 1400 of Fig. 14, essentially one Control Word per corresponding ECM is provided. In contrast to this, in the lower row 1401, each ECM comprises two Control Words, namely the Control Word relating to the current period or ECM, and additionally the Control Word of the subsequent period or ECM. Thus, there is some redundancy concerning the provision of the Control Words.

During the short lifespan, items of the decryption information may be transmitted several times, so that tuning to such a channel halfway through the lifespan of such a Control Word does not mean waiting for the next Control Word. The conditional access module may only send the first unique ECM it finds to the smart card to reduce or minimize the traffic to the card, as it may have a fairly slow processor.
This shows that there may be a limitation of trick-play on encrypted streams. There may be an implicit upper speed limit, coming from the limited speed of the processing capability of the smart card. In trick-play, the Control Word lifetime of 10 seconds may be compressed or expanded with the trick-play speed factor. Sending an ECM to a smart card and receiving the decrypted Control Words may take approximately half a second. The way Control Words are packed into an ECM may be provider-specific and particularly different for stream type I and stream type II, as depicted in Fig. 14.

CW A denotes the CW that was used to encrypt period A, CW B denotes the CW that was used to encrypt period B, and so on. Horizontally, the transmission time axis is plotted. ECM A may be defined as being the ECM that is present during the major part of period A. It can be seen that, in that case, ECM A holds the CW for the current period A and for stream type I additionally for the next period B. In general, an ECM may hold at least the CW for the current period and might hold the CW for the next period. Due to zapping, this may probably be true for all or many providers. Before going on, more information will be provided about a decrypter and how it may handle the CWs. The decrypter may contain two registers, one for the "odd" and one for the "even" CW. "Odd" and "even" does not have to mean that the values of the CWs themselves are odd or even. The terms are particularly used to distinguish between two subsequent CWs in the stream. Which CW has to be used for the decryption of a packet is indicated by the SCB 1014 in the packet header. So the CWs used to encrypt the stream are alternating between odd and even. In Fig. 14 this means that, for instance, CW A and CW C are odd, whereas CW B and CW D are even. After the decryption by the smart card, CWs may be written to the corresponding registers in the decrypter overwriting previous values, as indicated in Fig. 15. Fig. 15 illustrates the two registers 1501, 1502 containing even CWs (register

1501) and containing odd CWs (register 1502). Further, smart card latency 1500, that is a time needed by the smart card to retrieve or decrypt a CW from an ECM, is illustrated in Fig. 15.

In the case of stream type I, each ECM holds two CWs and as a result both registers 1501, 1502 may be overwritten after the decryption of the ECM. One of the registers 1501, 1502 is active and the other is inactive. Which one is active depends on the SCB 1014. In the example, the SCB 1014 will indicate during period B that the even register 1501 is the active one. The active register may only be overwritten with a CW identical to the
one it already holds because it is still needed for decryption of the remainder of that particular period. Therefore, only the inactive register may be overwritten with a new value.

Taking a closer look at period B in trick-play. Assuming that an ECM is sent to the smart card at the start of this period so at the moment the SCB toggle 1402 is crossed. The question is what ECM could then be sent to the smart card?

This ECM should hold CW C to ensure a timely decryption by the smart card for usage at the start of period C.

It may also hold CW B without disturbing the correct availability of CWs in the decrypter. Looking again at Fig. 14, it can be seen that for stream type I this means sending ECM B and for stream type II ECM C at the start of period B. In general, the current ECM can be sent in case it holds two CWs, and one period in advance if it holds only one CW. Sending an ECM one period in advance may be contradictory though to the embedded ECMs, so the latter have to be removed from the stream in that case. For a more generalized approach it may be preferred that the original ECMs are always removed from the stream by the trick-play generation circuitry or software. However, this cannot always be true.

Fig. 16 shows ECM handling in a fast forward mode.

In a plurality of subsequent periods 1403 separated by SCB toggles 1402, a plurality of data blocks 1600 are reproduced, wherein a switching 1601 occurs between different data blocks.

For stream type I, an ECM B is sent at a border between periods A and B. For stream type II, an ECM C is sent at a border between period A and period B. Furthermore, according to stream type I, an ECM C is sent at a border between period B and period C. For a stream type II, an ECM D is sent at a border between period B and period C. For ECMs to be available for trick-play at the correct moment, the ECMs may be stored in a separate file. In this file it may also be indicated to which period an ECM belongs (which part of the recorded stream). The packets in the MPEG stream file may be numbered. The number of the first packet of a period (SCB toggle 1402) may be stored alongside with the ECM for this same period 1403. The ECM file may be generated during recording of the stream.

The ECM file is a file that may be created during the recording. In the stream, ECM packets may be located which may contain the Control Words needed to decrypt the video data. Every ECM may be used for a certain period, for instance 10 seconds, and may be transmitted (repeated) several times during this period (for instance 100 times). The ECM file
may contain every first new ECM of such a period. The ECM data may be written into this file, and may be accompanied by some metadata. First of all, a serial number (counting up from 1) may be given. As a second field, the ECM file may contain the position of the SCB toggle. This may denote the first packet that can use this ECM to correctly decrypt its content. Then the position in time of this SCB toggle may follow as the third field. These three fields may be followed by the ECM packet data itself.

Using the SCB toggles stored in the ECM file, it may be easy to detect if such toggle is crossed even if this would be during a jump. To send the correct ECM, it may be required to know whether the ECMs contain one or two CWs. In principle, this is not known because it is provider-specific and secret. However, this can easily be determined experimentally by sending ECMs at various moments and observing the results on the display. An alternative method that is particularly suitable for implementation in the storage device itself is as follows. Send one single ECM to the smart card at the moment of an SCB toggle, decrypt the stream and check for PES headers in the coming two periods. With one PES header per GOP, there are around twenty PES headers in each period. The position of a PES header may be easily detected because a PLUSI bit in the plaintext header of the packet may indicate its presence. If correct PES headers are only found during the first period (after the latency of the smartcard), the ECM contains one CW. If they are also found during the second period, it contains two CWs. Such a situation is depicted in Fig. 17.

Fig. 17 illustrates a situation for one CW detection and for two CW detection. As can be seen, different periods 1403 of encrypted content 1700 are provided. With a smartcard latency 1500, an ECM A may be decrypted to generate corresponding CWs. By decrypting the encrypted content 1700, decrypted content 1701 may be generated. Further shown in Fig. 17 are PES headers 1702, namely a PES header A in period A (left) and a PES header B in period B (right).

The area 1703 of period B for one CW in Fig. 17 indicates that the data is decrypted with the wrong key and therefore scrambled. This checking could be done while recording, in which case it will take for instance 20 to 30 seconds. It could also be done off- line and, because only two packets indicated by the PLUSIs (one in each period) would have to be checked, it could be very quick. In the unlikely event that adequate PES headers are not available, the picture headers could be used instead. In fact, any known information may be useable for detection. Anyway, a one/two CW indication may be stored in the ECM file.
In the following, some aspects related to dealing with slow-forward streams in particular will be described.

Next, trick-play GOP based slow- forward, still picture and step mode will be explained. S Io w- forward which may also be denoted as slow motion forward is a mode in which the display picture runs at a lower than normal speed. One form of slow- forward is already possible with the technique explained above referring to Fig. 7 and Fig. 8. Setting the fast-forward speed to a value between zero and one results in a slow- forward stream based on a repetition of fast-forward trick-play GOPs. For a plaintext stream, this is a proper solution, but for an encrypted stream it may lead to the erroneous decryption of a part of the I-frame in certain specific conditions. One option to solve this problem is not to repeat the fast-forward trick-play GOP but to extend the size of the trick-play GOP by the addition of empty P- frames. This technique in fact may also enable slow-reverse, because it is based on the trick- play GOPs used for fast-forward/reverse and therefore on the independently decodable I- frames.

Such an I-frame based slow-forward or slow-reverse may be inappropriate in special cases for the following reason. The distance between I-frames in normal play is around half a second and for slow- forward/reverse it is multiplied with the slow motion factor. So this type of slow- forward or slow-reverse is not exactly what is usually understood as the slow motion but in fact more like a slide show with a large temporal distance between the successive pictures.

In a still picture mode, the display picture may be halted. This can be achieved by adding empty P-frames to the I-frame for the duration of the still picture mode. This means that the picture resulting from the last I-frame is halted. When switching from normal play to still picture, this can also be the nearest I-frame according to the data in the CPI file. This technique is an extension of the fast-forward/reverse modes and results in nice still pictures especially if interlace kill is used. However, the positional accuracy is not always satisfactory when switching from normal play or slow-forward/reverse to still picture.

The still picture mode can be extended to implement a step mode. The step command advances the stream to some next or previous I-frame. The step size is at minimum one GOP but can also be set to a higher value equal to an integer number of GOPs. Step forward and step backward are both possible in this case because only I-frames are used.

For the construction of a slow- forward stream many considerations apply. For example, the construction of a slow- forward stream on elementary stream level can only be
performed on fully plaintext data. As a consequence, the slow-forward stream will be fully plaintext, even if the normal play stream was originally encrypted. Such a situation may be unacceptable to a copyright holder. Furthermore, this is worse than in the case of fast- forward/reverse stream because all information, i.e. each and every frame, is present in plaintext in the slow- forward stream and not just a subset of the frames as is the case for true fast-forward/reverse streams. Therefore a plaintext normal play stream can easily be reconstructed from a plaintext slow-forward stream. So the slow-forward stream should be encrypted if the normal play stream is encrypted. Since a DVB encryptor is not permissible in a consumer device this can only be realized if the slow- forward stream is constructed on transport stream level using the encrypted data packets from the originally transmitted encrypted data stream.

In the following, referring to Fig. 18 to Fig. 58, systems will be described which are capable of processing a data stream in a system according to exemplary embodiments of the invention. It is emphasized that the systems described in the following can be implemented in the frame of and in combination with any of the systems described referring to Fig. 1 to Fig. 17.

In the following, referring to Fig. 18, a data processing device 1800 for processing an MPEG2 data stream including (for instance video) content according to an exemplary embodiment of the invention will be described.

The data processing device 1800 comprises a hard disk 1801 on which video content is stored. Such video content may be transferred from the hard disk 1801 to a processing unit 1802 which may be capable of generating a trick-play stream from the normal play data read from the hard disk 1801. In another operation mode, the processing unit 1802 is adapted to generate a normal play stream, as selected by a user operating a user input/output unit 1804.

When the user adjusts a reproduction mode of the device 1800, a corresponding command may be supplied to a control unit 1805, which may communicate with the hard disk 1801, with the processor unit 1802 and with a timing unit 1803. The timing unit 1803 may calculate corrected or updated timing information when there is a switch between different reproduction modes. The output of the processing unit 1802 may be supplied to an input of the timing unit 1803. An output of the timing unit 1803 may be coupled to a reproduction unit 1806 (a loudspeaker and/or an optical display) which may play back the processed data in accordance with the selected operation mode, for
instance in a normal play mode or in a trick-play mode. The reproduction unit 1806 is also communicatively coupled to the control unit 1805.

Thus, the device 1800 is adapted for processing a video input stream comprising a sequence of input frames, wherein the processing unit 1802 generates an output data stream as a trick-play stream, when a user operating the user input output device 1804 has selected a trick-play mode. The trick-play stream may comprise a sequence of output frames based on the input data stream and based on a trick-play factor of, for instance, three when a slow- forward reproduction is desired.

The timing unit 1803 may assign timing information to the output frames, the timing information pointing from an output frame to be reproduced for the first time to a subsequent one of the output frames which is reproduced or to be reproduced for the first time. Namely, the reproduction of the trick-play mode may include playing back one and the same frame (for instance a B-frame) several times, or playing back frames followed by empty frames so as to generate the perception of a trick-play. Therefore, the timing information assigned to the trick-play stream by the timing unit 1803 may differ and may be independent from the original timing information which may be stored in the hard disk 1801. Thus, no time and resource consuming recalculation of timing references is necessary. It is easy to simply calculate new time pointers to take into account new frame conditions in the trick-play reproduction mode. The system 1800 may be adapted for processing an MPEG2 stream so that the timing unit 1803 may assign Decoding Time Stamps (DTS) indicating a time of decoding portions of the data stream, and/or may insert PCR packets at appropriate positions in the output data stream. Thus, the reproduction may include anchor frames (like I-frames and P- frames) which are repeated once and are followed by empty frames, and may include B- frames which are simply repeated a number of times, for instance three times when the trick- play factor is three.

Exceptionally large trick-play factors and exceptionally large B-frames may be treated separately.

In the following, further details concerning the slow- forward trick-play reproduction according to exemplary embodiments of the invention will be explained.

Next, splitting of the stream into separate frames will be explained.

To be able to construct a slow-forward stream on transport level it is advantageous that each individual frame is available as a series of transport stream packets. In case of one PES packet per frame this comes natural. A PES packet is contained in a series
of transport stream packets because PES and transport stream packets are aligned. In the case of one PES packet per GOP this is only the case for the start of the I-frame. All other frame boundaries are mostly located somewhere inside a packet. This packet contains information from the two frames. So first this packet may be split up into two packets, the first one containing the data from the first frame and the second one of the data from the next frame. Each of the two packets resulting from the splitting may be stuffed with an Adaptation Field (AF).

This situation is indicated in Fig. 19.

Fig. 19 shows splitting of the packet at a frame boundary. Particularly, Fig. 19 illustrates a plurality of TS packets 1900 each comprising a header 1901 and a frame portion 1902. As can be taken from a central portion of the data stream shown in Fig. 19, a packet comprising a header 1901 and two subsequent frames 1902 is split up into two separate portions each having a separate header 1901 followed by an Adaptation Field 1903 and followed by the corresponding frame 1902. The splitting of packets is not difficult for a plaintext stream. A first option is to fully decrypt the normal play data as depicted in Fig. 20. Fig. 20 shows a slow-forward construction after decryption of normal play data. Encrypted normal play data 2000 from a harddisk 2001 are supplied to a decrypter 2002 generating a plaintext stream 2003. The plaintext stream 2003 is supplied to a frame splitting unit 2004 for splitting the different frames in a manner as shown in Fig. 19. Then, this data is supplied to a slow- forward construction unit 2005 constructing a slow-forward stream, which is then supplied to a set top box 2006.

The decryption and slow- forward mode of a stored fully encrypted stream 2000 or a stored hybrid stream is not difficult because no stream data is skipped or duplicated in the stream by the decrypter 2002. The stored stream 2000 (fully encrypted or hybrid) is simply fed at a lower than normal rate through the decrypter 2002 which also means that there are no problems with embedded ECMs (Entitlement Control Messages). The plaintext stream 2003 coming from the decrypter unit 2002 can then be used to split the packets or in fact to perform any necessary stream manipulation in the frame splitting unit 2004. The resulting slow-forward stream is a plaintext stream in this case.

The construction of an encrypted slow- forward stream from an encrypted normal play stream is performed on transport level because the use of a DVB (Digital Video Broadcasting) encryptors and in consumer device may not be allowed in special cases. For this, a hybrid stream (see Fig. 21) with only a few plaintext packets 2100 and 2102 on all
frame boundaries are needed. Fig. 21 furthermore shows encrypted packets 2101 which belong to the I-frames 2103, B-frames 2104 or P-frames 2105.

Below, it will be described how such a stream could be generated on the playback side of the storage device if the stored stream is fully encrypted. In this case, the decrypter unit 2002 in Fig. 20 may be a selective type that only decrypts the necessary packets. But preferably the stream is already stored as a hybrid stream as indicated in Fig. 22.

Fig. 22 illustrates slow-forward construction on a stored hybrid stream 2200. In the array shown in Fig. 22, no decryption unit 2002 is foreseen between the harddisk 2001 and the frame splitting unit 2004. However, a decrypter unit 2201 may then be foreseen in the set top box 2006.

The plaintext packets 2100, 2102 in the hybrid stream should now also allow for the splitting of packets containing data from the two frames. This may be guaranteed by a criteria which will be described below in more detail. However, some part of the sequence header code or picture start code can still be located in an encrypted packet. In this case, an ideal splitting is not easily possible. In fact the split may be made between the encrypted and plaintext packets. Solutions for these problems will be described below in more detail. In that situation only empty P-frames are concatenated to an I-frame and vice versa. For a frame based slow- forward, also other types of concatenation may be considered among which the concatenation of B-frames to B-frames. This may result in some kind of gluing algorithm at these frame boundaries as will be clarified referring to Fig. 23.

Fig. 23 illustrates a data stream in which a previous frame 2300, a current frame 2312 and a next frame 2301 are shown. At the end of the previous frame 2300, three bytes of picture start code 2302 are provided. Furthermore, at the beginning of the current frame 2312 one byte of picture start code 2303 is foreseen. Coming now to the next frame 2301, the frame end of the packet before comprises one byte of picture start code 2304. At the beginning of the next frame 2301, three bytes of picture start code 2305 are provided. Fig. 23 shows that an incomplete picture start code may be present at the concatenation point. This may make a gluing necessary at a connection region 2306. Thus, gluing should be performed between the B-frame 2307 and a repetition of the B-frame 2308. Fig. 23 particularly illustrates a packet header 2309, plaintext data 2310 and encrypted data 2311. In the example of Fig. 23, there is only one byte of the picture start code at the start and the end of the B-frame. As a result, two bytes are missing at the concatenation point. The gluing algorithm, which will be described below in more detail may heal such a
problem. For this gluing it should be known how the picture start code is split. This information may be obtained with a method that will be described below in more detail. In the following, repetition of the frames will be described in more detail. In a slow- forward mode, the decoder has somehow to be forced to repeat the display of a picture in accordance with the slow- forward factor. Empty P-frames may be used to force the repetition of a picture resulting from an I-frame. This technique can also be applied for pictures resulting from P-frames. However, this technique cannot be easily applied for B-frames because empty P-frames always point to an anchor frame being an I- frame or a P-frame. This is in fact the case for any type of empty frame. So the repetition of a picture resulting from a B-frame has to be realized in another way. A possible method is to repeat the B-frame data itself. Since the repeated B-frames point to the same anchor frames as the original B-frame the resulting pictures will be identical. The amount of data for a B- frame is usually much more than for an empty P-frame but in general it is still significantly less than for an I-frame. Anyway, the transmission is also multiplied with the slow-motion factor so there need not be an increasing bit rate at least on average.

The empty frames used to force the repetition of pictures resulting from an I- frame or a P-frame can be of the interlace kill type thus reducing interlace artefacts for these pictures. But such a reduction is not easily possible for pictures resulting from the B-frames because the repetition is not forced by an empty frame but the repetition of the B-frame data itself. So the B-frames will have the original interlace effects. If interlace kill would be used for the I-frames and P-frames this might look very awkward because pictures with and without interlace effects are sequentially present in the stream of displayed pictures. It is presently believed that it might be better to only use empty frames without interlace kill to construct the slow- forward stream. The repetition of the I- and P-frames may be enforced by the insertion in the transmission stream for empty P-frames after the original I-frame or P-frame. Such a method may be used for the fast forward/reverse stream comprising I-frames followed by empty P- frames. However, this method may be not absolutely correct for a stream that also includes B-frames, as in the case for a slow- forward stream constructed from a stored transmission stream with B-streams. Due to the reordering from transmission data to display stream, the I- frames and P-frames will be repeated in the wrong position thus disturbing the normal display order of the frames. This is illustrated in Fig. 24 and Fig. 25.

Fig. 24 illustrates the effect of reordering in normal play. Fig. 24 shows a transmission order 2400 and a display order 2401. Particularly, Fig. 24 depicts the effect of
reordering in normal play. The top line shows a normal play transition stream 2400 with a GOP size of 12 frames comprising I-frames 2103, P-frames 2105 and B-frames 2104. The first four frames of the next transmission GOP are also shown for clarity. The bottom line of Fig. 24 shows the stream 2401 after reordering to the display order. The index indicates the display frame order. According to the MPEG2 standard ISO/IEC 13818-2: 1995(E) (see particular pages 24 and 25), the reordering may be performed as follows:

- B-frames keep their original position;

- Anchor frames (that is I-frames and P-frames) are shifted to the position of the next anchor frame. Fig. 25 shows the effect of reordering in slow- forward mode. Particularly, Fig.

25 illustrates the transmission order 2500, an order after the reordering 2501 and an order of the displayed pictures 2502. Looking at the slow- forward stream constructed from the normal play stream in more detail, the top line of Fig. 25 shows the transmission order 2500 of the first part of the slow-motion stream for this case, assuming a slow-motion factor of three. Empty P-frames may be inserted after the I-frames and the P-frames, and the B-frames may be repeated. The middle line of Fig. 25 shows the effect of the reordering. The bottom line of Fig. 25 shows how the I-frames and the P-frames are repeated by the empty P-frames in this case. An empty P-frame may result in a display picture that is a copy of the picture resulting from the previous anchor frame, which itself could also be an empty P-frame. It is visible in Fig. 25 that the normal display order 2502 indicated by the index is disturbed because the display of frame 14 is split up into two parts. Only the last time frame 14 is displayed in the correct position. This also means that the B-frames may be decoded erroneously.

In the following, several options will be described how to correct such deficiencies. One possibility is shown in Fig. 26. Fig. 26 shows the insertion of empty P- frames before the anchor frames. The three rows in Fig. 26 are similar to the three lines of Fig. 25. In Fig. 26, the empty P-frames are inserted before the anchor frames in the transmitted stream extracted from the storage device as is shown in the top line 2500. In the reordered stream 2501, the empty P-frames are now positioned after the anchor frames. This is where they should be for a correct repetition of the anchor frames as is clear from the display pictures 2502 of Fig. 26.

However, there are arguments why it may be appropriate to avoid empty P- frames. One is related to the propagation of errors within a GOP. P-frames depend on the previous anchor frame and B-frames depend on the surrounding anchor frames. A data error during the transfer to the set top box results in coding errors and therefore disturbances in the
picture. If this error is an anchor frame it propagates until the end of the GOP because subsequent P-frames depend on this anchor frame. Also the B-frames are affected because they use the pictures from the disturbed surrounding anchor frames for the decoding. This may have the consequence that the picture disturbances gradually increase towards the end of the GOP. This may be especially important for slow- forward where the GOP size can be very large and therefore very long in time. On the other hand, a data error in a B-frame has only a very limited effect because no other frames depend on it. So the picture disturbances are restrained to this B-frame and its repetitions. One might argue that data errors should not occur on a digital interface but there may be a second advantage in preventing the use of empty P-frames. If these are of the interlace kill type they change at the decoded picture by nature resulting in decoding errors for the subsequent frames. So interlace kill may be not possible.

Referring to the construction of empty frames, several types of empty B- frames can be constructed. They may have the advantage that no additional error propagation is introduced and that interlace kill can be used.

Possible types of empty B-frames are the forward predictive empty B-frames (which may be denoted as Bf frames) and backward predictive empty B-frames (which may be denoted as Bb frames).

A B-frame is normally bi-directionally predictive, but uni-directional predictive B-frames can also exist. In the latter case they can be forward or backward predictive. Forward predictive means that an anchor frame is used to predict the following B- frames during encoding. So the picture resulting from a forward predictive B-frame is reconstructed during decoding from the previous anchor frame. This means that the Bf- frame forces the repetition of the previous anchor frame. Therefore, it has the same effect as an empty P- or Pe-frame. The Bb-frame has the opposite effect. It forces the display of the anchor frame following it. For both types of empty B-frames, an interlace kill version is possible as well.

In the following, it will be described how to use such empty B-frames for the construction of a slow- forward stream. A first possibility on the basis of Bb-frames is depicted in Fig. 27.

The Bb-frames are inserted before the anchor frames and keep their position during the reordering. The anchor frames are shifted to the position of the next anchor frame. The Bb frame forces the display of the anchor frame following it in the reordered stream.

Another option is the use of Bf- frames as shown in Fig. 28.
The Bf- frames are inserted after the anchor frames in the transmission stream. The repeated display of the anchor frames in the reordered stream is forced by the Bf-frames that follow them.

The use of Bf-frames is similar to the use of empty P-frames for the construction of fast-forward and fast-reverse streams. In fact the use of Bf-frames is also possible in that case thus commonising the trick-play generation even further. But when Bf- frames are used for fast-forward and fast-reverse, the effect of reordering should be considered. This means that some parameters in the fast-forward/reverse stream like PTS/DTS and temporal reference have to be chosen appropriately. In the following, further details concerning the temporal reference will be explained.

The display order within the transmission GOP starting with a GOP header is indicated by the temporal reference in each picture header. The first frame to be displayed has a temporal reference equal to zero. This is depicted in Fig. 29 for a normal play stream. Fig. 29 illustrates a temporal reference 2900 for the transmission order 2902 and illustrates a temporal reference 2901 for a display order 2903.

In display order 2903, the temporal references 2901 are a monotonously increasing series from 0 to 11. Due to the reordering, the temporal references of the anchor frames in the transmission stream are shifted. Considering the temporal references in the case of a slow- forward stream, the situation for the preferred case that the Bf-frames are inserted is depicted in Fig. 30 for a slow motion factor of three.

Fig. 30 indicates the temporal reference for slow-forward with Bf-frames.

The top line of Fig. 30 indicates the frames taken from the normal play stream shown in Fig. 29 with the original temporal references. The second line of Fig. 30 shows the insertion of Bf-frames and the repetition of the B-frames. The original temporal references are shown above this line and how they should be below this line. The third line of Fig. 30 shows the frames after reordering, and the bottom line of Fig. 30 shows the displayed pictures. The temporal references of the reordered frames are shown below these lines. It forms an increasing series from 0 to 35. The temporal references in the case of pre-insertion of B-frames or Pe- frames are depicted in Fig. 31 and in Fig. 32 for comparison.

It can be taken from Fig. 30,Fig. 31 and Fig. 32 that the frames of the slow- forward stream should be provided with new temporal references. How these are derived is explained hereinafter. It should be mentioned that in theory a GOP does not need to be
preceded by a GOP header. Although a GOP without GOP header has not been encountered in practice, this situation will also be considered. The temporal reference is only reset to zero for the first displayed frame after a GOP header. So in the absence of a GOP header the temporal reference will not be reset to zero but increased to its maximum value of 1023 and then return to zero. In this case, the I-frame has to be treated in the same way as the P-frame and the B-frame following an I-frame as a B-frame following a P-frame. All calculations are performed on a modulo 1024 basis. For the generation of new temporal references, a distinction is made between the new temporal references for the B-frames and for the anchor frames. In the following, new temporal references for the B-frames will be described.

No distinction is here made between original B-frames, repeated B-frames or inserted empty B-frames. But another categorization of the B-frames is made in relation to the temporal reference.

Fig. 33 shows an example for the case that Bf- or Bb-frames are inserted (note that BB is not Bb). In general, three types of B-frames are distinguished:

1. B-frames following an I-frame (Bi).

This is always the first frame to be displayed of the current transmission GOP. If no GOP header is present, it is treated as a B-frame following a P-frame. When a GOP header is present, the temporal reference in this B-frame is zero:

T(Bi) = O (5)

2. B-frames following a P-frame (Bp).

Due to the reordering, this B-frame is displayed after the last anchor frame preceding the P-frame in the transmission stream in front of this B-frame. This last anchor frame is denoted by AL and can be an I-frame, a P-frame or an empty P-frame. In this case, the temporal reference of the B-frame is equal to the temporal reference of the last anchor frame AL increased by 1 :

3. B-frames following another B-frame (BB).

It is displayed after the preceding B-frame (BL) in the transmission stream, which can also be an empty B-frame.
In this case, the temporal reference of the B-frame is equal to the temporal reference of the preceding B-frame increased by 1 :

T {BB) = T {BL} + 1 (7)

Next, new temporal references for the anchor frames will be described. Due to the reordering, the anchor frames will be displayed after the sequence of B-frames following them in the transmission stream. So it is important to know how many B-frames will follow the I-frames and P-frames in the slow-forward stream to determine their new temporal reference. In the case of a varying GOP size or of a varying GOP structure this cannot be derived from history. In practice, a varying GOP structure is not common. Even for stations having a varying GOP size, the anchor frames will always be followed by the same amount of B-frames. Nevertheless, a varying GOP structure will be considered and is possible. To be able to handle a varying GOP structure, the number of B-frames that will follow an individual anchor frame in a transmitted slow- forward stream has to be determined. This can be calculated from the slow motion factor and the number of B-frames following this anchor frame in the original recorded stream, taking into account whether empty B-frames or empty P-frames are inserted. So this number of B-frames is determined somehow. A possibility how this can be performed is to read all the data up to the next anchor frame but this demands for a substantial amount of buffering. Another possibility avoiding this buffering is to store this information in the CPI file and extract it from there. The number of B-frames can be easily derived from the distance in frames to the next anchor frame in the transmitted stream. In fact it is equal to this distance minus one. There are two ways to store this information in the CPI file:

1. The CPI file holds an entry for each frame including its type;

2. The CPI file holds an entry for each anchor frame that includes the distance in frames to the previous anchor frame.

In the first case, the distance in frames to the next anchor frame can easily be counted in the CPI file. The second case may seem a bit strange because the distance of the previous anchor frame is stored with the frame instead of the distance to the next anchor frame. This is chosen because the distance of the previous anchor frame is known at the moment that an anchor frame is received. The distance from the current anchor frame to the next anchor frame is simply found by reading the distance information from the next anchor
frame in the CPI file. This distance will be denoted by D and the slow motion factor will be denoted by L, both of which being an integer larger than zero (see Fig. 34).

Fig. 34 shows the distance D and the slow motion factor L for normal play 3400 and for slow- forward play 3401. The factor L is therefore not the speed factor but the slow down factor.

The total number of B-frames following the anchor frame depends on the insertion of empty B-frames or P-frames. So it is distinguished between two situations, namely that empty B-frames (Bf or Bb) or empty P-frames (Pe) are inserted. In case no GOP header is present, the I-frame is treated as a P-frame. Next, the new temporal reference in case that empty B-frames (Bf or Bb) are inserted will be described.

The original distance to the next anchor frame is equal to D (see Fig. 34).

The distance to the next anchor frame in the slow- forward stream is equal to L x D. So the total number of B-frames following the anchor frames is equal to

L x D - 1.

The first B-frame following an I-frame has a temporal reference of zero (see Fig. 35).

So the last B-frame following the I-frame has a temporal reference equal to L x D - 2. The I-frame is the next one to be displayed, so its temporal reference is one higher. Then the temporal reference for the I-frames is given by:

T(I) = Z x D - I (8)

The temporal reference for the P-frame also depends on the temporal reference of the previous anchor frame and the slow-forward stream. This previous anchor frame (I- frame or P-frame) will be denoted by AL, and its temporal reference is denoted by T(AL) (see Fig. 36).

The B-frame following the P-frame will be displayed after the previous anchor frame AL. SO the temporal reference of this B-frame is equal to T(AL) + 1.

The temporal reference of the last B-frame following the P-frame is T (AL) + L X D - I .

The P-frame is the next one to be displayed so its temporal reference is one higher. Then the temporal reference for the P-frames is given by:
T{P} = T{AL} + L x D (9)

In the following, it will be explained how the temporal reference is defined in case that empty P-frames (Pe) are inserted.

Since no empty B-frames are inserted, the total number of B-frames following an anchor frame is now L x [D - X) instead of L x D - 1 (see Fig. 37).

The temporal reference for the I-frames is now given by:

T(I) = Z x (D - I) (10)

A distinction is now made between P-frames and Pe-frames.

The anchor frame previous to the P-frame is normally a Pe-frame except for the case L = I where it is an I-frame or P-frame. In any case the previous anchor frame will be denoted by AL and its temporal reference by T{AL}, see Fig. 38.

The temporal reference for the P-frames excluding the Pe-frames is now given by:

T(P) = T{AL} + L X (D - 1) + 1 (11)

After the reordering, a Pe-frame will immediately follow a previous I-frame, P-frame, or Pe-frame, so a previous anchor frame. As a result, the temporal reference to the Pe-frame is always one higher than that of the previous anchor frame AL (see Fig. 39). The temporal reference for the Pe-frame can also be calculated with the formula for the P-frame by taking D = I. This results from the fact that a Pe-frame in the transmission stream is always followed by another anchor frame. It should also be noted that L = I corresponds to normal play and results in a normal temporal reference in all cases. Next, gluing of the individual frames will be described. Particularly, the gluing of frames in the case of incomplete picture start codes will be discussed. In order to determine the required gluing activities at the concatenation point in the slow- forward stream, it should first be clear where the original stream is explicitly split into individual frames. In the following, the practical situation of one PES packet per GOP or per frame will be considered.
In the case of one PES packet per frame, the original stream may be split between the packet with the PLUSI and the preceding packet, as indicated in Fig. 40.

In Fig. 40, the splitting of the stream for one PES packet per frame is illustrated. The data streams shown in Fig. 40 include plaintext packet headers 4000, Adaptation Fields 4001, plaintext data 4002, encrypted data 4003 and plaintext PES header 4004. Furthermore, a PLUSI present is denoted with reference numeral 4005, and a PES header is denoted with reference numeral 4006.

The individual frames comprise a number of complete original packets. So no packet splitting is necessary. This frame splitting could also be performed in a completely encrypted stream, but access to some plaintext data is still necessary for the construction of the slow- forward stream. The splitting at the start of a packet with a PLUSI also means that there are no picture start codes that are spread over two packets. Each individual frame contains its own correct and complete picture start code. Therefore, no gluing activity is necessary in this case. However, in the case of one PES packet per GOP, the situation is different.

The split between frames is made at the picture start code of a new frame, unless a PES header precedes it.

The following algorithm may be used to determine the splitting point:

1. The original stream is simultaneously researched for a packet with a PLUSI bit set, a picture start code and a picture coding extension;

2. If the packet with the PLUSI bit set is encountered first, the split is made at the start of this packet (see Fig. 41, including a picture start code 4100 and a picture code extension 4101). Subsequently, the stream is searched for the picture coding extension. After this is found, the search is continued as described in point 1.; 3. If the picture start code is encountered first, the split is made at the start of the picture start code. In many cases this means that the packet containing the picture start code has to be split in two packets of which the first is assigned to the previous frame and the second to the subsequent frame (see Fig. 42 illustrating splitting of a stream at the start of a picture start code 4100, wherein places of insertion of an Adaptation Field are denoted with reference numeral 4200). Both packets are stuffed with an Adaptation Field 4200. The payload of the second packet then starts with the picture start code 4100. The recording time stamp of the original packet is copied to each of the two packets resulting from the split. Whether the two packets from the split or the original packet will be used at a concatenation point of two frames depends on the specific situation as will be explained
below. Subsequently, the stream is searched for the picture coding extension 4101. After having found this, the search is continued as described in point 1.;

4. If the picture coding extension is encountered first, the picture start code must be undetectable because it is partially encrypted. This means that the current plaintext area starts with some bytes of the picture start code. In this case the split is made at the start of the first plaintext packet of the current plaintext area (see Fig. 43 showing the splitting of the stream within a picture start code 4100, and illustrating bytes of picture start code 4300 as well as picture code extension 4101). The search which is described in point 1. is continued after having found picture coding extension 4101. The described algorithm would also result in the correct splitting points for a stream with one PES packet per frame. Moreover, the algorithm is designed for application to plaintext streams as well as the hybrid streams mentioned above.

Gluing is only necessary in the case of incomplete picture start codes that can only result from point 4. of the given algorithm. So only point 4. leads to a non-ideal splitting point. A plaintext stream contains only ideal splitting points because the picture start code is always found. So no gluing is necessary in this case. But hybrid streams will contain non- ideal splitting points. A method described below may be used to determine how many bytes of the picture start code are on either side of the non-ideal splitting points. The effects of a non-ideal splitting point will be explained in detail hereinafter. Next, the situation will be considered that empty P-frames of any type are inserted at such a non-ideal splitting point. How to handle the first empty frame will be explained below. A number of bytes equal to the part of the picture start code after the splitting point is removed from the picture start code of the first empty frame. The intermediate empty frames are unchanged. The last empty frame has to be corrected for the missing part of the picture start code of the subsequent frame. So this missing part may be added to the end of the last empty frame. No changes are necessary to empty frames that are inserted at ideal splitting points.

In the following, the repetition of the B-frames will be considered. In case the B-frame has ideal splitting points on both sides, no gluing action is necessary for the repetition. But if a non-ideal splitting point is present on either side of the frame, gluing actions may be necessary or advantageous. The original frame and its repetition form a series of identical B-frames. No gluing action is necessary at the start or end of the series because here the frame is either connected to the same frame as in the normal play stream or to an empty frame. In the first case there is no discontinuity because normal order of the data is
restored at this point. The solution for the second case has been given above. So only the intermediate concatenation points have to be considered where the end of a B-frame is connected to the start of the same B-frame. The example described here refers to the example given above referring to Fig. 23 and is repeated in more detail in Fig. 44 for clarity. Fig. 44 illustrates incomplete picture start code at the concatenation point.

For a correct gluing it is necessary to know the number of bytes of the picture start code (within MPEG2 the start code may be 4 bytes in length) at the end and the start of the B-frame. Denoting the number of bytes at the end by n and at the start by m, for an ideal splitting point n=0 and m=4. In the case of a non-ideal splitting point, the number n for one frame and the number m for the subsequent frame may be determined with a method which will be illustrated below.

It is evident that n can never be equal to 4 because then the split would have been made at the start of the picture start code resulting in n=0. On the other hand, m can never be 0 because in that case the picture start code would be completely in a previous frame and the split would have been made in the ideal position thus leading to m=4. So 0 < n < 3 and 1 < m < 4 is a usual situation.

In order to get the numbers n and m for one and the same frame N, these numbers have to be extracted from the information of the two splitting points surrounding the frame. So n and m now represent the number of bytes of the picture start code at the end and start of a B-frame that has to be repeated. As a consequence, they also represent a number of bytes of the picture start code before and after an intermediate concatenation point.

Next, it will be assumed that n+m=4. This is the case when both splitting points surrounding the B-frame are ideal. But it is already known that no gluing action is needed in that case. However, this can be also the case when both splitting points are non- ideal. This is the situation depicted in Fig. 45.

Fig. 45 therefore illustrates the example of n+m=4.

The last packet of frame N is denoted with reference numeral 4500, and Fig. 45 further shows the first packet of frame N denoted with reference numeral 4501. No gluing action is necessary at a border 4502. The bytes of the picture start code (n=3) is denoted with reference numeral 4503, and the byte of picture start code (m=l) is denoted with reference numeral 4504.

The fact that n+m=4 means that the correct amount of picture start code bytes are present at the concatenation point and that no gluing action is necessary.

However, Fig. 46 shows the situation with n+m>4.
This means that there are 1 , 2 or 3 bytes too much at the concatenation point. In this case a number of bytes equal to n+m-4 is removed from the start of the second frame. This is accomplished by replacing these plaintext bytes by an Adaptation Field (AF) containing stuffing bytes. If an Adaptation Field is already present, its length has to be increased with m+n-4 and the data to be discarded is replaced by stuffing bytes that, according to the standard, have a hexadecimal value FF.

In the special cases of n+m>4 and n<3 it is also possible to do no gluing. Effectively, one gets elementary stream stuffing.

A point at which gluing action is necessary is denoted with reference numeral 4600. In the example, the bytes of picture start code (n=2) is denoted with reference numeral 4601. Bytes of picture start codes (m=3) are denoted with reference numeral 4602. Furthermore, bytes of picture start code (n=2) are denoted with reference numeral 4603 and bytes of picture start code (m=2) are denoted with reference numeral 4604. A position of replaced bytes using Adaptation Fields (n+m-4) is denoted with reference numeral 4605. Referring to Fig. 47, it is assumed that n+m<4.

This means that 1 , 2 or 3 bytes are missing from the picture start code at the concatenation point. In this case it should be known which byte or bytes are missing. Because n and m are both known, the missing bytes can be uniquely identified. The missing bytes are now placed in a new packet that is further stuffed with Adaptation Field. This gluing packet is then placed between the two frames. This gluing packet is denoted with reference numeral 4700. Reference numeral 4701 denote bytes of picture start code (n=2), reference numeral 4702 denote bytes of picture start code (m=l). Reference numeral 4704 denotes inserted bytes (4-n-m). Reference numeral 4705 illustrates bytes of picture start code (m=l).

In the following, DTS (Decoding Time Stamps) and PTS (Presentation Time Stamps) in the slow- forward stream will be explained.

This description includes the description of the generation of new DTS and new PTS values for all the frames in the slow- forward stream, so including repeated B- frames and empty frames. The given DTS and PTS formulas result in a continuous PTS in the display stream (so after reordering) when switching from normal play to slow- forward. A discontinuity in the display of frames at the switching point may thus be avoided. No additional DTS or PTS has to be inserted in the stream; only an existing DTS or PTS is replaced by a new value. The PES packet length may be changed to zero (unbounded) whatever its original value. In the case of one PES packet per GOP, an incorrect PES packet length at the switching point cannot be avoided if this value was other than zero, unless
substantial buffering is used or the switch is delayed to the start of an I-frame. In practical broadcast streams, the PES packet length is mostly set to unbound (zero).

In the following, calculation of the a DTS value will be explained. According to the MPEG-2 standard (ISO/IEC 13818-1 : 1996(E), see particular pages 95 and 96), the DTS of frames with no DTS has to be calculated in the sequential way from the most recent frame with a DTS by means of the following formula:

DTS[F] = DTS[FL] + Delta (12)

In formula (12), F designates the current frame and FL the previous frame in the transmission stream. The existence of formula (12) can be easily understood. The stream is reordered with the purpose that the decoding order is identical to the transmission order. That the decoding of subsequent frames is separated by one frame time is obvious. These observations immediately lead to the given formula assuming that Delta is a DTS increment that corresponds to one frame time. Formula (12) may for instance be used in the case of one PES packet per GOP to successively calculate the DTS of all the frames and the GOP from the DTS of the I-frame. But this formula can also be used to easily derive the DTS of all frames in the slow- forward stream from the DTS of the last frame before the switching point.

The parameter Delta is equal to the number of 90 kHz periods in one frame time because the DTS is linked to the PCR base. Some values for Delta in dependence of the frame rate are given in Fig. 48.

Namely, Fig. 48 shows Delta as a function of the frame rate. In the following, a calculation of PTS values will be explained. According to the MPEG-2 standard (see page 34 of ISO/IEC 13818-1 : 1996(E)), the PTS of a B-frame is given by:

PTS[B] = DTS[B] (13)

According to the same standard, the PTS for an anchor frame is given by:

PTS[A ] = DTS[A] for a low_delay sequence ( 14)

PTS[A] = DTS[AN] for a non-low_delay sequence (15)
In formula (15), AN stands for the next anchor frame in the transmitted stream. A low delay sequence is a stream without B-frames in which the low-delay flag is set. In practice, streams without B-frames have been encountered, but not a low delay sequence.

For a non-low delay sequence, the PTS may be expressed as a function of the DTS of the same frame. In the normal play stream, the distance to the next anchor frame is equal to D frames. The distance to the next original anchor frame in the slow- forward stream is increased by the slow-motion factor L to L x D frames. In the case that empty B-frames are used, no additional anchor frames are present in the slow-forward stream. Since DTS values increase by Delta from frame to frame, the following relation holds:

DTS {AN} = DTS{A} + L X D X Delta ( 16)

Substitution of DTS[AN) by PTS {A} leads to the following formula for the PTS of anchor frames in the case that empty B-frames are inserted:

PTS {A } = DTS[A) + L x D x Delta ( 17)

In the case of pre-insertion of empty P-frames, the distance of an original anchor frame to a next anchor frame, which is now an empty P-frame, is reduced by L - 1 frames. The distance of an empty P-frame to the next anchor frame is always equal to one frame. The PTS of anchor frames in the case that empty P-frames are inserted is then given by:

PTS[A) = DTS[A) + [L x D - (L - l)] x Delta (18)

PTS[Pe) = DTS[Pe) + Delta (19)

In the case of one PES packet per GOP, only the DTS and PTS of the I-frame are replaced by the calculated values. No additional DTS/PTS have to be added to the stream. Although this may result in a violation of the maximum distance of 700 ms between two PTS values in the presentation stream, no problems are expected. The reason is that the DTS/PTS values of the total slow- forward stream are calculated according to the rules for missing DTS/PTS values. This means that no discrepancies should be present between the values calculated by the STB for the frames in the slow- forward stream and the actual DTS/PTS
values. Since the calculated PTS is not used for any other purpose than the replacement of an original PTS, it only needs to be calculated for the I-frames in the case of one PES packet per GOP. The DTS has to be calculated for every frame though. First of all because the DTS of a frame is calculated from the DTS of the previous frame, but additionally because the old and new DTS of a frame are used in the following for the positioning of the data of this frame in the slow-forward stream.

Next, several aspects related to positioning of the frames and packets will be described.

In the following, the placement of frames and packets on the time axis of the slow- forward stream will be explained, without using the original timing relation the packets had in the normal play stream.

Methods to calculate the new DTS and PTS for the frames in the slow- forward stream have been described above. From those values it is possible to reconstruct the PCR clock in such a way that all frame data is available in the decoder buffer when needed. This means that all data belonging to a frame should be transmitted to the decoder buffer before the PCR clock reaches the value of the DTS for that frame. So it should be decided where to place the DTS on the time axis. In fast-forward, the DTS may point to the start of the transmission period of the next GOP. In slow-forward that would require an enormous amount of buffering in the MPE G2 decoder. This originates from the fact that in the case of slow- forward, the GOP size determines the slow motion factor (and vice versa). In fast- forward, the GOP size only determines the refresh rate. The speed factor may be changed by skipping frames. However, in slow- forward, a slow motion factor increase may be achieved by lowering the refresh rate. This may result in a GOP size increased proportional to the slow motion factor, potentially leading to a very large GOP size. But as in fast-forward, the DTS should correspond to a point after the end of the frame. In slow- forward, a possible choice is that it corresponds with the start of the transmission period of the next original frame. In the following, smoothing of the frame data will be explained. Each frame is repeated the same amount of times, so only integer slow motion factors can be achieved. Although this is not required, it eases the construction of the slow- forward stream. For the anchor frames, which are repeated by using empty frames, this means that the data of the following frame(s) is very small. The original anchor frame data can therefore easily be smoothed out in time over the entire period it is displayed. This however means that there will be a relation between the slow motion factor and the slow- forward stream bit rate during the transmission of the anchor frames and its repeating frames.
For high slow motion factors, the bit rate during the transmission of the anchor frames and its repeating frames can be very low, but for small slow motion factors, the bit rate can be very high as well. If the original duration of an anchor frame is more than L frames, where L is the slow motion factor, the anchor frame will be time compressed in the slow- forward stream. This means that the bit rate of the slow- forward stream can locally be higher than the bit rate of the normal play stream.

B-frames cannot be repeated by empty frames and must be repeated by retransmitting the entire B-frame data. So each repeated B-frame requires an equal amount of data as the original B-frame. This means that there is no relation between the slow motion factor and the slow- forward stream bit rate during the transmission of the B-frames and its repeating frames. It also means that there is no advantage to smooth a B-frame in its repetitions over their total display period of L frames. Because these frames are equal in size, each frame will be smoothed over exactly one frame period anyway. So smoothing a B-frame over a single frame period can just as well be done directly. Next, some aspects related to the creation of the PCR time base will be mentioned.

As noted before, the formulas for calculating the DTS and PTS are given above. The DTS is kept continuously increasing at the switching point from normal play to slow- forward. The DTS should point to the start of the transmission period of the next original frame. Without a PCR, the DTS has no meaning however. This determines a strict relation between the DTS and the PCR, which may be different from normal play. This may imply that a PCR discontinuity can occur during the switch from normal play to trick-play. MPEG2 has a PCR discontinuity flag to indicate this. The PCR comprises two parts, a base called PCRbase and an extension called PCRext. The PCRbase can be reconstructed directly from the DTS of a frame and the slow motion factor L as follows:

PCRbase = DTS - L x Delta (20)

This implies that the PCR packet is inserted at the start of every original frame. It will be clear that the original PCRs have to be removed from the stream. Due to the fact that the PCRbase is related to the DTS, one might expect that a PCR packet could be inserted at any location where the DTS value is known, which is at the start of every frame. But for the PCRs to be in the correct position according to their value, they should be placed on a frame grid in this case. This however means that all the frames, so including the anchor
frames, are smoothed over one frame period. However, the anchor frame data should be smoothed over the entire display period to avoid excessive bit rates.

Therefore, PCR packets should only be inserted at the start of original frames in the slow- forward stream. This may result in a violation of the required maximum PCR distance, especially for high slow motion factors. Due to the smoothing, intermediate PCR values can be calculated assuming a constant bit rate between successive PCR packets if needed. With the exception of some exotic frame rates (see description below), the PCRext can be kept constant. As a PCR discontinuity is indicated anyway, PCRext can just be set to zero. The effects of these formulas is depicted in Fig. 49 indicating a structure 4900 of the slow-forward stream.

A PCR is inserted at the start of each Z-sized block of data, which indicates an effective display period for a single frame. The PCR value at that point is chosen such that the DTS of the first frame in that block is located at the start of the next Z-sized block. This is indicated with the arrows 4901 in Fig. 49. The arrows 4902 indicate the location of the DTS for all the remaining frames. With the anchor frames 2103, 2105, the effect is shown that almost all the original data is smoothed over the entire display period, as the empty frames are relatively small. For the B-frame 2104 however, each frame is equally large, which results in a smoothing effect over exactly one display period. Next, it will be described how to handle large slow motion factors.

A method to smooth anchor frames over the entire GOP display period as described in the previous sub-section may have one disadvantage. It was already mentioned there that the change in bit rate during transmission of the anchor frames is proportional to the slow motion factor. For larger slow motion factors this also implies that both the DTS and the PTS of these anchor frames are placed proportionally further away in the future. This means that the algorithm cannot handle the extreme case where the slow motion factor could be set to an infinite value.

Although an infinite slow motion factor may at first sound like a purely theoretical problem, this is not entirely true. The reason for this is that using an infinite slow motion factor would be desired to create a still picture mode as a natural extension of the slow- forward processing. In this extreme case of an infinite slow motion factor, the packets would all be smoothed over an infinite amount of time. Effectively, this would mean that no packet at all is able to leave the slow- forward processing engine any more.
A solution is to set a maximum period over which anchor frames can be smoothed. This is only required for anchor frames, as B-frames are always spread over exactly one display period. A suitable maximum value would be six display periods. It is highly unlikely that any anchor frame exceeds this maximum in normal play, so usually the bit rate will be decreased during the transmission of the anchor frames. For a slow motion factor L larger than six, the anchor frame and five empty frames will be spread over six display periods. But the picture on the screen resulting from this anchor frame may be displayed L times. So another L-6 empty frames have to be added. These extra empty frames are added just as before, but they should always smooth individually over exactly one display period, or at least only one of them should be transmitted each display period. This is shown in Fig. 50.

Fig. 50 illustrates a limitation of the smoothing period. In this context, the term "FP" means frame period. An anchor frame is indicated with reference numeral 5000.

This also offers a partial solution to the PCR packet frequency problem. Above, it was also mentioned that a large slow motion factor might result in the violation of the MPEG2 specification on the maximum PCR packet distance. The reason for this violation is the simple fact that it is relatively difficult to calculate the intermediate PCR values, due to the packet smoothing. Therefore, only one PCR packet was added at the start of a smoothing period equal to L display times. In fact, a PCR packet could easily have been added to each B-frame because they are smooth over one display period. So if the highest smoothing distance is limited to six display periods, the PCR distance can also easily be limited to that same amount. This will still violate the restrictions MPE G2 puts on the PCR packet distance, but at least it is no longer unbound.

Next, a local bit rate rise in slow- forward will be explained. The bit rate of the slow- forward stream can easily be locally compared to the bit rate of the normal play stream it is generated from. The reason is that frames are compressed in limited time slots, so that the required timing information can easily be generated. The local bit rate when anchor frames are transmitted is coupled to the slow motion factor, as the time slot scales with the slow motion factor. Fig. 51 and Fig. 52 show a measurement of the anchor frame size in an exemplary recording.

In Fig. 51, a diagram 5100 is shown illustrating the I-frame size of an exemplary data stream. Along an abscissa 5101, the time in seconds is plotted. Along an ordinate 5102, the size in kB is plotted.
Fig. 52 shows a diagram 5200 illustrating a P-frame size in kB for an exemplary data stream.

The following relation holds for the trick-play stream bit rate during the transmission of an anchor frame (neglecting the size of the empty frames used to repeat the anchor frame):

LocalPeakBitrate{A) = (FrameRate x FrameSize) I L (21)

From relation (21), it is easy to see that when L increases, the peak bit rate will decrease. But if L is small, the peak bit rate during the transmission of the anchor frame may be higher than the bit rate in the normal play stream.

The bit rate of the B-frames does not scale with the slow motion factor, as these are always compressed into one display period. This means that high peak bit rates can also occur with large B-frames as well as with large anchor frames. Ordinarily, B-frames are not very big, but occasionally they can be.

Fig. 53 shows a measurement of the B-frame sizes in an actual recording in the form of the diagram 5300.

Particularly, Fig. 53 shows the B-frame size in kB for an exemplary data stream. However, for the trick-play stream bit rate during the transmission of the B- frames, the same relation as for the anchor frames holds, with the exception that L is no longer part of the relation as a B-frame is always spread over exactly one frame period.

LocalPeakBitrate{B) = FrameRate x FrameSize (22)

Looking again at the constraints MPEG2 puts on the stream, one can find the maximum peak bit rate that can be reached. Below it will be explained that no single frame can be larger than 1.2 x 1.835 = 2.202 Mbits or 275 kB. Would this frame be a B-frame, the maximum local peak bit rate according to the above formula would be over 55 Mbps with a frame rate of 25 Hz. This will violate the restriction of MP@ML that limit each video stream to 18 Mbps at the transport stream level. Fortunately, B-frames are rarely this large as can be seen in Fig. 53. In broadcast stream I-frames hardly ever reach 100 kB, and are typically 60 kB. A B-frame could be just as large, but over 60 kB is rarely seen. This size would lead to a peak bit rate for the compressed B-frames of approximately 12 Mbps.
Anchor frames will usually be larger, but are stretched over two or more frame periods. So they must be at least 120 kB to reach the same peak bit rate of 12 Mbps. This value is more likely to be reached in this case, especially for I-frames. The remedy is easier though for anchor frames, increasing the slow motion factor will drastically decrease the peak bit rate. The measurement of Fig. 51 does not show frames over 100 kB in size, so the minimum slow motion factor of 2 is possible in this case. In fact it is possible to use this factor up to an anchor frame size of 180 kB. A slow motion factor of 3 can be used up to a size of 270 kB, which is almost equal to the maximum frame size. So a minimum slow motion factor of 3 seems to be the best practical choice. In the following, reduction of the peak bit rate for the B-frames will be explained.

As mentioned above, a bit rate rise can occur in the slow-forward stream. An excessive rise in the bit rate for anchor frames can be avoided by the choice of an increased lowest slow motion factor, because the anchor frame is spread over L frame times. This does not decrease the bit rate of B-frames though, because they are spread over one frame time. The reason is that a repeated picture resulting from a B-frame can only be realized by a retransmission of all the B-frame data.

To reduce the bit rate, a B-frame should be spread over more than one frame time, which can only be realized by the addition of an empty frame instead of the full B- frame data. There is however no empty frame that can repeat a B-frame. Until now, the pictures resulting from every frame are repeated exactly the same number of times. This is not required though if a local variation in the visible slow motion speed is acceptable. This means that the empty frame may repeat an anchor frame one additional time at the cost of a one less repetition of the B-frame. This should not disturb the normal play display order of the frames. Therefore, only the B-frame data neighbouring an anchor frame or an empty frame in the original slow- forward stream can be replaced by an additional empty frame. This is depicted in Fig. 54.

Fig. 54 shows a replacement of a B-frame by an empty frame. If, in the original slow- forward stream, an anchor frame or an empty frame would precede the B-frame, this B-frame can be replaced by a Bf-frame in order to repeat the picture resulting from this preceding anchor frame. If, in the original slow- forward stream, an anchor frame or an empty frame would follow the B-frame, this frame can be replaced by a Bb-frame in order to repeat the picture resulting from the previous anchor frame. In theory,
this could also be a Pe-frame in this case, but this would necessitate the change of the PTS of the previous anchor frame.

The net effect of this replacement will be that L-I B-frames (plus a very small empty frame) can be evenly spread over L display periods, thus lowering the local bit rate by a ratio (L - X) I L. For larger slow motion factors, this ratio becomes close to one, thus reducing the effect considerably. But in fact empty frames can replace more B-frames as long as they are in a connected series. In this case, the ratio becomes (L - ή) I L assuming that empty frames replace a series of/? subsequent B-frames. The visible effect is small considering the fact that this method would only be applied to large B-frames. As this replacement technique can only be applied to B-frames in the slow- forward stream that are neighboured by an anchor frame or an empty frame, it can only reduce the bit rate of B-frames that are neighboured by an anchor frame in the original normal play stream. The application of this technique is only useful if the bit rate of all the B- frames can be reduced. So at most two contiguous B-frames may be present between the anchor frames and the original stream in order to appropriately apply this technique. In the following, stream construction on transport stream level will be explained.

Next, packet positioning will be mentioned.

The position of the packets copied to the trick-play stream cannot be coupled to the relative timing of the original transport stream, due to the compression and possible inversion (reverse) of the time axis in trick-play. Therefore, the pre-pended time stamps of the original time stamp transport stream are not used for trick-play generation. This is the reason why the described trick-play method can also be used for transport streams without pre-pended time stamps. Because the original relative timing cannot be used, another timing mechanism should be chosen. As will become clear later, the best way is to smooth the packet over a trick-play GOP as depicted in Fig. 55. Fig. 55 illustrates packet smoothing in trick-play.

The number of packets for the I-frame is known, as it is for the empty frames in some additional packets (e.g. PCR, ECM, SIT, DIT, etc.). The total of the packets is transmitted in the nominal GOP time that is equal to 1 / Rt or Tl R. The packet distance is calculated from the number of packets and the GOP time. In fact, the calculated packet
transmission moment is translated into new time stamps that are pre-pended to the trick-play packets. These time stamps may be derived from the calculated value of the new PCR trick- play time base at the start of the packet. In this way, the generated trick-play stream can be handled by the same output circuitry as used for normal play. Next, the Program Clock Reference (PCR) is mentioned.

The original PCR time base cannot be used for trick-play. First of all, it is probable but not guaranteed that a PCR will be present within the selected I-frame. More importantly, the frequency of the PCR time base is no longer correct. This frequency should be within 30 ppm from 27 MHz but is now multiplied by the trick-play speed factor, even leading to a time base running in the wrong direction for reverse trick-play.

So clearly the old PCR time base has to be removed and a new one added. Old PCRs are removed by cleaning the Adaptation Fields in which they are located. Adaptation Fields are not encrypted. The new PCRs are added by placing an additional PCR packet at the start of each trick-play GOP as indicated in Fig. 55. Because these GOPs are transmitted exactly in a nominal GOP time, the distance between PCR values is constant and can be derived from this nominal GOP time. As a result, the addition of a new PCR time base with high timing accuracy is simple.

The PCR comprises two parts named PCR base and PCR extension. The latter is the LSB part of 9 bits and ranges from 0 to 299. The PCR base is the MSB part with the size of 33 bits in a full range. The frequency of the PCR base is 27 MHz / 300 = 90 kHz. Almost all frame rates fit to this 90 kHz. For these rates, the PCR extension is constant for points that are an integer multiple of the frame time apart. Because the nominal GOP time is such an integer multiple, the PCR extension of all inserted PCRs of the new time base can be set to zero. Only the eccentric rates of 23.976 and 59.94 Hz do not fit to the 90 kHz. However, for 59.54 Hz the PCR extension is constant for a distance equal to an even multiple of the frame time and in case of 23.976 for a fourfold frame time. So with the IPPP (T= 4) trick-play GOP structure, a fixed value of zero for the PCR extension can be used for all frame rates, further simplifying the insertion of a new PCR time base.

The distance between subsequent PCRs in a transmitted stream, according to the MPEG2 standard, should not exceed 100 ms. In the DVB standard, this value is even lower, namely 40 ms. Sending only one PCR every trick-play GOP clearly violates these limits. In the envisaged worst case situation with T= A and R = 25 Hz, the distance between PCRs is 160 ms. In performed experiments, no problems have been experienced with
violating this distance. Additional PCRs could be included in the stream but this is more complex and does not seem necessary.

In the following, the Decoding Time Stamp (DTS) and the Presentation Time Stamp (PTS) will be illustrated. Frames can contain two time stamps, which inform the decoder when to start decoding the frame (DTS) and when to start presenting (displaying) it (PTS). They are started when DTS respectively PTS are equal to the PCR time base, which is reconstructed in the decoder by means of the PCRs in the stream. Since a new PCR time base is added to the trick-play stream and because the time distances for the DTS and PTS are no longer correct anyway, the DTS and PTS of the I-frame have to be replaced if present. They are located in the PES header. In principle, two ways exist to reconstruct a trick-play GOP, namely with one PES packet per frame and one per GOP. In the case of a partly encrypted picture start code, one PES packet per frame can in fact not be used. So one PES packet per GOP may be chosen even if the original stream was one PES packet per frame. Therefore, the inserted empty P-frames have no DTS or PTS. The PES packet length is set to zero (unbounded) whatever its original value.

Concerning the question when the decoding of the I-frames starts, the packets of the trick-play GOP are spread out over the constant GOP time. Almost all of the trick-play GOP consists of I-frame data, so the end of the I-frame is close to the start of the next GOP. Therefore, the decoding of the I-frame can start at the beginning of the next GOP. So the DTS of the I-frame is set to a value corresponding to the PCR time base at the start of the next GOP. The DTS and PTS can only contain a reference to the PCR base. The DTS is therefore identical to the PCR base that will be inserted at the start of the next GOP.

Concerning the question when the presentation of the I-frame starts, a time of one frame between DTS and PTS is not only common practice for the stream with only I- frames and P-frames, but is exactly what the MPEG2 standard prescribes for such a stream if the low delay flag is not set. So the PTS of the I-frame is set to the DTS value plus a value corresponding to one frame time. For the frame rates of 23.976 and 59.94 Hz, this is a value near to one frame time. The PCR distance between the start of successive trick-play GOPs has a precision equal to the PCR base and therefore equal to the DTS and PTS. The offset value between PTS and DTS can be calculated by dividing the PCR distance by the trick-play GOP size T. This is in fact very simple in the case of an IPPP

(T = 4) structure where one has to divide by 4. One can simply shift that bit of the PCR distance by two places to calculate the PTS/DTS offset.
This is depicted in Fig. 56.

Fig. 56 shows a diagram 5600 having an abscissa 5601 along which the time is plotted. Along an ordinate 5602, the PCR base is plotted. Thus, Fig. 56 shows DTS and PTS in relation to the PCR time base. Next, the insertion of ECMs will be explained.

In the case of an encrypted trick-play stream, ECMs have to be present in the stream to enable the decryption by the receiver (STB). Concerning the question which ECMs have to be inserted and where in the stream, one may say that in the preferred case where the recorded stream always contains the necessary plaintext packets, the data block read from the storage device will only contain I-frame data. The ECM insertion method should however also allow for the more general case with larger block sizes. The generated trick-play stream only contains the data of the first I-frame in a data block. So ECMs cannot be inserted exactly at the indicated positions in the trick-play stream itself. Still the ECMs have to be inserted more or less in the described position. This means that the first I-frame of the data block may be used to construct a trick-play GOP. Most ECMs will have to be sent somewhere these I-frames which is in fact between two trick-play GOPs. All trick-play GOPs have an equal length in time and the packets of a GOP are spread out over this time to smooth the bit rate. Inserting ECMs between these GOPs would unnecessarily increase the local bit rate. It is better to embed the ECM in a trick-play GOP. So one has to decide in which GOP the ECM is added. There are particularly two options:

1. Add ECM to the end of the previous trick-play GOP;

2. Add ECM to the start of the next trick-play GOP.

In a second option, the ECM is not really the first packet of the next GOP because these are the inserted PCRs that should remain in their position for timing reasons. So the ECM is a second packet in this case. Although in practice the difference between the two options is negliable, the optimal position is given by option 1 because it increases the available time for the description of the ECM. This situation is depicted in Fig. 57.

Fig. 57 illustrates inserting ECMs between trick-play GOPs. A nominal GOP time Tl R is denoted with reference numeral 5700. An SCB toggle is denoted with reference numeral 5701. Empty P's are denoted with reference numeral 5702. Furthermore, a PCR packet has reference numeral 5703, an ECM packet reference numeral 5704 and I-frames data is denoted with the reference numeral 5705.
With forward trick-play it can also occur sometimes that the SCB toggle is not located between the I-frames but somewhere within the selected I-frame. An ECM has to be sent when the SCB toggle is crossed. This means that in this case the ECM should be inserted at the correct location within the I-frame. Again there are two options to do this: 1. Insert ECM before the I-frame packet with the SCB toggle;

2. Insert ECM after the I-frame packet with the SCB toggle.

The packet with the SCB toggle is the encrypted video packet with an SCB value other than the preceding encrypted video packet. In reality it does not really matter whether option 1 or 2 is used, but in theory the better position is before the packet with the SCB toggle. This is because on the one hand the CW of the previous period is no longer needed from this moment on and on the other hand the time to decrypt the ECM is increased. Option 1 is depicted in Fig. 58.

In all cases the PID number and table ID of the inserted ECMs are preferable the original ones to enable a smooth switching between normal play and trick-play in both directions. The continuity counter in the ECM packet header has to be corrected though.

In the following, the block size will be described and particularly channels behave.

If it is unknown where I-frames start in the stream, it is not possible to guarantee that an I-frame will be present within a certain amount of data. The reason is that a channel can theoretically have an infinite GOP size. But because zapping would be really problematic in this way, it luckily is not really problematic. But the fact that the frequency at which I-frames occur in the stream is unknown may be a problem. It is not possible to be 100% sure that an I-frame is found, whatever amount of data is read, but it is possible to measure the GOP sizes that channels use to get a feeling for what is normal practice.

Particularly in Europe, a GOP size of 12 frames is widely used. A 16 frames GOP and a variable GOP size have also been seen but never a GOP size larger than 16. This value translates to a worst case zapping delay of somewhat more than a half second. It is not very likely that much larger GOP sizes will be chosen, mainly because of the increase in zapping time. Just knowing the GOP size as a frame count is not of much help though, because what is required to know is the amount of bytes needed to read to make sure that a complete I-frame is included in that block of data. But the size of frames varies a lot, I-
frames are usually much bigger than P-frames or B-frames, and successive I-frames are almost never the same size.

Because it is unknown where an I-frame starts on the stream when the data is encrypted, it may be advantageous to read more than just one I-frame and even more than one GOP. In an unlucky situation one may start just one byte beyond the start of an I-frame, so the first complete I-frame encountered in the stream is the I-frame from the next GOP. Therefore, one should read at least a GOP plus the next I-frame.

The specification of the VBV buffer size of 1.835 Mbits (table 8-14 in ISO/IEC 13818-2) maximizes the size of a single frame. However, the maximum size of a GOP in bytes is not equal to the product of the number of frames and the VBV buffer size. Although MPEG streams are not bound by an upper limit, MP@ML is bound by a maximum bit rate of 15 Mbps (table 8-13 in ISO/IEC 13818-2) that limits the GOP size in bytes. These numbers represent MPEG ES data, so translating it to a transport stream, where extra overhead for packet headers (both for PES and TS) is present is not straight forward. For the transport stream, an additional buffer for the video stream is specified, which has a leakage rate of 1.2 x 15 = 18 Mbps. Besides video, the recorded transport stream also contains audio and system data packets. Audio is bound at 2 Mbps and system data at 1 Mbps, leading to a total maximum rate of 21 Mbps, which is 1.4 x 15 Mbps. The size of an I-frame on transport stream level is limited to 1.4 x 1.835 = 2.569 Mbits or 321 kB or round 1700 packets. A practical upper limit for a block that will at least contain one complete I-frame is given by:

B = 1.4 x {G / R x 15 + 1.835} (23)

wherein B is the block size in Mbits, G is the GOP size in frames and R is the frame rate in Hz.

These values are so high that it would still be necessary to provide a very high bandwidth all throughout the system. In many cases, broadcasters are interested in placing as much channels as possible on each transponder frequency. So the bit rate used is always much lower than the allowed maximum bit rate. The bit rate used by different providers/channels can be measured. To do this, one may make a recording for each channel on which one wants to perform these measurements. For the measurements, it is possible to
make the recording in plaintext (decrypting before it is necessary), so it is easy to locate the positions of all the frames. With these measurements, it is possible to determine the bit cost of every GOP plus the next I-frame in the recorded stream.

A list of abbreviations used in the specification is provided in Table 1.

AFLD Adaptation Field Control

BAT Bouquet Association Table

CA Conditional Access

CAT Conditional Access Table

CC Continuity Counter

CW Control Word

CPI Characteristic Point Information

DIT Discontinuity Information Table

DTS Decoding Time Stamp

DVB Digital Video Broadcast

ECM Entitlement Control Messages

EMM Entitlement Management Messages

GK Group Key

GKM Group Key Message

GOP Group Of Pictures

HDD Hard Disk Drive

KMM Key Management Message

MPEG Motion Pictures Experts Group

NIT Network Information Table

PAT Program Association Table

PCR Program Clock Reference

PES Packetized Elementary Stream

PID Packet Identifier

PLUSI Payload Unit Start Indicator

PMT Program Map Table

PTS Presentation Time Stamp

SIT Selection Information Table

SCB Scrambling Control Bits

STB Set-top-box
SYNC Synchronization Unit

TEI Transport Error Indicator

TPI Transport Priority Unit

TS Transport Stream

UK User Key

Table 1 Abbreviations of terms related to trick-play

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be capable of designing many alternative embodiments without departing from the scope of the invention as defined by the appended claims. Furthermore, any of the embodiments described comprise implicit features, such as, an internal current supply, for example, a battery or an accumulator. In the claims, any reference signs placed in parentheses shall not be construed as limiting the claims. The word "comprising" and "comprises", and the like, does not exclude the presence of elements or steps other than those listed in any claim or the specification as a whole. The singular reference of an element does not exclude the plural reference of such elements and vice- versa. In a device claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. The terms "data" and "content" have been used interchangeably through the text, but are to be understood as equivalents.

Claims

CLAIMS:

1. A device (1800) for processing an input data stream comprising a sequence of input frames, wherein the device comprises a processing unit (1802) for generating an output data stream as a trick-play stream comprising a sequence of output frames based on the input data stream and based on a predetermined replication rate; and a timing unit (1803) for assigning timing information to the output frames, said timing information pointing from an output frame to be reproduced for the first time to a subsequent one of the output frames which is to be reproduced for the first time.

2. The device (1800) according to claim 1, wherein the timing unit (1803) is adapted for assigning timing information to the output frames which is independent of timing information of the input frames.

3. The device (1800) according to claim 1, wherein the timing unit (1803) is adapted for assigning Decoding Time Stamps as the timing information to the output frames.

4. The device (1800) according to claim 1, wherein the timing unit (1803) is adapted for inserting a timing packet in the sequence of the output frames at a position before an output frame to be reproduced for the first time.

5. The device (1800) according to claim 4, wherein the timing packet is a Program Clock Reference.

6. The device (1800) according to claim 1, wherein the processing unit (1802) is adapted such that the sequence of output frames is formed based on the sequence of input frames being reproduced a number of times and/or being filled with empty frames in accordance with the predetermined replication rate.

7. The device (1800) according to claim 6, wherein the processing unit (1802) is adapted such that a bi-directional predictive frame is reproduced a number of times in accordance with the predetermined replication rate.

8. The device (1800) according to claim 6, wherein the processing unit (1802) is adapted such that anchor frames are repeated by using empty frames in accordance with the predetermined replication rate so as to smooth a bitrate of the output data stream.

9. The device (1800) according to claim 1, wherein the processing unit (1802) is adapted such that the sequence of output frames is formed based on the sequence of input frames being filled with empty frames in accordance with the predetermined replication rate only in case that the predetermined replication rate does not exceed a predetermined threshold value.

10. The device (1800) according to claim 9, wherein the processing unit (1802) is adapted such that in case that the predetermined replication rate exceeds the predetermined threshold value, further empty frames are added but not used for smoothing a bitrate of the output data stream.

11. The device (1800) according to claim 9, wherein the predetermined threshold value is larger than four.

12. The device (1800) according to claim 1, wherein the processing unit (1802) is adapted such that bi-directional predictive frames having a size exceeding a predetermined threshold value are substituted by empty bi-directional predictive frames.

13. The device (1800) according to claim 1, wherein the processing unit (1802) is adapted for generating the trick-play stream as the output data stream in a trick-play reproduction mode of the group consisting of a slow- forward reproduction mode, a slow- reverse reproduction mode, a stand still reproduction mode, a step reproduction mode, and an instant replay reproduction mode.

14. The device (1800) according to claim 1, wherein the input frames and/or the output frames include at least one frame of the group consisting of an intra-coded frame, a forward predictive frame and a bi-directional predictive frame.

16. The device (1800) according to claim 1, adapted to process an input data stream of video data or audio data.

17. The device (1800) according to claim 1, adapted to process an input data stream of digital data.

18. The device (1800) according to claim 1, comprising a reproduction unit (1806) for reproducing the output data stream.

19. The device (1800) according to claim 1, adapted to process an MPEG2 input data stream or an MPEG4 input data stream.

20. The device (1800) according to claim 1, adapted to process an at least partially encrypted input data stream.

21. The device (1800) according to claim 1, realized as at least one of the group consisting of a digital video recording device; a network-enabled device; a conditional access system; a portable audio player; a portable video player; a mobile phone; a DVD player; a CD player; a hard disk based media player; an Internet radio device; a computer; a television; a public entertainment device; and an MP3 player.

22. A method of processing an input data stream comprising a sequence of input frames, the method comprising generating an output data stream as a trick-play stream comprising a sequence of output frames based on the input data stream and based on a predetermined replication rate; and assigning timing information to the output frames, said timing information pointing from an output frame to be reproduced for the first time to a subsequent one of the output frames which is to be reproduced for the first time.

23. A computer-readable medium, in which a computer program of processing an input data stream comprising a sequence of input frames is stored, which computer program, when being executed by a processor (1805), is adapted to control or carry out the following method: generating an output data stream as a trick-play stream comprising a sequence of output frames based on the input data stream and based on a predetermined replication rate; and assigning timing information to the output frames, said timing information pointing from an output frame to be reproduced for the first time to a subsequent one of the output frames which is to be reproduced for the first time.

24. A program element of processing an input data stream comprising a sequence of input frames, which program element, when being executed by a processor (1805), is adapted to control or carry out a method of: generating an output data stream as a trick-play stream comprising a sequence of output frames based on the input data stream and based on a predetermined replication rate; and assigning timing information to the output frames, said timing information pointing from an output frame to be reproduced for the first time to a subsequent one of the output frames which is to be reproduced for the first time.