It probably seems like half of this book has already been about video—I've assumed you had video media for the chapters on playback, editing, and components (Chapter 2 and Chapter 4), even though the material there would be perfectly well suited for use on audio-only media like MP3 files. Well, this chapter is only about video, showing a handful of useful tricks for working with video.

Because video is simply a progression of images, alternated quickly enough to suggest movement, you probably won't be too surprised to know that the material covered in the QuickDraw graphics chapter (Chapter 5) pays off in this chapter. QuickDraw and QD-like APIs are the means by which you create and/or manipulate video media. If you skipped that chapter and have problems herein with QDGraphics (a.k.a. GWorlds), Matrixes, GraphicsImporters, or compression, you might need to check back there. But I'll try to keep things fairly self-explanatory.

Combining Video Tracks

It's not hard to understand how two audio tracks can coexist in a movie—the sounds are mixed and played together. But the idea of combining video tracks is less intuitive.

By default, if you have two video tracks of the same size in a movie, one will totally overlap the other. But you can change the default behavior by specifying 2D transformations with Matrix objects, and the Z-axis ordering by setting "layering" behavior.

One way to play with Matrix-based spatial arrangement is to set up a picture-in-picture movie. In such a movie, the foreground video is scaled and moved into a corner relative to the background video.

How do I do that?

To do a picture-in-picture effect, you must have a movie with two video tracks and you must do three things to the foreground video track:

Scale it to a size smaller than the background track.

Optionally move it to a location other than (0,0).

Set layering to ensure it appears above the background track.

Fortunately, a few methods in the Track class provide all of this. The application in Example 8-1 brings up a window with a picture-in-picture effect achieved with matrix transformations and layering.

Note

Run this example from the downloaded book code with ant run-ch08-matrixvideotracks.

When this is run, the user is shown two consecutive movie-opening dialogs, for the background and foreground movies, respectively. Assuming that both have video tracks, the result looks like Figure 8-1.

Note

This example looks for a track with video media, so don't use audio-only files, or MPEG-1, which has a special "MPEG media" track instead of video.

What just happened?

After the two movies are loaded, this demo creates a new empty target movie and, through a convenience method called addVideoTrack( ) , finds the video tracks of the selected movies, creates new video tracks in the target movie, and inserts the VideoMedia from the source movies. This produces a movie with two concurrent video tracks.

To scale and move the foreground track, you use a Matrix transformation. In this case, the example takes the background movie's video track size and finds its center point, then sets up a destination rectangle with that point as its upper-left corner, with width and height equal to half the foreground's width and height, respectively. Finally, it tells the foreground track to use this matrix by calling Track.setMatrix() :

Next, to ensure that the foreground track draws above the background—if it doesn't, all this matrix work will be wasted—the demo calls Track.setLayer(-1) . The layers are numbered from -32,767 to 32,767, with lower-numbered layers appearing above higher-numbered layers. The background track keeps its default layer, 0, so setting the foreground to -1 forces it to be on top.

What about...

...the point of this? Am I really ever going to want to overlay video tracks? It's more common than you might think. Consider Apple's iChat AV application—it uses a very similar picture-in-picture effect, so you can see yourself when you videoconference with a friend.

But there's one other interesting thing that iChat AV does: it shows the video of you as a mirror image . This, presumably, is more natural for users—if you raise your right hand, it somehow makes more sense to see your hand go up on the right side of the preview window, even if that's not what the camera is really seeing. Fortunately, a mirror image is really simple to do with a Matrix transformation.

In the preceding example, add the following two lines right after the Matrix is created:

The scale( ) call makes the matrix multiply all pixels by -1, effectively "flipping" them around the x-axis. The y-coordinates are unchanged, so the scaling factor there is 1. The last two arguments define the "anchor point." By using 0, this says "flip around the x-axis" (the y-coordinate is similar but irrelevant here). Given an image width of w, this scaling operation makes the pixels run from -w to 0. The translate( ) call moves the coordinates back into positive coordinate space. Figure 8-2 shows this transformation conceptually.

For this to work you also need to change the Matrix.rect() call to Matrix.map( ). rect( ) clears out any previous transformations, essentially defining a new matrix that represents only the translate-and-scale from one rectangle to another, while map( ) maintains the previous transformations and then applies the translate-and-scale.

Figure 8-3 shows the demo running with this mirror image added to the foreground transformation. For this figure, I've used the same video source for foreground and background, to make the mirror transformation more obvious.

This mirror effect is pretty handy, and you might use it all by itself for doing something like a capture preview. Because the Matrix can be used on movie tracks, GraphicsImporters, and various other parts of the QuickTime API, mastering Matrix transformations will get you ' pretty far.

Note

Did you notice the capture settings dialog in Chapter 6 showed a mirror image? You could use a Matrix to make the motion detector in that chapter render a mirror image, too.

Overlaying Video Tracks

When one video track is drawn on top of another, the top doesn't necessarily have to obscure the bottom. QuickTime gives you the option of specifying a GraphicsMode to combine pixels from multiple video layers to create interesting effects.

How do I do that?

You can create a GraphicsMode object to describe the means of combining overlapping colors. To try it out, take the previous lab's code and replace all the matrix stuff (after the foreTrack and backTrack are created, but before the MovieController is created) with the following:

When run, this sample program asks you to open two movies, then creates a new movie with video tracks from the source movies' media, and combines the pixels of the foreground movie with the background, so the foreground appears atop the background. The result is shown in Figure 8-4.

What just happened?

Setting a GraphicsMode instructs QuickTime to apply a specific behavior to combine overlapping pixels. The GraphicsMode has a "mode" int, which indicates which kind of behavior to use, and a QDColor that is used by some behaviors to indicate a color to operate on. For example, you might use mode QDConstants.transparent and QDColor.green to make all green pixels transparent. The default mode is srcCopy, which simply copies one set of pixels on top of another.

Note

Chapter 5 showed how to set up GraphicsMode compositing of still images. Video works in pretty much the same way.

To apply this GraphicsMode to overlapping video tracks, you call setGraphicsMode( ) , a method not defined by Track but, rather, by the VideoMediaHandler. As a reminder, movies have tracks, tracks have media, and media have handlers. Actually, the setGraphicsMode() is defined by the VisualMediaHandler interface, making it available for all visual media (MPEGMedia, TextMedia, etc.).

The addMax behavior combines background and foreground pixels, using the maximum red, green, and blue values of each. This has the effect of producing something of a washed-out combination of the two video tracks, because bright colors in either source will be copied over to the screen.

The available QDConstant modes offer several dozen behaviors—check them out in the QuickTime documentation by searching Apple's site for "Graphics Transfer Modes"—though some of them aren't suitable for color images, and many of them produce garish results with real-world video. For example, Figure 8-5 shows the rather psychedelic effect of using the srcBic mode.

What about...

...practical uses for this? Granted, compositing two full-frame natural images is atypical, but composited video is used all the time in TV production. Modern video often represents many layers of overlapping sources. Watch a football game and you might see a shot of the game, overlaid by a graphic of a player and his stats (and maybe a video "head shot" of him), overlaid with a scoreboard for the corner, overlaid with a moving "bug" of the network's logo in another corner. Each source contains some amount of "useful" video, and the rest is a solid color (often black for synthetic video, green or blue for real-world video). The solid color becomes transparent, so only the useful data is copied over to the target. In terms of GraphicsModes, this would be the transparent mode, with the specified color as the operand.

Tip

If you're serious about shooting bluescreen video, there are sites on the Internet that list the supplies you'll need. For example, http://www.studiodepot.com/ sells chroma-key-friendly fabric and tape for making bluescreen and greenscreen backdrops.

Building a Video Track from Raw Samples

You can create a video track "from scratch" by adding video samples, one by one, to the video media. This is perhaps the ultimate in low-level access to QuickTime video, because it makes you responsible for every pixel in every frame. One way to demonstrate this is by making a movie from a still image and using slightly different parts of it in each frame to suggest a camera moving across the image.

Tip

This concept is called the "Ken Burns Effect" in Apple's iMovie, after the documentary filmmaker who used the technique extensively in documentaries like The Civil War, for which no film or video sources were available.

How do I do that?

To build a movie from samples taken from an image, use the following approach:

Import an image.

Pick source and destination rectangles.

Calculate a series of rectangles between the source and destination. These represent which part of the source image will be used for each frame.

Create an empty movie, new video track, and new video media.

Use a Matrix to convert each source rectangle to the size of the movie.

Compress each frame and add it to the VideoMedia.

You might already know how to do some of this; the new part is how to compress frames into a movie. Chapter 5 made use of the QTImage.compress( ) method to compress QDGraphics (a.k.a. GWorlds) into EncodedImages, but video is a little different in that you use a CSequence, short for compression sequence. The difference is that in many video compression formats, you may need information from previous or subsequent frames to render a specific frame. In other words, some frames are encoded as just the data that has changed from a previous frame. So, you can't compress a single image in isolation; you must work with a sequence of images. This is called temporal compression because it is time-based.

The VideoSampleBuilder demo, shown in Example 8-2, creates a movie called videotrack.mov from a source graphic.

Tip

This is the most involved example in the book and uses concepts from several chapters, such as enabling editing and adding a new Track (Chapter 3), using a GraphicsImporter (Chapter 4), setting up an off-screen GWorld (Chapter 5), using Matrix-based image manipulation (Chapter 5 and this chapter), and adding raw samples to a Media (a sound equivalent was shown in Chapter 7). So, don't be intimidated if it seems a little complicated the first time you read it.

If no properties file is found, the demo queries the user for an image and randomly selects the start and end rectangles.

As each frame is compressed, the program prints an update to the console indicating the frame count, the source frame, and how "similar" the CSequence decided the frame was to its predecessor. The console log looks something like this:

When finished, you can play the videotrack.mov file in QuickTime Player, the player and editor examples in Chapters Chapter 2 and Chapter 3, or equivalent. Figure 8-6 shows two screenshots from different times in the movie to indicate the zoom effect that is created by using different parts of the picture.

Figure 8-6. Movie built via addSample( ) from portions of a static image

What just happened?

One of the first things to notice is the constant CODEC_TYPE , which is used later on in setting up the CSequence. This indicates which of the supported QuickTime video codecs is to be used for the video track. The codec is indicated by a FOUR_CHAR_CODEint, in this case "SVQ3", which identifies the Sorenson Video 3 codec. Most of the usable codecs exist as constants in the StdQTConstants classes—for example, I could have put this as StdQTConstants6.kSorenson3CodecType. The advantage of using the FOUR_CHAR_CODE directly is that you can use any supported codec, even those that don't have constants defined in QTJ yet. In fact, Sorenson Video 3 and MPEG-4 video (StdQTConstants6.kMPEG4VisualCodecType) didn't have constants in QTJ until I filed a bug report for them, and the Pixlet codec (whose 4CC is "pxlt") still doesn't, as of this writing.

Tip

"So, what's the best codec?" I hear someone asking. Don't go there. There's no such thing as a best codec. There are so many different codecs, because they're engineered to serve different purposes. For example, some codecs are difficult to compress (in terms of CPU power, encoder expertise, etc.) but easy to decompress, making them well suited for mass-distribution media like DVDs where the encoding is done only once. On the other hand, a codec used for video conferencing must be light enough to do on the fly, with minimal configuration. Others are tuned to specific bitrates and uses, losing their advantages outside their preferred realm. The new MPEG-4 codec, H.264 (AVC), claims to be able to scale from cell phone to HDTV bandwidths...we'll see if it delivers on this.

To build the image movie, create an empty movie file, add a track, and create a VideoMedia for the track. You do this by creating a Movie with the constructor that takes a file reference (so that QuickTime knows where to put the samples you'll be adding), calling Movie.addTrack( ) to create the track, and constructing a VideoMedia. Then call Media.beginEdits( ) to signal that you're going to be altering the VideoMedia.

Note

These steps are similar to those in Chapter 7 s square-wave sample-building example.

The next step is to get the image with a GraphicsImporter. This will be the source of every frame of the movie. However, it's not the right size. So create an off-screen QDGraphics (a.k.a. GWorld, the term used in the native API and all its getters and setters in QTJ) with the desired movie dimensions. By calling GraphicsImporter.setGWorld() , you tell the importer that subsequent calls to draw() will draw pixels from the imported graphic into the off-screen GWorld, which will be the source of the compressed frames later on.

Next, after calculating how far the source rectangle will move each frame, you set up the compression sequence. To do this, you need a buffer big enough to hold compressed images, which in turn requires a call to figure out how big that buffer needs to be. QTImage.getMaxCompression() size provides this size. You need to pass in the following data (in the order shown):

The QDGraphics/GWorld to compress from.

A QDRect indicating what part of the QDGraphics will be used.

The color depth of the pixels (i.e., how many bits are in each pixel).

A constant to indicate the compressed image quality level.

The codec's FOUR_CHAR_CODE .

A constant to indicate which codec component to pick if several can handle the codec. You can pass a specific component, or the behavior constants anyCodec, bestSpeedCodec, bestFidelityCodec, and bestCompressionCodec.

Given this, you can allocate memory for the image by constructing a new QTHandle, and then wrap it with a RawEncodedImage object. This is where the compressed frames will go.

Now you have enough information to create the CSequence . Its constructor takes a whopping 10 arguments:

Spatial quality (in other words, the quality of images after 2D compression, using one of the constants codecMinQuality, codecLowQuality, codecNormalQuality, codecHighQuality, codecMaxQuality, or codecLosslessQuality)

Temporal quality (this uses the same constants as for spatial quality, but refers to quality maintained or lost when using data from adjacent frames; you also can set this to 0 to not use temporal compression)

Key frame rate (the maximum number of frames allowed between "key frames" [those that have all image data for a frame and don't depend on other frames], or 0 to not use key frames)

A custom color lookup table, or null to use the table from the source image

Behavior flags (these can include the codecFlagWasCompressed flag, which indicates the source image was previously compressed and asks the codec to compensate, and codecFlagUpdatePrevious and codecFlagUpdatePreviousComp, both of which hold on to previously compressed frames for temporal-compression codecs, the latter of which may produce better results but consumes more CPU power)

Now you've got everything you need to build the frames: a GWorld for source images, a RawEncodedImage to compress into, a CSequence to compress frames, and a VideoMedia to put them into.

So, start looping. Each time through the loop, you draw a different part of the source image into the off-screen GWorld. This is done by resetting the GraphicImporter's Matrix , using rect( ) to scale-and-translate from a source rectangle to a new rectangle at (0,0) and with the dimensions of the off-screen GWorld. Use GraphicsImporter.draw( ) to draw from the source image into the GWorld.

With the frame's pixels in the GWorld, call CSequence.compressFrame() to compress the pixels into the RawEncodedImage . This returns a CompressedFrameInfo object that wraps the size of the compressed image and a "similarity" value that represents the similarity or difference between the current frame and the previous frame. The similarity is used to determine if this sample is a "key frame" (also called a "sync sample" in Apple's terminology), which in this context means an image so different from its predecessors that the compressor should encode all the data for this image in this frame instead of depending on any previous frames.

Finally, you call addSample() to add the frame to the VideoMedia. This call, inherited from Media, takes a pointer to the sample data, an offset into the data, the data size, the time represented by the sample (in the media's time scale), a description of the data (here an ImageDescription retrieved from the CSequence), the number of samples being added with the call, and a flag that indicates whether this sample is a key frame (if it's not, pass StdQTConstants.mediaSampleNotSync).

Note

Notice addSample( ) has the same signature for any kind of media. That's why it needs a parameter like the ImageDescription to explain what's in the essentially untyped QTHandle.

When you're done adding frames, call Media.endEdits( ) , then insert the media into the track with Track.insertMedia() . Finally, save the movie with the Movie.addResource() call.

Note

Run this demo with ant run-ch08-videosamplebuilder.

When run, the demo looks for a file called videosamplebuilder.properties, in which you can define the source image and the start and end rectangles. The properties file should have entries like this:

If no properties file is found, the demo queries the user for an image and randomly selects the start and end rectangles.

As each frame is compressed, the program prints an update to the console indicating the frame count, the source frame, and how "similar" the CSequence decided the frame was to its predecessor. The console log looks something like this:

Note

Did you notice the capture settings dialog in Chapter 6 showed a mirror image? You could use a Matrix to make the motion detector in that chapter render a mirror image, too.

Overlaying Video Tracks

When one video track is drawn on top of another, the top doesn't necessarily have to obscure the bottom. QuickTime gives you the option of specifying a GraphicsMode to combine pixels from multiple video layers to create interesting effects.

How do I do that?

You can create a GraphicsMode object to describe the means of combining overlapping colors. To try it out, take the previous lab's code and replace all the matrix stuff (after the foreTrack and backTrack are created, but before the MovieController is created) with the following:

When run, this sample program asks you to open two movies, then creates a new movie with video tracks from the source movies' media, and combines the pixels of the foreground movie with the background, so the foreground appears atop the background. The result is shown in Figure 8-4.

What just happened?

Setting a GraphicsMode instructs QuickTime to apply a specific behavior to combine overlapping pixels. The GraphicsMode has a "mode" int, which indicates which kind of behavior to use, and a QDColor that is used by some behaviors to indicate a color to operate on. For example, you might use mode QDConstants.transparent and QDColor.green to make all green pixels transparent. The default mode is srcCopy, which simply copies one set of pixels on top of another.

Note

Chapter 5 showed how to set up GraphicsMode compositing of still images. Video works in pretty much the same way.

To apply this GraphicsMode to overlapping video tracks, you call setGraphicsMode( ), a method not defined by Track but, rather, by the VideoMediaHandler. As a reminder, movies have tracks, tracks have media, and media have handlers. Actually, the setGraphicsMode( ) is defined by the VisualMediaHandler interface, making it available for all visual media (MPEGMedia, TextMedia, etc.).

Note

Again, this wrap-up is the same as Chapter 7 s audio sample-building technique.

What about...

...appropriate codecs to use? I've pointed out Sorenson Video 3 and MPEG-4 Visual, because they have very nice compression ratios and still look pretty good with natural images. Other codecs of interest in a standard QuickTime installation are shown in Table 8-1.

Table 8-1. Some standard QuickTime codecs

Name

Constant

4CC

Description

Animation

kAnimationCodecType

"rle"

Good for long runs of solid colors, such as those found in simple synthetic 2D graphics.

Cinepak

kCinepakCodecType

"cvid"

This was the most popular codec of the early to mid-1990s, thanks to a good compression/quality tradeoff, wide support (even Sun's JMF handles it), and the fact that it could run on very modest CPUs. Today, there are better options.

H.263

kH263CodecType

"h263"

This standard originally was designed for videoconferencing, yet is surprisingly good in a wide range of bitrates.

Pixlet

N/A

"pxlt"

This wavelet-based codec, introduced in 2003, achieves high compression rates (20:1) without showing graphics artifacts like other codecs at similar compression levels. It requires powerful CPUs (PowerPC G4 or G5 at 1GHz and up) to decode.

As of this writing, Apple has demonstrated but not released an H.264 (aka AVC) codec for QuickTime. This is the newest and most powerful MPEG-4 codec, offering broadcast-quality video at 1.5 megabits per second (Mbps) and HDTV quality at 5-9Mbps, assuming your computer is powerful enough to decode it.

Also, other than making these "Ken Burns Effects," what am I going to do with writing video samples? This technique is the key to creating anything you want in a video track. Want to make a movie of your screen? Use the screen-grab lab from Chapter 5 and compress its GWorld into a video track. Have some code to decode a format that QuickTime doesn't understand? Now you can transcode to any QuickTime-supported format. You even can take 3D images from an OpenGL or JOGL application and make them into movies.

Note

Considering Chapter 5 showed how to grab the screen (even with the DVD Player running) into a GWorld, and considering you can make video tracks from any GWorld...uh-oh.