QTJ Audio

QuickTime Java can be the heart and soul of cross-platform video players
and editors. As you will see in this article, QTJ is also well-suited to be the engine of audio-only
applications, such as MP3 players. This article will develop an audio player, QTBebop, that displays song metadata, band levels, and current time, all of which help introduce the useful audio-related
tools provided by QuickTime to the Java developer. We'll also look at QuickTime's "callbacks," which are critical to all kinds of QT apps.

Too Good, Too Bad

We tend to think of audio and video applications as separate realms
— iTunes and WinAmp are one kind of application, while
iMovie and RealPlayer are another — but this separation exists
at the application layer, not the media framework layer. QuickTime
treats sound the same way it treats video or any other kind of dynamic media. In fact, there's nothing special
to opening and playing a sound file in QuickTime: you create an
OpenMovieFile to reference a file in a supported format
(MP3, AAC, WAV, etc.), hand that to Movie.fromFile(), and
call Movie.start(). The qtj61-player code
from the last article in this series will play audio files with no code changes. As far as QuickTime is concerned, imported audio files are just movies
with a single track of audio media.

Given that, it's quite easy to write a bare-bones, GUI-less audio player. In fact, it seems like it should consist of simply opening a file, making a Movie from that, and starting the
Movie. We could express this as the following code with only
a single caveat ... it doesn't work:

With Java 1.4 on Mac OS X, this returns immediately, without playing any music. On Windows
2000 and XP, it seems to play a few seconds and hang. Either way, chances are you're not happy with the result. What has happened?

The problem has to do with tasking, the arcane art
of giving a QuickTime movie enough cycles to actually play itself.
Typically, when we build a QTJ application with a GUI, we pick up
the tasking calls automatically, and thus don't have to worry
about, or even know about, the need to periodically call
Movie.task(). In this case, we haven't picked up any
automatic tasking calls, nor set up any of our own.

On Windows, the side effect is that after getting time to play a
little bit of its buffer, the Movie is never given another chance to
decode and play the audio. The Mac OS X case is a little stranger -- I
believe what we're seeing is that our MP3 is handed off to a native
library, not actually played by Java, so the main() method
returns and the JVM, seeing no non-daemon threads running, decides to
shut down.

In any case, we need to provide regular callbacks to the
task() method to give our movie a chance to decode and
play the data. Fortunately, QTJ provides a class called
TaskAllMovies, which runs a thread that provides
tasking callbacks to all active movies. So we can solve our problems
on Mac and Windows by adding the two highlighted lines below after the
Movie object is created:

Call Me, Call Me

At this point, when the selected audio is finished, the application will just sit around forever. We'd like it to do something a little more sensible, like terminating the app at the end of the song.
A kludgy approach would be to spawn a thread to periodically poll the movie and see if the current time has reached the end.

A better approach is to register to be notified when the movie is finished playing, using one of the callbacks that QuickTime provides. We can provide a small piece of code and tell QTJ to call this code at the end of the
movie.

In the included sample CloseOnCallbackAudio.java, we
simply extend the simple player to register a callback that will be
called when the movie (the audio) finishes playing. This registration
is done with the callMeWhen() method:

callback = new ShutdownCallBack (movie);
callback.callMeWhen();

The ShutdownCallBack is an inner class that extends
QuickTime's ExtremesCallBack. In its constructor, we
indicate what Movie we're interested in (specifically, the
TimeBase of the movie), and provide flags to indicate on
which events we want to be called:

The callMeWhen() call does the actual registration
of the callback. This may seem a lot like registering a listener in
various Java APIs, but there's a big difference:
callMeWhen() only registers code for one
callback, as opposed to listeners that get called over and over until
they're specifically removed. To get that kind of behavior in QTJ,
we'd need to issue a new callMeWhen each time the
callback is executed.

When the callback is called, its execute() method is
called. Here's our simple implementation:

Note: The cancelAndCleanup() call is a
required call to disassociate our callback from QuickTime when we're
done using it. As the name suggests, there are two parts: a
"cancel" that cancels any pending callbacks from occurring,
and a "cleanup" that cleans up system resources. A separate
cancel() method exists to just cancel pending
callbacks. This would be useful if we wanted to reschedule or change
the conditions under which the code is called back — we would then
reschedule with a new call to callMeWhen().

As you might have expected from the fact that we subclassed
ExtremesCallBack, there are different classes to extend
in order to achieve different behaviors. All are subclasses of
QTCallBack, but provide different constructors, since
some take more detailed parameters. Each takes a
TimeBase, typically fetched from a Movie,
and some take a flags argument whose possible values are
defined as trigger... constants in the
StdQTConstants class.

Class

Description

ExtremesCallBack

Called when the given TimeBase reaches its start or
stop point. You specify the behavior with the flags
triggerAtStart or triggerAtStop.

RateCallBack

Called when the TimeBase's rate changes. Using the
flag triggerRateChange provides a callback on any rate
change. Otherwise, you can use constants such as
triggerRateLT or triggerRateGT to get
called when the rate becomes less than, or greater than (respectively),
a supplied value. The full set of possible flags is listed in the documentation for the native CallMeWhen() function.

TimeCallBack

Called when a specific time value is reached. The flags
determine whether the callback occurs only when the time is moving
forward (triggerTimeFwd), backward
(triggerTimeBwd), or either
(triggerTimeEither).

TimeJumpCallBack

This callback occurs when the TimeBase's time value
changes by an amount other than would be expected from continuing to
play at its current rate. An obvious example would be when the user
clicks on the scrubber to "jump" to a different part of the
movie. Setting up this callback takes no behavior flags or
parameters.

While this is primarily an article about audio, it should be clear
that the callbacks have a wide range of uses in many QuickTime
applications. For example, a movie-playing GUI may want to enable or
disable some of its buttons and menu items, based on whether a movie is
currently playing.

What Planet Is This?

Now that we understand the basics of playing audio with QuickTime,
let's think about what else we'd need to provide a more complete
player application to end users.

One of the most obvious needs for a modern player is the ability to
present metadata about the current song: information such as the title,
the artist's name, what album it's from, etc. Practically any player
puts this information front and center in the GUI.

There are different schemes for different audio formats, since some
were designed to contain metadata and others weren't. MP3s, for
example, weren't designed with these needs in mind -- arguably the only
"metadata" per se is a copyright bit in the MPEG
frame header. However, the ID3
standard was cleverly developed as a means of attaching metadata to
MP3 files by defining a format that could be placed inside of an MP3 file but
outside of the individual media frames. Typically, this information
is simply placed at the beginning of an MP3 file, before its first
MPEG frame.

When we open an MP3 file in QuickTime, we're really
importing it, changing it into a QuickTime movie in memory. In
the course of doing this, the ID3 data is parsed and placed in the
movie's structure. If you recall from an earlier article on the QuickTime file format, QuickTime movies are represented
both in memory and on disk as a tree of "atoms." These
atoms can either contain data or other atoms, but not both. Typically,
the top level of a self-contained movie file will contain an
mdat atom to hold the media samples and a
moov atom, which defines the movie's structure. The
moov contains multiple trak structures, and
also a handy atom called udta, short for "user
data."

Once we know the atom type as a four-character code, getting the
atom's contents from the Movie is pretty straightforward.
We get a UserData object with
Movie.getUserData(), and then find our atom and retrieve
its contents with UserData.getTextAsString(). This
method takes three arguments: an int for the requested
atom type, an index that indicates our interest in the
index-th instance of the given type (note that
multiple atoms of the same type are legal, and also that this call is
one-based, not zero-based), and finally an "international region
tag" that takes one of the lang... constants from
quicktime.io.IOConstants (langUnspecified is
a useful wildcard value here).

This article's sample application, QTBebop, contains a
MetadataJTable with a setMovie method that
retrieves all of the defined metadata entries and turns them into the
model of a Swing JTable. It defines all of the constants
from Movies.h in an array called TAG_NAMES
and looks for matches in a UserData object like this:

After this section, the foundTags and
foundValues are converted into a two-dimensional array
and passed to a DefaultTableModel constructor.

Notice the squashed catch block. If a given type is
not found, QuickTime throws a QTException. For our current
purposes, we do nothing, because this exception simply means that one of the
many possible metadata atom types wasn't found in the user data. Returning an
error code may make sense in C, but in Java, using exceptions to
control program flow is considered something of a worst practice
because of the expense of building a stack trace that won't be used,
since the exception isn't really signaling an error state. From a purely
Java point of view, it would be nice if QTJ had something like a
UserData.hasType(int) method, so we could check for an
atom without the performance hit of building a throwaway stack-trace
if it isn't there.

That said, the MetadataJTable does its job, and works
fairly quickly. Figure 1 shows an example of the table, running
against an MP3 I ripped from my CD collection:

So how does iTunes support Unicode ID3 tags? Presumably, it has its
own ID3 library, which makes sense, considering that it needs to both read
and write ID3 data. So while QuickTime gives us easy ID3 tag parsing,
the lack of support for international character sets might make you consider
using another library for tag parsing, or rolling your own.

Bad Dog, No Biscuit

Since we know that QuickTime is used to play the AAC files
supported by iTunes 4 and
sold by the iTunes Music Store, we'd want and expect it to be able to handle metadata from those files, too.

In fact, since the M4A format for user-ripped AACs
and the M4P for Apple-DRM'ed songs are both in the
MPEG-4 file format, which itself was adapted from the QuickTime file
format, we might reasonably expect that their metadata tags are
already in the user-data atom, arranged in the same way that ID3 tags
are parsed.

Yeah, we might expect that ... but we'd be wrong.

The metadata is still in the movie's user data, but in a much
different and apparently undocumented format. So we have to examine
it by hand. (Sigh ... This kind of thing is why I keep HexEdit on my dock.)

These iTunes-ripped files have an atom in the user data called
meta. Its contents look like valid atoms, but aren't,
since the first four bytes, which should be the size of the first
child atom, are 0x00000000. Maybe that's meant to throw
off QuickTime file parsers. Interestingly, a set of valid atoms begins
after that, with four bytes of size and a four-byte type, just
as we'd expect.

meta has a child called ilst, which in
turn has children that use tag-name constants that we saw before. We
can't use getUserDataAsString to get values from these
atoms because we're now two levels below the user data, and besides, we're not through with
undocumented oddities yet. In this AAC world, these atoms seem not to
contain data, but rather a child atom called data, which
contains eight junk bytes (perhaps flags) and then, finally, the data
for the tag.

MetadataJTable also handles this kind of metadata.
Its strategy in setMovie(), which kicks off a parse, is
to look in the user data for the meta atom. If absent,
the movie is assumed to be an ID3-tagged MP3 and uses the
previously-described code. If it finds meta, then it
looks for an ilst atom. If that succeeds, it starts
looking for atoms named by TAG_NAMES. When one is found,
it jumps ahead 24 bytes (to skip the size, type, size, "data," and 8
junk bytes) and reads the value.

An example of parsing a song purchased from the iTunes Music Store
is shown in Figure 2.

Figure 2. Parsed M4P metadata

You Make Me Cool

Surprisingly, everything we've done so far is in the main QuickTime
API and is not strictly limited to audio content. Again, this speaks to QT's
worldview that anything it reads in is a movie. Still, there are cool
features that are specific to audio that we get at by retrieving a
"handler" for the low-level audio data.

One thing we might want to provide for an audio player is a visual
representation of the sound. On a home stereo or professional
recording or mixing equipment, this would be represented as level
meters that show the intensity of various frequency bands at an instant
in time. In iTunes, these values are used to distort the
visualizations and express the sound data in a visually pleasing
way.

We can get these levels from QuickTime by first getting an
AudioMediaHandler, which provides methods for
getting and setting balance and metering audio levels. It's
interesting to note that this class is an interface, implemented by
SoundMediaHandler, StreamMediaHandler, and
MPEGMediaHandler. The first is used for audio files and
sound tracks within normal QuickTime movies and the second for streaming
data, and the third represents the long-annoying fact that QuickTime
sees multiplexed MPEG-1 files not as separate audio and video tracks
but as a single opaque media type, which makes extracting sound and
video from MPEG-1 quite difficult. Fortunately, MPEG-4 files read in
as normal QuickTime movies, with separate video and audio tracks.

But how do we get an AudioMediaHandler? Again, it's
helpful to state things in terms of QuickTime's view of the world:

Notice that once again a QuickTime get-by-index call,
Movie.getTrack() in this case, uses indices that start at 1,
not 0.

Now that we have the AudioMediaHandler, we can set
balance, bass, and treble, and monitor sound levels. The first two are
trivial. For the third, we need to pass in a structure representing
which sets of frequencies, or "bands," we want to monitor.
We do this with a MediaEQSpectrumBands object, which
wraps the desired bands. For the QTBebop sample application, I've
used the bands shown by iTunes' graphic equalizer, represented by the
array EQ_LEVELS. So setting up for monitoring looks like this:

To get the levels, we call getSoundEqualizerBandLevels(), passing in the number of
bands that we set up in the first place (e.g., EQ_LEVELS.length). This returns an
int array, with values from 0 to 255. The QTBebop sample app uses a javax.swing.Timer to call this method every 100 milliseconds and redraw an offscreen java.awt.Graphics buffer with rectangles of a height proportional to the returned level values -- in other words, the rectangle gets 0 height if the level is zero, and is the height of the buffer when the level is 255.

The resulting application is shown in Figure 3.

Figure 3. The QTBebop application, with level meter

Author's Note: When run on Mac OS X with Java 1.4.1, the scrubber bar has repaint problems when a file is opened but is not yet playing. It does not have problems on OS X's Java 1.3.1 or on Windows, so this may be a version-specific bug, and has been filed appropriately. You can look in the sample code for the many workarounds
I tried to get the scrubber repainted correctly.

See You, Space Cowboy

Obviously, our sample application could benefit from a graphical
upgrade to make the bars more attractive -- perhaps spacing between
bars, LED-like blocks of color, use of red and yellow regions in the
upper part of each level, or a "sticky" line that represents
the peak of each band's frequency over the last second. Adding
balance and bass/treble controls would also be an easy
improvement.

A more significant feature to add would be support for audio
streams. As covered much earlier in this series, you can create a Movie from a URL by creating a DataRef from the URL string, which you
then pass to the static Movie.fromDataRef() method. In
terms of playable URLs, QuickTime can play RTSP-streamed content, of
course, and can handle Shoutcast-style HTTP-streamed
audio by changing the URL's http: protocol to the pseudo-protocol icy:, as detailed in the QuickTime 6 documentation.