This is the first of three chapters dealing with specific media types. Video will be covered in Chapter 8, and several other kinds of media—including things you might not have thought of as media, such as text and time codes—will be covered in Chapter 9.

It's possible that you've never thought of QuickTime as being the engine for audio-only applications—the ubiquity of QuickTime's .mov file format probably makes it more readily recognized as a video standard. But QuickTime's support for audio has been critical to many applications. For example, the fact that QuickTime was already ported to Windows made bringing iTunes and its music store over to Windows a lot easier.

In fact, iTunes is probably responsible for getting QuickTime onto a lot more Windows machines than it would have reached otherwise. So, I'll begin with a few labs that are particularly applicable to the MP3s and AACs collected by iTunes users.

Reading Information from MP3 Files

If you've ever listened to an MP3 music file—and at this point, who hasn't—you've surely appreciated the fact that useful information like artist, song title, album title, etc., is stored inside the file. Not only does this make it convenient to organize your music, but also, when you move a song from one device to another, this metadata travels with it.

The most widely accepted standard for doing this is the ID3 standard, which puts this metadata into parts of the file that are not interpreted as containing audio data—MP3s arrange data in frames , and ID3 puts metadata between these frames. ID3 tags typically are found at the beginning of a file, which makes them stream-friendly, although some files tagged with earlier versions of the standard have the metadata at the end of the file.

When QuickTime imports an MP3 file, it reads ID3 tags and makes them available to your program through the movie's user data, allowing you to display the tags to the user, or use them in any other way you see fit.

How do I do that?

Once you open an MP3 as a movie, you need to get at the user data, which contains the imported ID3 tags. Fortunately, it's wrapped as an object called UserData :

UserData userData = movie.getUserData( );

The user data is something of a grab bag of data that you can read from and write to freely. Items are keyed by FOUR_CHAR_CODE s, and the contents aren't required to adhere to any particular standard or format (after all, you're free to write whatever you like in user data). For example, QuickTime Player writes a "WLOC" entry that stores the window location last used for the movie.

Apple has a standard set of keys that you can use to retrieve the data parsed from an MP3's ID3 tags. Because these are text values, you use UserData 's getTextAsString( ) method to pull them out. getTextAsString( ) takes three arguments: the type you're requesting; an index to indicate whether you want the first, second, etc., instance of that type; and a region tag that's irrelevant in the ID3 case.

Example 7-1 shows a basic exercise of this technique, getting the UserData object and asking for album, artist, creation date, and song title information.

Note

Run this example from the downloadable book code with ant run-ch07-id3tagreader.

When run, this dumps the found tags to standard out, as seen in the following console output:

cadamson% ant run-ch07-id3tagreader
Buildfile: build.xml
run-ch07-id3tagreader:
[java] Album: Arthur Or The Decline And Fall Of The British Empire
[java] Full Name: Victoria
[java] Artist: The Kinks

What just happened?

The application sets up some static values for keys it is interested in and maps them to human-readable names. For example, the FOUR_CHAR_CODE "@alb" is mapped to "Album."

The program prompts the user to select an MP3 file and imports it as a movie, from which it gets a UserData object. In dumpTagsFromUserData( ), it calls getTextAsString() to attempt to get a value for each known tag. If successful, it writes the key and value to the console. If a given tag is absent from the user data, QuickTime throws an exception, which this program quietly ignores.

QuickTime has an important and disappointing limitation: it does not import tags written in non-Western scripts. For example, here's the output when I run the application against an MP3 whose "artist" tag is in Japanese kana (characters):

Because the artist (, or "Yoko Kanno" in romaji) is written in non-Western characters, QuickTime doesn't attempt to import it, and thus there's no artist item to retrieve from the user data.

What about...

...other tags? A big list of metadata tags are defined in the native API's Movies.h file. Unfortunately, these aren't in the StdQTConstants classes, or anywhere else in QTJ, so you have to define your own constants for them. Table 7-1 is the list of supported values.

Also, instead of requesting specific keys from the user data, can I just tour what's in there? Yes, you can use UserData.getNextType() to discover the types of items in the user data. This method takes an int of the last discovered type (use 0 on the first call), and returns the next type after that one. When it returns 0, there are no more types to discover. Given a type, you can get its data with getTextAsString() , but because you can't know that a discovered piece of user data necessarily represents textual data, it might be safer to call getData( ) , which returns a QTHandle , from which you can get a byte array with getBytes( ) .

Note

This technique is a lot like the "Discovering All Installed Components" lab in Chapter 4.

Reading Information from iTunes AAC Files

If you read the last lab and thought about how ID3 metadata is imported into a QuickTime movie's UserData, you might well expect that the same thing would be true of AAC files created by iTunes: .m4a files for songs "ripped" by the user and .m4p files sold by the iTunes Music Store. In fact, because these files use an MPEG-4 file format that is itself based on QuickTime, you might think that using the same user data scheme would be a slam dunk.

But...you'd be wrong.

These AAC files do put the metadata in the user data, but they do so in a way that resists straightforward retrieval via QuickTime. Fortunately, it's not too hard to get the values out with some parsing.

Note

Buckle up, this one is rough.

How do I do that?

For once, theory needs to come before code—you need to see the format to understand how to parse it. Here's a /usr/bin/hexdump of an iTunes Music Store AAC file from my collection, Toto Dies.m4p:

Granted, this is not easy to read, but I'll bet you can pick out the artist (Nellie McKay) and the song title ("Toto Dies"), so you know this is the relevant section of the file. In fact, you also might notice the string "udta"...sounds a little like "user data," doesn't it?

At work here is the QuickTime file format and its concept of atoms , which are tree-structured pieces of data used to describe a movie, its contents, and its metadata. Without going too deeply into the details—there's a whole book on the format—each atom consists of 4 bytes of size, a 4-byte type, and then data. Atoms contain either data or other atoms, but not both. The 4 bytes before "udta", 0x000318da, indicate the size of all the user data. The first child is an atom called "meta". Because its size is 0x000318d2, just 8 less than the size of "udta", the "meta" atom is clearly the only child of "udta".

Unfortunately, because this is user data, the contents don't have to adhere to any published standard, and they don't. The first thing after "meta" should be the 4-byte size of its first child atom, but the value is 0x00000000—an illegal "no size" value—so, a normal QuickTime parser would ignore the contents of "meta".

The "album" and "created" data didn't appear in the earlier hexdump because in the file they occur after the cover art data, which is several kilobytes long.

What just happened?

The program gets the UserData, gets its "meta" atom as a byte array, and looks for the "ilst" pseudo-atom. If it finds one, it skips ahead 8 bytes (over "ilst" and its size) and goes into a loop of discovering and parsing potential pseudo-atoms.

To parse, you look at the first 4 bytes and consider whether it's a plausible size—in other words, whether it's big enough to contain data, but small enough to not run past the end of the byte array. If so, interpret the next 4 bytes as a FOUR_CHAR_CODE type and check against the list of known metadata types. If it matches one of the known types, you've got a valid piece of metadata, which this program simply writes to standard out.

What about...

...combining this with the MP3 approach of the previous lab so that there's just one codebase? A good strategy for that would be to get the UserData and look for a "meta" atom. If you get one, assume you have iTunes AAC and do the previous parsing. If not, assume you have an MP3, and start asking for the various metadata types with UserData.getTextAsString( ), as in the previous lab.

Providing Basic Audio Controls

Most audio applications provide some basic audio controls to allow the user to customize the sound output to suit his environment. The MovieController provides a volume control, but you can do better than that: you can control balance, bass, and treble with simple method calls.

How do I do that?

The AudioMediaHandler class provides the methods setBalance( ) and setSoundBassAndTreble( ), so it's just a matter of getting the handler object. The key is to remember that:

Movies have tracks.

Tracks have exactly one Media each.

Each Media has a MediaHandler.

Iterate over the movie's tracks to get each track's media and handler. To figure out whether a given track is audio, you can use a simple instanceof to see if the handler is an AudioMediaHandler.

setBalance( ) takes a float, which ranges from -1.0 (all the way to the left) to 1.0 (all the way to the right), with 0 representing equal balance.

setSoundBassAndTreble( ) is interesting because it's officially undocumented. As it turns out, you pass in ints for bass and treble, where 0 is normal, -256 is minimum bass or treble, and 256 is maximum.

Note

Well, the native version is undocumented. For once, the Javadocs have the useful info.

When run, the program asks the user to select a file to play, and then shows a GUI, as seen in Figure 7-1.

Figure 7-1. Balance, treble, and bass controls

What just happened?

The key to this example is the use of Swing JSlider s, which can be configured with appropriate bounds for the features they represent. For example, the bass and treble sliders run in a -256 to 256 range, with 0 as a default:

trebleSlider = new JSlider (-256, 256, 0);

The balance slider needs to pass a float between -1 and 1, but JSliders work with ints, so it uses a range of -1000 to 1000, which is scaled to an appropriate float before calling setBalance( ):

balanceSlider = new JSlider (-1000, 1000, 0);

All the sliders share a ChangeListener implementation that reads the new value from the affected JSlider and make a corresponding call to the AudioMediaHandler.

Providing a Level Meter

Many audio applications also provide a graphical " level meter," which is an on-screen display of the loudness or softness of certain frequencies within the audio. In QuickTime Player, this is shown as a set of bars on the right side of the control bar, as seen in Figure 7-2.

Figure 7-2. Audio level meter in QuickTime Player

The intensity of lower frequencies, like bass, is shown in the leftmost columns, while higher frequencies are to the right.

How do I do that?

AudioMediaHandler provides two key methods: setSoundEqualizerBands() to set up monitoring and getSoundLevelMeterLevels() to actually get the data. setSoundEqualizerBands( ) indicates which frequencies you want to monitor for your graphics display. These are passed in the form of a MediaEqSpectrumBands object, which is built up by constructing it with the number of bands you intend to monitor, then repeatedly calling setFrequency() to indicate which frequency a given band will monitor.

Note

Unfortunately, most of the level-metering methods are officially undocumented.

As the audio plays, you can repeatedly call getSoundLevelMeterLevels( ), which returns an array of ints representing the measured levels.

When run, this example provides the graphics-level display as shown in Figure 7-3.

Figure 7-3. Frequency bands displayed as a level meter

What just happened?

This example sets up levels that, according to a demo in the native API, correspond to the same frequency bands metered by QuickTime Player:

int[ ] EQ_LEVELS = {
200,
400,
800,
1600,
3200,
6400,
12800,
21000
};

When the user opens a movie, the program finds the AudioMediaHandler of the first audio track and calls setSoundEqualizerBands() with these bands. Then it creates an instance of the LevelMeter inner class, along with a Swing Timer to repaint the level meter every 50 milliseconds.

When the repaint calls the meter's paint() method, it divides its width by the number of bands to figure out how wide each bar should be. The height takes a little more work: the returned levels are in the range 0 to 255, so the program calculates a "level percent" float by dividing by 255, then multiplying this by the height of the component. With the height and width of each frequency band, the component can draw a set of boxes, up to that height, to represent the band's level.

What about...

...the values passed in for frequencies and the number that can be passed in? Unfortunately, with no documentation for this feature, there's only trial-and-error to fall back on. One thing I've found is that you can have only 10 bands—you can pass in as many frequencies as you want, and you'll get that many back in the int array returned by getSoundLevelMeterLevels( ), but only the first 10 will have nonzero values.

Building an Audio Track from Raw Samples

As I've said many times before: movies have tracks, tracks have media, media have samples. But what are these samples? In the case of sound, they indicate how much voltage should be applied to a speaker at an instant of time. By itself, a sample is meaningless, but as a speaker is repeatedly excited and relaxed, it creates waves of sound that move through the air and can be picked up by the ear.

So, why would you want to do this? One plausible scenario is that you have code that generates this uncompressed pulse code modulation (PCM) data, like a decoder for some format that QuickTime doesn't support. By writing the raw samples to an empty movie, you can expose it to QuickTime and then play it, export it to QT-supported formats, and use other QuickTime-related functions.

How do I do that?

SoundMedia inherits an addSample( ) method from the Media class. This can be used to pack samples into a Media, which in turn can be added to a Track, which then can be added to a Movie.

But what values do you provide to create an audible sound? The example shown in Example 7-5 creates a square wave at a constant frequency. A square wave is one in which the voltage is either fully on or completely off. To create a 1000-hertz (Hz) tone, you write samples to alternate between full voltage and zero voltage, 1,000 times per second. Figure 7-4 shows a graph of sample values for the square wave.

When run, this creates a five-second, audio-only movie file called buildaudio.mov. Open it in QuickTime Player or an equivalent (like the level meter player from the previous lab) to listen to the file.

Note

Square waves are not easy on the ears. Turn down your speakers or headphones before you play this file.

What just happened?

Two constants at the beginning define important values. SAMPLING is the number of samples to be played every second. This example uses 44,100, which is the same as on a compact disc.

Tip

An important consideration for choosing a sampling frequency is the Nyquist-Shannon Sampling Theorem , which states that you need to sample at a rate double the highest frequency you want to capture. So, a sampling rate of 44,100 will properly reproduce frequencies less than 22,050 Hz. Given that human hearing typically ranges from 20 to 20,000 Hz, this effectively covers any humanly audible sound.

The FREQUENCY constant is the frequency of the sound wave to be produced. This example uses 262, which is approximately middle C on a piano.

Note

To be more precise, middle C is approximately 261.625565 Hz.

To start writing samples, you need a SoundMedia object and a place to put your data. The example does this by:

Creating a new Movie with createMovieFile( ) . Using this approach—instead of the no-arg Movie constructor—has the benefit of indicating where the samples are to be stored.

Adding a new track to the movie, with no size, and a volume of 1 (full volume).

Creating a new SoundMedia object. This constructor takes the track the media is associated with and a time scale for the media. In this case, 44,100 is a good choice because then each sample will correspond to one unit of the media's time scale. You could use higher values, but not lower ones, because a sample can't be expressed as less than one unit of the time scale.

Calling beginEdits() on the media to indicate that the program will be making changes to the media.

Most of the rest of the code in the example has to do with setting up the call to addSample() , which is somewhat tricky. The method takes seven arguments:

A QTHandleRef that points to the data to be added

An offset into the handle

The size of the data to be inserted

The durationPerSample—how much time the sample represents, in the media's time scale

The first thing to do is to create a SampleDescription that can be reused on every call to addSample( ). To do this, create a SoundDescription object. The constructor takes a "format" FOUR_CHAR_CODE, which for uncompressed data is "NONE".

Next, you customize the SampleDescription object with some setter methods to indicate the number of channels, the size of each sample in bits, and the sampling frequency. For this example, I used one channel and 16 bits per sample. This means that when the byte array with the data is parsed, QuickTime will take the data 2 bytes at a time and assume it to be a 16-bit value. If there were two channels, there would be 4 bytes per sample: two 2-byte samples, one for each speaker.

You might expect that you'd then simply loop through, adding one sample at a time to the Media and creating one second of audio every 44,100 times through the loop. Although this is legal, the resulting file won't actually play. The problem is that QuickTime wants you to put audio data in larger and more manageable chunks. To quote from the native AddMediaSample docs:

You should set the value of this parameter so that the resulting sample size represents a reasonable compromise between total data retrieval time and the overhead associated with input and output. [ . . . ] For a sound media, choose a number of samples that corresponds to between 0.5 and 1.0 seconds of sound. In general, you should not create groups of sound samples that are less than 2 KB in size or greater than 15 KB.

So, in this example, I've created a byte array to represent one second of samples, which is filled in a method called buildOneSecondSample( ). This method figures out where the waveform is at each sample time and writes either 0x7fff or 0x0000 to each 2-byte pair. Because the "NONE" format assumes signed shorts, 0x7fff is the maximum, not 0xffff.

With the byte array filled, you can wrap it with a QTHandle, and you're ready to call addSample( ) . The call looks like this:

Once you're done adding samples, it's cleanup time. You use endEdits() to tell the Media you're done editing, then actually put the media into the track with Track.insertMedia() , which tells the track what parts of the media object to use and where it goes relative to the track's time scale. Finally, the movie is written to disk with the curiously named Movie.addResource( ) .

What about...

...some other kind of wave because hearing that square wave is really unpleasant? A sine wave offers a nicer alternative, because it is much more like a naturally occurring sound. Figure 7-5 shows what its waveform looks like.

Figure 7-5. Sine wave

The following alternate implementation of buildOneSecondSample( ) produces a sine wave—I didn't want to put it in the preceding example, which is already complicated enough without having to use trigonometry, like this does:

This implementation calculates the width of a wavelength in samples, then divides that into equal segments of a 2 radius for its calls to Math.sin( ) . The returned values are then translated so that instead of running from -1.0 to 1.0, they run from 0 to 0x7fff.

It's also worth noting that the middle C sine wave is pretty hard to hear over basic computer speakers. You might have better results with a frequency of 440, which is the A above middle C.