Designing the Boot Sound for the Original Xbox

The following blog post, unless otherwise noted, was written by a member of Gamasutras community.
The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.

Old-skool Game Audio and the Original Xbox “Boot Sound”

Remember the original Xbox startup sound? https://www.youtube.com/watch?v=AY0rg5V2ymY . The 10th anniversary of the launch of the Xbox made me reminisce about some of the more fun things we did--including making the startup sound and some of the challenges we had..

Towards the end of the design of Xbox, we had a problem. When you power on a game console, you’re entertained by the startup sequence—that’s there for a couple reasons. One reason of course is to provide you with a visual and sonic ‘logo’ for the box. The other is somewhat more practical. When you turn on the Xbox, you want something to happen immediately; you want fun right away. But the system still has to ‘boot’ itself-- start spinning the hard disk and DVD drive, as well as go through all the things computer systems need to do when they start themselves from scratch. And the Xbox team spent tremendous effort to minimize that boot time. But as Star Trek's Scottie used to say “you canna change the laws of physics!” It just takes time to get things like the disks spinning to the point where you can actually use them. So, the boot up sequence, although fast, was still about 8 seconds. And the artists did some cool visuals to cover those 8 seconds.

This left sound. Normally doing the sound for existing visuals would be pretty straightforward: get the visuals, go into the studio and create a wave file to match the graphics and you’re done. So back to our problem: There wasn’t enough memory available to store an 8-second wave file. In fact, after the Xbox kernel and the opening animations sequences, there was only about 25kilobytes of room left in the 256kB ROM it needed to come from. Do the math, and 25kbytes isn’t even 1 second of 8-bit mono sound.

So, the question became how to create an 8-second boot sound using only 25kbytes of ROM memory. Fortunately, I’d spent most of my video game career prior to Microsoft doing just that—creating sound and music for extremely small spaces in arcade games, pinball machines, Genesis (Megadrive) and SNES games and various other gaming machines. Rather than recording a wave file, I needed to create a MIDI-file like sequence, which would synthesize the boot sound on the fly.

First I needed to write a tiny sequencer program (a normal MIDI player was far too big)—something which would allow me to define and play back something akin to MIDI data, to drive the synthesizer. The sequencer would read a MIDI/MOD-like sequence and use it to drive the powerful synthesis chip which was present in the Xbox. I wrote a quick and dirty version of a sequencer I’d written to do genesis and arcade games—they take lists of time stamped commands in a text file (aka “note lists”) and use them to control the synth chip.

The Xbox synthesis chip (called MCPX) was quite full featured. It provided 256 channels of sound with programmable filters on each voice as well as a 6-stage envelope—plenty enough to synthesize a good boot sound on the fly. By creating a long enough notelist, with multiple tracks, I could cover the entire 8 second boot time and match the visuals.

For raw materials, I used a couple of different techniques. I was able to synthesize on the fly a few very useful waveforms: white noise, sine and sawtooth waves. These are fast and easy to calculate and also provide great raw material from which to make sequences, particularly given the signal processing available on the Xbox. And they required virtually zero ROM memory! But that wasn’t quite enough. So for additional waves, I recorded 8-bit sound data, concentrating on the attacks. In total, 3 short digitized waveforms were used, a thunder sound, a cannon sound and glockenspeil, each recorded at a horrifyingly low sampling rates of between 6kHz and 10kHz. I also wrote a quick piece of code to reverse the data in the thunder sound, which provided me with a 4th digitized sound “reverse thunder” without needing any extra ROM memory which is mixed in as part of the lead-in to the big green flash about 6 seconds in.

One nice thing about combing synthesized waveforms together with digitized is that I could get the full fidelity of the synthesized waveforms (48kHz samping rate, 24-bit) combined with the punch of the digitized but far lower fidelity sounds.

Here’s one of the tracks from the boot sequence, the opening, low pitched “wwwwaaaaaaa,” which is a 256-sample looping sawtooth wave through a low-pass filter with the filter gradually opening up as the note is played. This track selects the patch (PatchSaw1), sets the volume. Then it sets the lowpass frequency parameters. The “note” command initiates the note. As the note plays, the ‘finc’ command is used to gradually open up from about 60Hz and open it up to about 3kH and then close the filter slowly again, resulting in the wwwaaaaaauummmm sound which starts the boot sound.

The direction we were given for the boot sequence was the notion of immense power striving to break free from the confines of the box—so the opening up of the filter seemed to fit that nicely.

The visuals also had some explosive elements- for those I relied on the digitized thunder and canon sounds, combined with filtered white noise—they provided the realism and boom of the thunder and cannon, but using the white noise let me extend the sound without requiring a long sample as well as add high frequencies not present in the digitized sound because of the low sampling rate.

The fast, tinkling hi-hat-like notes are actually very short filtered white noise notes, with a fast attack and decay. By narrowly filtering them and playing them at different pitches they take on an almost metallic charastic.

The sound of the energy blobs is the sawtooth again, filtered along with pitch modulation, to give it a 'throbbing' feel.

The brief melody at the end of the boot sound is done with a combination of digitized attack and synthesized sustain—the digitized glockenspiel attack is used for the attack while a simple sine wave was used to give some more duration to the decaying sound.

All in all, I used 9 concurrent tracks of sound. Put it all together, sequence it and you have the original Xbox boot sequence.

People sometimes wonder why the Xbox boot sound is only in stereo. The problem is that a stereo receiver will default to stereo, and then only switch to Dolby Digital if it detect a Dolby Digital signal coming in. Most receivers take a good 2 to 3 seconds to do that and they mute the sound during the switch. So having a 5.1 boot sound (which would have been very cool) would have meant that most people would have had the first couple of seconds of the boot sound cut off as their receiver detected and switched to the Dolby Digital signal. So we decided to go with stereo.

So that’s the origin of the original Xbox Boot Sound. Sometimes even in the days of ‘DVD quality,” live orchestral game soundtracks, real-time Dolby Digital and powerful synthesis chips, it’s necessary to go back to the old-school video game sound bag of tricks to get the job done...

Brian Schmidt worked for 10 years at Microsoft as the architect for the Xbox and Xbox 360 audio systems and created the Xbox startup sound. Currently he’s an independent consultant, audio game designer as well as founder and executive director of GameSoundCon, conference on video game music and sound design.