Designing modern USB audio systems

USB audio is a ubiquitous interface supported by all but the most ancient personal computer hardware and operating systems. With its robust connection and data rate, one might believe that delivering high quality audio over this interface is simple. However, today's successful USB-based audio products are the result of a lot of chip- and system-level attention to solving the thorny problem of clock recovery.

In essence, the problem is that the final output device that delivers audio to the speakers, headphones or line-out socket needs a 'master clock' to pace the audio conversion cleanly. This master clock needs to have two independent attributes:

1) it must be at exactly the correct multiple of the underlying audio sample rate (so that you never have to lose or duplicate an audio sample through timing failure) and

2) it must have low enough jitter (or, equivalently, phase noise) that the performance of the digital-to-analogue process isn't compromised.

The challenge lies in meeting both these requirements simultaneously.

Part of the difficulty comes from the fact that the receiving end for the digital data streaming over the USB cable doesn't know the exact sample rate. In fact, it can only infer the nominal sample rate.

What's more, the data that comes in over the USB 'pipe' isn't accompanied by any form of clock. This is in significant contrast to most other serial interfaces, which either send a clock, or structure the data so that such a clock can always be extracted from the link when it is running.

The only clocking information available on the USB interface is that every millisecond, a specific type of data packet announces the start of a frame, and this event is detected by the receiving hardware. This millisecond is derived in a known way from the system clock at the transmitting end – and so is the original audio sample rate (well, we'll briefly discuss an exception later).

A simple solution appears to be that one could put this 1 kHz pacing clock into a PLL-based multiplier and multiply it up by whatever factor is required to create the audio master clock and all the sub-clocks that depend on it. However, in a system handling CD-derived audio, the sample frequency is 44.1 kHz and a typical conventional audio DAC requires a master clock of 256x this, or 11.2896MHz.

The fact is that multiplying an input reference frequency by such a large factor in a single-stage PLL gives inherently rotten performance. It takes a hit from multiple constraints: loop bandwidth, reference spur rejection, and VCO jitter. What's more, the number we need to multiply the 1 kHz by in this case isn't an integer, making the task that much harder.

Cascading two fairly complex multiplier loops can be made to work from a phase noise and spurious rejection standpoint. This approach, however, tends to be power-hungry, to demand high levels of chip- and system-level analogue design smarts, and to be rather slow to respond to changes in demanded clock frequency.

The nominal sample rate being used on a USB audio link can change rapidly between tracks, and waiting for a large fraction of a second for things to stabilize can result in unreliable performance. Such links are primarily found in fixed-frequency studio-based digital audio links, where cost and size aren't important.

Over the years, various ways of creating the necessary audio master clock without suffering from PLL multiplier problems have been integrated into the dedicated chipsets that have gone into innumerable USB speakers, headsets, and external sound cards. These parts do what's needed of them, but don't spend extra silicon area or pin count on "what if" capability. This certainly keeps the cost down, and everybody is happy.

But, what do you do if your next-generation USB interfacing needs can't be met by any of these special-function chips? Mobile devices, such as media players and the latest tablets, are built on new platforms and are running new operating systems that are increasingly standardizing on USB as the wired link of choice for a wide range of accessories and enhancements. Some of these systems have combinations of requirements that aren't met by existing USB audio chipsets, causing a 'shock' to the component supply infrastructure. USB Audio is one of the capabilities increasingly being demanded in these small mobile devices.

Extracting audio in digital form from a mobile device has several benefits. The analogue audio interface no longer needs to be the limiting factor in the system's sound quality. This allows the manufacturer of an audio system or accessory connected to the player to aim for higher levels of measured and sonic performance through their own circuit design. Just as important is the digital audio link's improved resistance to TDMA interference coupled from the cellular modem of the mobile device into the analogue circuits of the audio replay path of the system.

There are plenty of microcontrollers on the market that integrate a USB device port but none have been designed to also have the necessary clock generation and recovery circuitry that is needed to deliver the high standard of audio reproduction that's now demanded. Sometimes this issue can be resolved by using external 'clock cleaning' chips or more sophisticated audio converters that integrate PLLs or sample rate converters, in an attempt to bridge the master clock accuracy/quality gap.

However, this moves a system back into the territory of high cost, high current consumption, high component count, or all of these. In addition, the technique of 'under-clocking' audio into a very long memory buffer is unusable in any system where video images (even PowerPoint slides) need to be time-aligned to the audio.