Q&A with Andreas Koch

With all of the attention DSD has garnered of late, it seemed like a good time to go to one of SACD's creators for answers to some common questions. Andreas Koch was involved in developing the standards and equipment for SACD production, recording, and playback, his company Playback Designs was one of the first to offer DSD capability in a consumer product, and he was also an integral player in the development of the DSD over PCM (DoP) specification. Andreas was kind enough to answer some questions concerning DSD which will hopefully clarify some aspects of how DSD actually works.

Can you give us an overview of what DSD is and how it differs from PCM?
The term Direct Stream Digital (DSD) was coined by Sony and Philips when they jointly launched the SACD format. It is nothing other than processed Delta-Sigma modulation first developed by Philips in the 1970’s. Its first wide market entry was not until later in the 1980’s when it was used as an intermediate format inside A/D and D/A converter chips.

Fig. 1

Figure 1 shows how an analog source is converted to digital PCM through the A/D converter and then back again to analog via the D/A converter. The A/D internally contains 2 distinct processes:

Delta-Sigma modulation: the analog signal is converted directly to DSD with a very high sampling rate. Various algorithms are in use depending on the application and required fidelity. They can generate 1-bit DSD or multibit DSD oversampled at 64x or 128x compared to regular CD rate.

Decimation filter: the DSD signal from the previous step is downsampled and converted to PCM. Word length is increased (for instance 16 or 24 bits) and sample rate reduced to CD rate or a low multiple of it for high resolution PCM formats.

This technology was chosen because of its improved linearity and consistent quality behavior across physical components, as most of the heavy duty signal processing was shifted to the digital domain where it was not susceptible to variability of electronic components. It was quickly adopted in most converter systems and we can say that since about the late 1980’s we have been listening to some form of DSD without even knowing it.

"...we can say that since about the late 1980’s we have been listening to some form of DSD without even knowing it."

While DSD is used at a sample rate of 2.8224MHz (64 x 44.1kHz) with 1 bit per sample mostly for SACD production, recording equipment has also been used at double that rate at 5.6448MHz (128 x 44.1kHz). Often studios use this format to archive their library of analog recordings. Recording equipment for this double rate DSD is available relatively inexpensively at great quality so that consumers can use it to archive their beloved vinyl and tape recordings onto a digital format and then play that back directly via an audiophile grade D/A converter (such as any Playback Designs product) in the comfort of their own listening room.

The theoretical frequency bandwidth of a DSD signal with a sample rate of 2.8224MHz (64 x 44.1kHz) is 1.4112MHz. Compare this to a 96kHz PCM signal which has a theoretical bandwidth of 48kHz, or 192kHz PCM signal with a bandwidth of 96kHz. However, this wide bandwidth comes at a price: pure Delta-Sigma signals are quantized to 1 bit and, therefore, do not have a great dynamic range by themselves. That is why Delta-Sigma converters need to incorporate a process called “noise shaping” that increases the dynamic range in the usable audio range (0-20kHz) and then slowly decreases it over higher frequencies. It is this noise-shaped delta sigma signal that is then called DSD. Fig.2 below shows the typical dynamic range of a DSD signal sampled at 2.8224MHz and at 5.6448MHz. The slowly rising noise floor at higher frequencies also follows to some degree our hearing threshold for transient signals that have been proven to be audible up to 100kHz.

Of course, DSD at double the rate (5.6448MHz) has an extended audio range of 0-40kHz above where the noise floor then starts to rise gently.

Fig.2 also shows the theoretical dynamic ranges of high resolution PCM signals at various sample rates. Note the steep brickwalls that PCM signals typically have. It is those brickwalls that can generate very audible side-effects such as pre-ringing, if not processed with special algorithms (such as in all Playback Designs products). By design DSD signals do not generate these side effects.

Fig. 2

As we can see from this, DSD is characterized by the following:

great dynamic range in the audio band (0-20kHz)

slowly rising noise floor in higher frequencies (no brickwalls)

extended frequency range into MHz

This makes DSD a serious contender in the choices of high resolution audio formats. Sometimes DSD is criticized for its high frequency content (as shown in Fig.2). But all DACs limit the amount of noise that actually gets through to the analog side. This noise is generally not correlated to the music signal and therefore is easy for our psychoacoustic hearing system to filter out, but most listeners do not even hear it. Double rate DSD addresses this problem by pushing the ramp of the rising noise floor up on the frequency axis by about 20kHz thus reducing the absolute noise floor in the higher frequencies quite dramatically.

You have been involved with SACD from the beginning. Can you provide us with an overview of your history with SACD/DSD?
I have been involved in the creation of SACD from the beginning while working at Sony and was leading a team of engineers designing the world’s first multichannel DSD recorder and editor for professional recording (Sonoma workstation), world’s first multichannel DSD converters (ADC and DAC) and participated in various standardization committees world-wide for SACD. Later I founded AKDesign which designs and markets OEM products incorporating a number of proprietary DSD processing algorithms for converting PCM to DSD and DSD to PCM, and other technologies for D/A conversion and clock jitter control in DACs. In 2008 I co-founded Playback Designs to bring to market my exceptional experience and know-how in DSD in the form of D/A converters and CD/SACD players.

Some manufacturers, including Gordon Rankin of Wavelength Audio, have pointed out that there are no current production DAC chips that handle DSD natively. If this is the case, are all DSD DACs that use current production chips converting DSD to PCM internally?
Most DSD DAC chips, if not all, lowpass filter the DSD signal to get rid of the high frequency noise (see Fig.2) before the signal gets converted to analog. The resulting signal behind this lowpass filter (and before the actual analog conversion) may still have the same sample rate as the original DSD signal, but it is no longer 1 bit. So can this still be considered DSD?

"Sometimes it is more useful to distinguish DSD from PCM in the frequency domain and look for the characteristic behavior in the higher frequency bands..."

It is all a matter of definition: DSD, or Delta-Sigma Modulation, can be encoded with more than just 1 bit, and PCM can have a very high sample rate. When looking at the criteria of word length and sample rates only, the boundary between DSD and PCM can become fuzzy. Sometimes it is more useful to distinguish DSD from PCM in the frequency domain and look for the characteristic behavior in the higher frequency bands, as pointed out in Figure 2 above.

Since I believe that the source of the sonic difference between DSD and PCM lies in the difference between how these signals compare in their behavior for higher frequencies, I also believe that filtering a DSD signal with an aggressive filter to flatten the upper frequencies will make it behave and sound more again like a PCM signal.

"The reason why chip manufacturers like to add an aggressive lowpass filter at the input to their DACs is simple: the analog output measures better. Whether it sounds better with real music signals instead of measurement tones is an entirely different question."

The reason why chip manufacturers like to add an aggressive lowpass filter at the input to their DACs is simple: the analog output measures better. Whether it sounds better with real music signals instead of measurement tones is an entirely different question. Similarly, most audio manufacturers and even end users who do not understand DSD are mostly concerned with Signal-to-Noise performance even at high frequencies and would not choose a chip with a frequency response that is not completely flat and optimally low all the way up to Nyquist.

With that, the answer is yes, most if not all DSD DAC chips convert to PCM before converting to analog.

That opens the door for discretely built DACs that don’t have to follow the criteria of measurements with sine waves, but rather the listening experience with real music signals.

How do people know if their "DSD capable DAC" is able to handle DSD natively or not?
Many manufacturers define “DSD capable” as being able to receive DSD signals natively via their digital input. What happens to the DSD signal once it is inside their converter is an entirely different question. If you want to find out more details on that question you need to find out from the manufacturer of your DAC, what chip is being used, or what kind of algorithm in the case of no off-the-shelf chip is used. Most DAC chips have a publicly available data sheet that you can download and study, but sometimes they are not so easy to read for the technically less inclined.

Unfortunately, this question is not so easy to answer for many users. But in the end, shouldn’t we use our own ears to be the ultimate judge for what sounds great and what sounds not so great?

"But in the end, shouldn’t we use our own ears to be the ultimate judge for what sounds great and what sounds not so great?"

Gordon Rankin goes on to point out that the DoP (DSD over PCM) protocol introduces overhead in the encoding/decoding process. Can the DoP protocol be improved upon and if so will these improvements result in better sound quality?
The overhead associated with the DoP protocol is for the identification of DSD signals while being transmitted in “PCM containers”. It has no bearing on the actual bits of the sound signal. The argument surely couldn’t be that overhead negatively impacts the sound quality, because then I wouldn’t know why USB generally can sound so good. The overhead of USB is huge compared to more traditional audio transmission formats.

DoP is a compromise solution for applications that do not allow a dedicated DSD signal transmission (for instance between Mac computer and external DAC). It was created with the contributions from a number of manufacturers. Of course, such solutions are never ideal and can create bigger or smaller headaches for certain manufacturers depending on their existing architecture.

Like in anything in life, there is always room for improvement and that is certainly true for DoP as well. But whether any improvement will also improve the sound quality is quite questionable.

One byproduct of DSD is unwanted ultrasonic noise. Can you talk about why we should or should not be concerned with this?
The real question here is whether this ultrasonic noise is unwanted or not. Our human hearing is a complex and very dynamic process. We often make the mistake of trying to describe it with a frequency response resulting from measurements with sine waves. We have to understand that this is only a very rudimentary approximation and doesn’t explain at all how the process works with dynamic signals such as music.

In order to only begin to understand the complexity of the hearing process we have to go into the deeper psychology of it. The physcial process of transforming sound pressures to nerve signals, as performed by the cochlea, is the easy part. What happens next in our brain is the least understood and therefore quite controversial part.

Without going further into details on this subject, I want to just point out that it has been shown that our hearing does not stop at 20kHz, but goes much beyond that for dynamic transient signals. The dynamic range above 20kHz gets gradually reduced, but never shows a sharp edge. Just like naturally occurring sounds that never show a sharp drop off either. Now look again at the graphs in Fig.2 and tell me which curves may be most similar to the human hearing thresholds. Wouldn’t you pick the DSD signals?

The ultrasonic noise of the DSD signal is first of all uncorrelated to the music signal. Our hearing algorithm is very good in “tuning out” uncorrelated signals – this is how 2 people can have a conversation in a noisy environment. Then for the most part this noise is very low in amplitude, often even below the hearing threshold.

"...a flat PCM-like response in higher frequencies (with the potential of negative side effects) may be “overkill”, because our ear and associated psychology already perform the function of noise removal."

By counting on the mostly misunderstood capability of our hearing as a filter we can avoid certain algorithms in the way we design our technology and therefore avoid some pitfalls associated with certain algorithms. In that sense a flat PCM-like response in higher frequencies (with the potential of negative side effects) may be “overkill”, because our ear and associated psychology already perform the function of noise removal.

At the recent RMAF 2013, I heard a number of attendees asking about double rate DSD. What are the benefits of double rate DSD?
Double rate DSD pushes the noise shaper up in the frequency domain as shown in Fig. 2 above. That is most interesting for recording and post production when the intent is to release the product in DSD, because DSD2x gives the extra headroom that recording engineers need in order to record and edit without causing any degradation when releasing their final product in single rate DSD.

It may also be interesting for hobbyists who for instance want to archive their analog music library to a digital format. In such applications you may not care about the extra storage space that is required and you certainly wouldn’t be bothered with bandwidth bottlenecks when sending DSD2x files through the internet.

However, as a delivery format from studio to end user single rate DSD seems to offer an optimal combination of sound quality, bandwidth and storage space.

A number of people including myself and Stereophile's Stephen Mejias have commented on the sonic qualities of DSD playback. Stephen said it very well in his RMAF show report, "There’s an overall smoothness and effortlessness, combined with wonderfully natural and powerful dynamics." Is there some technical aspect of DSD that you could point to that accounts for DSD's superb dynamic capabilities?
We talked about that already above a little. I think it is DSD’s lack of any “sharp edges” in its characteristics that make it sound superior. Wherever there are sharp edges in nature funny things happen. That is true with sun light hitting the corner of a house, radar dish antennas and also with audio encoded in PCM with a brick wall.

"I think it is DSD’s lack of any “sharp edges” in its characteristics that make it sound superior. Wherever there are sharp edges in nature funny things happen."

Some people appear to be reluctant to get involved with DSD because of their experience with SACD, essentially buying into a technology that was more or less abandoned by Sony. Is DSD different and if so, how?
Encoding formats generally don’t disappear, it is usually the physical delivery format that ages and then disappears. DSD is an encoding format and is no longer tied to a physical carrier.

The Playback Designs MPD-5 DAC can handle up to 6.1MHz DSD through USB. Why 6.1Mhz DSD?
6.1MHz or 128 x 48kHz is the theoretical limit that the input receiver accepts data in all PBD DACs. The actual D/A converter runs at an even higher frequency (built-in future proofness).

I'm going to largely ignore the stuff about "proven" response to high-frequency signals/the idea that the human hearing very slowly rolls off. I'm spoilt for citations on highly reputable psychoacoustic authorities declaring that's hooey, but I don't think there'd be much point me starting on that one.

I'm also going to largely ignore the statements alleging that pre-ringing is a problem with modern PCM signals. It isn't in the slightest (especially not with modern oversampling technology), but again I don't think I'm going to get anywhere on this one either.

What I can't ignore is that monstrously misleading graph. On this point, regardless of your opinion on the sound of PCM or DSD, there is a simple, completely unambigous factual error: the graph is drawing the PCM noise floor in completely the wrong place. -98dBFS is the RMS noise level across the entire signal: the actual noise floor, assuming standard triangular dither, is about 130dB down.

I think that perhaps Figure 2 may be a little hard to interpret. The purple dotted graph is for 44.1 KHz sampling rate and 16 bits of resolution.

The Audio Precision plots in JA's article really do NOT show an SNR of 130+ dB. They show the noise power in much smaller noise bandwidths than the 20 KHz audio band, since the AP uses averaging and filtering to allow you to measure below the whole sampled bandwidth noise floor. The total power of the noise for the entire 22.05 KHz sample gives a much worse SNR. Not that this is necessarily the system noise limit.

Pre-ringing is a function of the filter function. You can make filters with zero pre-ringing using the right algorithm.

One of the problems is that most studios and editors use software like Pro Tools that are more optimized for speed and functionality rather than lack of pre-ringing. Once the ringing is there, well, it's there to stay. The existing DSD tools don't do this, since they are pretty limited in capability. Most DSD editing, if done, is performed by first converting the DSD signal to PCM - back to the same problem.

So, we're back to good recordings sounding good and not so good recordings sounding not so good. That's even before you run them through playback software and whatever filter is used in the DAC. Maybe the whole thing centers around the ringing going on.

I'm not saying that 16-bit audio has an SNR of 130dB: as you state, that would be wrong (SNR being 98dB). This does not mean, however, that the noise floor sits -98dB down, which can be trivially demonstrated by the capacity of 16-bit audio to resolve a -105dB tone well above the noise floor (as demonstrated below, admittedly with noise shaping to push the noise floor even lower where the ear is most sensitive at the cost of much higher noise further up the spectrum).

For the line representing DSD, they have done a proper noise floor measurement (as you say, one that allows you to 'see below' the SNR - horrible sentence, but you get my point!), and then drawn an approximation of that noise floor.

Compare it with the second image I've attached to this post, which is even more blatant (I think it originates in Sony's publicity materials for SACD, but don't quote me on that): DSD gets a proper FFT, whilst 24-bit PCM gets a line at its SNR.

The above figure is similar, but seemingly modified to make the misleading nature of the comparison less obvious (!).

I can just tell that this is the kind of discussion that will not end well. So, I'll avoid that rabbit hole.

For everyone else... You can read more about the reasons for the shape of the noise floors if you Google "noise shaping dither". There's about 47,000 (that's the number!) hits on the subject. At least a few will be worth reading.

This one is directed right at the subject. Skip over the obligatory conceptual math, and you'll still get the idea.

I'm not sure what point you're trying to make. Noise shaping can indeed push the noise floor even lower in much of the audio band, but even with normal TPDF dither, the noise floor is *still* not -98dB down. There are figures demonstrating this in your linked paper.

It's all very well to talk of avoiding rabbit holes rather than pointing out where I'm wrong (which is entirely possible, but I can link to other, more qualified individuals, saying essentially the same thing as I am).

From my understanding, record labels like 2L are recording on DXD format at a sampling rate of 352.8 kHz / 24 bit. With the exponential growth of high-speed internet and data transfer, it will be just a matter of time that such sampling rates are easy to store and crunch in new generations of recording studio's and DAC systems.

There are some manufacturers over here in the Netherlands, who claim that even a standard 44 kHz / 16 bit signal direct from PC to DAC via shortest signal path and NO conversion via USB ( but use I2S-I2S connection instead) sounds much better than DSD recordings....

When performing off-the-fly upsampling tricks with a standard 44 kHz / 16 bit .waf file to 192 kHz / 32 bit .waf file and streaming the sound quality was enhanced even further. I also was very much surprised that this trick indeed revelaed a significant improvement in my system as well, which more or less proved that there is still much to improve in the digital-to-analog conversion. The effect was almost as dramatic as with HDtracks remastered 192/24 albums...

I am aware that much will depend on the type of software, pc settings, USB cable, etc etc. and for this reason it seems logical to me that the shorter the signal path, the less involvement of hardware, conversion and computer will be, the better the sound reproduction from anay digital source or format will be.

But is this will be of interest for the audio industry is the question..On the other hand, if data storage and memory and fast algorhythms are easily to be compressed onto a chip, it will be just a matter of time when such idea will be launched into the market by small audio pioneers from more exotic countries like China, Rumania or elsewhere.