Friday, September 23, 2016

More on DSD vs PCM

Here is the most helpful and informative page I have ever read about DSD, by Charles Hansen of Ayre Acoustics. DSD ("Direct Stream Digital") is simply a meaningless trademark term which Sony has in this case defined as 1-bit delta sigma modulation at 2.8224Mhz and 7th order noise shaping. They own the trademark so they can say it means anything they like. Any time you deviate from 1-bit, as is essential for any kind of mixing or mastering or even level setting, you are forced out of the delta sigma domain into the PCM domain. The Sonoma so-called DSD workstation is really a PCM workstation that happens to operate at 2.8224Mhz with 8 bit data. DSD and PCM are interpreted by the same delta sigma DACs just with different digital filter algorithms. The difference in filters explains everything people hear--it has to because there are no other differences. Any superiority comes from the loss of the need for brick wall filters in high speed systems. Now that we have 4x PCM, we don't need brick wall filters in PCM any more either, so we can achieve the same benefits with PCM which is far easier to work with, but few have ventured into this new landscape yet (except Ayre of course, who has a QA-9 A/D converter with no brick wall filter, instead it uses a "moving average" filter which has not time smear or ringing). The only "pure" DSD recordings are all analog then converted to DSD, or live performances. And there are just a very small number of those. [And btw, Charles Hansen is the greatest!!! After reading this hugely informative yet no-nonsense post, I'm a fan.]

Lotsa people think DSD and HighRes PCM are pretty equivalent. I think that's a reasonable view, though one I don't exactly agree with (I still favor PCM), with the equivalence being DSD is about the same as 88.2/20 or 96/20.

Of course, DSD fanboys have always claimed that DSD is some special magic, that NO PCM could equal. (Don't tell them about the feedback that delta sigma systems rely on. That might collapse the magic.)

Many if not most think 44.kHz/16 bit "perfect sound forever" is still perfectly fine, and I know a number of people who think CD quality PCM is superior to DSD, especially in the highs, with a common thread that the highs in DSD sound fake, for which there is a tiny bit of technical justification (more noise and noise shaping is going on, meanwhile there is less brick-wall filtering, so you could also take this the other way).

Famously much SACD and DSD content is made from PCM sources, often defeating one of the long running claims (which probably most serious audio engineers would regard as hype) that DSD bypasses several phases of processing used in PCM.

Although unleashing DSD onto the world, Sony supported it poorly according to many industry insiders. AFAIK, and in contrast many earlier format releases, Sony did not sell any DSD mastering equipment. Instead, they gave it away to specific "partners." If you were not one of the handful of chosen, you were out of luck, you would have to do your mastering in high rez PCM and convert to PCM. Even the equipment Sony gave away may not have been fully featured, apparently Sony designed a fully featured DSD mixing system with a European partner, then never actually bothered to make it. It's not impossible to make such a thing, and I believe that there are now, 18 years after launch, fully DSD mixing and EQ system(s) available from companies other than Sony.

Speaking of how DSD allegedly bypasses the decimation and integration phases (the hype which some believe as the magic of DSD that makes it inherently better), there are a bunch of problems with the argument (in addition to the one that PCM processing is nearly always used anyway). Even if you had pure DSD mastering and playback (almost never the case) the claim would be inaccurate because:

1) First, it assumes that 1-bit DAC's are being used at the DSD sampling rate. This is almost never the case anymore. Almost all DAC's used for DSD now are Delta Sigma DAC's. It's still considered DSD if you use Multi-bit Delta Sigma DAC's at the DSD frequency or higher, which requires a lot of complex mathematics to do optimally.

The last 1-bit DAC's were used in devices like my 2001 DVP-9000ES. Those were Sony DACs which actually operated at 70Mhz if I understand correctly, which would be something like 24x DSD. Sony was doing some kind of way up sampling to increase dynamic range. So it was never as simple as the cute block diagram Sony used to make DSD look simpler.

(Interestingly enough, it does not appear that the spec sheets for the Sony converter chips used in DVP-9000ES and back to CDP-707ES, have ever been made public. But Sony did advertise these as 70 Mhz 1-bit converters. I wonder if Sony made these at the long closed Sony Semiconductor factory in San Antonio, Texas. Sony subsequently found it cheaper to buy off the shelf multibit sigma delta converters from the likes of Burr Brown. Cynics

2) Second it assumes that 1-bit Sigma Delta ADC's are used. I haven't found much discussion about this, but I believe that in the early days of digital audio, sigma delta ADC's were considered too noisy. Noiseshaping is required when you use a sigma delta ADC. Also, very high oversampling. I believe some if not all of the earliest ADC's were actually SAR (successive approximation) which is one of most widely used approaches for analog to digital conversion. Even now when Sigma Delta ADC's are used, they are used with multi bit converters and high oversampling.

Even if Delta Sigma ADC's are used, there's a lot more going on than you might think. Quoting from the above linked article:

These are usually very-high-order sigma-delta modulators (for example, 4th-order or higher), incorporating a multibit ADC and multibit feedback DAC.

Sigma Delta systems are inherently approximate (aka noisy) systems which almost always require feedback to operate correctly. This is something NEVER mentioned. This is one reason why I've personally moved back to PCM as much as possible. PCM does not require feedback to work correctly. The dirty word "feedback" would destroy the claimed "magic" of simplicity.

Of course it is also because of feedback that delta sigma systems can get near perfect linearity without requiring extensive trimming the way PCM systems do. You do the fine tuning before, when you can only guess, or you do the fine tuning after the fact, which can always be perfect.

Now this also is probably a non-issue. While the feedback used in Delta Sigma would smear the highs, Delta Sigma ADC's and DAC's generally operate at such high frequencies that high frequency information might even be better preserved as compared with slower PCM systems. It's actually quite hard to know without extensive analysis and/or testing which system preserves the high frequency integrity better.

However, one can also just look at the measured performance. DSD does quite well compared to 16 bit systems in the midrange, but has much more noise in the upper octave 10-20kHz. That greater high frequency noise means that by definition high frequency information is NOT being preserved as well. OTOH, there is ultimately response to an even higher frequency, and there may be less phase shift in the upper audible octave. So it looks like a toss up.

Listening Tests

The best published investigation of audible differences between PCM and DSD was done in Germany using some of the very best megabuck PCM and DSD equipment. (IIRC the PCM was either 88.2kHz/20 or 96kHz/20, so as to have comparable bandwidth and bit depth.) Monitoring was done with Stax headphones (you can't get more transparent than that). And the result was: there is no audible difference! Not only was the null hypothesis not rejected but most identification was no better than random for nearly all people.

I believe this is basically correct. DSD is simply an inefficient high resolution system which takes more bits to achieve 88.2kHz/20bit fidelity than PCM does, and PCM is more easily worked with in many ways, including incrementally increasing fidelity with just a few more bits. The very idea of 2xDSD and 8xDSD is monstrous--a monstrous waste of bits.

I've argued that DSD operates a bit as if it has an infinitely varying digital filter. Varying the digital filters in 44.1/16 can make a slight audible difference (or larger if you throw out the book with NOS, which is not high fidelity IMO). Once you get to modern apodizing reconstruction filters using ordinary PCM, it's not clear from published research that better can be achieved or is necessary, but an end-to-end apodizing system like MQA promises to be would be a step better. That is, a step better than DSD-in-principle.

To DSD or not to DSD

If Only Sony had marketed DSD on a fairly straightforward technical basis, I might have signed on in 1999 and never looked back. Forget the simplicity crapola, the real technical advantage of DSD compared to plain vanilla PCM is the superb impulse response.

[Update: after the nth revision of this post, I discovered that Charles Hansen had already debunked the above graph in great detail. It's a pack of lies from beginning to end! It's no wonder that Sony didn't plaster this on everything, more people might have called them out. This is not to say that you couldn't come up with a relatively more honest graph to make the point that DSD has better impulse response that the usual 44.1kHz plus brick wall filtering usually used, but in that case there would be competing hirez PCM systems that could do as well. The way the graph is shown no real systems can produce those results at all. BTW, I'm now a little bit concerned that the MQA impulse response graph shown in TAS is also inaccurate, though in showing more rather than less time smearing with standard PCM.]

Now, PCM defenders will argue, and they have got to be at least mostly correct, that this difference, which is caused by high frequency phase response in the anti-aliasing and reconstruction filters, is not audible. But it sure looks like it would be important.

Even if you take the worst case for PCM noise, 24/192, and then combine it with No Oversampling (NOS) which as I always argue isn't really high fidelity or standard PCM, do you get the noise to rise a bit closer to DSD. But the DSD noise is still higher. Archimago doesn't show the NOS and DSD noise spectra on the same graph, or even the same page, but I can compare them and they are still quite different. At 40kHz, 24/192 with NOS reaches -113dB. Meanwhile, DSD64 has reached -100dB, which is 13dB worse.

I still wouldn't say "they are the same below 45kHz" but close. But I'm not even sure why we are doing THIS comparison. Well Hiro also mentions that DSD64 can very simply be up sampled to DSD128. Now here we have an interesting case, however. Upsampling will push the digital noise upwards. But it seems to me very much unlike "noise shaping" in one critical way. As a purely 1-way process, up sampling cannot possibly restore lost information. The information loss from the original DSD64 encoding cannot be undone. So while the noise will be reduced, the lost information cannot be restored, and I'd predict a kind of dark sound, the same thing you get in clunkier fashion with noise gating.

Meanwhile, I would have been (and was) rightfully turned off by a large number of things about DSD right from the start:

1) DSD recorders have been almost unobtainable (there were no consumer DSD recorders until 2007 or so, right now one is available for $999).
2) SACD discs are impossible for most people to make, they require a manufactured watermark (some old machines will accept a fake DVD/SACD, and the newest ones will read DSD files).
3) DSD does not lend itself to simple DSP for crossover and room correction functions--so conversion to and from PCM is required anyway, so the best approach is high rez PCM end-to-end.

I'm less bothered by (3) than I was years ago, for an interesting reason. The reason is that conversion to and from PCM is extremely transparent. It's so transparent that I find I often prefer taking the analog outputs of digital devices and resampling to digital at 24/96 than just letting the 44.1/16 pass through all the way. So, if I'm fine with resampling analog to PCM, why not DSD to PCM, or even DSD to Analog to PCM? I see now I can fit DSD into my system as a perfectly fine music delivery system, though not as a final digital conversion approach.

Of course, as many have pointed out, SACD was an attempt to impose DRM on an industry. If Sony could have led everyone to abandon PCM, we'd be locked into their new system with a DRM system that has not even yet been broken. Of course, we know in retrospect this was never going to happen.

But from the beginning, there was no consumer recording of the new formats, and, very curiously, the first generation SACD machines had problems dealing with user-recorded media that had already become well established by that time. As if to send a message to the industry. Well it was too late.

Now I just said that PCM conversion is very transparent, as was demonstrated by the Meyer/Moran experiments in 2006, up to 10 levels of PCM conversion/deconversion was still found to be audibly transparent. Very little noise is added, however there is a increasing amount of high frequency phase shift. This doesn't look good on photos but has never been proven to be audible.

Meanwhile, DSD is not amenable to multiple generations because of high frequency noise that keeps on growing until you get overloading in the highs.

However, DSD128 is looking like it might have reasonably low noise levels in the near supersonic, and still of course give you the natural (noise shaping aka feedback driven) impulse response. DSD64 is so noisy you can easily see the HF noise on high bandwidth oscilloscope traces of sine waves, as Archimago shows. DSD128 looks just like analog on the scope.

I'm not sure we've seen the end of this, since now filter designers are showing how perfect impulse response can be obtained with PCM and slightly higher sampling and end-to-end mathematical apodizing. This retains the advantages of PCM in relative compactness and mathematical tractibility--it can easily be worked with in DSP.

DSD counfounds mathematics not because of sigma delta itself--that's the trivial part that had me fooled for the longest time. Equally fundamental to DSD is Noise Shaping, based on continuous high level feedback. This means, in effect, each pulse is NOT equal. Each pulse is in the context of everything before and after it, which actually determines what it means. This context dependence makes the mathematics infinite. You can't just "add things up" to make a mixer, etc.

Meanwhile, PCM is reborn every few years with some interesting innovations, though I consider apodizing important but little else. IMO, by the time we get to CD players like the Pioneer PD-75 around 1991 we're in the modern era of high sound quality, thanks to high linearity, low jitter and high stability, and flat but closer to linear phase digital filters: plain old 44.1/16 bits done fairly well is incredibly good! For the longest time, the best published research in JAES was that it was perfectly transparent. It may never have been perfectly transparent, but it's obviously quite close. It has only barely been established in the AES literature that it isn't perfectly transparent, that significantly improved apodizing can be slightly audibly better and demonstrated in DBT (published by Meridian). This has not been scientifically established for DSD, in fact the reverse has been demonstrated in the most recent and well done experiment--it is indistinguishable from comparable PCM. Most talk to the contrary has not been well founded.

DSD stays alive simply by slowly making what used to be impossible less so. And I'm happy to play with it as I can without huge expense. I will never have full DSD end-to-end because that would require me to give up DSP. But I can accept DSD inputs, converted through analog resampling to 96/24.

Which in a way, is not surprising. Recall that DSD was originally invented, in the first place, not as a mixing or mastering format, but as an archival format. Now I'm not sure it's especially good at that either, because of noise, but for an archival format there isn't much concern regarding mixing and mastering, and even distribution and playback. Also, DSD56 was invented specifically for the mastering of 44.1 and 48kHz, the two popular rates of the time, but not for the high rez PCM formats of today.

Archimago does usefully propose combining DSD128 with lossless compression. If it can indeed be compressed to the same size as 192/24, perhaps it's not that bad. But we have no reason to believe this complexity is needed. As far as we know now, 24/96 PCM is as high definition as is needed, and it is far easier to work with than DSD128.