Raising The Standards

This article is almost entirely unlike
the original version published in the July 1998 edition of AudioMedia UK, which
is available as a PDF. The present article
is shorter and contains late-breaking news, but the original, while longer,
has more laughs. Take your pick!

You may have thought that the format
for the next High Quality Audio Disc – HQAD -- was already settled. Surely,
it’s DVD-Audio, isn't it? -- with 96kHz sampling and 24-bit word length; 5.1
surround and stereo audio; and Meridian Lossless Packing (MLP) for inaudible
data compression (now
officially endorsed by DVD’s WG-4) which also allows for innovative surround
schemes such as Ambisonic B-Format. But as you may have heard relatively recently,
there is now another contender in the shape of Sony and Philips’ Super-Audio
CD (SACD for short).

Standards

DVD-Audio and SACD are fundamentally
different. (We can discount the inclusion, for political reasons, of SACD-like
technology as an option in the current draft of the DVD-Audio spec. If you make
a standard too wide, manufacturers will only implement the bits that suit them
– and in the case of DVD-Audio, this would probably be PCM.) So there are currently
two separate proposals on the table.

It’s time we looked at what we actually
need from a HQAD, as well as trying to guess what we are likely to get.
Which of these two techniques get closest to what we desire, and what are their
pros and cons?

We rightfully want to make records
with the highest quality gear available – preferably gear capable of higher
quality than consumer equipment, preferably. This is why there is a quest for
ever-higher sampling rates and longer word lengths. In my personal opinion,
perceptible improvements in audio quality due to increased sampling rates for
conventional PCM (ie not DSD) tail off above about 60kHz – but 88.2 and
96 are nice multiples of current practice, so why not. 24 bits is an improvement
over 20, which is a significant step forward over 16-bit. Of course, few of
us use all that dynamic range apart from during fades and in reverb tails.

48 vs 96

We can probably hear the difference
between 48 and 96kHz sampling in a quiet, modern studio, but it is difficult
to say whether record buyers can. Even if they have home theater systems with
surround capability, there are plenty of people around who will tell you that
they’re happy with 20/48 (the capability of most DVD-Video players). This includes
some noted producers, who feel that 20/48 DVDs with surround – and even surround-encoded
CDs – are entirely adequate. 20/48, they believe, satisfies the quality requirements
of the material, and the missing link is not higher digital resolution but surround-sound,
which we can do with current systems (though I would argue not very well --
see Ambisonics in the Age of DVD, AudioMedia
April 1998). Others actively dislike 96kHz, because on current converters they
can hear problems like jitter. That doesn’t mean to say that the same people
won’t use higher resolution systems in the studio, of course.

And are 24/96 converters real anyway?
Yes, but they are difficult to do well. You may be able to get 24 or so bits
to wiggle 96,000 times per second, but that doesn’t mean that the data itself
carries any additional real information. Clock jitter is more difficult to deal
with, for example, and noise levels in the analog stages -- more than the digital
circuitry -- define the actual achievable dynamic range as much worse than the
theoretical 144 dB. But compare converters and see what you think.

But converters with higher sampling
rates can still sound better. Placing the anti-imaging and anti-aliasing filters
at a sufficiently high frequency that they can’t do any audible damage to the
sound is one obvious reason, and the other I’ve never seen in print before,
so it may be rubbish (but here goes anyway).

Why Record Ultrasonics?

As is widely recognized, we can’t
hear much above 18kHz, but that does not mean that there isn’t anything up
there that we need to record – and here’s the second reason for higher sampling
rates. Plenty of acoustic instruments produce usable output up to around the
30 kHz mark – something that would be picked up in some form by a decent 30
in/s half-inch analogue recording. A string section, for example, could well
produce some significant ultrasonic energy. Arguably, the ultrasonic content
of all those instruments blends together to produce audible beat frequencies
which contribute to the overall timbre of the sound. If you record your string
section at a safe distance with a Soundfield mic, for example, all those interactions
will have taken place in the air before your microphones ever capture the sound.
You can record such a signal at 44.1kHz sampling and never worry about losing
anything – as long as your filters are decent and you have enough bits.

If, however, your idea of recording
a string section is with a couple of 48-track digital machines, a mic on each
desk feeding its own track so that you can mix it all later, you are doomed.
Your close-mic technique does not pick up any interactions, so the only place
they can happen is when you mix it – by which time the ultrasonic stuff has
all been knocked off by your 48 kHz multitrack machines, so that will never
happen. So if I was to be uncharitable, I could say that high sampling rates
allow you to use bad mic technique with better results.

Pick A Number

Having established that higher sampling
rates are a good idea, there is a question as to what the sample rate should
actually be in a studio environment. On the face of it, 96kHz takes care of
capturing any audio that might ever happen, and 24 bits offer quite enough quantization
steps. Is that enough?

Yes, in theory. But there are some
potential problems, real or imaginary, to having a production environment that
has the same resolution as the consumer distribution format. Think of it as
a kind of "headroom". We need higher resolution in the studio than
consumers so we can start with a higher level of quality in case some gets lost
on the way, which might well happen. And what happens when you modify a digital
signal in the digital domain, say by EQing it? You create more bits. You ought
to have spare bits so you have room to work. You can always lose resolution:
but you can’t easily get it back again.

There’s another way of looking at
it, which will be familiar to engineers and producers who recall the way things
were in the Seventies. With recording facilities of the time, you could make
an album which sounded fine to you, and probably to most people. But if you
ever heard an audiophile’s playback system sucking every last ounce out of the
vinyl, you’d hear not only what you recorded, but what you didn’t know
you’d recorded – guitar amps humming, someone tapping their foot, and weird
breaths at a drop-in point.

With the arrival of Compact Disc,
everyone suddenly had the equivalent of an audiophile system, even if
it cost far, far less. Suddenly, everyone could hear all the things you had
recorded but over which you had no control.

I was only partly joking when I remarked
some years ago that, quite frankly, listeners at home should degrade the replay
quality of their gear to match the industrial audio setting of the studio. That
way, they could hear our records the way that we heard them when we all
agreed that Take 146 was the master. Do you want the closest approach to the
original sound or not? What do listeners think the "original sound"
is, anyway? Does it include things you couldn’t hear?

This, it seems to me, is another
reason to have a production environment that has a higher intrinsic resolution
than the consumer distribution medium: They are a whole lot less likely
to get more information out of your recording thanyou knowingly put
in. We simply can’t afford to have people recovering undefined sonic experiences
from our albums, enjoying things that we had never known were there and would
have removed if we had.

So even if the consumer format is
based on 24/96 PCM, you may well feel that you need something even more exotic
to make records with. Presumably this will include a sensibly-designed surround
control-room where you can hear it all properly – but talking about that
would be to broach a topic that would make Pandora’s Box seem as innocuous as
a pack of Oreos. Let’s go there another time.

DSD
and SACD

Unfortunately, today it isn’t even
as simple as asking whether or not we should upgrade to 24/96 or even 24/192.
Because, with Super Audio CD and Direct Stream Digital, there is another, virtually
fundamentally different, option.

DSD, with its "1-bit" bitstream
approach, also features lossless compression, an idea whose time has evidently,
and thankfully, come. SACD is the distribution medium for this system: a disk
made with DVD-like technology but containing a high-density layer with 5.1 surround
and stereo areas, plus a Red-Book-compatible layer which can be read by a regular
CD player.

At the recent Hi-Fi 98 show in Los
Angeles, I had the opportunity to listen to demonstrations of Super-Audio CD,
and to hear the originators talk about it. At a Sony/Philips demo, we heard
excellent stereo recordings by Michael Bishop of Telarc, and surround recordings
by Philips engineers. We also heard a real Super Audio CD played on a real SACD
player – and on a boom-box, showing that the dual layer SACD/Red Book construction
really does work.

I asked if there were plans for a
higher sample rate than 2.882 MHz for production applications – perhaps one
that would divide down nicely to all the likely PCM sample rates we might need
in the marketplace, bearing in mind that non-integral sample-rate conversion
is not easy to do without it sounding nasty. The answer I got was that there
wasn’t apparently a decision on that point, but 7.056 MHz – Fs x 160 – "would
be logical".

Even though I was never conscious
of the 100kHz tweeters (part of the DSD technology, apparently), the sound was
excellent. Gus Skinas of Sony kindly took me around the equipment room afterwards
where Sony stereo hard disk recorders (interfaced with SDIF-II) and Philips
multitrack HD recorders (interfaced with ST Optical cable) were arrayed.

In another room on the same floor,
Marantz (owned by Philips) was demonstrating SACD versus 24/96. And whilst bearing
in mind that the listening environment was a typically nasty hotel room, my
colleagues, quite honestly, couldn’t hear the difference. Hmmm.

DSD Converters

DSD is what you might call a "scalable"
technology. The D/A is simply a low-pass filter, which you can implement as
cleverly as you can afford. You can do simple, cheap, OK-sounding LPFs relatively
easily (for an SACD ‘WalkPerson’ for example), while with a significantly higher
degree of effort, truly high levels of quality are possible. The same recording
– and the same physical disc – could satisfy the jogger, the in-car listener
and the audiophile.

The sigma-delta style of A/D conversion
used in the vast majority of PCM converters is still employed – you simply record
the 1-bit stream directly instead of decimating it. And some existing A/D chips
already have an output which can be used to derive a DSD stream.

And assuming you don’t simply want
to convert your recording to DSD at the mastering stage – which many might see
as rather missing the point of using the technology at all – then you need to
replace virtually every piece of digital equipment in your studio. Ooops.

Luckily, you don’t need a whole lot
for high-quality classical recording. Multichannel DSD converters to capture
a surround signal, a recorder to store the output of the converters, an editing
system to put it all together, and while you’re at it, put a couple of extra
mics up and record a stereo version at 44.1 for the Red Book layer (or the CD
version if you’re still doing them). All these products already exist, and generating
sonically-decent 44.1 PCM from DSD at 2.882MHz is not too daunting.

When it comes to multitrack recording
and mixing, however, more gear is involved, and DSD begins to get a bit scary.
If PCM-based DVD-Audio becomes the standard, all we do is to upgrade our studio
gear until we reach and possibly exceed (depending on the headroom we would
like to have) 24/96 performance. It’s technology we know, and it’s an evolutionary
strategy: comparatively safe.

An obvious problem, however, is that
generating a 44.1 version will not be simply a matter of putting up a couple
of mics for most people: DVD is based around 48 kHz and multiples, while Red
Book CD is based around 44.1. Sample-rate conversion from 96 to 44.1 may not
sound very nice – it’s not a simple divide-by-n -- so we will have to do two
or even three mixes, stereo at 44.1 and 96, and 5.1 at 96. This may be too expensive
for many record companies to consider. It may be why there are second thoughts
about single-inventory, and why nobody is quite sure whether a DVD-Audio disc
will have a Red Book layer

DSD/SACD on the other hand is a revolutionary
strategy, and thus more risky. If we end up using DSD for multitrack-based production,
a whole load of gear is required, almost all of which is currently imaginary.
Or it may be, after all, that we can use high-enough-quality PCM systems in
the studio and convert to DSD at the end of the day. In this case, the classical
recordists are the only people who need to invest in (relatively simple) systems
relying on DSD throughout.

The pros and cons do not stop there.
There are many who believe bitstream signal processing to be fraught with difficulties.
There are even potential problems with bitstream technology as a whole, which
may render it intrinsically inferior to PCM – see the ARA Web site at http://www.meridian-audio.com/ara/
for details – although I would not claim to know enough of the theory to back
one side or the other.

Crystal Gazing

My guess is that we will end up with
a consumer audio distribution format based either on SACD, or upon DVD-Audio
-- PCM discs recorded at 24/96 -- or both, which might be the worst of all possible
worlds. I suspect both will have a Red Book CD-compatible layer, and 5.1 as
well as stereo mixes in most cases.

I can imagine that SACD will find
most adherents among the fans of "serious" music, while the backers
of PCM will be more inclined towards "popular". But it is difficult
to imagine that the classical field could support its very own HQAD format,
whether at the release or the replay end of the chain. As Mike Batt once put
it, there are no such things as "popular" and "serious"
music – just "popular" and "unpopular": and this technology
is sufficiently expensive that whatever the next HQAD may be, it will need to
be popular.

We have all the signs of a format
war on our hands that, as always, will be expensive and controversial however
it falls out in the end. The worst-case scenario – both formats co-existing
– could have the same result as that caused in the past by having two incompatible
open-reel digital systems. This arguably held back the introduction of serious
digital recorders into the studio for years, and resulted in both formats being
eclipsed in many minds by the MDM. Such a war would not be good for our business,
and I would join a growing number of industry bodies in calling for a single-format
solution, as quickly as possible.

Richard Elen (relen@brideswell.com)
has been a frequent writer on professional audio for over two decades. He is
now VP of Marketing at Apogee Electronics Corporation in California.

This article reflects the personal
views of its author, which may not be those of his employers.