On the Audibility of Phase

This post is nothing so much as some extended thinking aloud on the subject of the audibility of phase. I have written before about how phase relationships can profoundly affect the actual waveform of a complex sound even though the frequency content remains unaltered. Experiments to determine whether those phase-induced changes are actually audible, using synthesized sounds, are unsatisfactory. I personally am totally unable to hear any difference between the sounds of different tracks where all I do is vary the phase content. But this doesn’t really prove much, because the human brain is not well adapted to discern subtle differences in synthetic non-real-world sounds. Remember – the EARS listen, but the BRAIN hears.

A great, and very valid point of reference, are the ultra high-end loudspeakers of the Wilson Audio range (whose “entry-level” models cost more than my daughter paid for her two year-old Ford). These top-end models include a facility for adjusting the positioning of the mid-range and tweeter units. The idea, as claimed by Wilson, is to permit fine adjustment of the “time alignment” between the treble, mid, and bass drivers. That such adjustments should have an audible effect is not surprising since, most obviously in the crossover regions, the signal reaching the listener is a combination of signals originally emitted by one or more drivers. The “time alignment” of those signals can make the difference between those signals reinforcing one another, or trying to cancel one another out. Those effects will manifest themselves in aspects of the speakers’ measured frequency response. But beyond that, these adjustments have the effect of fine tuning the phase response of the speakers, at least to some degree.

What effect do these adjustments actually have? I can tell you from personal experience that they are most effective. And it is not a question of optimizing the tone colour for ‘naturalness’ as you might presume if the effect you were hearing were that of the phase reinforcement/cancellation effects alone. No, what I heard, and what everyone else I have spoken to who has listened for themselves has reported, is that when the adjustments are ‘just right’ the whole soundstage seems to suddenly snap into focus in a way that only the Big Wilsons seems able to command.

This is personally interesting because a good 30 years ago I bought my first ever pair of seriously high-end loudspeakers, the Advanced Acoustic Designs ‘Solstice’ model produced by Colin Brett, a one-man operation whose day job was as owner of the local shaver repair shop. Inspired by the now-legendary Dahlquist DQ-10, Colin designed a speaker with a sealed bass unit, above which he mounted an open-frame midrange unit and tweeter. The open-frame units were progressively set back from the front of the bass unit’s baffle in order to provide a degree of time-alignment. By the time I came on the scene he had completed that phase of the design by mounting a selection of differently cut frames and listening to how they each sounded. I, on the other hand, wanted to hear this for myself, and suggested that he repeat the experiment this time using a pair of staggeringly precise piezo-electric slides, which I could conveniently borrow from where I worked. Sadly, that experiment never came about. I still have my pair of Solstice loudspeakers in my basement, although one of the mid-range units, long since out of production, has gone to meet its maker.

Just how much ‘time alignment’ do the Big Wilsons provide for, and how significant might you expect that to be? The full range of adjustment is confined to something like a couple of millimetres (by my estimation). That’s about one tenth of the wavelength of a 20kHz sound wave. The process of homing in on the ‘right’ position involves setting it within what looks like less than 1/10 of a millimetre. It seems a little surprising that mechanical adjustments of that order are necessary to fine tune the temporal response of a loudspeaker, but for the sake of this discussion lets take it at face value. The adjustable Wilsons make me yearn for what Colin Brett might have heard if he had voiced the Solstice with a precision positioner instead of the much cruder and significantly less precise method he chose! Although I wonder whether he would have been able to maintain such tolerances in manufacture, given the technology of loudspeaker cabinets in the early 1980s.

Phase and Time Alignment are different ways of looking at the same thing. Phase is measured in fractions of the oscillation period of a wavelength, and Time Alignment in fractions of a second. A fixed Phase error corresponds to a progressively smaller time error as the frequency gets higher and higher. Alternatively, as the frequency gets higher, a fixed amount of time represents a progressively larger fraction of the period of the oscillation and therefore its Phase. So ‘Time Alignment’ is more critical at higher frequencies than at lower ones, because it induces – or corrects for – a larger Phase error.

So to the extent that the Big Wilsons provide a crude “Phase Response” correction tool, and to the extent that the audible changes heard by the listener in response to those corrections represent the audibility of phase, we can look at various process that affect the phase response of an audio signal and compare those to the magnitude of the phase errors which are ‘audible’ on the Big Wilsons. There are a lot of ‘ifs’ in there, but if you bear with me it might be instructive.

I like digital filters when it comes to this sort of discussion, because digital filters can – if designed properly – have a known and precisely constrained effect. By constrained, I mean that all of their effects are knowable and are precisely quantifiable, even if, like the ‘phase response’ we may have trouble knowing what they all mean in terms of audibility. By contrast, in an analog filter, both capacitors and inductors are in reality complex physical constructs whose behaviour we can only ever approximate, and can never precisely know.

I want to look at a simple low-pass filter and try to make some very general conclusions as regarding the audibility (or otherwise) of their phase responses. I am going to choose a filtering operation I know quite well – a digital low-pass filter designed to convert a DSD source signal to PCM. Filters similar to these are used in virtually all modern PCM ADSs. Lets make some simplistic assumptions for the design of that filter. We’ll specify the low-pass corner frequency to be 20kHz, the accepted upper limit of human audibility. In order to eliminate any aliasing effects the filter needs to eliminate all signals above one half of the PCM sampling rate. If the PCM bit depth is 24-bits, then we need to attenuate such frequencies by at least 144dB. Finally, we want the character of the filter to have a Pass Band (the region below the corner frequency) with a frequency response as flat as possible. There are some other parameters I won’t trouble you with. Lets go away and design some filters and see how they look.

We’ll start by designing filters for 24/88.2, 24/176.4, and 24/352.8 PCM formats. We’ll come back to 16/44.1 PCM later because, as we’ll see, it is a lot more complicated. The first decision we need to make is regarding the type of filter we want to use. There are two types of filter that we would ideally prefer to choose from, which both have a flat frequency response characteristic in the Pass Band. Those are the Butterworth and Type-II Chebychev filters. Butterworth has the advantage that the attenuation keeps falling the higher the frequency gets, whereas the Type-II Chebychev only provides a minimum guaranteed attenuation.

With each filter design we are going to look at two things. First the ‘order’ of the filter. This is something I am not going to get too deeply into, save to say that if the ‘order’ is too high then the filter may become unstable or inaccurate. You’re going to have to take my word for it if I say the order of the filter is too high. Second, we’re going to look at the ‘Group Delay’ of the filter. This is a calculation that takes the Phase Response, corrects for the phase-vs-frequency relationship, and spits out the corresponding time delay. In essence, if we had a hypothetical loudspeaker that had one drive unit for every frequency, ‘Group Delay’ would tell us how far forward or backward we would have to adjust the position of the drive unit – Wilson style – to correct for it. The important thing here is the difference between corrected positions of the bass unit (the ‘lowest frequency driver’) and the 20kHz unit (the ‘highest frequency driver’). I will call that the ‘Wilson Length’, which is the distance by which the tweeter position would have to be adjusted in order to correct for it. This is the result that I will report. I hope that makes sense.

We’ll start with a Butterworth filter for 24/88.2 PCM. After doing my Pole Dance, what I come up with is a 31st-order filter, whose ‘Wilson Length’ is 14mm. That 31st-order filter is a non-starter to begin with. For 24/176.4 PCM the filter is 17th-order, which ought to be acceptable, and its Wilson Length is 3.5mm. For 24/352.8 PCM, the filter is 12th-order, which is fine, and the Wilson Length is 1.3mm. Given that experience with the Big Wilsons suggests that the Wilson Length needs to be optimized to within a fraction of a millimetre, it implies that the phase distortions of ALL of these filters could well result in audible deterioration of the perceived sound quality.

Type-II Chebyshev filters are the traditional workhorse for low-pass audio filters because they give good frequency response without requiring as high an order filter as the equivalent Butterworth. For the three applications above, the filter orders workout to be 18th, 12th and 9th respectively, all of which ought to be acceptable. Their Wilson Lengths work out to be 7.6mm, 1.8mm, and 0.6mm respectively. In all, the Type-II Chebyshev filters seem to be slightly better than their Butterworth counterparts although without really knocking the Wilson Length parameter out of the park. Only the 24/352.8 filter appears to have a shot at being ‘inaudible’. Bear in mind, though, that the specific filter designs I described may not be optimal for those applications. They were just chosen for illustrative purposes.

At this time it is instructive to look at the 16/44.1 variant of this filter. With only 16-bits of bit depth we can reduce the attenuation requirement to 96dB, but with the Nyqvist frequency of 22kHz so close to the corner frequency of 20kHz this places great demands on the filter. With a Butterworth design what we get is a 192nd-order filter which is a total non-starter. With the Type-II Chebyshev it is a 44th-order filter, which, despite being a much smaller number is still of no practical value. To get the level of performance we require will need what is called an Elliptic filter. This can actually be achieved with an acceptable filter order, but an analysis of its ‘Wilson Length’ behaviour is both more complicated and in any case will be much poorer than any of the results obtained above.

The above analysis seriously reduces a complex subject to an unfairly simple catch-all number, but I think it has some value if taken on its own terms. I hold the view that the sound of PCM is the sound of the anti-aliasing filters to which the source signal has to be subjected prior to being encoded in the PCM format. We understand those filters very well, and in terms of frequency response we know that those filters ought to be inaudible, but we are less clear on whether their phase responses are in any way audible. I personally suspect that the things we don’t like about PCM are the artifacts of the phase response of their anti-aliasing filters, which are baked into the signal. If we are willing to accept at face value that (i) the ‘time-alignment’ capability of the Big Wilsons provides an audible and beneficial optimization; (ii) the underlying cause of such optimizations are changes in signal phase; and (iii) the amount of adjustment needed to bring the Big Wilsons into their ‘optimum alignment’ reflects the sensitivity of human brain to phase errors; then this would seem to be a good basis for arguing that the phase distortions induced by anti-aliasing filters are more than capable of adversely impacting the sound of PCM audio – particularly so in the 16/44.1 format.

I think that’s rather interesting. While I recognize that there are a lot of broad sweeps and generalizations involved in all this, I think it has significant validity, provided it is confined to being taken on its own terms.

I want to conclude by commenting on ‘time alignment’ in the specific context of speaker design. Clearly, if you apply the same signal to each drive unit of a loudspeaker, there can be only one unique position at which the two drive units are correctly time-aligned to one another. Any other position would be, by definition, out of alignment. So why offer the possibility of adjusting that alignment? The answer lies in the caveat “… if you apply the same signal …”, because we don’t. Different drive units receive different signals, each contoured to the drive unit’s needs by the loudspeaker’s crossover. Crossovers are filters, and yes, they too have a phase response. Those phase responses mean that there is usually no one fundamentally correct time alignment. Wherever the alignment is set there are going to be some frequencies for which the alignment is ideal, and others for which it is less than ideal, and this may change with, for example, the relative listener position. Whether or not an audibly optimum position exists at all will vary from speaker to speaker, according to its design. So it doesn’t necessarily follow that you will be able to replicate the “Wilson Effect” by jerry-rigging some sort of alignment capability on your own loudspeakers, although, as I have mentioned in a previous post, simply tilting the speaker can have a surprising effect. I suspect the speakers have to be designed from the ground up to take full advantage of this design approach.