The newer versions of ALSA (the Advanced Linux Sound Architecture) enable software mixing by default. All sounds which are played are converted to 48kHz (by default) and mixed in software. When I installed the new Ubuntu and it was set up this way, I was somewhat concerned about the sound quality implications of this setup. I dove into the ALSA source and discovered some very interesting things.

The first issue is the actual mixing. This is done with 24 bits of precision (32 bits are used internally, but the lowest 8 are used for saturation). The algorithm seems to be fairly good and I don't see any major quality implications are using it.

The second (and much bigger) issue is sample rate conversion. Obviously for software mixing all the signals need to be converted to a single rate. This rate, by default, is 48kHz. The algorithm used by default is a very rudimentry linear interpolation algorithm. This algorithm is fast but very low quality and is likely to audibly degrade the sound quality of 44100 material played through a mixed device. I tested this SRC using a loopback cable and some generated square, sine and triangle waves of various frequencies. It appears to add harmonic distortion of around -10dB in some cases and about -20dB on music. I'll post some graphs when I get home this afternoon.

There are a number of solutions to this problem.

1) Install the libasound_module_rate plugin and set defaults.pcm.rate_converter "samplerate_best" in your .asoundrc. This plugin uses the SRC_SINC_BEST_QUALITY algorithm from libsamplerate and seems to offer excellent quality. The problem here is that most distributions don't package this plugin and it is only included with very new versions of ALSA. I would say this is the ideal solution and it would be nice if the ALSA developers would make this (or at least libsamplerate's SRC_SINC_FASTEST algorithm) the default.

2) Set the dmix plug to use 44.1kHz. This is great if you are mostly listening to 44.1kHz material, but not so useful if you are listening to a mixture of 48kHz and 44.1kHz material.

3) Create seperate plugs for 44.1kHz and 48kHz material. This works fine if you don't want software mixing to work when playing both types of material simultaneously. A good setup here would be a 44.1kHz plug for used by default and by your MP3 player and a 48kHz plug used by your movie player. System sounds and other stuff can use either (I don't generally care about sound quality when watching Google Video content, for example).

ALSA by default seems to offer very dissapointing sound quality, but it is possible to set it up to offer much better quality. I will post a detailed guide in the next couple of days to explain how to set up your Linux machine for better playback and capture quality.

Update: I have posted about this issue to the alsa-devel mailing list and received a number of good responses. Hopefully there will be enough interest among alsa developers that this issue will be fixed soon. I see it as a bit of a showstopping bug in ALSA and hope to communicate that to the list.

One of the more interesting replies I got was:

QUOTE

The problem is actually far more complicated than you described. A fair amount of ALSA core will have to be modified to really fix the problem. The main problem is the buffer and period sized as they pass through dmix. For example, if the sound card hardware is running with 1024 samples per period at 48000, and an application wished to use a sample rate of 44100, the application should really get 940.8 samples per period. That is nor possible, so the application gets 940. dmix then tries to convert 940 samples to 1024 samples ready for the hardware. So, even if the dmix algorithm used the super high quality sample rate conversion function, the actual rate change applied would still be slightly off.

To which I replied

QUOTE

In the scenario that you presented the application is told that there are 940 samples per period. ALSA, internally, converts these to 1024 samples per period at 48kHz sample rate. So the material (sampled at 44100Hz) is played back at a rate of 44062Hz? I don't know enough about the field to know if this error (0.08%) will be audible.

While this isn't, in my opinion, as big a problem as the bad SRC, it is one more problem with dmix and a reason you shouldn't use it for music playback (for now, at least) until this problem is fixed. Without dmix, ALSA's sound output quality isn't a problem - so don't panic about it.

Does anybody know how, for example, Foobar 2000's DSPs handle this problem of chunk size mismatch? Or do they not use chunks?