I've read a few contentions against lossy compression and another against many ABX test. I'm wondering if there is any established data to confirm or refute them, or if these factors have been taken into account in the development of LAME.

** The response of different speakers to the same power amplifier signals varies more or less everywhere across the frequency spectrum.** The audio delivered to the ears from any given speaker system varies considerably from room to room due to room response and speaker-room interactions (not to mention the position of the listerner in the room).

Both of these are true and can be objectively measured. The contention is that psychoaccoustic models, where they are based on masking, are thus not completely valid except for the exact listening circumstance under which they were developed. This is because masking will differ somewhat under the different listening conditions, making transparency difficult or impossible to achieve in everyone's living room.

It seems well established that the listening room must be properly compensated to avoid adverse effects on what is heard. ‘Masking" of a type takes place at various frequency ranges in most "raw" rooms due to enhancements and cancellations from room reflections.

It is also well established that the majority of home music speakers, even rather expensive ones, are designed to deliver something other than exactly what is fed into them. This is why professional monitors exist.

The question is whether or not these factors are taken into account in encoding. If not, then the audiophile contention, that even the best mp3s are not transparent if the equipment is good enough, may have a basis in fact. Of course, in principal, ABX testing could be applied to investigating these claims. Have they? I suspect that most people who make such claims believe the differences are so apparent that further testing is irrelevant. It would probably take a skeptic -- with access to such equipment -- to make such tests.

The ABX testing "difficulty" contention is that the auditory system suffers from reduced discrimination ability after some time due to fatigue, and that this time can be as short as 20 minutes. A single ABX test probably most often takes less time, but more extensive tests probably run much longer. Is this a know factor? Is it considered relevant to encoder development?

Pretty deep post there. I think the point of ABX testing is that: all things being equal (except the tester), the files should sound the same.

It seems to me, however, that it would be impossible to account for every variation in listening setup in psychoacoustic models. Thus, perhaps this conundrum is one of the primary reasons for lossless, since the output audio from the file itself is exactly identical to the original.

Personally I don't buy the "fatigue" concept. When I ABX successfully, it's because I can pick a certain point(s) in the tracks where they actually sound different. I'm not guessing. Thus, it doesn't matter how many times I listen, as long as I focus on that point I'll catch the difference and the test will be a success. If you're referring to ABXing several tracks in row, however, I agree - that's something I never do because the cognitive overhead of pinpointing the difference in the first case can be tiring.

The ABX testing "difficulty" contention is that the auditory system suffers from reduced discrimination ability after some time due to fatigue, and that this time can be as short as 20 minutes. A single ABX test probably most often takes less time, but more extensive tests probably run much longer. Is this a know factor? Is it considered relevant to encoder development?

If this was correct it would be correct for the non-ABX test as well. Which would make your argument irrelevant.

ABX is king. It is used to eliminate the placebo effect from many important scientific research. Like the ones that determine the efficience of certain drugs in fighting certain deseases. Apply ABX to audio and you will find out that the large majority of people are unable to differenciate a Lame mp3 file 16 bit at 256 kbps from any other "superior" source. Have you tried yourself?

QUOTE (AndyH-ha @ Apr 8 2007, 16:49)

I've read a few contentions against lossy compression and another against many ABX test...If not, then the audiophile contention, that even the best mp3s are not transparent ...

You are hanging out with the wrong crowd. That's why you are coming out with those illogical ideas.

The contention is that psychoaccoustic models, where they are based on masking, are thus not completely valid except for the exact listening circumstance under which they were developed.

AFAIK, psycho-acoustic models are not developed under any listening circumstances, but with scientific data about the inner workings of human hearing. They are based on what (most) humans can't hear, not on what some developer's speaker system and room conditions sound like.Of course, some otherwise audible artifacts can be masked by cheap speakers, but if your hearing is not capable of recognizing a certain artifact (a lowpass, for example), even the world's best listening equipment won't do the trick. Cheap equipment only increases the masking threshold further. That audiophile blabber intends to justify expensive "high-end" record players and other stuff IMO.

Even the best MP3s are not transparent if the listener is good enough.

Fatigue Fatigue and acclimation are common for other senses; they probably occur for hearing too. This has nothing in particular to do with ABX tests. If it happens, it would effect one's ability to make some auditory discriminations, relative to how able one is before it happens. Logically, if it happens, it probably mostly effects the ability to hear more subtle aspects of the sound.

If it happens, it would impact blind A/B tests that attempt to discriminate between sounds that don't differ very much. Since this is frequently the use of blind testing in audio, it would be significant. Small differences that are audible enough before fatigue sets in could become inaudible after one's hearing becomes "fatigued."

I run across subtle sounds when working on my recordings that I can't hear every time I replay the section, even when I'm listening specifically for what I just heard seconds ago. I don't think this generally has anything to do with fatigue, if is just a characteristic of hearing. I don't think I am any different than other people in this respect (think about straining to hear that soft but unpleasant sound in the darkness, the one you though you heard just a moment ago).

I suspect whether or not there is such an effect is well know from psychological and physiological research, but not by me. So, does anyone involved in LAME's development and testing know if there is such a hearing factor?

Audiophile Imagination The equipment does effect some of what one hears. Headphones, even very good ones, can not provide the perception of ambience and sound stage that is normal with a proper stereo speaker setup. Neither can the stereo speaker setups that I've seen in most people's homes. There, speakers are put wherever the rest of the room's furniture and decor provided a hole, with no consideration of proper stereo listening.

Lack of those aspects might not properly be called artifacts but, if the recording is capable of providing them, their lack can certainly be called a defect. Uneven speaker responses and poor room acoustics certainly effect them. This is physics, not audiophile religious doctrine.

Effects on Masking Masking occurs when a sound at one frequency is yea so much louder than a sound at a nearby frequency. How much louder, and how near, depends upon where is the audio spectrum we are at the moment, and perhaps upon other factors. So, here is our audio file and right there, in it, is an instance of where masking will occur. Sound A will mask sound B. The encoder therefore reduces the resolution of sound B under the assumption that people won't hear it anyway.

But, it seems reasonable that there has to be some kind of assumption involved about how people will listen to the compressed audio. Most speaker systems do not have a completely flat response. Room acoustics generally make the situation worse. It certainly seems possible that, in a particular listening room, the acoustics will conspire to make B audible in the uncompressed audio. If so, the lossy compression would probably sound different and thus be distinguishable from the source during blind testing. What conclusion does the data support?

Other ConsiderationsFurthermore, since aspects of audio such as ambience, sound stage and imaging are real, it would seem they need to be considerations of encoder development. The question is: have these aspects been reasonably well tested under blind A/B setups? Is there data that says they are, or are not, adversely effected by LAME encoding? No one who uses a listening setup that does not reveal these aspects well could tell if the mp3 were deficient or not.

The contention is that psychoaccoustic models, where they are based on masking, are thus not completely valid except for the exact listening circumstance under which they were developed.

AFAIK, psycho-acoustic models are not developed under any listening circumstances, but with scientific data about the inner workings of human hearing. They are based on what (most) humans can't hear, not on what some developer's speaker system and room conditions sound like.Of course, some otherwise audible artifacts can be masked by cheap speakers, but if your hearing is not capable of recognizing a certain artifact (a lowpass, for example), even the world's best listening equipment won't do the trick. Cheap equipment only increases the masking threshold further. That audiophile blabber intends to justify expensive "high-end" record players and other stuff IMO.

Even the best MP3s are not transparent if the listener is good enough.

I was nodding with you all through the post but your last sentence made me go . Who are these listeners who can ABX all MP3 encodes?

The ABX testing "difficulty" contention is that the auditory system suffers from reduced discrimination ability after some time due to fatigue, and that this time can be as short as 20 minutes. A single ABX test probably most often takes less time, but more extensive tests probably run much longer. Is this a know factor? Is it considered relevant to encoder development?

If fatigue does happen, would it also mean that when you actually listen to music, the benefits of a really good reproduction system would only last for the first ~20 minutes? If that were the case, and the news got out, it would be bad news for the spurious end of the hi-fi market.

Yes, if such fatigue is real, it may well effect all "serious" listening. If may, however, only come into play significantly, or have it effect so quickly, when one is really concentrating one's attention on a "serious" task, such as attempting to hear small differences that one can't even be sure exist. During relaxed listening for pleasure, even if one's attention is focused on the music, and not wandering all over the place, the effect might be much less severe due to less stress. I have no data with which to do more than speculate.

If true about the "stress" aspect, a person might learn to do "intense" listening,such as ABX testing, without much fatigue through learning to stay relaxed and emotionally unattached. However, most people are unlikely to achieve that state.

You can quickly (seconds) switch between files when ABXing. Are your "senses" going to be "tired" for every single ABX you perform?!

QUOTE (AndyH-ha @ Apr 9 2007, 00:04)

The equipment does effect some of what one hears. Headphones, even very good ones, can not provide the perception of ambience and sound stage that is normal with a proper stereo speaker setup...

So go ahead and do some ABXing with your full audio gear! And let us know if you can ABX above 128 kbps instead of telling us what "could" happen. To perform an ABX between 2 computer files is pretty quick and easy. There is nothing standing in your way!

QUOTE (AndyH-ha @ Apr 9 2007, 00:04)

Furthermore, since aspects of audio such as ambience, sound stage and imaging are real, it would seem they need to be considerations of encoder development...

Once again. Go ahead and ABX with your stereo/surround and give the developers a feedback. In the extremelly remote possibility of you being able to find any difference you would be helping develop the enconders instead of creating or feeding superstition.

I was nodding with you all through the post but your last sentence made me go . Who are these listeners who can ABX all MP3 encodes?

If the encodes have the so-called 'killer samples', then MP3's at any level of quality is ABX-able.

But no matter what, audiophools should at least realize that what they ought to be striving for is enjoyment, no mattter what the source of the sound is. Listen to lossily-encoded tracks without prejudice. and I do believe there is no difference in enjoyment between lossily-encoded tracks or CD reproduction.

ABX is good if you want to ascertain that an artifact you hear is really there. The question is: Does an audiophile a normal human being scan the whole track seeking artifacts? I don't think so.

Most often audiophooles complain of 'damages to stereo stage', 'loss of warmth', or whatever quacko their brain happen to manufacture at the moment. Nowhere I see an audiphoole claiming of 'hearing an artifact'. In other word: placebo effect.

I do strongly believe that whatever deficiencies audiophooles blame on lossy-encoded audio is make-believe, due to the psychological need of them having to justify spending a lot of moolah on overpriced sound systems.

You can quickly (seconds) switch between files when ABXing. Are your "senses" going to be "tired" for every single ABX you perform?!

You can quickly throw individual stones at a target. Eventually fatigue reduces your accuracy at hitting the target. The hearing fatigue contention has nothing to do with the individual samples, it has to do with the cumulative result of extended use. The mechanical part of your hearing becomes tired, reducing its ability to respond, or the glucose level in your brain gets used up, reducing your ability to concentrate adequately, or a neurotransmitter supply gets used up, reducing the ability to process certain kinds of input, or ... .

I have not made any claims or perpetuated any myths. I asked some logical questions, the answers to which need to be based on data. I asked if anyone knew of relevant data. Certainly much testing data has been collected over the years in the process of developing LAME, but I don't know everything that has been investigated. My explaining what is involved in the questions does not constitute championing of the claims that have been made.

Obviously no one who has so far responded has any idea about the data. It seems clear to me that I am not the source of any prejudice that may be expressed in this thread. Perhaps someone who has worked on LAME will eventually wander by and shed some light.

Do I have the resources to do the original research? Certainly not everything that could be involved. Repeating tests, to verify or refute, is always valuable in science, but me doing the tests could not answer my questions about what has been considered and utilized in developing LAME.

If you were to think of the audio system as two parts; The 'A' chain: everything up to the equaliser on the amplifier and the 'B' chain: everything from the equaliser on.

If you consider that the listener has set the 'B' chain to their preference, then all sources should have a similar sound in the given listening environment. So the object becomes to make compressed signals sound as close to the original as far as the 'A' chain is concerned. Thus 'B' chain fatigue or inaccuracies are common for all sources.

I was nodding with you all through the post but your last sentence made me go . Who are these listeners who can ABX all MP3 encodes?

I don't know any - well, maybe guruboolez - but that last sentence was merely an altered quote from the OP. I tried to emphasize that artifact masking depends more on the listener than on the listening equipment.

Do I have the resources to do the original research? Certainly not everything that could be involved. Repeating tests, to verify or refute, is always valuable in science, but me doing the tests could not answer my questions about what has been considered and utilized in developing LAME.

If you have a stereo system and a computer you have the resources. There are (free) software out there that allows you to perform this analysis quite easily. And regardles that should be in you the onus of proof (once you are chalenging the fact that a Lame mp3 file just above 128 kbps is indistinguishible from a "superior" source.) I will run the tests you are asking for some day this week and post the results on this thread. Why don't you do the same so we can compare results?

I did not challenge anything. I ask some questions about what has been done during the development of LAME, whether or not certain aspects with some potential for problems has been addressed. I attempted to explain the logic of why those seem reasonable questions.

Only someone who has adequate listening conditions could investigate these matters, that is, the ones dealing with ambience, imaging and soundstage. Simple possession of a "stereo system" is not necessarily adequate. Does your system provide good, stable placement of musicians in a jazz ensemble? Does that include depth as well as width? Does it reproduce realistic ambience for recordings that contain such material? If so, it may be adequate. It seems, to me, difficult to judge if the room acoustics have not be set up by someone well qualified for the task.

All psychoacoustic-based lossy encoders are using the assumption that the output system frequency response should be flat.

If there is small deviation (ie real life conditions), then it should be covered by the safety threshold of your encodes. That is why many people are encoding tracks using an higher quality level than the first one providing transparency for them.Example: you are unable to discern encoding at -V4, but choose to encode at -V3.

If there is more than a small deviation, then the sound reproduction system is considered to be broken (sometimes intentionnaly). Psychoacoustic-based encoder are not designed to be used over such systems.Example: using a kind of "stereo-enhancer" on your reproduction system.

If your sound reproduction system deviates significantly from flat response and you compensate with equalization, or if you simply apply equalization so that it sounds "better" to you, then the psychoacoustic model in a lossy codec may be inaccurate. In this case it might be better to apply the equalization BEFORE lossy compression so that the psychoacoustic model will be more accurate for your particular situation.

Speakers regularly vary by 10-20db in their frequency response across the spectrum. Does the encoder ATH model have enough margin for that? If not, how would you possibly expect to correct for it? Room correction won't correct for all listening positions.

All psychoacoustic-based lossy encoders are using the assumption that the output system frequency response should be flat.

My impression is that LAME development has been empirical, it has been fined tuned over and over based on ‘listening tests.' No?

Music is also mixed and mastered through listening tests, according to a well developed model. Professional monitors cost a lot because they are expensive to build, not because it is a snake-oil, whatever the market will bear, business. The near-field use of those monitors, in a properly treated room, is based on sound theoretical and empirical reasons. It is through development on this kind of system that the finished product is most likely to sound good on the widest range of real world playback systems.

I've seen many comments in this forum about what to use for one's personal testing. Comments along the line of ‘cheap earbuds that came with your walkman' have often been offered. For personal tests, to determine what encoding one wants to use, using whatever equipment you normally use to listen to music, makes good sense. For development of the codec itself, however, that does not seem reasonable.

Most people have no means to set-up a good near-field monitoring situation, nor a ideal living room listening setup. However, since there are significant differences in how at least some things sound on such systems, it seems to me that critical development would have to be done on a good monitoring setup. The only way anyone could reasonably claim otherwise is by collecting enough data, for enough people, using both "proper" and "other" setups, to support the nay contention.

I have no idea what in fact has been done with LAME development, that is why I ask. I am not saying LAME is deficient, I am merely asking how much these particular factors have been considered and proved. Is there data to say they are indeed well provided for or is there an underlying assumption that, since the majority of people will never hear recorded music played back under "proper" conditions, these factors don't matter?

I guess the basic problem is that there's no concrete evidence saying that this is a problem with real situations. By that, I mean that if this were a problem, then lowering the ATH would result in real quality improvements for most people, even though each frequency's ATH change would only affect those people with speakers whose response peaks at that frequency.

But that's not what we see. In fact, from what I understand, an incorrect ATH is pretty low on the list of possible encoder issues. This may be simply because the ATH numbers might be pulled from the literature, which tested hearing in anechoic chambers etc. While a real listening environment is so much noisier that even if a speaker peaks 20db up, you'd still be under the ATH imposed by the environment.

Long story short: Speaker response deviations could be an issue, but as long as the deviation is smaller than the difference in background noise between the listening environment and the environment used to measure the ATH, it should not unmask artifacts at those frequencies.

I have no idea what in fact has been done with LAME development, that is why I ask.

From this, I understand that they took the ISO reference psymodel, which I assume was developed on "good" monitors, and then this model was improved simply by listening tests from people like you and me. And I think this is not the worst method to optimize an encoder.

But remember I read sth. about another psymodel called nspsytune. Perhaps somone could take the lime to enlighten me about that.

Pretty deep post there. I think the point of ABX testing is that: all things being equal (except the tester), the files should sound the same.

The idealistic model human does not exist - at all!

QUOTE

It seems to me, however, that it would be impossible to account for every variation in listening setup in psychoacoustic models.

Exactly. Variations all around in abundance - hearing-abilities, speakers, room-acoustics, DSPs...... i fail to see how those factors differ in any way, regarding the topic at hand here. Any psychoacoustic compression has to have a reasonable safety-margin anyways. And it has to accept, that absolute perfection is impossible. There will be exceptions - the point is to make them so rare, that psychoacoustic lossy compression is "good enough" in the majority of cases. Thats why there is no such thing as "absolute transparency".... when i read posts on ha.org which say things along the lines of "transparency is binary", i can only smile and dismiss such naivity and lack of understanding about the background-theory behind psychology and even plain simply how critical realism works overally. Lossy compression is not perfect and there is no need for perfection - if you need something perfect, then use lossless - lossy compression is meant to be "good enough" in *almost* every case regarding normal listening. It has to be "good enough", not perfect. This is why neither so called "killer-samples" nor other extreme and unusual conditions show that an encoder failed - because such extreme and unusual conditions weren t the target anyways.

Last point first: Clearly a particular lossy encoding can be transparent to a particular person. Most of the factors are gaussian distributed, meaning that a lossy encoder setting which is tranparent for all samples for all people is probably impossible. However, getting faults down to, say, one in million must be possible. I'd suggest that we're there already.

Given that the hearing and masking curves for people may follow an approximately gaussian distribution, then as Lyx says it makes no sense to talk about a single perfect psy model, and any encoder which appears to work for most people must be using a psy model which keeps more than most people need.

This also means that speakers, rooms and EQ will have to boost masked frequencies greatly before they become audible for most people.

And, as Grabriel mentioned, most people here are aleady encoding beyond what usually necesary in order to minimise problem samples. This means most tracks are very over encoded, so the subject of this thread is a none issue.