There's nothing wrong with that.Now imagine a heterogeneous sample: beginning (first seconds) is quiet, whereas the following part is very different.Suppose that most people will only rate the file on a small part (4-5 seconds). Suppose then that most people will favour the first thing they hear (beginning). Most, but not all... HE-AAC is very good on the beginning, but fail on the second part. Will the overall notation be representative? Isn't it better to provide to all people a short sample only?

If people will evaluate different part of one sample, it could be considered as evaluating two (or more) different samples (at least if a long sample propose some variety). But correct me if I'm wrong, the purpose of a collective test is to obtain results from different subjectivity evaluating the same thing (same sample, same listening material). We can't do that: people don't have the same hardware. But we can at least make one thing, and be sure that all people are listening to the same musical informations.

Regarding Vorbis, I would love if some Vorbis users could start a small listening test and compare AoTuV3 and Xiph 1.1 so that the better version will be used in this test.[a href="index.php?act=findpost&pid=284811"][{POST_SNAPBACK}][/a]

When the test is performed, I need to submit the newest experiment version. It is more clearly than aoTuV beta3 good with some samples (setting to the low bit rate).[a href="index.php?act=findpost&pid=284836"][{POST_SNAPBACK}][/a]

Were you planning on releasing a new version soon anyway? I wouldn't want you to feel rushed to get a version out the door just to be in time for this listening test...

At this moment, I'm extremely busy with real-life and work-related stuff, but next week I'll probably have some time to do a few Vorbis listening tests...[a href="index.php?act=findpost&pid=284838"][{POST_SNAPBACK}][/a]

I am able to exhibit the version corresponding to the range of 64kbps at least. I want it to be tested.

Another suggestion (related to the sample): instead of focusing too much on musical genre (metal - jazz - classical ...), I think it would be better to choose sample for the kind of signal they represent: loud - quiet - noisy - tonal - attacks...

When I sent to Roberto the very quiet sample called Debussy.wav, which had apparently nothing hard to encode, most people were at the end surprised by the poor performance of the champion (musepack). This sample revealed severe issues with musepack (even wma & atrac3 were better) at moderate bitrate. I know that some lossy encoders have serious problems with very tonal music (-> ringing); some other suffers with low volume content. There's also pre-echo...

One thing I'd like is to let encoders adapt to the content before the test position.Most encoders have adaptative thresholds, and so need a few time to adapt at the beginning. It means that a specific piece would not be encoded the same if it is at the beginning of the track or in the middle.I think that a 1s delay should be reasonable enough.

So would it be possible to:

*cut the first second of the decompressed sample?

or

*instruct the ABC/HR software to only allow testing past the first second?

I really don't see why LAME at 128kbps should be included in a 64kbps listening test, as it was the other time. Probably has something to do with the claim that WMA at 64kbps sounds "as good as" MP3 at 128kbps. I don't think many here believe that claim. In any case, IMO, a 64kbps listening test should only include music encoded at 64kbps. It is misleading to encode 128kbps in one format, and 64kbps in all the others. Everything in a 64kbps listening test should be encoded at 64kbps.[a href="index.php?act=findpost&pid=284822"][{POST_SNAPBACK}][/a]

A credible listening test should have a low and high anchor.[a href="index.php?act=findpost&pid=284834"][{POST_SNAPBACK}][/a]

What does that mean, a high and low anchor? I guess I really don't know what that means--it just seems strange, that for a 64kbps listening test, one of the formats would be tested at 128 kbps rather than at 64 kbps.

If it is to have a reference to compare to, then why not have one sample uncompressed, for listeners to compare the compressed versions with? (Perhaps that's already done. That makes sense, but I don't understand the " high and low anchor", I guess. Please explain.

Does "high anchor" always mean one format is tested at a higher bit rate than the others? For low anchor a lower bit rate? Will you test one format at 32kbps for the "low anchor"?

In the 128kbps listening test, was one of the formats tested at 192kbps for the "high anchor"?

The purpose of anchors is to bind the results to real world i.e. when you have an anchor your results won't anymore "float" in the air. When you have anchors, you can compare codecs which are featured in different listening test to each other to some extent.

1) the rating scale description should be changed to the "excellent" to "poor" labels; I already know this option exists, but it should be forced from the configuration file[a href="index.php?act=findpost&pid=285190"][{POST_SNAPBACK}][/a]

I'm not sure what exactly you mean by "forced from the configuration file". The custom rating labels can be specified in the test setup dialog and will be saved to the configuration file.

Quote

2) the start time should be forced to X sec into the clip without allowing the listener to hear anything before that time, also specified from the configuration file.[a href="index.php?act=findpost&pid=285190"][{POST_SNAPBACK}][/a]

The offset setting could be used for this. Just adding 1000*X to each of the offsets will have exactly that effect.

1) the rating scale description should be changed to the "excellent" to "poor" labels; I already know this option exists, but it should be forced from the configuration file[a href="index.php?act=findpost&pid=285190"][{POST_SNAPBACK}][/a]

I'm not sure what exactly you mean by "forced from the configuration file". The custom rating labels can be specified in the test setup dialog and will be saved to the configuration file.

Quote

2) the start time should be forced to X sec into the clip without allowing the listener to hear anything before that time, also specified from the configuration file.[a href="index.php?act=findpost&pid=285190"][{POST_SNAPBACK}][/a]

The offset setting could be used for this. Just adding 1000*X to each of the offsets will have exactly that effect.[a href="index.php?act=findpost&pid=285254"][{POST_SNAPBACK}][/a]

What I meant is that Sebastian should be able to create a configuration file that everyone uses, and which will control the rating labels. Doh, forgot about those offsets in the config file! That's the easy solution, of course.

I also think that 30s might be too long.Perhaps 6s is too short, but I think that 15s should be enough.

Letting testers deciding which portion to use is perhaps reducing "usefullness" of results. It is like they are testing different samples, but it makes correlation between results for the same sample harder.

If a sample has some quite different parts in a 30s set, then it could be intersting to split it into 2 samples, making interpretation of results easier.[{POST_SNAPBACK}][/a]

I've just uploaded an 18sec track [a href="http://www.hydrogenaudio.org/forums/index.php?showtopic=32691]here[/url] that I feel would be good for this test. I deliberated over which section to use and also how long that section was to be - the song is 21min long and has a lot of demanding parts. I think I chose the best part.

What I meant is that Sebastian should be able to create a configuration file that everyone uses, and which will control the rating labels.[a href="index.php?act=findpost&pid=285283"][{POST_SNAPBACK}][/a]

Yes, that is possible.

Quote

Doh, forgot about those offsets in the config file! That's the easy solution, of course.[a href="index.php?act=findpost&pid=285283"][{POST_SNAPBACK}][/a]

Heh. Yes, I was just about to get to work on the "new" feature myself, when I noticed it's not such a new feature, really.

For the high anchor, I would prefer Lame 3.97 (probably in abr setting) that will probably be at least in beta stage when the test should start.[a href="index.php?act=findpost&pid=285477"][{POST_SNAPBACK}][/a]

For the high anchor, I would prefer Lame 3.97 (probably in abr setting) that will probably be at least in beta stage when the test should start.[a href="index.php?act=findpost&pid=285477"][{POST_SNAPBACK}][/a]

just out of curiosity are you saying that some ABR preset in the new LAME 3.97 built might be better than -V5 --athaa-sensitivity 1 ?

--alt-presets are there for a reason! These other switches DO NOT work better than it, trust me on this.LAME + Joint Stereo doesn't destroy 'Stereo'

I did a small listening test for WMA9 encoders. As samples, I've used all selected by Roberto for his last 128 kbps Multiformat Listening Test.

Two important things:

• I didn't browse HA since last thursday (If decisions were made in this topic since one week, I wasn't aware)• this listening test was a very fast one. Too fast I would say. I didn't ABX anything; and I've probably miss some details.

• WMA9Pro is better, but bitrate doesn't tend to 64 kbps at -q10. WMA9Pro is nevertheless not that better.• Statistically, CBR 2 pass and VBR 2 pass are tied, but CBR 64kbps 2-pass appeared to be a bit more constant in quality than VBR at low bitrate.

As I said, I was not fully satisfied by this test (too fast, too imprecise). If the collective test doesn't start in the next days, I think I could test CBR and VBR again, without WMApro this time, and with ABX phase in order to be sure that difference were audible.