Machine #1 takes 1:23 to complete the task while machine #2 takes 0:50. This is completely counter intuitive to me since machine #1 bests machine #2 in all benchmarks, even single-threaded integer performance (24% faster). What am I missing? More importantly, is there anything I can do to speed up wavpack on machine #1?

EDIT - Just in case anyone was wondering, it's not a hard drive issue. Machine #1 has an NVMe drive and machine #2 is just spinning platters.

Re: Wavpack.exe Slow on AMD Ryzen

The issue, if I recall correctly, with Intel's compiler is that it will not generate MMX, SSE, etc. for any processor other than Intel, even if they support those instructions. It's not about how the source is written, but the machine code generated by the Intel compiler.

I'm looking through the sources right now to see if there's something checking for an Intel CPU instead of checking for MMX support.

On the i7 machine I'm getting 3.7 GHz during encoding. When I get to the other machine I'll get you the actual clock running the wavpack process. I don't suspect this is the issue since the benchmarks on machine #1 far surpass the same benchmarks on machine #2. If there were a boost clock issue I wouldn't think that would be the case.

Re: Wavpack.exe Slow on AMD Ryzen

It's not about how the source is written, but the machine code generated by the Intel compiler.

Let me repeat this once again because it clearly didn't register: the relevant part of the source code is written directly in machine language by hand. There is nothing for the compiler to compile. The code IS MMX code.

The 64-bit binary assumes MMX because that's guaranteed to be available in AMD64 mode. It has no need to detect the CPU vendor to know it can use MMX.

Re: Wavpack.exe Slow on AMD Ryzen

Let me repeat this once again because it clearly didn't register: the relevant part of the source code is written directly in machine language by hand. There is nothing for the compiler to compile. The code IS MMX code.

Re: Wavpack.exe Slow on AMD Ryzen

What's weird? Almost all audio encoders are single-threaded, AFAIK.So yes, wavpack is constrained by single-threaded performance.

It's weird that it's not even using 100% of the single core it's running on.

Anyways, I broke down and downloaded the 2GB of Visual Studio and built wavpack without the ASM. It STILL runs faster on this i7-4770 than the assembly-enabled version runs on the Ryzen 3990X ... by about 30%.

Re: Wavpack.exe Slow on AMD Ryzen

Set wavpack affinity for core #0 on the Ryzen 3900X and core #0 boosts to 4.5GHz while running wavpack. Still 60% slower than the i7-4770 running at 3.7GHz. CLock-for-clock that's almost precisely double the speed.

I tried the non-asm version of wavpack I built and it's even slower than the asm-enabled version.

Anyways, I broke down and downloaded the 2GB of Visual Studio and built wavpack without the ASM. It STILL runs faster on this i7-4770 than the assembly-enabled version runs on the Ryzen 3990X ... by about 30%.

Re: Wavpack.exe Slow on AMD Ryzen

What's weird? Almost all audio encoders are single-threaded, AFAIK.So yes, wavpack is constrained by single-threaded performance.

It's weird that it's not even using 100% of the single core it's running on.

Its because you don't understand what the numbers representYour CPU has 8 logical Cores100% of 1/8 = 12.5%So yeah its using close 100% of a Core but that's is only close to 12.5% of the CPU11.5% of you CPU = 92% of a core

So that does not seem weird at all for a single threaded process

Going back to your Ryzen. Some of the performance you are missing can also be coming from CCX jumping which results large cache latancy penaltiesTry using affinty to lock it to just one of your CCX's Avoiding CCX jumping has shown to give up to 23% FPS boost in games, so I would think it could give some boost for something even more CPU bound

The Ryzen CCX built up really hurts Thread performance if you don't take control of your threads

Re: Wavpack.exe Slow on AMD Ryzen

I understand what they mean. I think some people are misled because they're thinking 11.5% = 12.5%. That's like Intel floating point math. The i7-4770 is 4 cores, with 4 pretend cores. 100% utilization of a core should be at least 12.5%. 11.5% means it's not using 100% of the core. If there's no I/O bottleneck, what's the deal with the 8% speed penalty?

Going back to your Ryzen. Some of the performance you are missing can also be coming from CCX jumping which results large cache latancy penaltiesTry using affinty to lock it to just one of your CCX's Avoiding CCX jumping has shown to give up to 23% FPS boost in games, so I would think it could give some boost for something even more CPU bound

The Ryzen CCX built up really hurts Thread performance if you don't take control of your threads

That's why I was saying I locked the affinity to core #0 to no effect. The fact that it's close enough to exactly 2x clock-for-clock to be within the margin of calculation error got me to thinking about the way the first generation Ryzen processors did two AVX128 to simulate AVX256, which incurred a 50% speed penalty relative to Intel CPUs with AVX256, but this is different. This is the first program I've noticed this issue with, so there's something peculiar about that loop that really hurts on Ryzen for some reason, even with double the RAM at twice the speed.

Re: Wavpack.exe Slow on AMD Ryzen

I understand what they mean. I think some people are misled because they're thinking 11.5% = 12.5%. That's like Intel floating point math. The i7-4770 is 4 cores, with 4 pretend cores. 100% utilization of a core should be at least 12.5%. 11.5% means it's not using 100% of the core.

That is normal sampling error. If you want the exact stats you'll need to log them yourself using a profiler.

Also, you don't have 4 real and 4 pretend cores. You have 4 cores, each of which is 2 threads wide.

Re: Wavpack.exe Slow on AMD Ryzen

If I profile wavpack release version, I don't get the function names, but if I profile a debug build it's like 75% slower and might be masking the speed issue. Not sure which is better. I'm not knowledgeable enough to interpret the results, but this is what I see profiling the release version on the i7-4770:

Re: Wavpack.exe Slow on AMD Ryzen

It occurs to me that this information may not be helpful. The 3900X could be 10X slower but still spend the same percentage of time in the same loops. Is there a way to quantify the actual number of instructions or the amount of time to execute an instruction using this profiler? Is that what the numbers next to the percentages show? For example, the 18,656 next to the pack_decorr_stereo_pass_cont_common: Is that the number of times that particular loop executed in the 30.685 seconds of profiling (i.e., 608 per second)?

Re: Wavpack.exe Slow on AMD Ryzen

It occurs to me that this information may not be helpful. The 3900X could be 10X slower but still spend the same percentage of time in the same loops. Is there a way to quantify the actual number of instructions or the amount of time to execute an instruction using this profiler?

You would need a lower level profiler (or at least I don't think VS can do it but I could be wrong) like Intel's Vtune or AMD's CodeXL. The problem with that is that you'd have to use each vendor's tool (most likely) since you want access to low level timing information and hardware counters.

However, your profiling above says that essentially the entire runtime is spent in that one bit of hand written MMX. Unfortunately since it looks like the linker inlines, you're not able to see finer detail. Have you tried profiling the version with MMX disabled and just letting the compiler generate its own code?

Is that what the numbers next to the percentages show? For example, the 18,656 next to the pack_decorr_stereo_pass_cont_common: Is that the number of times that particular loop executed in the 30.685 seconds of profiling (i.e., 608 per second)?

That is the number of milliseconds spent in that function and anything it calls.