I tried to encode and decode several different WAV files from CD rips with WavPack (version 4.60.1) in hybrid mode, on Arch Linux, on two different machines (Intel Core i7 and Xeon, x86_64). I always got the same result: the files are heavily distorted (groups of samples repeating themselves), and obviously non lossless. See samples.

original md5: e180f2ec2c4c65fd214bc6c4f70c7988 unpacked md5: 2d5edeb0d3d7ba971282428b77fd41c6 MD5 signatures should match, but do not!

I tried the win32 binary, which worked fine (lossless result). I don't have another Linux distro installed anywhere to see if this problem is distribution-specific. I also tried three different builds (including one built from source by myself), always with the same result.

I confirm the bug. I got non-matched MD5s (albeit the corrupted one is different from what's reported here).I confirm that FLAGS set in the build environment are not the cause.I confirm that this is not a compiler-specific bug (both gcc and clang yield the same result).

What I discovered is that building with "gcc -m32" or "clang -m32" fixes the bug. So this looks like an architecture-related bug.

The bug itself appears not to be architecture-related, although due to implementation differences in glibc, it manifests itself (on Linux) only on amd64.

For those interested, memcpy() is being used to copy overlapping regions of memory. The result of such a copy is undefined, which means it might work, but it doesn't have to. glibc on x86 appears to copy in such a way that corruption does not occur, but on amd64 it does not. This is almost certainly because optimized versions of memcpy() are being used, and differences in the architectures mean differences in how to best implement the function. The fact that the C standard makes the copying of overlapping areas undefined allows for these optimizations, resulting in faster code. There is another function, called memmove(), that can copy between overlapping regions.

I've sent a patch off to David, assuming he hasn't already seen this thread and fixed the problem.

Thanks guys! I have checked Angenial's patch into SVN and will post something to the mailing list.

I didn't read the whole redhat thread, but I agree with Linus that changing memcpy() to copy backwards is silly, especially if it isn't any faster and breaks things. Of course I know that memcpy() is not guaranteed to work with overlapping regions, but I'm sure that when I did that I was just lazy and assumed that I'd be okay if dst < src. Won't happen again!

I guess Iíve sort of dropped the ball on this. I am always hoping that Iíll get a chance to do a major release soon and so I can skip a minor release for just this fix, especially since it doesnít affect too many users (Linux, 64-bit, and a mode which isnít even supported in gstreamer). But I have accumulated some other small fixes and itís probably time for a minor release just to avoid the appearance of abandonware. Iíll start looking into this and shoot for sometime later this month.

...itís probably time for a minor release just to avoid the appearance of abandonware

No way, sir! The idea of WavPack being abandonware in the lossless circuit is laughable =D

I think WavPack is the most versatile of its kind, despite that I have a lot of favorite audio codecs (both lossless and lossy).

Putting all my cards on the table, my initial peeve with WavPack was separate binaries for encoding and decoding. I am sure there are bountiful reasons for this and I was able to compensate due to my continued interest.

I apologize if I interrupted the discussion (I personally run Windows and use .WV primarily for 32-bit float) but I could not resist to extend another "thanks" to David. IMO WavPack will be around for a long time

Thanks for the link to the SVN and instructions. I complied and installed and i still get the same stuttering and false starts. I double checked to make sure i was using the correct binary. Might be something else on my system.

Thanks for the link to the SVN and instructions. I complied and installed and i still get the same stuttering and false starts. I double checked to make sure i was using the correct binary. Might be something else on my system.

I assume that this bug can affect both the command-line programs and the installed libwavpack (which, for example. gstreamer uses). Do you see the problem when you decode from the command-line (and did that change when you built and installed the new version)? Or is this something that only happens in a player? If it only affects hybrid lossless (wv + wvc) and sounds like the sample at the top of the thread, I'm sure it's the same issue.

Thanks for asking. It happens when using command line tools using both the package compiled from the gentoo repos and the package i compiled manually from svn. If i take a wav file and encode it using wavpack -b400 -c -m hybrid.wav (to use skamp's example for testing purposes) and then play the resulting file in any media player i get the stuttering and distortion. I've tried mplayer, mpc, and gstreamer just to try different libraries with it.

If i then decode the wv+wvc back to wav and play the wav in any of those it displays the same behavior. Sometimes the stuttering starts right away and sometimes it's seconds into the track but the stuttering is exactly the same in both the wv+wvc and the decompressed wav. I don't know that mplayer is actually utilizing the wvc file but mpd is and the same things happen in all 3 players.

What confuses me is that i've been using wavpack-4.60.1 since June 5th and this problem only started occurring on November 28th. The only thing that i'd updated on my system when it stopped working was virtual/libiconv-0. That's a virtual package for gnu charset conversion for libc. What that would have to do with it, i've no clue. It possible that i rebooted the pc and something that was updated a while ago got read into memory (i don't reboot for months at a time usually). The previous kernel i was using was 3.4.9 from august 27th, though on nov 21st i did go from linux-headers-3.4.r2 to linux-headers-3.6. I have recompiled the repo's wavpack package a few times and done a revdep-rebuild which checks for packages that need to be recompiled due to changed libraries and headers to no avail.

Well, the first thing is do you get the same MD5 mis-match message that skamp reported, no matter which version of wvunpack you use? And I assume that when you do a regular lossless encode, everything works fine? Also, have you tried to run the version you built locally directly from the build directory (with ďcli/wvunpackĒ from ďtrunkĒ)?

I assume that when you play in the players, everything works fine if you move or rename the .wvc file, right?

At this point, the only thing I can imagine is that you arenít really using the new library,. That seems much more likely to me than that youíve discovered a new bug. Unfortunately, I am not real clear on exactly how Linux libraries work, so I canít offer solid advise on checking or verifying that...perhaps someone else can chime in on the proper procedure.

Got it! Thanks Bryant and Mr_rabid_teddybear for your assistance. The svn compile was throwing things into the wrong directories. After i figured out what the correct config flags should be all was well. Now i'm having an issue with files that have covers in the tags but that's another subject for another thread.

By the way, this kind of bug calls for generating internal MD5 hashes systematically (wavpack -m). When I create a Wavpack file without that option, and the memcpy() bug applies, and then verify the file's integrity with "wvunpack -v", the result is a false positive (wvunpack reports the file is OK, when it's really not). Same with foobar2000, since it doesn't have an internal hash to work with. I suggest that the MD5 hashing option be made a default, like with FLAC.

Yeah, the problem with this bug is that the issue was not in the core encoding and decoding (which is verified for every block with a fancy checksum). Instead, the data corruption happened before the data even got to the compression, so only an MD5 would catch it (and would not have caught it in lossy mode, BTW). Fortunately, this is the kind of bug that gets caught quickly because it's not triggered by some weird rare audio data; it happens every time and is very obvious. Nobody is likely to encode a lot of audio before they discover it.

In response to this I added the verify option (-v) to the 4.70 encoder which uses an MD5 (even if not selected with -m) to verify the actual written file. I'm not sure I would recommend this for daily use (it can double encoding time) but it's certainly a good way to build confidence in a setup and it's probably a good idea when using the new transcoding feature. Of course, with a high -x value, it's basically free.

What I would really like is to have a batch test built into the build that would exercise the encoder in verify mode to catch architecture-related issues up front. That's on my to-do list...

Another shameless plug: caudec SVN (upcoming version 1.7.0) can compute hashes (MD5, CRC32, SHA1/256/512) independently from the decoded WAV file (raw PCM, actually) and store them as metadata (caudec -H). It will use those when checking file integrity (caudec -t), i.e. it will decode lossless files, compute the decoded WAV file's hash, and compare it to the one stored as metadata. If decoding fails (the decoder exits with a code that's not zero), or if the hashes don't match, caudec will report an error.

Since the hashing process is completely independent from all codecs, it should be immune to such bugs. I have to say, while the bug is clearly audible in this instance, it's making me a bit paranoid. It's what prompted me to use caudec's hashing facility when testing file integrity.