I was reading you could take 2-3 of those Nvidia Tesla cards and build a super computer that would rival a million dollar rack system. You would use a server type Mobo with like 64 gigabytes of RAM, and 2 intel Zeon processors. The throughput is just ridiclous, look at these specs! I realize it's a 5k PC we're building but damn! That's what I call getting it done!

I was reading you could take 2-3 of those Nvidia Tesla cards and build a super computer that would rival a million dollar rack system. You would use a server type Mobo with like 64 gigabytes of RAM, and 2 intel Zeon processors. The throughput is just ridiclous, look at these specs! I realize it's a 5k PC we're building but damn! That's what I call getting it done!

If you just want to do audio, the GTX from Newegg is ~$220 and a little faster then that model. I've got one in my PC they're pretty nice.

Apologies if this is not the right thread, but I couldn't find any that looked like a better place. I would like to request support for multichannel (at least up to 8) in flacuda. It doesn't seem to be there now - I tried feeding 0.6 a 7.1 multichannel flac for re-compression and it threw an exception saying that the input was an invalid flac file.

Multichannel support is important to me because I rip all of my blurays and re-encode the lossless audio tracks (truehd/mlp, dts master audio and raw lpcm) to flac. The flac encoding accounts for at least half of the wall-clock time of the rip (typically 15+ minutes just for the audio re-encoding). Occasionally I'll rip a DVD-A with 5.1 audio to flac for easy playback in foobar too and that can also take roughly the same amount of time as a bluray.

I haven't looked at the source for flacuda, but I was wondering if doing multichannel might be able to get a super-linear speed-up due to the parallel nature of gpus - similar to the way that Kaspersky is doing massively parallel pattern matching on the gpu to speed up their anti-virus scanning.

Since i read more then once some people donīt trust in GPU encoding cause of errors that can creep in i tried to force Flacuda to do so.My 24/7 overclock on my 260 is 648/1100 at lowered voltage. Here i never got an error since i use Flacuda.So i tried 684/1161, this is were gaming may hang or crashes to desktop after a while at this voltage. I encoded ~20 albums without a problem.Now i tried a surely unstable overclock of 725/1242 but only tried 5 albums in a row cause i donīt want to fry anything. All 5 albums encoded without a problem.

I have the imagination using Cuda isnīt really stressing my card in any way.

Now i have 2 questions. How secure is the verify implementation? Are these fears of highered chances for errors in the data justified in any way for Flacuda?

Multichannel support is important to me because I rip all of my blurays and re-encode the lossless audio tracks

I'm afraid those multichannel tracks also have higher bit depth, most probably 24 bits per sample instead of CD's 16 bits per sample.That might be a problem, because those require 64-bit integer arithmetic at some point, which current GPU's aren't very good at.I'm not saying it can't be done, but the encoder might be very slow.

Now i have 2 questions. How secure is the verify implementation? Are these fears of highered chances for errors in the data justified in any way for Flacuda?

Verify is secure. It decodes each frame (on CPU) and compares each sample with original, so we can be sure that the result can be decoded into original at least using this decoder.

The fears of GPU errors aren't really justified in general, and are completely unjustified for Flacuda. It uses GPU to calculate best encoding options for each frame, but generates the output on CPU, so even if GPU would produce an error, this would only result in slightly lower compression.

I do compare the original wav to the one decoded from the flac file just to be sure and so far i had no checksum mismatch between the files.My laptop does become really laggy (visual lag of the aero UI) while encodning with flaCUDA but at least i'm now compressing in about half the time for the same filesize Thank you for writing this tool.

2 things that i miss are the --cuesheet and the --image parameters. I have a batch file that embeds both the cue and the cover into the flac automatically, but now i have to do it manually... could you please add this? (and possibly other general option switches like replaygain that the reference encoder has?)

edit:

A 59min 16-bit 44,1 kHz wav file,

flac 1.2.1b -8 --verify

Size: 434MB

Encoding time: 2:21

flaCUDA 0.6 default compression --verify

Size: 434MB

Encoding time: 1:13

flaCUDA 0.6 -11 --verify

Size: 432MB

Encoding time: 4:46

The 2 MB difference aren't worth the double encoding time for me, so i kept the default level. Oh and i encoded into a RAMdisk, not on HDD.That's on a 8600m GT & C2D T7250 CPU (2GHz)

Verify is secure. It decodes each frame (on CPU) and compares each sample with original, so we can be sure that the result can be decoded into original at least using this decoder.

The fears of GPU errors aren't really justified in general, and are completely unjustified for Flacuda. It uses GPU to calculate best encoding options for each frame, but generates the output on CPU, so even if GPU would produce an error, this would only result in slightly lower compression.

I noticed that in some cases CPU is still a bottleneck for FLACuda, so i'm experimenting with utilizing multicore processors.Here is an experimental alpha version: FlaCuda07(08). I strongly recommend not to use it to encode anything valuable.Experimental features are activated using two new command line parameters:"--gpu-only" tries to ultilize GPU even for the tasks, which are maybe better suitable for CPU. Use it if you have a fast GPU and/or slow CPU. Note, that it also provides a very slightly better compression ratio."--cpu-threads N" tries to utilize N additional CPU cores.I also somewhat retuned compression levels. -8 is still the maximum compression level compatible with flac subset (used by some hardware implementations). It is however quite impractical now, -7 is much faster and provides almost identical compression.

I got a nice speed boost from --gpu-only option even though my CPU is quite fast (Core i7 940). CPU thread count only affected results very little and with different compression modes the winner changed. The changes were too small to capture in my traditional graph.

Thanx Mr. Chudov for improving Flacuda again. --gpu-only --cpu-threads 1 here on my C2D/GTX260 seems to utilize my system best also. With threads 2 it seems encoding gets tiny hickups. Compression at -8 got a bit better again against 0.6Edit: i saw block size 4096 is default now, that of cause may be the small benefit.

Summary on threading tests:1. In 0.7 version compression of -7 and -8 modes was slightly increased at the expense of speed, compression of -11 mode was decreased with a little speed up.2. On quad-core Q6600 4 threads are useless, total 3 threads are enough.3. -7 and -8 modes are practically equal.

Summary on GPU tests:1. Compression in all modes is better a bit than in non-GPU-only modes.2. 4 threads on Q6600 are useless, 3 threads are the best.3. "GPU-only + 3 threads" is slightly slower than "3 threads".

Final words: encoding 800+ MB file in just 16 (sixteen) seconds looks VERY impressive. It is 50MB/s, and maybe speed is limited by hard drive. Nice work!

I have been lurking on HA for several years, but with the release of Flacuda 0.7, I just had to register to say AMAZING!!!!

I could not get v0.6 to run stable on my non-OC'd 8400GS for long periods of time. 1-2 albums would go fine but more than 10-15 would always fail. I think the problem was heat related. Using -- gpu-only and -- num-threads 1, I was able to get 40 albums to convert bit perfect with foobar. Additionally the performance (at level -11) went from 6.5x real time in v0.6 to 20x real time in v0.7. After remounting the heat sink with Arctic Silver 5, I was able to OC from 459/400/918 to 550/475/1500 MHz (core /mem/shaders) and now it runs at over 30x real time.

THANK YOU!

This program is incredible, excellent work and thank you for the X-mass present!

Even though I'm a newcomer here, it was very difficult for me to simply ignore such a topic. I have performed some of my own tests, based on three different recordings which - I believe - represent very different music types so that we can see how the encoder behaves when fed with various musical styles. I've used Morrissey's "Live at Earls Court", My Bloody Valentine's "Loveless" and David Bowie's "Low" for my tests. All these recordings were ripped from original CDs by EAC in secure mode as single-file WAV CD images.

I've encoded the files with FlaCuda 0.7 with two different sets of switches, FlaCuda 0.6, a 64-bit compile of FLAC v1.2.1 (from here, the exe itself says that it's flac 1.2.0, but the encoder tag is 1.2.1, so I think it's 1.2.1) and the ordinary 32-bit FLAC v1.2.1. I know that the two last ones are probably a bit offtopic, but I've been looking for a chance to try out the 64-bit binary I've found sometime ago.

My setup isn't very impressive, it's a laptop with Core 2 Duo T7300, 2GB of RAM, and GeForce 8600M GT with 256MB of RAM. The input files were stored on the laptop's internal drive, encoder's output files were directed to an external hard drive connected via eSATA. I would've used a ramdisk if only I could find a ramdisk driver for the 64-bit version of Windows 7 Professional, which I have installed.

They seem very interesting to me. I'm the most surprised (and let down) by the inferior performance of the 64-bit binary in comparison to the 32-bit one. FlaCuda performed extremely well, I couldn't believe my eyes when I saw how fast the conversion was going with the newest version and the -9 switch. As expected, enabling --gpu-only slightly reduces the filesize at the cost of slightly longer encode time.

I'm taken aback by all the fabulous work you've done, Mr. Chudov, and hope to see new versions soon. By the way, how long do you intend to keep this project in alpha stage?

There comes me to mind if people playing with the Flac codec before or even Mr. Coalson himself once had an idea how to improve compression but never digged deeper cause of maniac computing power it would need. I bet Mr. Chudov already did some under the hood or at least looses some sleep about that.Now the time has come And imagine if Fermi hits the road or an alike code works under OpenCL for recent DX11 Ati cards...

Thanks to all for kind words and detailed test results, especially for test results

QUOTE (XAVeRY @ Dec 29 2009, 22:00)

I'm the most surprised (and let down) by the inferior performance of the 64-bit binary in comparison to the 32-bit one.

That's to be expected. 64-bit compile itself doesn't normally make code faster. In some applications you can gain some speed by rewriting parts of code, but more often you loose some speed. In this case, 64-bit compile most probably has SSE optimizations disabled, because SSE assembler code has to be rewritten for 64-bit mode. The increased number of registers in 64-bit mode allowed the compiler to make up for it, and almost reach the speed of SSE code. Modern compilers are that good.

QUOTE (XAVeRY @ Dec 29 2009, 22:00)

how long do you intend to keep this project in alpha stage?

At least until i can test it using flac test suite by Josh, and make sure it runs ok on the next generation of GPU's (Fermi).

Ideally, i would like to see it incorporated into mainstream flac in some form.

QUOTE (Wombat @ Dec 30 2009, 03:10)

There comes me to mind if people playing with the Flac codec before or even Mr. Coalson himself once had an idea how to improve compression but never digged deeper cause of maniac computing power it would need. I bet Mr. Chudov already did some under the hood or at least looses some sleep about that.

I did for some time, but now i'm quite sure that we have reached the limit of flac format. There's no room for further compression improvement without a new one.

There comes me to mind if people playing with the Flac codec before or even Mr. Coalson himself once had an idea how to improve compression but never digged deeper cause of maniac computing power it would need. I bet Mr. Chudov already did some under the hood or at least looses some sleep about that.

I did for some time, but now i'm quite sure that we have reached the limit of flac format. There's no room for further compression improvement without a new one.

I was under the impression there still is some room. At least i think to remember Mr. Beck the TAK developer somewhere mentioned he has some ideas to improve Flacs compression. This may of cause with some changes in the codes structure in mind.

If we reached the end this has of cause at least one positive side. I never have to reencode again

It's a bit early for OpenCL. According to NVIDIA, Fermi will be their first OpenCL-optimized architecture. The only upside to OpenCL is that such code would be easier to modify to work with AMD GPUs. That would require an AMD GPU, and i don't have one yet. I would also probably have to upgrade my computer to get a second PCIe slot, and i'm not even sure that i can have two different sets of GPUS/drivers/SDKs running on one computer.

0.8 is basically a re-branded 0.7 with default compression mode changed from -5 to -7 and default mode set to --gpu-only, to provide better results for casual user who doesn't want to bother with command line switches It doesn't deserve separate testing.

flac test suite is available in flac's sources, but has to be adapted for FlaCuda. This should be easy, it was done once with flake. I'm just very lazy and didn't find time to do it.