Nevermind. I just found out my system's processor doesn't support SSE3. I'd have to have one that's WinXP/SSE/SSE2/foobar2000 compatible. So if something like that isn't avaliable, I know the guys over at dBpoweramp have their own aoTuvb.5.7/Lancer builds for use with their program, so that would be the only way, for me, to get a stable, up-to-date Lancer build to use.

@forat.eu: I guess you are aware that you cannot go with an "if" statement everytime you have to decide if you should use one technology or another, right?

With this, I mean that for an application to properly and automatically (or even manually via a setting) switch between using SSE or not, and not lose all gains the programmer has to write full independent paths for those operations. ("If" statements do really slow things).

Yet, this also means lots of work for the programmer, since he has to do manually what a compiler does automatically for him.I guess the best way to do so for a multi-file program would be to have different dll's, each one with the same code, but compiled for different processors (by the compiler). The main program could choose to load one dll or another depending on where it is run. Anyway, this is not a perfect solution.

Also, when you throw in x86/x64, are you talking of an installer, or an application?? If it is an installer, the point is moot, since here we were talking about an executable program.

The only program that I've seen doing a sort of "Universal binary" for x86 and x64 is Microsoft's (or Mark russinovich's) Process Explorer. Version 11 of this software, when run on an x64 machine, extracts a x64 binary from itself and then executes that file (the downloaded file sizes 3.7MB. The x64 file sizes 950KB). So if you add up, there's clearly more space used than just having two separate files. It just makes it more handy when you're talking of small files.

Nevermind. I just found out my system's processor doesn't support SSE3. I'd have to have one that's WinXP/SSE/SSE2/foobar2000 compatible. So if something like that isn't avaliable, I know the guys over at dBpoweramp have their own aoTuvb.5.7/Lancer builds for use with their program, so that would be the only way, for me, to get a stable, up-to-date Lancer build to use.

Also, upon checking upon dBpoweramp's website, it seems that they've updated the available Ogg Vorbis lancer encoder installer for dBpoweramp to the b5.7 2009-03-03 version, from the previous b5 2006-10-24 version they had in the site a few months ago ( http://forum.dbpoweramp.com/showthread.php?t=18713 ). I do now know the source or compiler of this one, though.

I hope Steve had better luck getting this to work, because all I got was that infamous "Windows has to shut down this encoder because something's wrong it" message when I tried to use it in foobar2000 :-(

I had convert flac to ogg q5. It seems generate correct files but cpu usage is abnormal.I ran 4 encoder simultaneously, each process consume around 5% of cpu time.So 4process consume just 20% CPU time. 80% is free.It looks like lack of IO perf, but I think it doesnt matter cuz I tested on free 7200prm hdd.SSE2 version also bring this problem.

In addition, I tested another encoder 'BS; (LancerMod [20091214](SSE3) based on aoTuV b5d [20090301])'It works great and faster than john's earlier build. Peak speed up to 150x, fantastic!

I hope it will help john's work

QUOTE (john33 @ Jul 20 2010, 20:32)

Thanks for the feedback and suggestions. In the hope of resolving this, here are three compiles, this time with oggenc2.87:

I have to say that for standard length song tracks, ie., approx. 4 mins, there seems to be negligible speed difference between them on a q6600 @ 3.2GHz and 8GB DDR2 although any difference will no doubt be more apparent on a longer encoding exercise.

Is this a known bug of the Intel compiler? Did it print some warnings about unsafe optimizations used, so that one has the chance to see the problem coming? I guess, I'll have to do some code review of LAME, looking out for similar potential problems.

Sorry for re-upping this nearly one year old post, but I was wondering, related to this thread (aTuVbeta6.02): where can I download the fastest (and latest) SSE3 (or SSE2, or even SSE4.1 ) 64-bit version of accelerated oggenc2? lvqcl's one is not online anymore (from this post). Has any new advancement been realized in that field?

Dunno if can help in any way, but here's an Eric Gur (Processor Client Application Engineer @ Intel Corp.) reply to my message about MT library:

QUOTE

For threading I recommend using Intel's free TBB library. It's very fast, cross platform, simple to use and has an important feature - malloc replacement.I used it in a previous project - 1M lines of code, multithreaded application on Linux x64. Just the malloc replament boosted performance by 3x without changing any code (1 line in the makefile).

BTW, there are a number of malloc replacements available, including this and one from Google...

Many thanks lvqcl for your quick and effective answer! I don't know anything yet about compiling, but I think I'll start giving it a shot... I've seen you gave your optimization options in another thread, so I'll start with that :-)Sorry to ask, but is there any storage site or ftp server where you upload your compiles, or do you do it on an on-demand basis? :-)

Thanks again anyway, now I can encode an album in ogg in no time, which was kind of a problem so far.Good continuation and cheers for the help!

Thanks, managed to compile your sources using GCC with SSE3 acceleration as shared libraries (libvorbis, libvorbisenc and libvorbisfile) natively on my linux box. Encoding a CD to ogg takes now 30 seconds less on my old Athlon X2 4600XP. But, unfortunately oggdec and ogg1234 cannot decode anymore with the new libvorbisfile lib. After the method ov_raw_seek is called the programs exit with a "Segmentation Fault". After this method is called for the first time, the data members of vorbis_info like mode, rate, ... show only junk numbers like "-1223863434" hence the programs crash ...

I compared different encoders, and for Ogg Vorbis, specifically, several specific builds, encoding a whole CD image (697 MB) on an AMD Phenom II X6 1045T, 2700 MHz. Times taken with ptime; best of 3 consecutive runs.

That shouldn't leave any doubt that Ogg Vorbis, fine tuned by Lancer, is now probably the practically most efficient audio encoder, regarding a weighted relation between quality efficiency and speed efficiency. The FhG AAC encoder is close, but lacks of bitrate tuning flexibility (quite a large gap between VBR presets 4 and 5, targeting at 128 or 192 kbps).