@nu774: Very nice! Now, if anyone finds any samples which prove to be difficult to encode gaplessly, it should be fairly easy to add some countermeasures. The simplest to add would be to add an LPC function, like the one Vorbis employs, which I later lifted for opus-tools. Then you predict maybe 1-2ms away from each end of the input file, depending on the size of the frames or the profile being encoded, and add the extra start samples to the delay, and add the rest to the padding.

@kareha: That rather depends if you consider "not needing to have quicktime, or its libraries in the system" as a "need".

If your question was specifically about audio quality, I am not sure if it has been tested after fixing some of the problems that arose in this thread, but the encoder in Winamp is known to be of good quality, and this one comes from the same company.

I don't feel like writing decoder frontend only for AAC-ELD, and I guess applications that actually need low delay codec (such as VoIP) will not be using batch style CLI frontend anyway.Do you really need it?

I ran these samples by Case, with various encoding modes, and although I don't necessarily hear a gap, he claims to hear a loud pop on track change transition.

At the very least, taking a note from how vorbisenc works should help somewhat with cases like these. libvorbis bundles a handy linear predictive coding function. You pick a LPC order, like 32 or 64 samples, prime one filter per input channel with at least the last order count samples, then use exactly the last order samples to predict forwards for about 1 or 2 milliseconds worth of samples. Actually, since the beginning of the file comes first, you do that backwards to the beginning of the file to predict a few milliseconds of extra delay.

Once all that extra delay is predicted around the input data, the lead-in delay is added to the encoder delay, and the duration is left alone, so the lead-out delay is truncated as well.

I can look into implementing this and running test files by Case to see if his apparent decent ears can catch a gap after even that much precaution is applied.

(EDIT: For an example of how it could be applied to an input chain, see Opus Tools, where I implemented it as a post EOF padding function. I also implemented it as a pre-track padding function, but that patch was never accepted, I don't think.)

I ran these samples by Case, with various encoding modes, and although I don't necessarily hear a gap, he claims to hear a loud pop on track change transition.

Thanks for suggestion. Maybe I can take padder from opus-tools. However, without someone who actually find or hear glitches, how it could be tested if it is really fixed or improved?I tried those samples with fdkaac, but I couldn't hear the glitches.

I compiled it this morning and have been trying it out. It works exactly as described as far as I can tell. A couple of days ago I built libfdk_aac from git in order to build ffmpeg with libfdk_aac support. For music I prefer the fdkaac binary to using ffmpeg with libfdk_aac for several reasons:

ffmpeg with libfdk_aac doesn't produce files which play back gaplessly. fdkaac standalone does produce files which play back with perfect gapless by default. I've tested this with some passages I know well and always notice if there is any defect, and I also checked with the files linked at Small Pop between tracks.. which apparently can cause problems with lame.

fdkaac can write tags using simple, sane syntax and has a very neat feature where it can set tags from ffprobe's json output, so you run

CODE

ffprobe -v quiet -print_format json -show_format INFILE >TAGS

then encode with

CODE

fdkaac ENCODE_OPTIONS --tag-from-json TAGS?format.tags INFILE

This is great in a script and saves having to do text manipulation gymnastics with sed, awk or grep.

I haven't done any abx testing but I have encoded various problem samples such as trumpet, eig_essence and some others. fdkaac is fine with eig_essence at 128k CBR or any LC VBR mode, while ogg, lame and faac all fail very unambiguously at default settings.

I tried the HE modes for some speech files and it works really well. Music with the LC VBR settings seems great, though I will have to do some more listening and some abx tests of stuff I know has made me wince in the past.

This is the first really useable free(dom) software aac encoder I've found. FAAC would be ideal if it didn't sound like ess aich eye tee, so I resorted to using neroAacEnc for movie soundtracks but it is sometimes extremely inconvenient because it's 32-bit only and while it runs on multi-arch 64-bit Debian it has no large file support. For music I've been using Ogg Vorbis -q 7 and for mono speech (converting audio book CDs and similar) I've been using "lame --resample 22.05 -m m --abr 64" (lame is surprisingly good for speech at these settings). I'm starting to think fdkaac/libfdk_aac can probably work at least as well in each distinct role (movie audio tracks, music, speech). It encodes at about the same speed as the other lossy encoders I use and all my players support it.

There is no need to enable afterburner in ffmpeg or in fdkaac as it is enabled by default. To switch it off use -afterburner 0 (ffmpeg) or -a 0 (fdkaac), though it's hard to think of a circumstance where switching it off makes sense but if you can then you can.

The LC VBR modes don't really correspond with the above values. Perhaps they did then, but they don't now (unless libav does something radically different than ffmpeg or vanilla libfdk/fdkaac). One big difference between the modes is the lowpass filter: vbr 1 and 2 use the same value of about 13 KHz; vbr 3 is at slightly over 15 KHz; vbr 4 is at about 15.75 KHz and vbr 5 appears not to use any lowpass filter at all.

vbr 1 and 2 are usually so similar as to be interchangeable, or possibly for one mode to be redundant. I tend to get stereo files encoded to between 110 and 128 kbps, though sometimes over 140.vbr 3 gives me stereo files a little larger, maybe from 135 to over 160 k, occasionally over 180 (that was solo piano, an old analogue recording with obvious hiss).vbr 4 might produce stereo files from 155k and even up to over 220 k. vbr 5 bitrates can be very large, over 350 k, though 280 - 320 seems normal.

Those ranges of values are for two channels and from encoding numerous two channel audio extracted from pressed CD, some orchestral music, some rock/pop, some solo piano, some speech. Peaks can be very high in any mode. In broad terms the per-channel values D404 estimated can realistically be multiplied x1.5 with the current version. In terms of bitrates and lowpass I get the same values whether using ffmpeg or fdkaac (the only difference being that ffmpeg output doesn't play gaplessly).

It seems odd to have two (relatively) lower bitrate vbr modes, 1 and 2, which use the same low lowpass filter and produce extremely similar results. Modes 3 and 4 are well differentiated in terms of lowpass and usually bitrate, while the difference between modes 4 and 5 can be massive, though the likelihood of any human hearing a difference must be very small indeed. It seems to me that anyone interested in LC -vbr 1 setting would do better to look at the HE CBR mode, while -VBR 5 might only be useful for killer samples.

nu774, your news page has the changelog for fdkaac 0.2.0, where can I find the one for 0.3.0?

New option --moov-before-mdat was added at 0.3.0 (it was 3month ago).By default, fdkaac works the same as "qaac --no-optimize" regarding container layout.

Since fdkaac is source only and you have to fetch from github repo, I has been thinking that git history in the repo should be enough.Maybe you can at least blame me for empty ChangeLog in the source, that was created merely to shut up GNU autotools...