I have a question about a possible new iTunesEncode feature (or any other program driving Apple's AAC encoder). Wouldn't it be possible to add a gapless feature? It might sound silly, but after all, nyaochi did it with the latest Fraunhofer MP3 encoder (here and here). I'm not a coding expert -I'm not a coder at all- but I think that doing it wouldn't be an impossible task.

Adding somewhere in the tags the precise offset (constant for Apple's encoder) and calculating the amount of padded sample maybe suffice? I don't really know. faac and Nero AAC are gapless, and apparently Apple is not hurry to implement this feature. It's a pity, because Apple's encoder is pretty good. Gapless playback is not possible on any iPod, but on a computer (playing with foobar2000 as exemple) users would benefits from it.

Could we reproduce Nyaochi's ACMenc patch to work with iTunes? Or make something similar? What do you think?

Long answer: yes, but I'm not going to add it to iTunesEncode for a number of reasons:

1) iTunesEncode is designed to just be a CLI to access the functionality that iTunes itself provides via the COM interface, with regards to encoding. Other than copying the resulting encoded file around and renaming it, the file itself is never touched by iTunesEncode. It doesn't do anything at all with the actual data in the file, so adding that sort of functionality is beyond the scope of what iTunesEncode currently does.

2) I lost the source code to iTunesEncode via an accident, and so what you get is what there is. Yeah, I could rewrite the missing pieces easily enough, but there's no real compelling reason to do so at this point. I have already added all the functionality I could feasibly add via the iTunes COM interface. Supposedly, the new iTunes will support VB scripting, which might be advantageous to use in some way, but COM is basically a dead end as far as new functionality goes.

Honestly, it makes more sense to make a new program to modify existing iTunes created AAC files to add that info, sort of thing.

This may be a dumb question, but if the encoding delay is constant, would it be possible to just add gapless info in the tags with a special MP4 remuxer? Forgive my ignorance, but to non-coders it seems simpler than I'm sure it is.

I don't know much about AAC and MP4 container (I don't even install iTunes on my computer ), but technically speaking, it should be possible to write such a frontend program for iTunes. We need: 1) encoder delay of iTunes AAC encoder; 2) number of padded silent samples at the end of stream by iTunes (of course it's not constant); and 3) container to store above 1) and 2) information.

As for ACMENC implementation, 1) is supposed to be specified by a user (through a command-line preset); 2) is calculated by the number of samples in an input audio and the number of samples (MP3 frames) in the output MP3 stream. We use MP3-Info frame to store the encoder delay/padding information. To achieve 2), we must count manually the number of frames in the output stream generated by F-IIS ACM codec. We cannot reuse my source code because MP3 and AAC/MP4 are totally different.

I'm not sure how AAC gapless is achieved, but the scenario I guess would be:1) convert an input audio file into AAC file by iTunes' COM interface;2) open the input audio file and obtain the number of samples;3) open and parse the output AAC file to count up the number of frames;4) construct MP4 stream from the AAC stream and delay/padding information by using libmp4v2(?) similarly to FAAC.

In addition to the general audio-programing knowledge, the knowledge about AAC stream format and MP4 container format will be necessary.

We need: 1) encoder delay of iTunes AAC encoder; 2) number of padded silent samples at the end of stream by iTunes (of course it's not constant)

How did you find the delay of the FhG encoders nyaochi? How exactly is an encoder delay determined in any format, or is it format specific?

Removing silent samples has been done so many times in audio editors that it shouldn't be a problem. Or will removing silent samples that aren't padded ones by iTunes cause even bigger problems?

QUOTE (nyaochi @ Jun 21 2005, 07:40 PM)

I'm not sure how AAC gapless is achieved

There's the FAAC source.

Please no developers take this as an insult. Note that I'm minimizing the amount of work this would take as I'm

1) Not a programmer2) Someone who really would like to see someone at least attempt to do this, and am worried that the amount of time this would need may scare some talented people away, so is unestimating the time on such a project greatly3)Am afraid that when iTunes receives VBR support from Quicktime 7, even if the quality ends up being better than Nero, most users here will have to stick with Nero for gapless.

How did you find the delay of the FhG encoders nyaochi? How exactly is an encoder delay determined in any format, or is it format specific?

Lookint at this page to guess the delay, I measured and confirmed the value by using a wave editor.

QUOTE (Tropican @ Jun 22 2005, 10:19 AM)

Removing silent samples has been done so many times in audio editors that it shouldn't be a problem. Or will removing silent samples that aren't padded ones by iTunes cause even bigger problems?

You missed the point. AAC stream seems to have 1024 frame size, which means that you will/must get 1024*n samples after decoding an AAC stream. That's one reason why iTunes must pad silent samples to fill the last frame. And there's another reason from encoder delay, but I don't mention here. Anyway, the necessary task is not removing the silence, but telling a decoder the number of samples to be removed for playback. It cannot be achieved by an audio editor.

QUOTE (Tropican @ Jun 22 2005, 10:19 AM)

There's the FAAC source.

Please no developers take this as an insult. Note that I'm minimizing the amount of work this would take as I'm

1) Not a programmer2) Someone who really would like to see someone at least attempt to do this, and am worried that the amount of time this would need may scare some talented people away, so is unestimating the time on such a project greatly3)Am afraid that when iTunes receives VBR support from Quicktime 7, even if the quality ends up being better than Nero, most users here will have to stick with Nero for gapless.

There's always the chance Apple will add support themselves

Of course I saw the FAAC source. I understand your feeling to minimize/simplify the problem. But the problem cannot be simplified as you expected. The simplest solution would be something like what I wrote in the previous post, which talented people won't scare.

You missed the point. AAC stream seems to have 1024 frame size, which means that you will/must get 1024*n samples after decoding an AAC stream. That's one reason why iTunes must pad silent samples to fill the last frame. And there's another reason from encoder delay, but I don't mention here. Anyway, the necessary task is not removing the silence, but telling a decoder the number of samples to be removed for playback. It cannot be achieved by an audio editor.

I actually didn't miss the point, but used an incredibly bad example. I was just wondering if we could implement already existing silence cutoff code. A better example would probably be how some Winamp plugins are able to just cuttoff the silence at the end of a file during decoding, thus achieving gapless playback. Or isn't that true gapless?

QUOTE (nyaochi @ Jun 21 2005, 09:11 PM)

Of course I saw the FAAC source. I understand your feeling to minimize/simplify the problem. But the problem cannot be simplified as you expected. The simplest solution would be something like what I wrote in the previous post, which talented people won't scare.

I didn't doubt you saw the FAAC source, as you specifically mentioned the decoding library used by it and other programs. I just wanted to make sure others reading this thread knew the information was available. My mentioning that I was minimizing the problem was just me apologizing in advance, to ensure no one would take offence to what I was saying. And also talented people may scare if they think that working on this means they themselves must complete it. Publishing the source of whatever they did would be more than good enough. If there is enough interest, others are then able to continue. I think we are putting the cart before the horse though, don't you? After all, this program will be about as popular and useful as your ACMENC and Otto's iTunesencode. Not saying they aren't loved here at HA and other select places on the net, but they are no where near large enough to have their development or developers questioned as to who's working on them and their progress. There really shouldn't be any planning, just us making a thread like this with info, and then down the line if someone ends up working on such an app from what we and hopefully other people write here they can release it to the community. That was the point I was trying to make, just because there's demand, doesn't mean someone who wants such a feature has to make an awesome program, or even little more than a hack. I didn't want people to think I was alluding to that.

Long answer: yes, but I'm not going to add it to iTunesEncode for a number of reasons:

1) iTunesEncode is designed to just be a CLI to access the functionality that iTunes itself provides via the COM interface, with regards to encoding. Other than copying the resulting encoded file around and renaming it, the file itself is never touched by iTunesEncode. It doesn't do anything at all with the actual data in the file, so adding that sort of functionality is beyond the scope of what iTunesEncode currently does.

2) I lost the source code to iTunesEncode via an accident, and so what you get is what there is. Yeah, I could rewrite the missing pieces easily enough, but there's no real compelling reason to do so at this point. I have already added all the functionality I could feasibly add via the iTunes COM interface. Supposedly, the new iTunes will support VB scripting, which might be advantageous to use in some way, but COM is basically a dead end as far as new functionality goes.

Honestly, it makes more sense to make a new program to modify existing iTunes created AAC files to add that info, sort of thing.

How would you get the encoder delay out of iTunes though? Unless it'll give you the exact sample length of the origonal CD Audio track, I don't see how you could calculate it.

That's pretty easy then for encoding audio, as you can calculate the number of samples in a .wav. Just would have to build that into a program. Right Gabriel? Sorry, I know little about implementing such a feature.

Pardon my ignorance on the technicalities involved, but doesn't the MPEG-4 structure allow for chapter stops, or index points of some sort? And if so, wouldn't it be simpler to re-encode an album as a single *.m4a file, with possible plugin or hardware support for using those indices? That would eliminate the entire problem of offsets, calculated or actual, and enable true gapless playback for any player that didn't choke on the metadata. (Yes, I realize this would entail encoding each album as a single, large track, and that it would require players to buffer portions of the track - ideally, to buffer until the next index/chapter marker - but in my end-user/non-programmer/non-hardware-designer/feeble brain the method just makes sense!)

Many music CDs contain songs that blend into each other, and importing them to iTunes may create a small gap between songs that interrupts the flow. If you use the iTunes Join Tracks feature, the program melds two or more songs into one, continuous gap-free track. So now you can enjoy listening to classical music, concept rock albums and extended dance mixes without the silent treatment.

--------------------

"Facts do not cease to exist just because they are ignored."—Aldous Huxley

Pardon my ignorance on the technicalities involved, but doesn't the MPEG-4 structure allow for chapter stops, or index points of some sort? And if so, wouldn't it be simpler to re-encode an album as a single *.m4a file, with possible plugin or hardware support for using those indices?

Audible is doing exactly what you describe for their audiobooks... large m4a files with chapter stops. It could potentially be used for true gapless albums.

Unfortunately it's still a mystery as to how the feature is implemented...

Pardon my ignorance on the technicalities involved, but doesn't the MPEG-4 structure allow for chapter stops, or index points of some sort? And if so, wouldn't it be simpler to re-encode an album as a single *.m4a file, with possible plugin or hardware support for using those indices?

Audible is doing exactly what you describe for their audiobooks... large m4a files with chapter stops. It could potentially be used for true gapless albums.

Unfortunately it's still a mystery as to how the feature is implemented...

I think I see what's going on here. It seems that Audible are using a variety of formats. The file I have is directly from Audible.com and uses a proprietary speech-based codec.

However, the Audible files from the iTunes Music Store apparently are AAC and have the same bookmark & chapter-stop features. Check this out:

QUOTE

2. Audible File Formats

A. Enhanced playback features Audible utilizes a proprietary file format that includes custom features that improve the user experience over regular MP3 formats when listening to spoken word audio.

Audible files provide the ability to bookmark and remember your last heard position on each and every file stored on the iPod. You can switch between Audible files, exit and listen to music, and go back and forth and the iPod will remember your last position played and pick up where you left off.

Also, Audible files are broken up into different sections, either by timed intervals, chapters, or program segments. These segment markers allow you to quickly advance backward or forward to the next section.

. . .

Audible files purchased from the iTunes music store are encoded in Apple's AAC format, and provide the same enhanced playback improvements as files with .aa extension downloaded directly from Audible.com.

I think I see what's going on here. It seems that Audible are using a variety of formats. The file I have is directly from Audible.com and uses a proprietary speech-based codec.

Audible uses the ACELP.net codec for many audiobooks. (Before anyone asks, no, I do not have any such files... although perhaps this is what soundcheck has? The bitrate should be in the neighborhood of ~16kbps, if so.) Although the Helix implementation includes ACELP.net encoding, documentation of the *.aa container is almost non-existent. If we could find a way to convert Helix-encoded ACELP audio to iPod-compatible files, that would also be useful.

Since no one seems to take this task, I read the specification of MP4 file format available from the spec, install the latest iTunes on my machine, and download the latest mpeg4ip tools to dump MP4 streams.

I googled and found a thread to implement the gapless solultion (but I found original thread with an important information later and realized that I shouldn't have taken this approach ) and implemented a tool to set "ctts" and "stts" MP4 boxes to iTunes' MP4 files (Again, don't take this approach).

I measured iTunes’ encoder delay and found it to be probably 1088 (= 1024+64?). Then I modify the MP4 stream to store gapless-playback information. The following is an example of a dump text of an MP4 stream my experimental program generated:

RESULT:In a short answer, I could not get gapless playback/decoding by using foobar2000/faad. Even though foobar2000 displays the song length as I expected:01-itunes-1088.m4a: 441000 (= 431 * 1024 + 744 -1088)01-itunes.m4a: 442368 (= 432 * 1024)foobar2000 and faad won’t remove samples at the beginning which comes from the encoder delay of iTunes’ AAC encoder.

REASON FOR FAILURE:I found a post saying, "don't use 'ctts' and 'stts' boxes for gapless playback", in the original HA thread which does not exist in the Google's cache. Now I realized the reason why foobar and faad did not implement "ctts" for removing samples at the beginning.

HOW GAPLESS PLAYBACK IS ACHIEVED IN FAAC:I have no idea how faac implements gapless playback. To remove the padded samples, we can use duration field in 'mdhd' MP4 box instead of 'stts'. But I was wondering how the decoder removed the samples which comes from FAAC encoder's delay. Then I compared the wave forms of: original wave; iTunes (delay = 1024; this is only for debugging purpose); iTunes (delay = 1088); iTunes (no delay information); faac (MP4 stream); and faac (AAC stream) in this order: http://nyaochi.sakura.ne.jp/temp/mp4-delay.pngAll streams made from iTunes have the same delay even though I added delay information. Another interesting thing is, AAC stream does not have any delay. AFAIK, AAC stream does not contain gapless playback information, right? If so, the encoder delay of FAAC is found to be zero...

In conclusion, I could find a solution to remove padded samples, but no solution for removing samples at the beginning of a track that comes from encoder's delay. Does anyone know how to store encoder's delay in an MP4 stream? I'm disappointed to waste my weekend... I've gotta sleep.

I think the Nero/Audiocoding guys never bothered to hack into the MP4 container a way to store delay, because both Nero and FAAC have the same delay, and FAAD is compatible with that delay, so it takes it into account automatically when decoding.

If that is correct, you would need to hack a way to store delay in MP4 yourself, and then patch FAAD to take this information into account.

QUOTE

If so, the encoder delay of FAAC is found to be zero...

Nope, but the encoder/decoder pair delay is zero

If you decoded the FAAC-generated stream in iTunes, you would probably notice some delay.

This post has been edited by rjamorim: Jun 27 2005, 03:59

--------------------

Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:http://www.rarewares.org

somehow i get the feeling that this "hacking" will not lead to anything good regarding interoperability with normal aac implementations and might break more (cant back this up, just a feeling and experience with private hacks)

therefore i would like to point out that the mp4 container offers explicitely one place where private info of any kind has to be and can be stored and thats the udta (userdata) atom

i would propose to use this for storing private gapless data and make the decoder of your choice (propably faad2) use this data from the udta