the ESP8266 runs a ported Version of your "VLSI Solution generic microcontroller example player / recorder for VS1053" in v1.10 2016-05-09 HH.
Programming is done with "https://github.com/esp8266/Arduino".

In the meanwhile I tried to load the plugins "vs1053b-patches.plg" and "vs1053b-patches-latm.plg" with your LoadPlugin(const u_int16 *d, u_int16 len)-Function.
After Loading the Plugin (each on a seperated test) the hdata1-Register does not contain 0x4154 or 0xAD34 when sending the m4a or adts data, anymore.

I am not quite sure, whether the plugin is loaded correctly, can I check this out reading some register?
Do I need to write some "Start-Address" to SCI_AIADDR after I called LoadPlugin() ?

Yes, I write SCI_CLOCKF before loading the patch. After writing SCI_CLOCKF and after loading the patch I wait until DREQ becomes HIGH.
My Start-SPI-Speed is 250KH. After writing to SCI_CLOCKF I set it to 4MHz. Than I call LoadPlugin(..).

I looked at the files. The M4A.m4a in your samples should play with or without the patch. I recommend setting a fixed 4.5x clock though (0xc000).

ADTS.ts is a transport stream, so I don't expect to play with or without the patch unless you remove the TS wrapper.

Are you using just one SPI for the SCI (xCS) and SDI (xDCS)? SCI_MODE should probably be 0x8800 for the new communication mode. Also, if the crystal is 12.288MHz, you should not set the top bit, SCI_MODE should be 0x0800.

mp4 files may not be streamable if they have the "mdat" atom before the decoding parameters. However, the one you attached decodes fine with vs1053b.

Do the mp3 files play 100% correctly?

You may have some subtle error in the number of bytes you send to SDI. mp3 can synchronize to all audio frames even when there are extra bytes, but the mpeg-1/mpeg-2 container starts with structures that require the correct number of bytes.

You do get "M4" in HDAT1, so the first bytes are correct (the decoder detects the mp4 container), but you should get 44100Hz/stereo (AUDATA=0xac45) or 22050Hz/stereo depending to SCI_CLOCKF and your capability of servicing the SDI.

No, there are some noise. At the beginning I had some buffering problems (because of less memory), I thought this was the reason of the noise.
I fixed this issue by now, but the noise is still there.
The noises are short impulsives and turns on almost regulary in 100ms intervalls. Indeed I send 1600 Bytes (in 32 Byte Chunks) each 100ms, approximately, that matches the 128kbit/s of the mp3 file.

for(int i = 0; i < 48 && buffer->space < 1024; i++) { //as long as fifo is not empty and not more than 48*32 bytes
player->playChunk(buffer->read(), 32); //buffer->read() returns a char* (size of 32 Bytes) on my fifo
}

Thanks for your work! But I think we are searching in the dark. If you dont see something wrong at once, I will ask me University for a oscilloscope.
I will post it, if I find the solution. But in the moment I have less time, because I have to write some exams.

playChunk looks fine. I assume SPI.transfer() returns when the transfer has been finished (does it have a return value?), in which case the delays are not necessary.

The one suspect thing is using two conditions in the byte send loop (i < 48 && buffer->space < 1024). 48 tells me you have a 48*32 = 1536-byte buffer, which is 3 disk blocks. What does buffer->space contain? Can the loop be exited early in some cases?