Details

When reviewing the libmad code segment mad_bit_read() it became obvious that on LITTLE_ENDIAN targets the function betoh32() is called in each call of mad_bit_read(). This is not needed as the address from which the data is loaded does not change on each call of mad_bit_read(). So, we can buffer betoh32'ed data.

Patch v01 adds two global variables. The first holds the address, the second holds the data. If the address did not change in the current call of mad_bit_read(), the old -- already betoh32'ed -- data is used. In case of the need to load the new address the global variables are set accordingly.

Measurements on a sample file showed that buffered data is used in 2/3 of all calls. So, on LITTLE_ENDIAN targets 2/3 of betoh32()-calls are eliminated, on Coldfire targets it might be of interest to measure speed as well. Reason is that the buffered variables are located in IRAM. A very similar approach in libmpc sped up Coldfire by several MHz.
Strange thing is that the effect on arm7tdmi is (slightly) negative. Without patch 37.0 MHz (single core), with patch 37.15 MHz (single core).

Patch v02 gets rid of global variables and adds both variables to the struct mad_bitptr. Libmad's code defined two different structs of this name! Therefore patch v02 needs to add dummy variables to the "other" struct as well to avoid overwriting of memory. I will check for a better solution which cleans up this mess...

Needed:
a) Tests on Coldfire targets!
b) Ideas why there is a slight slowdown on arm (which should be LITTLE_ENDIAN)?
c) Clean up of struct mess.

for coldfire it doesn't seem so surprising, iiuc your patch introduces an extra load from iram (bitptr->buffered_addr), a comparison and a conditional branch while replacing a load from dram with one from iram.

Just disassembled the resulting asm for betoh32() -- it is only 4 cycles. When taking into account the added code and compare it to the unpatched code the break-even for arm is reached when (betoh32() + ldr from DRAM) is ~10 cycles. If it takes more cycles, the patch will speed up, otherwise it will slow down.