The ARM Cortex-A9 CPU has a SIMD extension called NEON MPE. It allows for vector instructions that can perform operations on multiple elements in a single instruction. Whilst this usually improves performance, certain IIR filters called biquads pose problems as only five operations are necessary per sample and every iteration is dependent on the result of the previous result. A brief overview is given for IIR filters, the NEON extension and fixed-point processing.
In order to analyse optimisation of biquad filters, an audio effect with four different implementations is produced, comparing results with/without fixed-point processing and with/without NEON optimisation. The problems introduced by the use of biquad filters are solved by running multiple channels in parallel. As the audio channels are independent, two samples can be calculated in parallel, which approximately doubles peformance. Further performance improvement is provided by improved memory operation efficiency and the use of fixed-point processing.
The results show that the fixed-point NEON implementation is the fastest, however the floating-point NEON implementation is marginally slower but simpler to write. The use of NEON MPE improves performance by between 1.7 to 2.8 times in this case.