Various "long" variables used to hold timestamps should be "unsigned long"

Thanks westfw, our initial version of the code included some operations in the INT loop, however we reason that in the FOR statement there was already an increment operation. The code use the INT loop to calibrate the speed of the FLOAT loop, and it is probably ok to have a rough comparison between the platforms we got.

Probably I could code a WHILE statement where comparison and increment can appear as different recognizable operation, but I got the feeling that It would not be that different for the compiler.

At high speed the results are imprecise: Teensy 3.6 (Cortex M4@180Mhz). The result of FLOAT_MUL is 181.82 MIPS.The empty reference loop has the following repetitive high level operations:1)increment2)compare3)jumpAnd takes 502 microsecond for 30000 iterations, so 59.76Mloops. The high level operations MIPS are: 59.76*3=179.28 How is posible to achieve 181.82 MIPS using FLOAT_MUL?. Without optimizations must be 180 MIPS or 179.28 may be.

Operations are operation and asignement, and may be the asignement time was negligible. The inclusion of asignement to a constant in the LONG calibration loop may be a best approach, as sugested by westfw.

May be interesting to measure the asignement time (ad MIPS) of diferent data types

At high speed the results are imprecise: Teensy 3.6 (Cortex M4@180Mhz). The result of FLOAT_MUL is 181.82 MIPS.The empty reference loop has the following repetitive high level operations:1)increment2)compare3)jumpAnd takes 502 microsecond for 30000 iterations, so 59.76Mloops. The high level operations MIPS are: 59.76*3=179.28 How is posible to achieve 181.82 MIPS using FLOAT_MUL?. Without optimizations must be 180 MIPS or 179.28 may be.

Operations are operation and asignement, and may be the asignement time was negligible. The inclusion of asignement to a constant in the LONG calibration loop may be a best approach, as sugested by westfw.

May be interesting to measure the asignement time (ad MIPS) of diferent data types

I wrote the code a while ago, (indeed 180MHz microcontrollers were not exactly a target).

, if I recall correctly I tried to make all the loops look similar "in structure" to the calibration loop (so I could remove the loop weight). A float should give about 180MFLOPS in cortex-M4+FPU. I see your points however the accuracy is quite undermined by the use of the function micros (which has a granularity of 8 microseconds) and a loop of 30000 is probably quite insufficient. Actually I think 181.82MFLOPS is quite close, but probably the number of digits is definetely pointless.

The "DUMMY" assignments were made (if I still recall) because they somewhat had an effect in the compiled code. Probably a better programmer would have coded directly in assembler caring to make all the loops exaclty the same (and I am also a lazy programmer most of the time!).

I recall testing the different suggestion (looking at the compiled code), but I did not have time to improve the bench for high speed (without affecting the old results).

I also wrote a benchmark for different MCUs, both AVRs and ARMs.The benchmark test peforms low- and high-level tests for integers, floats, doubles, bitshift, random, sort, matrix algebra, GPIO r/w, and graphics.The test will run even without having attached a TFT, you may keep the #included Adafruit libs or optionally substitute them by proprietary ones.

Update: the test for Raspberry Pi now also has been completed.

As AVRs don't feature 64 bit doubles, the 32bit float test is performed twice, without issueing penalty points though (which admittedly is not fair to the ARM boards )