Thanks for posting. Your results are similar, though perhaps slightly faster, compared to the graphs that James uploaded which I converted to portable network graphics:
Here "My Computer" refers to the new Raspberry Pi 4B.

Compared to the original Pi B the Pi 4B is 26.8474 times faster. That's about double the performance of the 3B+ overall, however,I find it surprising that the merge sort timings are actually slower than the 3B+. I wonder if this result is related to the compiler version or an optimization setting. It would be nice to find a set of compiler flags for which the merge-sort timings were faster.

From Eben when I showed him the results for the merge, "Could be expensive line moves between L1s, but I suspect it's actually measuring the cost of forking processes in LPAE."

Which is why some of the other Pie charts were comparing LPAE kernels on the Pi3B+.

My understanding is that the task parallel constructs in modern OpenMP implementations fork a pool of threads at the beginning of the run (which isn't measured by the timing routines) and then use either work stealing or some sort of grand central dispatch to assign parcels of work to the threads in the pool. Maybe the cost of Linux thread synchronization primitives goes up when LPAE is enabled; however, it is strange that the serial version also runs slower.

I wonder if this is a gcc version 8.x compiler regression. Have you tried any compiler flags to remedy the situation?

No, I have not looked at any of this, just did the charts with the default compiler on the pi itself. There's probably some mileage in using the latest compilers.

Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
"My grief counseller just died, luckily, he was so good, I didn't care."

Aha, one last thing. It doesn't solve the merge-sort anomaly, but I realized there were still some changes in config.txt including a small overclock. Resetting config.txt to stock settings just lowers the speeds proportionately:-

Here are my Pi 3B vs Pi 4 numbers. I used the same binary, that I compiled with gcc 6.3 on an older image (because I am having huge performance issues with Raspbian Buster and I wanted to investigate those). For some reason, these performance issues don't show up at all in this pichart-test - but they do in software that I wrote (an old 2017 Raspbian image is more than twice as fast as Buster on that, no idea why, if anyone has an idea: https://www.raspberrypi.org/forums/view ... 8&t=243859 ).

Ah. And there's the cause of the MergeSort problem: It broke in gcc.
Here are the same numbers for the Pi 4, but using gcc 8.3 that's on the Buster image:
Prime Sieve: 1690
Merge sort: 328
Fourier: 253
Lorenz 96: 5747

Ah. And there's the cause of the MergeSort problem: It broke in gcc.
Here are the same numbers for the Pi 4, but using gcc 8.3 that's on the Buster image:
Prime Sieve: 1690
Merge sort: 328
Fourier: 253
Lorenz 96: 5747

Thanks for running those tests. The results make me more confident in the engineering behind the Pi 4B and the Cortex-A72. I'm sorry it didn't help with your problem.

The fact that merge sort compiled with gcc version 8.3 has only 60% the performance of the same code compiled with gcc version 6.3 makes one imagine too much time has been spent optimizing 64-bit at the expense of 32-bit targets. Maybe the regression is due to the mitigation of Spectra-like side channel vulnerabilities that might leak information about the numbers being sorted when the test is running. Maybe in-kernel side-channel information leakage mitigations are responsible for the slowdown you are experiencing.

It's not all bad. Even though the performance of merge sort deceased, the performance of prime sieve and the Lorenz 96 simulation increased. Therefore the final Pi ratio didn't decrease as much as might be expected. In summary

Also interesting: Has anyone done a 64 bit test? It's too bad that Raspbian is still 32 bit, but there are other images that are 64 bits. On my Pi 3B I saw (on one specific program, and I think we were using different gcc versions too) a 10% increase in performance vs 32 bit.

I'll do some tests next week.

(Btw the other problem is solved, was a sound card speed issue in the older image, it apparently ran at a lower sample rate than selected).

Here are my Pi 3B vs Pi 4 numbers. I used the same binary, that I compiled with gcc 6.3 on an older image (because I am having huge performance issues with Raspbian Buster and I wanted to investigate those). For some reason, these performance issues don't show up at all in this pichart-test - but they do in software that I wrote (an old 2017 Raspbian image is more than twice as fast as Buster on that, no idea why, if anyone has an idea: https://www.raspberrypi.org/forums/view ... 8&t=243859 ).

Ah. And there's the cause of the MergeSort problem: It broke in gcc.
Here are the same numbers for the Pi 4, but using gcc 8.3 that's on the Buster image:
Prime Sieve: 1690
Merge sort: 328
Fourier: 253
Lorenz 96: 5747

More numbers, this time with Ubuntu Mate 64 bit, Pi 3B (because there's no Pi 4 version yet). And unfortunately with gcc 7.4, because that's the one that's delivered with Ubuntu Mate...

My very very unreliable estimate would be that 64 bit is between 10 and 15% faster than 32 bit. Which matches values that I've read elsewhere, and values that I've seen in my own software (compiled with the same compiler version in both 32 and 64 bit). It would be helpful to use the same gcc versions (and I could, I have gcc 8.2 running at both 32 and 64 bit on some other Pi's), but that's too much effort for now. I'm more interested in how it affects my own software than on how it affects a benchmark.

My very very unreliable estimate would be that 64 bit is between 10 and 15% faster than 32 bit. Which matches values that I've read elsewhere, and values that I've seen in my own software (compiled with the same compiler version in both 32 and 64 bit). It would be helpful to use the same gcc versions (and I could, I have gcc 8.2 running at both 32 and 64 bit on some other Pi's), but that's too much effort for now. I'm more interested in how it affects my own software than on how it affects a benchmark.

Thanks for the report. It's good to know that running pichart using 64-bit ARM doesn't make the surprising difference it does with sysbench. In a way I share your sentiment about only being interested in how fast 64-bit affects the software you wrote yourself. The only difference is that pichart is my software.

I'm somewhat disappointed that switching to 64-bit didn't solve the performance problems with newer compilers and merge sort. From this post it looks like the Pi 4B will soon run a 64-bit version of Gentoo Linux. I wonder if there is anything that can be done with the current C code to make merge sort run faster with the newer versions of gcc.

These seem to be the best scores yet for a single run of pichart on a Raspberry Pi 4B. For the record, there is a nice comparison to a Rock64 here which shows that single-board computer has roughly the same performance as the Pi 3B+. Since both machines use a quad-core Cortex-A53 processor this is not unexpected. However, I still find it interesting.

Just note the Overclock though, I assume most boards will run at it, it the one Tom's Hardware used in thier announcement and I just copied it straight off. It a 17% frequency increase.
It'll probably go faster as it not sweating with just a gentle fan blowing over it.

No doubt we'll see these faster speeds if they build in active cooling solutions in say a + board in the future. The room is these in the SoC.