According to Wikipedia thunks are used in the call by name (not by value or reference) method of passing parameters in Algol 60.

I think this use was long forgotten by the time the MS coined the term for the process of getting parameters between 16 and 32 bit code (which has nothing to do with "call by name"). Was there a thing called "The Universal Thunk" ?
PeterO

Discoverer of the PI2 XENON DEATH FLASH!
Interests: C,Python,PIC,Electronics,Ham Radio (G0DZB),1960s British Computers.
"The primary requirement (as we've always seen in your examples) is that the code is readable. " Dougie Lawson

I will say. PIs are mainstream now beyond education. When people enter the arena they will basically get Raspbian (32) first. Where they go next, if they feel then need to move up to 64bit, will basically be elsewhere. This is good for Gentoo. The way you keep them in the fold is by providing a 64bit system.

Better to you grow your ecosystem.

We make money from HW sales, not software. So in the real world, in some ways, it doesn't matter who provides the OS, as long as we continue to sell HW. Of course, having our own developed OS means we can ensure that its works correctly on new models on day of launch etc, whilst third party offereings do lag behind. One could argue that if we produced our own 64bit OS version, then sales would increase, but I am not sure how much of a sales increase would result.

Our ecosystem, right now, runs on every single Pi version ever made. There is something to be said for that. A 64bit OS will not, so the ecosystem would actually fragment...

Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

I will say. PIs are mainstream now beyond education. When people enter the arena they will basically get Raspbian (32) first. Where they go next, if they feel then need to move up to 64bit, will basically be elsewhere. This is good for Gentoo. The way you keep them in the fold is by providing a 64bit system.

Better to you grow your ecosystem.

We make money from HW sales, not software. So in the real world, in some ways, it doesn't matter who provides the OS, as long as we continue to sell HW. Of course, having our own developed OS means we can ensure that its works correctly on new models on day of launch etc, whilst third party offereings do lag behind. One could argue that if we produced our own 64bit OS version, then sales would increase, but I am not sure how much of a sales increase would result.

Our ecosystem, right now, runs on every single Pi version ever made. There is something to be said for that. A 64bit OS will not, so the ecosystem would actually fragment...

Providing a proprietary operating system can ensure repeat customers for the hardware, particularly at the corporate level where IT departments create bespoke code that runs under that operating system. Examples include IBM z/OS, Apple OS/X and formerly Sun Solaris and DEC VMS. Imagine what would happen today, however, if Dell created a unique operating system called DellishOS that only ran on their hardware.

Open source software that runs on multiple architectures as well as Microsoft Windows allows people and IT departments to run their programs on any hardware they choose. As a result, the programming skills and programs developed using the Raspberry Pi transfer immediately to all machines running GNU/Linux. This means students learn not only the theory of computation and computer science but also practical skills that without modification can be used on embedded devices to servers and all the way to the world's most powerful supercomputers. One of the strengths of the Raspberry Pi is that Raspbian is just another GNU Linux distribution. From this point of view, the only reason Raspbian exists is because 32-bit ARMv6 with hard floating point is unusual enough that the Debian project didn't want to maintain it.

Rather than discuss why Raspbian isn't 64-bit, it might be expedient to focus on whether a Raspberry Pi Desktop for ARMv8 could be added to the downloads next to the Raspberry Pi Desktop for x86.

This thread has named a few 64-bit images including bamarni's Debian Pi64 and sakaki's Gentoo. One that was mentioned briefly but I'd like to re-highlight is the 64-bit kernel Raspbian build by Crazyhead90. Lite and desktop versions available here: https://github.com/Crazyhead90/pi64/releases

As with all unofficial distros it has its lingering issues, but the defining feature is that it is still Raspbian. The Pixel UI, Sonic Pi, Minecraft Pi, and other batteries included.

Today I tried installing minetest:arm64 on Ubuntu and it ran at about 10 fps on my Pi 3. That's about the same as what I've seen on 32-bit Raspbian. When it comes to Minetest's performance on the Pi, it's not entirely a question of whether rendering is hardware-accelerated. It's that the mess of Minetest on top of Irrlicht is in need of profiling and performance debugging to find the real bottlenecks. I once wrote about the state of Minetest performance on Pi in this post

Further to jdonald's point above, as an alternative to using a 64-bit kernel with 32-bit native userland (per Crazyhead90's pi64 Raspbian spins), or indeed a full multilib userland, you can also run a 'pure' 64 bit (kernel + userland) system, and then just chroot into a 32-bit OS like Raspbian when required.

(NB: in the current state of play, once you elect to use a 64-bit kernel, in any configuration, you will forfeit MMAL / OpenMAX IL access to the VC4's h/w acceleration - albeit still allowing e.g. Eric Ahnolt's vc4-(f)kms-v3d / vc4 / mesa approach - so you need to determine whether this is acceptable for your use case.)

Most Raspbian stuff should work this way (i.e., when chrooted), and you can also install packages using apt-get etc. as usual. For example, here's a screenshot of the 32-bit Lazarus IDE running within a 32-bit Raspbian chroot (and installed via apt-get), from the gentoo-on-rpi3-64bit image:

(instructions for this are on the project's wiki, here; can easily be adapted for any 64-bit host OS)

You can even run apps like Mathematica in this way, if you patch the machine identity test ^-^ (as described in this post):

(assuming Wolfram are still licensing it for RPi3 usage at all ^-^)

The advantage of a 64-bit host userland / chroot approach, is that you get the easy availability of e.g. 64-bit (only or mostly) apps like Firefox Quantum (current release 63.0.1 on the gentoo-on-rpi3-64bit project's binhost), LibreOffice (6.1.3.2) etc.

As a second option, you can run guest OSes under KVM on your RPi3 also (much as you might use e.g. VirtualBox on a PC), if you have an appropriately configured system. Here, for example, is 64-bit Ubuntu 18.04 LTS server guest running on 64-bit Gentoo on an RPi3B+:

(instructions for doing this are on the project's wiki, here and here; again, can easily be adapted for any 64-bit host O/S that has the appropriate kvm kernel support)
Note how the guest is running a different kernel from the host here, it's not just a chroot.

You can run 32-bit guests too under KVM from a 64-bit host (setting "-cpu host,aarch64=off" when invoking QEMU), so e.g. Raspbian may be started this way, with its full 32-bit kernel, systemd init etc., but be careful your guest doesn't try to take control of the VC4 for h/w acceleration etc. It isn't a properly virtualized GPU, and any attempted shared access between host and guest will lock up your system. (I'm working on some instructions for "safe" 32-bit Raspbian KVM, but they aren't quite done yet. I do have a working system running locally though.)

I got a bit fed up with Raspbian being so hostile to multiarch out of the box while it would just work on Pi64 or Ubuntu. After getting a better understanding of the APT rules and poking around at this, I've turned my hacks into a script for patching 64-bit Raspbian. This tool can be used to prep the environment and install firefox:arm64 plus other aarch64 programs, while keeping the Pixel GUI and Raspbian's key features intact:

I had some pleasant surprises from building this. One is that once all the APT conflicts are out of the way, the system becomes amenable not only to installing arm64 programs but also i386 ones. This makes it straightforward to run Linux x86 commands (with qemu-user) or even Wine x86 without having to maintain the whole wheezy chroot. Basically, it's the situation I'm used to on Linux PCs where if you install a cross-compiler and qemu-user, ARM-compiled executables just run. When multiarch actually works on Raspbian, to some extent we get the same benefit in the other direction.

Feedback welcome. As this script is experimental, I recommend trying it on freshly installed SD cards but hold off on transforming existing systems if not backed up!

This isn't necessarily better than chrooting or virtualization, but in some ways can be a more natural 32-bit / 64-bit side-by-side experience.

Last edited by jdonald on Sat Nov 17, 2018 10:39 pm, edited 1 time in total.

When will Raspberry Pi organization release a 64-bit operating system?! Ideally the Raspberry Pi 3 B+ will work faster if a 64-bit operating system were available in Raspbian. The current 32-bit operating system may present a bottleneck.

64 bit is not any faster than 32-bit for 99.9% of anything you do. There are very few things that have an advantage at 64-bit integers, and the presence of the extra registers is not a lot of help as of yet (we already have a lot of general purpose registors in 32 bit, 16 of them, and have since 1987). Ok there are a very few things that could benifit from running 64 bit.

I am looking at the AARCH64 information currently, and the performance gains are not that much, if any, at this time. AARCH64 will povide some noticable performance gains in the future, with newer versions of the ARM core, and other upgrades to the other hardware, though at this time we are not there yet.

Going 64-bit today is kind of like going 64-bit on the Intel/AMD based PC's in 1998, there really is not a lot to gain other than bregging rites.

RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

There are 64bit SUSE versions out there ... I had one running on the 3B with no problems ... if you really want 64bit then you might want to look there ...

Haven’t tried yet on the 3B+ but planning to soon ... however, until we see a Pi with more than 4GB onboard then benefits for me are limited ...

I think I may have reached the summit of Enthusiasm Mountain and am now descending towards Suffiency Plateau ... I think the Pi is a fantastic product which I thoroughly enjoy but my days of bleeding edge pain are definitely behind me ... so I’ll leave 64bit to the diehards ...

And even with more than 4 gig the benifits are limited. As it is rare for a single application to use that large of an address space on any system, and the MMU can handle mapping in memory as needed beyond the 4GB barier.

RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

DavidS,
Have you actually measured the performance difference for real code?
YMMV but I have seen around 15% improvement for similar code (and the code is smaller). (Other people may see more or less).

Ignoring the larger addresses (which we cant use on the Pi) and wider registers (which are sometimes useful - you don't need to check for overflow!), ARM have rethought everything in the light of experience

The slow instructions have been removed or replaced with faster alternatives.

Don't underestimate the advantage of "more than double" the number of registers. The compiler uses all of them to good effect.
As you know ARM cannot address memory operands directly. So "a = b + c" might involve a couple of ldr instructions, an add, and an str.
There is great benefit in keeping values in registers, even to be used a long way further down the code. With 31 registers available, its easy for the compiler to save a value in a spare register for later use.
Unless you have arrays, or a ridiculous number of scalar variables in a function, the stack doesn't need adjusting all the time, which usually takes a couple of instructions on ARM32.
And the zero register is brilliant!

I run Gentoo64, it does seem faster and better than previous Raspbian.
Have not tested against just released spin, today is first day on new release.

But I don't expect much improvement as the SDRAM is still only 32bit wide.
To get max use of AARCH64 code would need to be optimized too?
Those that say 64bit is not needed don't understand some of us want to "learn" aarch64 coding now not later.

jamesh wrote:
More than 4GB? Not happening for years. RAM is too expensive to put that much on and keep anywhere close to the $35 price point. And RAM prices are currently INCREASING....

I find it amazing how much RAM we 'need' nowadays, when we were doing very similar tasks in 32MB devices not that long ago. Just badly written code in my opinion.

+1
Agreed. I am currently using 134MB of RAM on my Raspberry Pi B+ while I am typing this. And of that 64MB is a RAM Disk, so that makes 70MB that I am using for a web browser, a syntax highlighting text editor (having 5 open documents), a shell, a CPU Tempurature monitoring programming, all of the filesystem cache space, all of the drivers, all of the loaded modules, and the OS itself.

And I still feel like I am using way to much RAM on this RISC OS box.

Heater wrote:
I also find it amazing and sympathize with your point of view.

I have some ideas as to how to fix the "bloat" problem. Perhaps Raspbian could lead the way:

1) Remove all internationalization from every application. English should be good enough for anyone smart enough to use a computer.

Why? Internationalization does not require ANY extra RAM (as you load the current locality for messages), and does not take much extra disk space.

2) Remove Unicode support, which we don't need because of 1) above. And what is wrong with simple ASCII anyway? And besides, it's computationally impossible to support Unicode properly.

Unicode support is not that big of an issue, as proven by the many OS's that have it and only use 4MB to 16MB at boot, and are able to run many internationalized applications in under 32MB.

3) Remove support for proportional fonts and antialiasing. What is that for? Only fluff.

You want to make it take more RAM by requiring the use of bitmapped fonts?

Vector fonts save us RAM and disk space.

4) Probably we don't need colour displays either. Green, text only, screens were good enough back in the day.

That would save some, not worth the savings.

Needless to say, I don't have much hope of my suggestions getting much traction.

OK, let me put this another way...

Some guy or girl, or many of them, have been working very hard, 8 hours per day or more, for some years, to bring us a 64 bit Pi.

After all that effort on their part, we say "Meh" don't need that.

No it is more like:
Some people worked very hard to bring us the advantage of the performance of out of order execution, better branch pridiction, and full NEON, even in 32-bit modes, and we all said YEAH GREAT COOL lets put this to some good work. That is the Raspberry Pi 3B.

ejolson wrote:
The advantage of 64-bit mode is not only 64-bit memory addresses, but also 64-bit integers in general. Any compute-bound program that makes heavy use of 64-bit integers will show significantly better performance in 64-bit mode.

Yes yes yes. And it is extremely rare to have a real need for 64-bit integer math in any program, and even more rare to need a lot of 64-bit integer math.

And even I could come up with a few examples of where we could benifit from 64-bit. Though these are so few and far between that 99% of people will likely never have a real use for them.

As it stands there are still more advantages to staying 32-bit for the majority of people.

A bad but common example of a 64-bit integer computation is the prime number finder that sysbench uses to determine CPU speed. Sysbench runs about 15 times slower in 32-bit mode compared to 64-bit mode. While the sysbench CPU score doesn't mean very much, this fact is not obviousto many technology bloggers. The result is that the Raspberry Pi is frequently described as being much slower than other single-board computers when, in fact, its speed is almost exactly the same.

It is unfortunate that the tool many use for benchmarking give the wrong impression. For real world application the RPi 3B is just as fast in 32-bit mode as in 64-bit.

Even with the extra registers, the advantage is not YET significant enough to just jump.

jahboater wrote:
DavidS,
Have you actually measured the performance difference for real code?
YMMV but I have seen around 15% improvement for similar code (and the code is smaller). (Other people may see more or less).

Yes I have. The performance in most cases is prety close to identicle, in a few cases 32-bit ARM is actually a little faster on the same CPU, in a few cases 64-bit ARM is a little faster. In most cases there is not enough difference to notice. It has been a while, and I do not have an identical 64-bit and 32-bit OS setup to repeat the tests with at this time, played with it a while back and have some notes about what I found that is it.

Same advantages the 32-bit ARM ISA has had since 1985 I think is when it was first made. Nothing new about that.

Don't underestimate the advantage of "more than double" the number of registers. The compiler uses all of them to good effect.
As you know ARM cannot address memory operands directly. So "a = b + c" might involve a couple of ldr instructions, an add, and an str.

Yes and as we have 12 working registers (according to most calling conventions) that gives us the ability to optimize reused values or results between statements, a kind of optimization that is easy to implement and does not noticably slow down a compiler.

As things evolve there may be a time that the extra registers help with this kind of optimization (which I think was the idea), if the data being operated on allows the extra registers to help.

Gavinmc42 wrote:
I run Gentoo64, it does seem faster and better than previous Raspbian.
Have not tested against just released spin, today is first day on new release.

That is not comparing apples to apples. Different configurations, there are many 32-bit configurations that will feel even faster.

But I don't expect much improvement as the SDRAM is still only 32bit wide.
To get max use of AARCH64 code would need to be optimized too?
Those that say 64bit is not needed don't understand some of us want to "learn" aarch64 coding now not later.

I thought that the bus between the SoC and RAM was 128-bits wide on the Raspberry Pi. This seems more reasonable as it would explain why copying an area of ram with the cache set to write through for the destination by:

Is more than 4 times faster than doing the same thing with single registor load and store instructions. obviously the above example would be even faster if it were unrolled (taking advantage of the conditional nature of all arm instructions, oops AARCH 64 stands in the way).

RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

which has the additional benefit that you are not using four of your general purpose registers.
I think you will find this loop is very very fast on a Pi3, but should still work well on an older Pi.

Obviously the above example would be even faster if it were unrolled

I thought loop unrolling was a thing of the past? Probably because the loop is less likely to fit in a cache line (or the fancy loop buffer that x86 decoders have). Compilers stopped doing it ages ago.

By the way, "BNE Loop" is a conditionally executed B instruction and works fine on aarch64!

which has the additional benefit that you are not using four of your general purpose registers.
I think you will find this loop is very very fast on a Pi3, but should still work well on an older Pi.

Obviously the above example would be even faster if it were unrolled

I thought loop unrolling was a thing of the past? Probably because the loop is less likely to fit in a cache line (or the fancy loop buffer that x86 decoders have). Compilers stopped doing it ages ago.

Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

which has the additional benefit that you are not using four of your general purpose registers.
I think you will find this loop is very very fast on a Pi3, but should still work well on an older Pi.

That is actually a better way about it, though I was giving an example to be readable by all.

I thought loop unrolling was a thing of the past? Probably because the loop is less likely to fit in a cache line (or the fancy loop buffer that x86 decoders have). Compilers stopped doing it ages ago.

By the way, "BNE Loop" is a conditionally executed B instruction and works fine on aarch64!

There are many cases gcc still does some limited loop unrolling. It does still have some benifit, just do not take it as far, maybe four times in the above example at most, and make sure alligned to 16 word boundry of mem for even better results (especially on 32-bit ARMv6).

Though you are correct about it being important to keep a loop within a single cache line. So there is a limit to unrolling now days.

BNE is a conditionalbranch. I was talking in this case about conditional instructions like LDMFABE.

I am still studdying ARMv8 AARCH 64 so I may end up being surprised. Though real world tests do not yet show a benifit. I am sure that this is likely to change over time, as the extra registers are a benifit, while the lack of conditionals I am not sure about.

RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Great analysis on everything, jamesh! I like the point that vector fonts actually save RAM.

I've got another tip for those looking to run aarch64 software on Raspbian or any 32-bit Pi OS. It turns out 32-bit Docker can run 64-bit Docker images. You'll still have to run a 64-bit kernel, but don't need to battle Raspbian's system setup with conflicting packages. After installing docker-ce try with:

Great analysis on everything, jamesh! I like the point that vector fonts actually save RAM.

Just for the record it was DavidS that made that statement about vector fonts.

It's not actually true.

A 5 by 7 pixel font for the ASCII character set, as displayed on our green screens of days gone by, will be a lot smaller than any vector font you can find. Not mention requiring a lot less software to use it, if any is required at all!

Hey, thanks for the heads up on 64 bit code in Docker on a 32 bit OS.

That is the most brilliant an useful thing I have heard the whole week!

Great analysis on everything, jamesh! I like the point that vector fonts actually save RAM.

Just for the record it was DavidS that made that statement about vector fonts.

It's not actually true.

A 5 by 7 pixel font for the ASCII character set, as displayed on our green screens of days gone by, will be a lot smaller than any vector font you can find. Not mention requiring a lot less software to use it, if any is required at all!

Hey, thanks for the heads up on 64 bit code in Docker on a 32 bit OS.

That is the most brilliant an useful thing I have heard the whole week!

Now repeat with a 32px font. We used vector fonts in the early 90s, to save memory. They also are easier to antialias.

Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

My Raspbian bloat complaint has been addressed by the new just Desktop version
Thank you RPF Very handy for compiling 32bit baremetal apps.
Using Lite and loading a Desktop is tedious.

Gentoo64 is for 64bit coding, it could be used for 32bit but then I need to mess with compiler flags etc.
And since I am trying to do everything in Pascal, learning C compiler flags seems distracting.

I have not checked but are there any vector fonts buried in the start.elf/VC4 RTOS that could be used by Arm code?
AJ has a font2openvg utility but that uses memory, not useful if I want the smallest Demoscenes.https://github.com/ajstarks/openvg

Is AJ's OpenVG font smaller than Leon's 8x16.h? I know which one scales
Now will 64bit code do OpenVG?
Hmm have I tried Leon's 64bit OpenGLES version yet?
Another senior moment, don't remember if I had checked
Anyway I assume if I delete all the other kernels and only leave the kernel8.img and it boots on a 3A+ then it must be 64bit?

Very interesting, the 3A+ runs all 4 versions of Leon's OpenGLES kernel.img.
start.elf is checking for a kernel8.img before the kernel8-32.img.
Time to do some C coding?
Is Leon the first one to get 64bit OpenGLES baremetal going?

I have created a 64-bit Debian MATE image for the Pi 3. It supports the Pi 3's built-in WiFi out of the box and is built upon the standard Debian ARM64 base, meaning you can install any package available for ARM64 in the Debian repositories. It has firmware for pretty much everything such as WiFi, USB, and Ethernet. You can also enable ARM32 (armhf) as a secondary architecture using dpkg. Everything runs smoothly on Debian MATE, LibreOffice works as well as Chromium.

In a different thread appears a short self-contained C program which computes the first Fibonacci number with a million digits. This program implements big-number arithmetic using 64-bit integers as the underlying type. The Pi 3B+ running in 32-bit compatibility mode completes the computation in 15.43 seconds. Based on rescaling the clock speeds of a different ARM-based single-board computer, it was estimated that the Pi 3B+ running in 64-bit mode should complete this same computation in only 7.49 seconds. If true, that would be a two-fold increase in speed for a particular application just by switching operating systems.

It would be nice if someone who is running a 64-bit operating system on real 3B+ hardware could confirm that this estimate is correct. The program is available in this post. The above mentioned performance results are discussed in subsequent posts of the same thread.

Last edited by ejolson on Wed Dec 05, 2018 6:04 am, edited 2 times in total.

I'm glad to know that it's all working well for you. When I have time, I'll see if I can make Debian LXDE, XFCE, and Plasma images as well. Making the images doesn't take too much time; what takes so much time is uploading them to GitHub!