Sadly, though, it's a very 'basic' assembler with only the ARM4 32-bit instruction set: no Thumb, no Neon etc. Since BBC BASIC for SDL 2.0 itself requires those extensions to run (it's virtually all Thumb code, and SIMD instructions are used for the SOUND emulation) it would be nice to have a more complete assembler. If anybody is keen on extending it it's the file bbasmb_arm_32.c in the released source package.

Sadly, though, it's a very 'basic' assembler with only the ARM4 32-bit instruction set: no Thumb, no Neon etc. Since BBC BASIC for SDL 2.0 itself requires those extensions to run (it's virtually all Thumb code, and SIMD instructions are used for the SOUND emulation) it would be nice to have a more complete assembler. If anybody is keen on extending it it's the file bbasmb_arm_32.c in the released source package.

The interpreter runs largely in Thumb? May I ask why?

RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

My primary ARM build is for Android and the development tools I am using default to Thumb(2) so I have simply followed suit in the RPi build for consistency. I think, early on, I did try building it for 32-bit ARM but the result was larger code (as would be expected) and no significant performance improvement.

You can easily repeat the experiment if you are particularly interested. In the RasPi makefile replace '-mthumb' with '-marm' in the CXXFLAGS definition. I think I'm right in saying that when targeting ARMv7 GCC defaults to Thumb anyway.

My primary ARM build is for Android and the development tools I am using default to Thumb(2) so I have simply followed suit in the RPi build for consistency. I think, early on, I did try building it for 32-bit ARM but the result was larger code (as would be expected) and no significant performance improvement.

Yes, I found about a 25% decrease in code size using thumb2 and almost identical performance. In theory when the instruictions are 16-bit the CPU loads 2 at a time, which could slightly help.

You can easily repeat the experiment if you are particularly interested. In the RasPi makefile replace '-mthumb' with '-marm' in the CXXFLAGS definition. I think I'm right in saying that when targeting ARMv7 GCC defaults to Thumb anyway.

It defaults to arm on the Pi2, so you can just remove the -mthumb option.
There are some ARM CPU's that only support thumb2 (the Cortex M0?), so it may be the default there.

I found it works very well for ARMv7, but you get lots of warnings for ARMv8 as you can no longer put 32-bit instructions in IT blocks.
So I don't use it for ARMv8 Pi's.

Edit: I have just tried -mthumb again on the more recent GCC 8.2 and it works fine on ARMv8 now!

On RISC OS gcc produces ARM code of course (it is RISC OS). Though there are some optimizations that thumb makes impossible, and some things that do end up bigger in thumb (because of needing 3 or more instruction words to do what can be done in one ARM instruction).

As I recall (been a while since I looked at thumb) you can not combine rotates/shifts into normal operations on thumb (ARM that is the only to do shift and rotate), also I seem to remember some limits to STM/LDM in thumb, and we use the stack alot. Then there is conditional execution.

These are the reasons I ask. I would guess that gcc probably overlooks these things because most other CPU's can not do them, and there are likely a lot more people targetting thumb now days (which is a sad state of things).

That makes sence then.

RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

there are likely a lot more people targetting thumb now days (which is a sad state of things).

Why sad? The Thumb2 instruction set was presumably introduced because it offers advantages, not disadvantages! Code density is typically increased significantly, and with that comes speed benefits not just because it may be possible to load two instructions at the same time but because smaller code means more efficient cache usage. If those benefits don't, on balance, outweigh the overhead of occasionally needing more instructions to perform the same task, what was the point?

As I recall (been a while since I looked at thumb) you can not combine rotates/shifts into normal operations on thumb (ARM that is the only to do shift and rotate), also I seem to remember some limits to STM/LDM in thumb, and we use the stack alot. Then there is conditional execution.

Conditional execution is available in thumb2 with something called an IT block.

As for using the shifter combined with other things, here is some output from the assembler:

Aarch64 is strictly 32-bit only instructions, so thumb will decline in time I guess.

Indeed so. The iOS build of BBC BASIC is 64-bits (that's mandated by Apple) so is entirely 32-bit instructions, and compiled with Clang rather than GCC anyway. However there is in any case no assembler because iOS blocks 'arbitrary code execution'.

With iOS apps being sandboxed I don't know why they feel the need to add the extra 'security' of disallowing code generation at run-time (which must prevent self-modifying code and some kinds of Just In Time compilation) but Apple are a law unto themselves.

Aarch64 is strictly 32-bit only instructions, so thumb will decline in time I guess.

Indeed so. The iOS build of BBC BASIC is 64-bits (that's mandated by Apple) so is entirely 32-bit instructions, and compiled with Clang rather than GCC anyway. However there is in any case no assembler because iOS blocks 'arbitrary code execution'.

With iOS apps being sandboxed I don't know why they feel the need to add the extra 'security' of disallowing code generation at run-time (which must prevent self-modifying code and some kinds of Just In Time compilation) but Apple are a law unto themselves.

Well I do not much care for Apple now days. I liked soem of what they had before 1998, then things went more and more downhill.

Though thank you for the update on thumb. I thought that thumb was only better than thumb, did not realize it brings back ARM to thumb users. Good to know.

I still go with ARM ISA, not Thumb personally. One of these days I am going to have to actually take a look at what the AARCH64 looks like in detail

RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

One of these days I am going to have to actually take a look at what the AARCH64 looks like in detail

It's interesting, not least in showing that some of the assumptions which led to the ARM 32-bit architecture are no longer valid today. For example a key characteristic of 32-bit ARM is that every instruction can be made conditional, which can eliminate a lot of jumps. Most Aarch64 instructions can't be made conditional: "Benchmarking shows that modern branch predictors work well enough that predicated execution of instructions does not offer sufficient benefit to justify its significant use of opcode space, and its implementation cost in advanced implementations".

It's interesting, not least in showing that some of the assumptions which led to the ARM 32-bit architecture are no longer valid today. For example a key characteristic of 32-bit ARM is that every instruction can be made conditional, which can eliminate a lot of jumps. Most Aarch64 instructions can't be made conditional: "Benchmarking shows that modern branch predictors work well enough that predicated execution of instructions does not offer sufficient benefit to justify its significant use of opcode space, and its implementation cost in advanced implementations".

Its more than that. Imagine the problems it causes for an out-of-order processor! And the extra 4 bits allowed for double the number of registers which is far more useful.
The same horrific costs apply for uncontrolled writes to the pc, stm/ldm and so on. These things don't work with modern CPU's.

Aarch64 replaced all the conditional stuff with some new instructions that are rather clever and very useful.

It's interesting, not least in showing that some of the assumptions which led to the ARM 32-bit architecture are no longer valid today. For example a key characteristic of 32-bit ARM is that every instruction can be made conditional, which can eliminate a lot of jumps. Most Aarch64 instructions can't be made conditional: "Benchmarking shows that modern branch predictors work well enough that predicated execution of instructions does not offer sufficient benefit to justify its significant use of opcode space, and its implementation cost in advanced implementations".

Its more than that. Imagine the problems it causes for an out-of-order processor! And the extra 4 bits allowed for double the number of registers which is far more useful.
The same horrific costs apply for uncontrolled writes to the pc, stm/ldm and so on. These things don't work with modern CPU's.

Aarch64 replaced all the conditional stuff with some new instructions that are rather clever and very useful.

Actually having conditional execution of all instructions has the potential to simplify an out of order processor, assuming a reasonable implementation.

I do not know the details of how ARM do out of order execution, so I can not comment further on that.

They would have to be extremely clever to justify the added size of programs that suddenly will have to use branch instructions where load tables, or conditional execution was done before.

I definitely look forward to seeing what it looks like. It kind of scares me that your description almost sounds like the level of casteration that the 680x0 series had in ability, despite the similar and older PDP-11 being more universal in that way, the 680x0 traded in direct access to the program counter as a user register (R7 on PDP-11) for twice as many registers.

I hope that the overall is positive, and as eloquent as the 32-bit ARM ISA.

RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Only the thumb mode can beat Aarch64 for size (and that is not using any conditionals at all!).

I hope that the overall is positive, and as eloquent as the 32-bit ARM ISA.

I think ARM know what they are doing. They had good reasons for their choices of what went into Aarch64, what was left out, and what was improved. It was streamlined, and designed for high performance on future CPU's - which ARM themselves know most about.

As for eloquence, you only have to look at two similar code sequences side by side.

Trivial things are:-
Aarch registers are called Xn (64-bit) and Wn (32-bit) which doesn't seem as nice as Rn that Aarch32 used.
In A64 you don't have to use '#' for all the immediates (as in "mov w5,42") which does make it easier to scan.
To return from a function, you just use "ret"
Register 31 reads as zero, and its called xzr or wzr, which is extremely useful.
The addressing modes are the same.
There is no ldm/stm (they are slow and don't handle interrupts well). Instead there is the very fast ldp and stp (load pair), so you can write for example:
"stp xzr, xzr, [sp, -64]!" say, to write out 16 bytes of zeros in one instruction. Or "ldp q0, q1, [x2],32" to load 32 bytes and advance the pointer.
Having 31 general purpose and 32 floating point registers means the stack doesn't need adjusting very often.