Orange Pi

It is a mess. I won’t pretend it’s not. I’m feeling my way with learning the internals of RO, the build system, writing a HAL, and making the whole mess work with gcc.

Your code is “interesting”, to say the least.

I know you’ve said before that you’ve been using C since the 90’s, but judging by the large number of comments throughout the source I get the feeling that you’re not yet fully comfortable with it. Likewise with the assembler sources (although that’s a bit more forgivable if you’re not familiar with system-level ARM assembler, or interworking C and ARM).

There are some CPU registers which are “banked” so that they map to different physical registers in different CPU modes. In most modes this is R13, R14, and the SPSR. So if u-boot isn’t calling you in SVC mode, your SP setup is useless, because it will only be affecting whatever mode you were entered in.

r3 / sigbits is a mask indicating which bits of the start/end addresses are significant – e.g. if you only had a 16bit address bus you’d set it to &ffff to indicate that only the lower 16 bits of the addresses matter. But all modern machines will have 32bit (or wider) address busses, so just set it to -1.

Under RISC OS, C programs typically use APCS (“ARM Procedure Call Standard”) as their ABI. (Other ARM-based OS’s generally use an APCS derivative as well, but RISC OS sticks to the older versions of the standard). APCS dictates that r0-r3 are used for function inputs; if a function requires more inputs than that, then the caller must push the extra inputs onto the stack. So that’s the explanation for why it’s only using r0-r3.

Similarly, APCS dictates that “small” results are returned in the registers r0-r3. So storing the result to the stack isn’t necessary (and may even confuse the C code, depending on how the compiler treats the stacked parameters). The reason the Pi HAL stores the result onto the stack is because it is in control of how that stacked parameter is handled (if it wanted to call OS_AddRAM again, it could take advantage of the fact that the updated ref value is already stacked at the correct location for the OS call)

Using CallOSM will result in an infinite loop, because you’re not preserving LR. CallOSM will set LR so that the OS_AddRAM call will return to just after CallOSM, so then when RISCOS_AddRAM does mov pc, lr it will end up jumping back to the STR that’s after CallOSM. So you’ll either want to save and restore LR within RISCOS_AddRAM (which will then also require you to copy the stacked parameter down as well), or use a “tailcall” version of CallOSM which doesn’t modify LR (so CallOSM will return directly to your C code) – obviously that only works for situations where tailcall optimisation is possible! (or get a working C version of CallOSM)

“I don’t understand this. Is it putting sp -4 in r1?”. With LDR/STR, the address is provided by the expression in square brackets. With LDM/STM the address is provided by the first operand (it is a bit unfortunate that there are two different schemes used, but that’s life). Both sets of instructions have a number of different addressing modes, which adds to the complexity/power – Rick has a handy reference for the LDR/STR addressing modes

It’s possible the compiler is using r11 for its own purposes (I can’t see any obvious flag/pragma which is protecting it). So your assembler routines might be losing track of where the OS header is. (With Norcroft this is handled by declaring a special __global_reg variable, I’m not sure what the GCC equivalent is)

I’m sure I could find more things to comment on, but I should probably get back to work!

That’s what I was thinking of, yes. Don’t tell me I’ve forgotten something, or that they have decided to re-use it since I left? Surely not…

Not all that tiny, I suppose, but not exactly enormous. Very tangled to decode – okay if you make it all off limits and then use odd bits of it for odd things that each use small chunks of instruction set space I suppose.

…the fact that loads of people still use it for stacking/unstacking single registers?

“LDM or STM of single register is probably slower than LDR or STR” makes 178 appearances in a build of RISC OS, which is pretty bloody good considering there are 156 instances of this when building Twin (which is tiny in comparison).

I’m guilty too. There are times when I STM R14 because… laziness. STM is cuddly and friendly while STR is complicated, what with the #-4 offset and all. ;-)

Very tangled to decode

Down a road that looks like that is the lurking ghost of x86.

Another fun bit…STM (and LDM) with no registers at all…

Well, in theory one could use this to provide a completely new instruction.

I know you’ve said before that you’ve been using C since the 90’s, but judging by the large number of comments throughout the source I get the feeling that you’re not yet fully comfortable with it.

To be fair, it can depend on what you’re doing with it; I’ve been using it for years for Windows app development but had never needed to touch the << operator until I wrote my “Uptime” RISC OS module. A lot of the “low level” stuff is still a mystery to me…

I’ve been using it for years for Windows app development but had never needed to touch the << operator until I wrote my “Uptime” RISC OS module.

To be fair, thiis is one of the thiings that really irritated me about VisualBasic, the lack of a simple binary shift. To multiply and divide by powers of two to an integer is just so horrible given that one can easily imagine the sort of chundering going on inside VB in order to perform the * or / when a shift is, what, a single instruction on pretty much every processor on the planet?

A lot of the “low level” stuff is still a mystery to me…

That’s not a surprise given the sorts of fruity stuff you can do in C, like:

return *(void *)&this->that.something;

I made that up. It’s probably wrong. But I’ve seen worse. Then there’s the “let’s make functions look like an array so we can call them programmatiically” thing. I’m sure very useful for emulators and the like, but it leads to the sort of code that makes spaghetti GOSUB of the ’80s look tame in comparison.

I’ll be digesting all this info properly when I have a chance. Can’t really do anything useful until next week.

I did something a couple of days ago that had the results I expected unfortunately. Because of USB serial issues with RO, I set up a serial to TCP/IP WiFi bridge using an esp8266. It didn’t work well. I’m amazed it worked at all.
It’s complex, but comes down to a 3 way logic level battle. Or it should. Weird thing was it was dropping data on TX. That is OPiPC → esp8266 → WiFi → terminal. Any bursts only had the beginning transmitted. The rest would dissapear into the ether. So I’d only receive the beginning of a directory listing then nothing, for example.

A fun thing that I did notice though is if I used xterm-colour in Nettle, it could correctly display the extended character attributes like colours from the OPiPC. I had suspected it could, but this proved it. Because it was just a bridge it lacked Telnet extensions.

One of the reasons I tried this was the Serial USB drivers for RO end up with an irritating local echo happening so 2x every character was printed.

In the end it was all pointless anyway. I mean I can load binaries from ext4. It takes a few steps, including copying over the network. I would have preferred to use X/YMODEM protocols, but I just can’t seem to get Hearsay or Connector to behave. I was hoping to check for data corruption after doing a transfer with one of them instead.

e: Jeffrey, you can say my code is horrible if you want. I know it is. It’s a mess. I know. I don’t even like looking at it.

the fact that loads of people still use it for stacking/unstacking single registers?

Now this is where my background shows. I wasn’t aware of that at all.

STM is cuddly and friendly while STR is complicated

Of course a clever assembler could automagically assemble any instance of LDM/STM single register as the corresponding LDR/STR. And since the instructions are the same length (!) automagically replacing them in existing code wouldn’t be that difficult either.

Down a road that looks like that is the lurking ghost of x86.

There’s a fair few examples of that already in ARMv7. Okay, not so extreme – hence my comment about “tangled”. Some day I’ll take a look at v8. Or maybe not. I find I’m not missing any of the apps I wrote with chunks of assembler in them (yet?) – all the ones I’m still using are BASIC only. Don’t know if I’ll ever write Assembler again.

A lot’s happened recently. I’m also sick. When I’m better and get my brain back I’m going to tackle this some more. With the new useful information and time to think, i believe some may have actually penetrated my thick skull.

The problem with AllWinner SBCs has been that some hardware acceleration has only been possible by using linux with a legacy Android kernel which had some nasty bugs and issues. The Mali support for mainline linux kernels comes in the form of a blob still, but at least it’s not a mess that nobody was willing to touch like it has been.

I have absolutely no idea if the blob can be made to work for other OSes, but it’s still interesting.

e: This is for OpenGLES btw, not for the video codecs. Not sure what the status on that is, because tbqh I find the OPiPC better for watching video than the RPi3.

Edit:
Sometimes a break is good. Fresh eyes on forgotten code. Comments are good. Beyond what has been spotted by Jeffrey, I’m finding some really silly mistakes including but not limited to the UART code. I’m borrowing a version of the blink example code for now although it is at odds with what RISC OS wants. Ie instead of failing silently it’s blocking.

I’m thinking I can borrow a few GPIO pins for a simple output for debugging if I can’t catch the issue via UART, assuming it wasn’t actually the UART code. I’m hoping it’s failing where Jeffrey pointed out and not before.

Jeffrey, I know the AddRAM bit I did is garbage by the way. I fully expected the code to crash before there. I did run into the issue of having no idea what areas could be considered allocatable too.

The USBSerial blockdriver has sped things up immeasurably. I don’t have much time to work on this but I’v already found a few failure points.
Jeffrey, it appears that R11 is one of the registers that is saved in C’s stack frame. It then immediately reuses r11. I’m not sure yet if it restores r11 before it needs to be used. I’m just slowly shuffling through the disassembly, matching it to the source to work out how things end up at the point of the register dump I just grabbed.

There are some anomalies. Most notably the PC is in a HAL function stub which never gets called. According to the PC the undefined instruction is a very valid looking MOV. So this is an odd one. My bet is on linker issues again.

e: Yep.
I rearranged a few things, fixed a few things I wasn’t happy with and rebuilt. No matter what, the C code is branching to weird landing points. It seems to be missing it’s target symbol. Weird.

There are some anomalies. Most notably the PC is in a HAL function stub which never gets called. According to the PC the undefined instruction is a very valid looking MOV. So this is an odd one. My bet is on linker issues again.

I’d check the PSR as well; it’s possible the instruction is undefined because it’s dropped into Thumb mode.

It looks like GCC does have an equivalent of Norcroft’s __global_reg attribute, so I’d recommend adding the following to a common header which will be included by everything:

However that may end up generating an error due to it conflicting with one of the APCS registers (I think v8 is typically the APCS frame pointer?). Possibly you’ll be able to fix this with -fomit-frame-pointer. Failing that, try changing the code so that RO_Base is stored in v6.

Ultimately it doesn’t really matter what register RO_Base is stored in, since the HAL can always locate the OS image directly. But when the MMU is enabled and you need to keep track of the HAL workspace pointer (which will be in v6) it’s going to be important to have a tried-and-tested way of preserving the register (otherwise you’d need stubs for almost every call into C, to allow the workspace pointer to be supplied as a function argument).

(Note also that using v6 for both RO_Base and the HAL workspace pointer should be fine, since one is only used pre-MMU and the other is only used post-MMU)

It didn’t drop to thumb. Only reason I didn’t include the reg dump was I’m typing this on the PC.

I think before any of these issues are tackled, the big, big issue of the landing points of branches missing their target at build time needs to be tackled.

Not sure if I mentioned but I have a second build setup. It’s the same tree as my C one except the HAL is a different directory. It’s kind of my testbed.
After the feeling of crushing defeat with the linker sunk in I swept away the old files in the assembly HAL to a safe place and started a “proper” one again. It already has usable elementts like the H3 Hdr which I cleaned up a little. But I started from the “Top” with a different approach today.
While I was at it I got gcc to spit out gas assembly files for the C I’ve done so I can recycle some bits and pieces. Porting to ObjAsm and back is pretty easy thankfully, as long as macros aren’t involved.

I just want some confirmation on a question I asked at an earlier point.
Does the contents of the HAL affect the contents of the main ROM in any way during build? I hope the answer is “No”.

An idea came to me. I can build a complete ROM. The ROM is large. The HAL which isn’t very capable yet needs RO to make it more than a few instructions in.
Copying the entire ROM over XMODEM every time I wanted to test would be tedious. So what I’m thinking is I build a ROM and zero out the first 96(?)KB of the image then dump that on to the SD card. Then I could just load that then dump the HAL via XMODEM over it at the same base address. I mean I could remove the beginning of the RO ROM and have a different offset but this just seems easier. Does it sound feasible?

While I’m here. For posterity, the machid for the Orange Pi PC is 0×1029.
I haven’t been able to find the machid anywhere. I spotted it this time in the currently installed version of UBoot while it was loading the Linux image. This is a number I don’t want to lose. Last time I checked there was no entry in the machid database.

Does the contents of the HAL affect the contents of the main ROM in any way during build?

No.

So what I’m thinking is I build a ROM and zero out the first 96(?)KB of the image then dump that on to the SD card. Then I could just load that then dump the HAL via XMODEM over it at the same base address.

Yes, that should work fine.

While I’m here. For posterity, the machid for the Orange Pi PC is 0×1029.
I haven’t been able to find the machid anywhere. I spotted it this time in the currently installed version of UBoot while it was loading the Linux image. This is a number I don’t want to lose. Last time I checked there was no entry in the machid database.