Wednesday, December 14, 2011

Okay. We're almost done with the big bits of scary assembler. Indeed, this post is almost totally assembler free, and will deal with some C functions and definitions we need for later.

We want to do something useful with what we have at the moment. We could simply implement "something useful" as a C language routine called c_entry(), which would run in SVC mode with interrupts off. In some cases, that would be sufficient. But it would hardly count as an OS, let alone a multitasking one.

So, let's look at what we want to do, make some definitions. We want tasks that run in an unprivileged mode (i.e user mode) and are either preemptively swapped out by the OS in order to run another task, or which periodically yield control of the processor to another task. They must be able to terminate. For the moment, we won't worry too much about protected memory spaces, IPC, or any of that jazz (which complicate matters, but which will come later).

A task, must have its own, inviolate, set of registers, and its own stack. It must also have some other information - entry point, state, and potentially priority. My implementation is based on scheme, so a task must also have an environment, but that's not absolutely necessary.

Obviously, we need to know what the current task is, and have a list of other tasks that might want to run. This is not identical to my code, as tasks are actually scheme objects, but you get the idea.

/* And the bits we need for the actual lists */task_t * __current_task;task_list_t * __priority_lists[31];

Now, the approach we'll be taking to multitasking is this:Each task is created with a prority, and positioned as such in one of the priority lists. Every time we need to find a task, we go through the priority lists, starting at zero, and ending at 31. We look at each element in turn of the list by removing it from the head of the list and then grafting it onto the end of the list. This way we round-robin schedule within each priority. Only "runnable" tasks get scheduled, obviously. If the task is actually the placeholder "skip", we will skip onto the next lowest priority. This way, all tasks eventually get a bite of CPU, with high priority tasks getting vastly more than the low priority ones.

Obviously, the initial setup of the lists is critical, and should be done in c_entry:

The astute amongst you will notice the use of the C library function malloc() in there, despite not having a c library. Don't worry about it. It'll come later. Do worry about me not checking for errors :)

Note that priority 31 has *no* 'skip' entry, and points to a real task. This task should not, under any circumstances, be missed out.

Now, that's all fine and well, but what about setting up tasks and actually making them swap? Ah. That's a bit more complex, and we're gonna have to delve down into assembler again. I'll get into that next time round.

Until then, though, here's a couple of little functions we needed before.

Sunday, December 11, 2011

Now, we have the ARM booting, jumping to a reset handler, and dropping into an endless loop. That's a pretty good start. But really, we'd like to do something more - well - how to put this - "more".

In order to do this, we need to have a bit more understanding about how the ARM itself works.

If we go to the ARMv7AR Architecture Reference Manual (which can be had by registering at arm.com, or by downloading a hooky copy off the internets, either approach is feasible, and one at least of which is recommended), we see, in section B1 (the System Level Programmer's Model) a certain amount of interesting reading. Forget the "privilege" aspect for the moment, and let's skip ahead to section B1.3.

If we look back at the set of vectors we set up earlier, a lot of these "cross over". So when we drop into the IRQ vector, we will be in IRQ mode. FIQ, FIQ mode. Either of the aborts, Abort mode. Undefined instruction, undefined mode, and so on. What's interesting is how the machine registers are shared between modes, and particularly the fact that all but system/user modes have their own stack pointers.

Now, when the ARM starts up, it is in SVC mode. That's the way it is, and you can't change that. And when it starts up, no stack has been defined. So you need to be really damned careful in the first bits of the reset code.

Stacks on the ARM grow downwards, so the best thing to do generally is to put them at the top of memory. As such, a typical reset routine will start by finding out how much memory is available, then setting up stack pointers for each of the operating modes. We're nothing if not typical, so let's look at how we do that.

First thing - sizing memory. On the versatile baseboard as emulated by qemu, this is easy. We try writing to a bit of memory, then read back - if the value is set, there's memory there, if there's not, then we are above the top of memory. It's not quite so simple on the Pi, as trying to write outside of physical RAM will cause an exception. However, we're going to be a bit clever, and try to kill 2 birds with one stone.

Firstly, we need to set up some big fat global variables.

.global__memtop__memtop:.word0x00400000/* Start checking memory from 4MB */.global__system_ram__system_ram:.word0x00000000/* System memory in MB */.global__heap_start__heap_start:.word__bss_end__/* Start of the dynamic heap */.global __heap_top__heap_top:.word__bss_end__/* Current end of dynamic heap */

__bss_end__ is set up by the linker, and it would be much better of me to use that for the initial value of __memtop (rounded up to the nearest megabyte) as well. But hey, I'm lazy. It'll come back to bite me later, I'm sure.

Now, as the Pi causes an exception on writes outside memory, we need to patch in a handler, temporarily. Here's the handler:

/* temporary data abort handler that sets r4 to zero */

/* this will force the "normal" check to work in the */

/* case (as, I believe, on RasPi) where access 'out */

/* of bounds' causes a page fault */

temp_abort_handler:

movr4, #0x00000000

sublr, lr, #0x08

movspc, lr

Note how the comment indicates I'm not absolutely sure this will work. This is, frankly, because I'm not sure if this will work on a real Pi, and nobody wants to let me get my hands on one. Still, let's pretend, eh?

/* This tries to work out how much memory we have available*/

/* Should work on both Pi and qemu targets*/

FUNC_size_memory

/* patch in temporary fault handler */

ldrr5, =.Ldaha

ldrr5, [r5]

ldrr6, [r5]

ldrr7, =temp_abort_handler

strr7, [r5]

DMBr12

/* Try and work out how much memory we have */

ldrr0, .Lmemtop

ldrr1, .Lmem_page_size

ldrr1, [r1]

ldrr2, .Lsystem_ram

ldrr3, [r0]

.Lmem_check:

addr3, r3, #0x04

strr3, [r3]/* Try and store a value above current __memtop */

DMBr12/* Data memory barrier, in case */

ldrr4, [r3]/* Test if it stored */

cmpr3, r4/* Did it work? */

bne.Lmem_done

ldrr3, [r0]

addr3, r3, r1/* Add block size onto __memtop and try again */

strr3, [r0]

b.Lmem_check

.Lmem_done:

ldrr3, [r0]/* get final memory size */

lsrr3, #0x14/* Get number of megabytes */

strr3, [r2]/* And store it */

/* unpatch handlers */

strr6, [r5]

DMBr12

bxlr

.Lmemtop:

.extern __memtop

.word__memtop

.Lmem_page_size:

.extern __mem_page_size

.word __mem_page_size

.Lsystem_ram:

.extern __system_ram

.word __system_ram

.Ldaha:

.externdata_abort_handler_address

.worddata_abort_handler_address

We see a few things here. Firstly, how to patch in and out the handler. Also, that I've got fed up with doing the whole .code 32; .global foo; foo: rigmarole and defined a macro called FUNC. We also see a macro called DMB, which implements the ARMv6 Data Memory Barrier (ARMv7 has a 'dmb' instruction, to do that, we don't). For what it's worth, these are the macros:

.macro FUNC name

.text

.code 32

.global \name

\name:

.endm

/* Data memory barrier */

/* pass in a spare register */

.macro DMB reg

mov\reg, #0

mcrp15,0,\reg,c7,c10,5/* Data memory barrier on ARMv6 */

.endm

So, we can hopefully now find out how much memory we have, with __memtop containing the actual top of memory and __system_ram containing the number of megabytes in case it's useful to know.

So let's look at the start of _reset...

.equ MODE_BITS, 0x1F /* Bit mask for mode bits in CPSR */

.equ USR_MODE, 0x10 /* User mode */

.equ FIQ_MODE, 0x11 /* Fast Interrupt Request mode */

.equ IRQ_MODE, 0x12 /* Interrupt Request mode */

.equ SVC_MODE, 0x13 /* Supervisor mode */

.equ ABT_MODE, 0x17 /* Abort mode */

.equ UND_MODE, 0x1B /* Undefined Instruction mode */

.equ SYS_MODE, 0x1F /* System mode */

FUNC_reset

/* Do any hardware intialisation that absolutely must be done first */

/* No stack set up at this point - be careful */

ldrr0, =.Lsize_memory

ldrr0, [r0]

cmpr0, #0

blxner0

/* Assume that at this point, __memtop and __system_ram are populated

/* Let's get on with initialising our stacks */

mrsr0, cpsr/* Original PSR value */

ldrr1, __memtop/* Top of memory */

bicr0, r0, #MODE_BITS/* Clear the mode bits */

orrr0, r0, #IRQ_MODE/* Set IRQ mode bits */

msrcpsr_c, r0/* Change the mode */

movsp, r1/* End of IRQ_STACK */

/* Subtract IRQ stack size */

ldrr2, __irq_stack_size

sbcr1, r1, r2

bic r0, r0, #MODE_BITS/* Clear the mode bits */

orr r0, r0, #SYS_MODE/* Set SYS mode bits */

msr cpsr_c, r0/* Change the mode */

mov sp, r1/* End of SYS_STACK */

/* Subtract SYS stack size */

ldrr2, __sys_stack_size

sbcr1, r1, r2

bic r0, r0, #MODE_BITS/* Clear the mode bits */

orr r0, r0, #FIQ_MODE/* Set FIQ mode bits */

msr cpsr_c, r0/* Change the mode */

mov sp, r1/* End of FIQ_STACK */

/* Subtract FIQ stack size */

ldrr2, __fiq_stack_size

sbcr1, r1, r2

bic r0, r0, #MODE_BITS/* Clear the mode bits */

orr r0, r0, #SVC_MODE/* Set Supervisor mode bits */

msr cpsr_c, r0/* Change the mode */

mov sp, r1/* End of stack */

/* And finally subtract Kernel stack size to get final __memtop */

ldrr2, __svc_stack_size

sbcr1, r1, r2

strr1, __memtop

/*-- Leave core in SVC mode ! */

/* Zero the memory in the .bss section. */

mov a2, #0/* Second arg: fill value */

movfp, a2/* Null frame pointer */

ldra1, .Lbss_start/* First arg: start of memory block */

ldra3, .Lbss_end

suba3, a3, a1/* Third arg: length of block */

blmemset

ldr r2, .Lc_entry/* Let C coder have at initialisation */

mov lr, pc

bx r2

cpsiei/* enable irq */

cpsief/* and fiq */

/* Initialisation done, sleep */

ldr r2, .Lsleep

mov lr, pc

bx r2

.Lbss_start:.word__bss_start__

.Lbss_end:.word__bss_end__

.Lc_entry:.wordc_entry

.Lsleep:.wordsys_sleep

Note the use of msr cpsr_c, rx - this is how we change mode. We can change mode this way from any mode except user mode. Luckily, the user mode stack pointer is shared with system mode, so we don't need to drop into user mode at all. So we go off, find how much memory we have, then for certain of the operating modes, we set up a stack pointer. We then use a pre-written implementation of memset() to zero out the bss section, let the 'c' code have a go at initialising its stuff via c_entry(), turn on interrupts, and go to sleep via sys_sleep().

Wednesday, December 7, 2011

My Scheme OS for the Raspberry Pi SBC is coming along nicely (code at https://gitorious.org/lambdapi), and a couple of comments on the Raspberry Pi forum kinda kicked me into actually documenting some of the process of what I've been doing. So, here goes.

Firstly, the toolset.

The first thing we'll be needing is development tools. Yeah, there's "off the peg" toolsets available, but I wanted to be up at the bleeding edge. So, off to GNU's site, and let's get cracking.

I built and installed the latest versions of libtools (which includes the assembler and linker), gcc, g++, newlib and gdb, all for target arm-none-eabi. If you want to know how to do this, googling "arm bare metal" should elucidate. Otherwise, there's always codesourcery.

Now, booting. Obviously, the first thing we need to do is boot the board. In my case, it's very uncomplicated. No first-stage booters, no relocating stuff from flash, just a bunch of RAM that your binary gets loaded into, starting at address 0x00000000. Easy peasy.

So. How does ARM (specifically, the ARM1176jzf-s processor on the Raspberry Pi) boot? Well, there's chapter and verse on the ARM site, but here's the TL;DR version.

Simples, right? Well, not quite. Address 0x00000000 is the start of what's known as the exception vector table, which contains 8 bytes for each of 8 potential exceptions. 8 bytes (or 2 words) is enough to store an absolute jump instruction, or an instruction to move an address from memory into the program counter. So the simplest vector table would look like this:

The "fast interrupt" code gets to miss an indirection, so it's faster. We simply start the interrupt handler directly at the end of the vector table. I'm not actually doing this at the moment, but it's possible.

The other exceptions load their address from an indirection table, so we can repatch them on the fly.

We have a "generic" handler for unhandled exceptions. The way that gets patched in is to do with the linker. A .weak directive for a symbol will allow us to simply not define a symbol in our code, and the linker will replace it with zero instead of barfing. The .set directive enables us to use a different default to zero. Thus, any of the _undef, _prefetch_abort or _data_abort entry points (in the code above) will redirect to _no_handler unless we define those entry points elsewhere. This is a trick we'll use again later. Note _reset, _swi and _irq have no defaults, and thus must be defined elsewhere (I've defined them to simply jump to _no_handler for the moment.

All we need to do is assemble that and link it to load at 0x00000000, and we have a booter. It will do bugger all, but it will work.