RISC OS

and the 26-bit question

What's the problem? (a brief history)

RISC OS
was originally developed on the ARM 2 processor, whose program counter also
doubled as the processor status, as shown in the table below:

31

30

29

28

27

26

25..2

1..0

N

Z

C

V

I

F

24-bit program counter (word aligned)

Mode

With this arrangement, it was possible to have a 26-bit address space,
capable of accessing up to 64MBytes of memory. At the time, this was considered
a lot, but with the advent of the ARM6, memory had become a much cheaper
commodity, and so a 32-bit mode was introduced, with a separate register to
store the processor flags in. Existing applications could still run because the
processor could be put into a special 26-bit mode, where the program counter
reflected the processor status bits, as in the ARM 2.

Since RISC OS uses the 26-bit program counter, applications also use this
mode of operation, as RISC OS does not support any kind of mode switching. Only
the FIRQ vector operates in 32-bit mode.

As 32-bit mode is generally more desireable (you get up to 4GBytes of memory
space), future processors will not have the 26-bit mode as an option. This means
that RISC OS will not work on the future processors.

What's the solutions

There are five possible
solutions:

Use another OSThis would mean that the
current applications would not run on the new OS, and development for RISC OS
would just dwindle.

Rewrite RISC OSThis would be a mammoth job
- and the existing applications would have to be rewritten (or possibly just
recompiled).

Create some code convertorThis would
convert 26-bit applications into 32-bit applications. There are some problems
with this, as discussed below.

Emulate 26-bit operationThis would be slow,
but as processors get faster, the emulation speed would increase.

Have two processorsThere would be one fast
processor which would perform 32-bit mode operations, and one slower processor
that could do some 32-bit mode operations, but also 26-bit mode operations.
This would increase the cost of machines, and timing would have to be
carefully judged, but it is not impossible. This is not discussed here, but
can be later on...

A code convertor

This would be the neatest
solution - simply pop your existing code into a 'magic' convertor, and out pops
a 32-bit application. However, such a task is not easy, and may indeed not be
possible.

Consider the ARM instruction
: MOVS pc,lrThis becomes the
hexadecimal sequence 0xE1B0F00E. If an automatic convertor saw this
instruction, it would convert it into whatever instruction performed the same
function as that when it was running under 26-bit (a possible method is
discussed below).

Now, if you had a block of raw data, and one of the words in the raw data
just so happened to be 0xE1B0F00E, then that instruction would also
be changed.

There is the possibility of checking around that instruction, to see if it
looks like it is code, but even that can fail:

In this example, the convertor may correctly identify the
LDMFD r13!,{pc}^, but if the debug variable was not
one, then the MOVS pc,lr would be surrounded by words which 'looks'
more like data than ARM instructions. If you were to disassemble it, it would
become:

ANDEQ r0,r0,r0,LSL#1
MOVS pc,lr
ANDEQ r0,r0,r0

You could, of course, have part of the convertor that
checks to see where program branches jump to. However, they would fail if the
following occurs:

Here, the automatic convertor wouldn't know what to do with it, unless
it had some form of built-in emulator to work out what is happening. This may
seem like a contrived example, but there could be many more complicated ways of
doing the same thing.

The emulation option

In theory, any computer
can emulate any other computer - it may not be as fast, but it'll still emulate
it.

This solution is to emulate 26-bit mode from within a 32-bit mode. There are
two ways of doing this, one is a standard emulation, the other I'm calling the
"Code Lookahead Optimal Emulator" (or CLOE), which will be discussed later.

In standard emulation, the processor would 'pretend' to have 15 registers,
and one program counter. Some of these may have a direct relationship with the
actual registers (ie. they would not be virtual registers), but others may be
virtual registers - the program counter being one of those.

For each instruction, the emulator would work out what the instruction did,
and would perform it on its set of registers. This would be quite slow (it would
be good performance if a 20:1 ratio could be achieved), but the 26-bit programs
would work. When a SWI is called, control is passed from the emulator to the OS,
and when the SWI returns, the emulator resumes. However, parts of the OS may
still be 26-bit and hence use the emulator...

Code Lookahead Optimal Emulator

Since only a
small subset of the ARM instruction set needs to be emulated, it would be
preferable if the code is emulated where it needs to be emulated, and run
naturally when it does not. In order to describe CLOE, it is best to give an
example:

CLOE starts off at the first instruction. It
looks at it, and decides whether or not it needs to be emulated. In this case,
it doesn't, and so looks at the next instruction. This continues until it finds
one of the following class of instructions:

Branch

Branch and link

Any instruction which reads the PC's status bits

Any instruction which modifies the PC

One of the CLOE SWI calls

Any SWI that would cause the program to exit

When it gets to one
of the first 4 of the above class of instructions, it stores the current
instruction at that location in some form of storage, and marks the address with
a SWI. It then performs a branch to the start of the emulation code. In this
example, the code becomes:

Since CLOE knows that the processor will always
reach the SWI (because there is no opportunity for the program counter to change
without CLOE's knowledge), the program will execute as a standard 32-bit mode.

CLOE then looks up the original instruction, and then decides that the
following needs to take place:

R14 becomes the old program counter, with the status register

The program counter becomes the location of print_text;

CLOE then starts the emulation again at the new program counter. Here,
the first instruction it finds which falls in the above category is LDMFD
r13!,{r0-r2,pc}^, so it marks that with a SWI:

As before, CLOE starts
executing in normal mode from print_text, and after calling the SWI
XOS_Write0, it is then called again. It emulates the instruction
which it has stored, and finds out that the execution continues after the
earlier SWI that has been called. So, it starts the emulation again, this time
reaching BGT loop. The code becomes:

As CLOE needs to know exactly where execution
continues, CLOE needs to emulate any form of branches, so it can continue
emulating where the code left off. After the first run, R7 is 31, so the routine
would repeat for 32 times.

After the final CLOE_branch has failed, CLOE recognises SWI XOS_Exit, and
this would return back to the 32-bit OS.

There is one problem with CLOE - code which checks itself against
modification. This would have to be addressed...

OS considerations

In order to reduce the amount
of emulation, it is vital that a 32-bit kernel be in place as soon as
possible, as well as an active encouragement to get developers to write
32-bit applications.

There are four main ways ARM code can get executed:

Applications (filetype 0xff8)

Utilities (filetype 0xffc)

Relocatable modules (filetype 0xffa)

Vectors (hardware and software)

To a lesser extent, BASIC programs
can contain assembler, but since BASIC is currently 26-bit, any ARM assembled
code would still be running under the emulator...

In order to allow 32-bit operations of the OS, the kernel would have to be
written in 32-bit, and switch between 32-bit and 26-bit modes on the processor.
To distinguish between 26-bit versions of the above, and 32-bit versions,
different file-types/SWIs could be used:

App32

Util32

RMA32*RMLoad32 etc.

SWI XOS_Claim32, XOS_CallAfter32 etc.

The kernel would normally operate in 32-bit, but when it called a
26-bit module, utility, vector or application, it would jump into 26-bit mode
(either emulated, or using 26-bit mode on the processor). Calling a SWI, 32-bit
vector, or 32-bit utitlity the program would cause the kernel to jump into
32-bit mode (or out of the emulator), and when returning, 26-bit mode is
restored. Exiting the program would make the kernel enter 32-bit mode, and carry
on where it left off.

Finally...

Because new processors won't have
the 26-bit mode, I don't believe that RISC OS has to die; it just needs to
evolve slightly...