ROLF - An alternative GUI for Linux

ROLF is a GUI framework using the framebuffer device on Linux to provide a look and feel similar to that of the RISC OS operating system.
The RISC OS GUI is largely unchanged in concept since the late 1980's, not because of any lack of development, but because it just worked.

Monday, May 31, 2010

ArtWorks Renderer

The ARM emulator is now working with 26-bit code as well as 32-bit (although it would be a bad idea to mix them in the same process) and is capable of executing the free renderer for ArtWorks files. Unfortunately, it seems to be a little slower than the pure 32-bit version was but there's plenty of room for optimisation.

The Viewer window title bar shows the time taken to render the file (actually, the whole ArtWorks file is rendered and copied into a bitmap that is displayed by the Viewer application). The computer is a dual core with about 5000 BogoMIPs per core (the emulator only uses one core, of course). No modules from RISC OS are needed to run Viewer with AWRender.

Saturday, May 29, 2010

Building on and for Knoppix

The Knoppix Live CD (http://knopper.net/knoppix/) is an excellent way to try out Linux on any PC with a CD-ROM drive. It won't store anything on your PC (unless you ask it to) and doesn't mess up any existing installations of Linux or Windows.

In this article, I'm going to describe how to build ROLF to run on Linux assuming a PC booted from an unmodified Knoppix 6.2 CD.

When done, you should have a working ROLF desktop being displayed in a window, able to run a few native ROLF programs and even some RISC OS programs, in a limited way. Simply by copying the directory into which you've installed ROLF to some permanent storage (a hard disc or a memory stick, for example), you can use the build again without having to go through the steps I describe.

Set up a build environment

Knoppix, unfortunately, doesn't include all the tools needed to build ROLF, so we have to get them from Debian first. (If your Linux system already includes these tools, you can obviously skip this step.)

First, get the full list of packages that are available:

sudo apt-get update

Then update the compiler to include the C++ compiler, needed to build the Server part of ROLF (the libraries are all plain C, but to build them, you need libtool).

After that has been done, the compiler will still refuse to compile C++ programs because it can't find cc1plus, for some reason. The simplest way to fix that is to add the path to the (just installed) cc1plus file installed to the PATH environment variable.

Similarly, the build requires a set of include files [This bit needs an update!]

Specifically, the files needed are:

From zlib:zlib.h, zconf.h

From libpng:png.h, pngconf.h

From file:magic.h

From libjpegjpeglib.h, jmorecfg.h, jconfig.h

From freetype 2:./ft2build.h./freetype/internal/internal.h./freetype/fttypes.h./freetype/ftsystem.h./freetype/ftmoderr.h./freetype/ftimage.h./freetype/fterrors.h./freetype/fterrdef.h./freetype/freetype.h./freetype/config./freetype/config/ftheader.h./freetype/config/ftconfig.h./freetype/config/ftstdlib.h./freetype/config/ftoption.h

sed -i 's/CFLAGS.*$/& -I$(CFGDIR)/my_includes' config.mak

Change directory into the downloaded code and run configure (not as complicated as a normal configure script) and build ROLF:

Saturday, May 22, 2010

Emulator speedups

The other day I noticed that it took AWRender nearly fifty seconds to render the file celtic_knot3 from here, and I thought that was too long, so I decided to get around to speeding up the emulator. (Had I first tried it on my SARPC, I'd have found that it took 55 seconds on there, so it wasn't really too slow.)

The first step was to disable all the debug output from the compatibility library; that halved the rendering time to a shade over 24s. I found that rather disappointing; I was expecting the debug output to take up at least two-thirds of the time.

I turned on optimisation in the library compilation, -O4 reduced the render time to under 18s.

Two optimisation suggestions from Jake Waskett were to ensure that jump targets were on 16-byte boundaries and to fix up calls to scan from a fixed location and jump to the returned call so that on the second attempt, there wouldn't be a relatively expensive scan call. The former shaved about 0.1s off the render time, but the latter gave a significant improvement, taking the render time down to just over 15s.

During all this, I noticed the SETcc operations and realised that I could use SETO %al ; LAHF to get all the necessary x86 flags into %ax (previously, I'd been using pushf/popf). Of course, when I googled for that combination I found a description of someone writing an ARM JIT compiler using pushf, with a comment recommending the seto/lahf combination; it's all about knowing what to look for! Anyway, once the flags are in %ax, it's fairly easy to get the four flags we want into the bottom nibble of %al by rotating %ah, masking and or'ing with %al: ( ror $3, %ah ; and $0xe, %ah ; or %ah, %al ). The flags aren't in the same order used by ARM, but a 16-byte lookup table can translate between the two in one instruction.

Obviously, there are more flag reads than flag writes (there's no point in setting the flags if they aren't going to be read at least once), so I added a new global variable to be set at the same time the flags were which contains a 16 entry bitmap, one for each condition code (EQ, NE, GT, etc.), so that a conditional instruction just has to test a known bit in a known variable and use the ZF to behave appropriately.

That change was fairly major (but only affected three files), and takes the render time down to 13 seconds.

Next thing to try was to eliminate the extra code for each load or store that checks for non-aligned accesses. The idea is to set the x86 flag that causes a SIGBUS signal to be generated for unaligned accesses and load the registers as necessary before moving on to the next instruction. Unaligned accesses in ARM code will probably be relatively rare and the speedup in the normal memory accesses should more than make up for the slower signal handling. Since the only routines called from emulated ARM code are scan_arm_code and (when debugging is enabled) dump_regs), those routines would reset the flag on entry and restore its state on exit.

That "optimisation" slowed the render time down to 15s again.

Since the only unaligned access from scan_arm_code is likely to be when setting a 32-bit constant in an instruction, I stopped manipulating the flag in scan_arm_code and tried modifying cache_32bit to write its four bytes one at a time, instead, and the time improved again to a little over 12s. However, the code I was testing didn't include any unaligned accesses, and since I hadn't written the signal handler anyway, I've decided to call it a day for the time being and leave that optimisation out.

Future optimisation possibilities:finish the SIGBUS solutionImprove the hash table lookupUse mov $constant,arm_emulator_regs[n] for constant loads into registersCombine consecutive ARM instructions that load a constant into a registerRemember if the flags (or a register's contents) are stored in a register from last time.

All of these things have a chance of making the scan_arm_code routine slower and negating their speed improvements, but they're probably worth a try.

The other thing to do is to profile the ARM code somewhat by generating code to increment counters when, for example, flags are set, flags are read, scan_arm_code and get_hash_entry are called, etc. At the moment, I notice that a sequence of ARM instructions leading up to a decision point (conditional jump, swi, etc.) is rarely much more than ten instructions.

About Me

I'm a software engineer, currently staying at home (in Germany where I moved in 2000) to look after my two daughters.
I started programming in the eighties which might be why I like elegant software which doesn't break and isn't bloated. I still use RISC OS daily for e-mail and Usenet, but have moved over mostly to Linux for web browsing and video.
I still find the RISC OS GUI a pleasure to use, and I also admire the Eiffel programming language, although I don't think it has lived up to its initial promise.