Monthly Archives: January 2015

To date, I have ported Madness from the original 6809 sources to x86 under Linux, BSD, OSX and Windows. These 4 operating systems probably comprise 95% of the x86 operating system market. In fact, off the top of my head the only other mainstream, modern-day x86 operating system I can think of is Solaris x86. So the question is should I port Madness to Solaris x86? Although there is probably very little reason for doing this, the main reason for me is that Solaris is my favorite operating system. Granted, I like Solaris SPARC, but porting to Solaris x86 is a step in the right direction. Besides, being a BSD derivative, the port shouldn’t be too bad. So Madness is now destined to run on Solaris x86.

As expected, porting to Solaris x86 wasn’t much different than the port to BSD. A few internal structures needed to be changed, but other than that, it was pretty straight-forward.

You can find the binaries for Solaris x86, as well as the source code, here:

In this entry I’ll be describing the motivation behind porting Minotaur to x86 as well as describing some of the challenges I encountered along the way. Most of this information can be found in the FAQ and other various documents in the source tree, but I am going to recap everything here in an attempt to keep things in one place.

Probably the first question that comes to mind is why port this archaic game in the first place? To answer, we need to go back to 1982. Back then, I had an original TRS-80 Color Computer with 4k of memory. Shortly after, I acquired Madness and the Minotaur as a present. Unfortunately, this game required 16k, and I only had 4k. Since I wanted to use Extended Color Basic with all its fancy features anyway, I went for the 16k upgrade. After the upgrade, I must have played Minotaur for hours, but never did beat it – partially because the game is so difficult, and partially because back then, we saved our data to cassette tape which wasn’t very reliable. Anyone who has used cassette tape for storage knows what I’m talking about. After spending what seemed like hours trying to load your saved game, listening to sounds similar to that made by a modem, 7 out of 10 times the load would fail with the dreaded “CLOADM ?IOERROR”. Sometimes repeated attempts would successfully load the game, but more often than not it wouldn’t help. So that saved game you just spent 10 hours playing was lost forever. Then sometime in 1982, “cheap” ($1200) IBM XT clones began appearing on the market and I traded in my my trusty CoCo for an 8088 clone, learning all about this new MS-DOS operating system, quickly forgetting about Madness and the Minotaur and the days of cassette storage.

Fast-forward 30 years to the Internet age. Sitting around one day, Minotaur popped into my head for no apparent reason and I decided to google it. Lo & behold, there were a couple of links talking about how hard Madness was to beat. One page, http://www.figmentfly.com/madnessandtheminotaur, even had the binary and information on how to run Madness in a simulator. So I downloaded it and tried some of the simulators, but found them too cumbersome to use, especially around saving and restoring the game. One of the links on the figmentfly page was to a disassembled binary by Chris Cantrell (http://www.computerarcheology.com), complete with full commentary.

Looking over Chris’s source made me think that a port to x86 Linux would be the most beneficial thing for me to do – that way I could easily play the game on my Linux box and also implement a simple save/restore mechanism myself. So with that goal in mind, I set out to port Chris’s commented 6809 sources to x86. Technically, I originally made Chris’s source code compilable and relocatable on the 6809 and then used that code for the port, but in the end the result was the same – Madness and the Minotaur running natively on Linux.

So with the question of why the port to Linux answered, the next logical question is why port to assembler, and not some other high-level language like C? The reason for that is two-fold:

1) Considering the age of the original game and its strong ties to the 6809 processor, porting to x86 assembly might actually have been easier than porting to a high-level language first. Some of the reasons for this are the 8-bit nature of the game, where things such as variable size matter. Another reason is the standard assembly practice (at least back then) of using jumps instead of subroutines, as well as tail recursion. Yet another reason would be the popular tricks used back then such as the LXI Trick (which you can read more about here in item #9: http://www.drdobbs.com/embedded-systems/assembly-language-tricks-of-the-trade/184408315. Such tricks, though very useful back then when memory was tight, are hardly needed today and are fading fast from the memory banks of most programmers. Since all of these tricks and practices would need to be unrolled before porting to a higher-level language anyway, assembly seemed the best choice.

2) The second, and perhaps most important reason, is that I now had an excuse to work with assembly again and sharpen my skills. It’s not often these days that I get to use assembly, never mind two assembly languages at the same time – 6809 and x86.

On the subject of assemblers, I imagine some assembly folks will be upset that I chose the GNU assembler for the port rather than NASM. Simply put, I prefer the AT&T assembly style, and since that is the default style on UNIX, it just made sense. Plus the GNU assembler is available on many platforms, though that argument could be made for NASM as well. If anyone really wants a NASM version, there is a Linux utility named intel2gas that can help translate from AT&T to Intel style. Despite its name, the utility also translates from GAS to NASM.

Another assembly decision I had to make was whether to port to 32-bit or 64-bit. I thought 32-bit would reach the most audiences, as there are still a lot of 32-bit architectures out there. Also, the 32-bit version will run just fine on a 64-bit processor, as there is nothing in the program that requires anything greater than 16-bits. 64-bit assembly would be really overkill for this program, and I imagine the resulting binary would be quite bigger. If for some reason a 64-bit port is needed, it should be a simple matter of changing the register names and taking care of overflow/underflow operations from the original 8-bit design.

I also skipped over 16-bit assembly because it is obsolete and would require working with segments. Segments? Enough said.

With the major x86 UNIX derivatives out of the way (Linux, BSD, OS X), it’s time to bite the bullet and work on a Windows port. Obviously, Windows is unlike any of the UNIX derivatives and so should require the most work. However, after my escapades with OS X, particularly the alignment issue, I’m beginning to think the Windows port may be easier.

As mentioned elsewhere in my blog, during the Linux port I had planned ahead for future porting work and so tried to isolate all the platform specific stuff into a file called hardware.s. Now that I’ve actually ported Madness twice, I’ve decided that platform-unix.s would be a better name than hardware.s for the UNIX derivatives, and I would create a platform-windows.s module for the Windows stuff.

The first obstacle is going to be the tool chain. I originally chose the GNU assembler for the port because I am probably one of the few people who actually likes the AT&T assembly syntax, and because it is available on many platforms. Another obvious choice would have been NASM, as it too runs on many platforms. But since it uses the Intel syntax, I chose GNU.

So how am I going to compile GNU assembly on Windows? Fortunately, Cygwin (http://www.cygwin.com) is the answer. Normally, when programming in C/C++ and using Cygwin, the machine running the binary has to have the cygwin1.dll file installed somewhere on the path, as that is needed to perform the POSIX translation. However, since I’m using assembly, no such translation is needed – a binary created with the GNU assembler under Cygwin is the same as a native binary created with the MinGW assembler (http://www.mingw.org). As a test, I removed all the platform-specific code from Madness (so that it would assemble, since I haven’t ported it yet) and assembled/linked it using the Cygwin tools. Then I used the MinGW assembler and linker to build the same code, producing a native Windows binary without Cygwin. As it turns out, the binaries are just about identical. The file size of the Cygwin version is 12-bytes more and a cmp between the two binaries shows a difference of 788 bytes. However, a comparison of an objdump of each binary shows that the generated assembly is identical. So this means I can simply use Cygwin for assembling Madness and the resulting binary will run natively on Windows without the need for cygwin1.dll. Cygwin also supports git, so I am able to perform all my git operations from a Cygwin terminal as well.

Another plus to using Cygwin is that it provides an ssh server. This means I can ssh into my Windows machine from a Linux terminal, and use make, as, ld and vi as my development tools. Since about the only file I’ll be modifying is platform-windows.s, there’s really no need to have to log in to Windows and use an IDE. Plus I think the command-line tools are much better in Linux (and hence Cygwin) than those available through the Windows command prompt. If not better, at least more familiar to me.

With the infrastructure out of the way, it’s time to start the actual porting work. The way I’ve done all the ports so far was to first port the I/O routines, output first and then input. After that, I get the game timer to work. From there, I just fill in the remaining areas of functionality. Since this plan has worked so far, I’ll take the same approach for Windows.

The first decision to make is which mechanism to use to communicate with the operating system. The Linux and BSD ports are pure assembly, making system calls for all I/O and requiring no external linkage to libraries. The OS X port is almost pure assembly, but because of an issue I had with sigaction, it currently links with the C library so it can use the sigaction() function. Hopefully I will figure out why the sigaction system call didn’t work and when I do, OS X can be pure assembly as well. The decision to use pure assembly was a conscious one, and I’d like to use pure assembly for the Windows port as well. This means using system calls on Windows.

After researching the topic of making direct system calls on Windows, it seems the consensus is “you should never do that”. Apparently, the Windows system call numbers change from version to version, and sometimes even between builds of the same version. Because of this, I couldn’t find anyone who suggested using system calls. Instead, most people say use the Win32 API, as that’s what it’s meant for. Two other options would be to use the NT API, which is a level below Win32, or use the C library. In keeping with the spirit of the UNIX ports, I’d prefer not use the C library, which leaves Win32 or the NT API. Even though the NT API is a level below Win32, most people say to use the Win32 API, as the NT API is largely undocumented and may change. So Win32 it is.

Surprisingly, the port to Windows was quite easy. Once I substituted all of the UNIX system calls with the appropriate Win32 API calls, it just worked. I simply replaced my read syscall with ReadFile and my write syscall with WriteFile. For the game timer, I used CreateTimerQueueTimer in place of sigaction and setitimer. The only snag I ran into was the kbhit-like functionality Madness has when starting and exiting. For UNIX, this wasn’t hard to do – all that had to be done was put the terminal in raw mode. For Windows, this was a nightmare. I wanted kbhit() functionality without having to link with a library to use kbhit(), since I’m trying to write in pure assembly. I tested kbhit() and it did work, but I wanted to roll my own using Win32. There are plenty of resources on the web, including MSDN, on how to do this, but none of them seemed to work for me. After spending hours of trying sample programs using the WaitForSingleObject(), GetAsyncKeyState(), GetNumberOfConsoleInputEvents() and other Win32 API calls to implement kbhit(), I took a step back and thought about why Madness needed this functionality. As it turns out, the sole reason for needing non-blocking input was so that Madness would have a way of generating a seed that could be used in its random number generator. The idea was to use the indeterminate amount of time it takes a user to “press any key to continue” as the seed for the generator. Thinking about it, I realized that such a processing loop could consume 100% of the CPU when left waiting for a key press, which is undesirable on today’s multiprocessing systems. Besides, there is a much easier way to generate a seed – the rdtsc instruction. So I removed the non-blocking input code from Madness and instead substituted it with code that blocked until one character was read. After that, I called rdtsc, took the low-order return value in eax mod 8192 (the size of the random data) and voila – I had a random number generator whose seed was based off of rdtsc rather than user input. Not only is this much cleaner and efficient, but it let me remove one of the hardest pieces of code to port – the non-blocking input routine.

With that out of the way, the rest of the Windows port fell into place, and is now finished. You can find the binaries for Windows, as well as the source code, here:

With the port from Linux to BSD done, I now returned to the OS X port. My thinking that porting from BSD to OS X would be easier than porting from Linux to OS X turned out to be true, but there were still unforeseen obstacles to come.

The first obstacle I ran into was my game timer wasn’t working on OS X. On Linux and BSD, I was using the sigaction and setitimer system calls to implement a timer that would execute a timer_service routine via SIGALRM once per second. Unfortunately, this mechanism wasn’t working on OS X, and I wasn’t sure which syscall was in error – sigaction or setitimer, since both were needed to implement the timer. My gut feeling (and experience) told me it was sigaction, but I wasn’t 100% sure. I’ve run into problems with sigaction in the past (including during the BSD port) and the problem usually turns out to be finding the right sigaction struct to use. While researching this, I learned something I did not know about OS X – it requires the stack to be aligned on 16-byte boundaries when making external calls. I thought this might be the issue, so I made sure to align the stack, but it didn’t help. I spent a lot of time looking into this, even posting the following question on Stack Overflow:

Unfortunately, as of yet, there have been no answers to my question. While waiting for an answer, I discovered that the OS X library function, _sigaction(), did work. Of course, this requires me to link against libc, but while I research the problem and/or wait for an answer, it will allow me to continue work on the port. I’m curious why the other syscalls I’m using work (open, read, write, etc.), but sigaction doesn’t. One thing I noticed is that the disassembly of _sigaction() is quite lengthy compared to other system calls, leading me to believe the library call is doing something extra.

With the sigaction problem solved for the moment, I next had to focus on the 16-byte alignment issue. Most of the reading I did on this subject turned out to be false, causing me to waste enormous amounts of time. For example, I read that OS X would start your program with the stack already 16-bit aligned and that the stack needed to be aligned on a 0xXXXXXX0c boundary at the point of the syscall. Both of these turned out to be false. Initially, my program kept crashing randomly upon startup. I discovered this problem while debugging, when I noticed the starting address of the stack in _start was always 4-byte aligned, but only 1/4 of the time was 16-byte aligned. I solved this problem by adding code to align the stack to 16-bytes upon start-up. As for being aligned at 0xXXXXXX0c before a call, it turns out it needs to be aligned on 0xXXXXXX00.

With the start-up problem out of the way, I focused on aligning the stack on all routines that made external calls (system calls, in the case of Madness) on a 16-byte boundary. This proved to be harder than it sounds and turned out to be very time-consuming. Again, the documentation on this subject led me astray. Supposedly there are two options, -mstackrealign and -mstack-alignment=16, that can be used to solve this problem. Unfortunately, they don’t appear to work. The clang compiler accepts them, but they don’t seem to align the stack. I have a suspicion that these arguments, as well as the guarantee that the stack starts out 16-bit aligned, only applies when writing C code and not assembly.

As if the above problems weren’t enough, I learned that the lldb debugger that I was using was reporting the esp address incorrectly. I aligned all my code by single-stepping through the debugger, but in the end the code still didn’t work. When I substituted printing the esp via the code, I noticed the values were different (usually off by 8). I believe using the debugger skews the stack pointer, and so should not be used to help manually align the stack. After discovering this, I had to go back and realign the stack on all routines that made syscalls.

Another oddity I discovered with the lldb debugger was using it remotely through a terminal. For most of the OS X work, I used OS X Mavericks on my Hackintosh (which I created using the excellent utilities and tutorials over at tonymacx86 http://www.tonymacx86.com). As I started to port to more operating systems, it made sense to me to instead virtualize OS X and stick the VM on one of my servers, just like I was doing with FreeBSD. That way I could free up my laptop to run Linux and ssh into all the various platforms needed to develop Madness. So I created a Snow Leopard VM under Virtual Box, using the same utilities from tonymacx86. I had to start with Snow Leopard for two reasons:

1) VirtualBox doesn’t support booting from USB, so I needed OS X to be on a DVD.
2) The only DVD of OS X I have is for Snow Leopard, and the iBoot (http://www.tonymacx86.com/downloads.php?do=cat&id=3) software doesn’t seem to support booting itself from CD and then installing OS X from USB. As far as I could tell, both iBoot and OS X either need to reside on disc or USB – you cannot have one on CD and the other on USB.

Once Snow Leopard was installed, I upgraded it to OS X Lion using the xMove Lion utility (http://www.tonymacx86.com/downloads.php?do=cat&id=9). Once on Lion, I upgraded to OS X Mountain Lion, using xMove Mountain Lion. There doesn’t seem to be an xMove for Mavericks, so my VM is currently stuck at Mountain Lion. All of this worked fine, but when I tried to use lldb remotely through ssh it kept exiting with one of these errors:

When I ran lldb locally on the VM, it prompted me for my credentials. After digging around, it turns out I had to enable security by running this command directly on the OS X VM (not through ssh):

/usr/sbin/DevToolsSecurity -enable on

Once I did that, I was able to use lldb remotely for debugging.

With all these problems out of the way, Madness on OS X is now a reality. The code seems to work, but I won’t swear that I’ve found and aligned all possible areas where the stack needs to be aligned on 16-bytes. To make tracking this easy, I put in a check for an unaligned stack on OS X only. If the code detects an unaligned stack, it will print an error message and exit.

Forgetting about the not-so-successful OS X port for the moment, I started work on a BSD port. I like BSD and so happen to have two BSD machines available, but one is running FreeNAS and the other is running NetBSD on SPARC. The FreeNAS box doesn’t have the GNU toolchain installed by default, nor did I wish to install it since I use it for storage only. The NetBSD box is a totally different architecture, so it is out of the question. I do have an old x86 box somewhere running NetBSD, but rather than fire it up I instead opted to use a FreeBSD 9.1 VM I had lying around for such occasions.

Unlike OS X, I expected the BSD port to be relatively easy. For the most part, that turned out to be true. Like OS X, I knew I had to change the system call mechanism from passing arguments in registers to using the stack. The one snag I hit was BSD requires 4-bytes of spill space after all parameters are pushed onto the stack, but before the system call is made. The FreeBSD Developers’ Handbook has information on this (http://www.freebsd.org/doc/en/books/developers-handbook/book.html#x86-system-calls). To make things simple, the handbook recommends wrapping the int 0x80 call in a wrapper routine to avoid having to deal with the 4-byte spill area.

After modifying the hardware.s module to pass arguments via the stack (most of which I had done previously in the OS X port attempt), FreeBSD mostly worked. The biggest problem I ran into was using sigaction. In absence of being able to register interrupts like you could on the 6809, the Linux port uses sigaction and setitimer to implement a game timer to control things such as creature movements. For some reason, on BSD, the timer wasn’t working. After a lot of digging, I discovered that there are three BSD sigaction system calls – 0x2e 0x156, and 0x1a0. I was originally using 0x2e, which didn’t work. Once I switched over to using 0x1a0, the timer routine started working. The difference between the system calls is 0x2e and 0x156 are for compatibility and compatibility 4, respectively, each of which require a #define to be set. The 0x1a0 system call is standard and so is always available. With this change in place, only a few more code changes were required, most of which were just oversights from the Linux port that BSD detected (for example, using the full eax register value rather than just al).

So the first request for a port of Madness has arrived, and that request was to port it to OS X. Not quite what I expected, as my money would have been on a request to port it to Windows. However, I was happy that is wasn’t, as porting to OS X would surely be easier as it is a BSD derivative.

I’m not an OS X expert, but I knew that being a BSD derivative, the calling convention was different. Linux system calls are more like DOS in that they take their arguments in registers. BSD, and hence OS X, follow the more traditional UNIX convention of passing arguments by pushing them onto the stack. Since I encapsulated all of the hardware/operating system functionality in a module called hardware.s (which I’m now thinking should be renamed platform.s), adding the pushes before making system calls should be trivial.

So I fired up my trusty Hackintosh system that I created a year or so ago (a Lenovo T530 running OS X Lion), cloned the madness repository and executed make. I didn’t expect it to work, but I certainly didn’t expect to see the errors it was complaining about; errors about unknown pseudo-ops like .equ, .ifdef, and .bss. I thought perhaps it wasn’t using the GNU assembler, so I checked the version:

$ as -v
Apple Inc version cctools-822, GNU assembler version 1.38

Version 1.38? That seems pretty old, as I recall binutils being version 2.x for some time now. Just to be sure, I checked the version on my Linux box:

$ as -v
GNU assembler version 2.22 (x86_64-linux-gnu) using BFD version (GNU Binutils for Ubuntu) 2.

Sure enough, it is version 2.22.

I also have OS X Mavericks running on my Hackintosh, so I fired it up to see what version it is using. To my surprise, it is the same old version. A little digging around revealed that it is well known that Apple’s GNU toolchain is quite old, and the reason for it seems to be it is being replaced with LLVM. So there will be no new versions of the GNU toolchain shipped with OS X. With this revelation, I tried using llvm-gcc-42 – same errors. I imagine that llvm-gcc probably uses /usr/bin/as anyway.

Next thing to try was clang. With clang, the original errors disappeared, but were replaced by complaints of the unknown directives .rept and endr. Checking my clang version:

and cross-checking that with what’s out there, I find clang is outdated too. Rather than upgrade to the latest & greatest version, I opted to try the next version – clang 3.2. After downloading and installing clang 3.2, my preprocessor problems disappeared and in their place were a few assembler errors that I could deal with.

After working out all the assembler errors, adding push instructions before making system calls, and changing a few system call numbers that differed between OS X and Linux, I had Madness running on OS X. However, the timer, which is central to the game, was not working.

I tried to debug the problem on OS X, but ran into problems with gdb – there were no symbols in the binary for it to use. None of the usual command-line options seemed to work with clang. So I switched to the lldb debugger. This debugger had symbols, but no line information, so I couldn’t set breakpoints on line numbers and so had to resort to addresses. After messing with lldb for a while, I decided the tools on OS X weren’t up to par and figured I’d have an easier time porting to BSD first, and then from BSD to OS X. That way I could shake out all the bugs on BSD (where I have the latest GNU toolchain), and after I had it working there I could try porting it to OS X again. So the OS X port was put on hold in favor of a BSD port!

Update: As mentioned elsewhere in my blog, I have recently setup OS X to run as a KVM guest under Linux. In doing that, I have learned that installing Xcode and then the command-line tools for Xcode gives a working tool chain in which I can compile Madness. I have tried this with both Mountain Lion and Mavericks. This means there is no reason to install clang manually. I haven’t tried the debugger yet, but will post back when I do.