I’ve always been fascinated by the early days of the computer revolution. Today we take tremendously powerful machines for granted but it was not always that way. As a personal project I decided to implement an early eighties era microcomputer on the Arduino Uno to demonstrate just how powerful even the most basic of our microcontrollers are today.

My microcomputer of choice was the Apple II, this was the computer that was responsible for making Apple Computer a household name, with over five million units sold it was one of the most popular microcomputers of the era.

The Apple II was originally designed in 1977 by Steve Wozniak. In order to reduce costs and to bring the computer into the mass consumer market, Steve made many unique design decisions that reduced the cost and complexity of the machine. One of these goals was to drastically reduce the chip count of the machine.

The first Apple II machines featured 4 kilobytes of RAM that was shared with the video frame buffer. For CPU it featured a MOS 6502 clocked at 1 MHz. It was capable of generating text video at a resolution of 40 columns and 24 rows, and it featured two graphics modes capable of indexed colour video at up to 280x192 pixels.

Original Apple II Microcomputer.

The MOS 6502 CPU was a rather revolutionary device by its own right. The 6502 was designed by the fledgeling semiconductor manufacturer, MOS Technology in 1975. The MOS Technology CPU project was headed by Chuck Peddle and three other ex Motorola employees. They sought to produce low cost CPU designs for the broader consumer market, a revolutionary idea at the time, and an idea that lead to them leaving Motorola.

When the 6502 was first released it was priced at $25 USD. At the time this was unheard of, being up to six times cheaper than the nearest competitors. Some people even thought that the low price had to be some form of scam. Ultimately the 6502 was no scam and went on to power many of the early microcomputers, including the Apple I/II and Commodore 64. The 6502 has been referred to as the “The original RISC processor”. Its elegant instruction set and historical significance has led to the processor remaining relevant and revered even forty years on.

The first step for my project of emulating an Apple II was to emulate the 6502 processor. Emulating the 6502 is in itself is a significant undertaking. Several of the emulator design decisions were made easily, it had to be written in plain c, it had to be memory/cpu efficient, it didn’t have to be cycle accurate and it did not need to support BCD arithmetic (as Steve Wozniak never used it in his BASIC code).

The 6502 instruction set is remarkably simple, each opcode is a fixed 8 bits in length, there are 56 different instructions and 13 address modes. Most instructions work with most address modes. This simple (instruction/address mode) relationship is referred to as instruction set orthogonality. This approach leads to a much simpler emulation strategy. We can reuse the majority of memory fetching and instruction decoding code.

The MOS 6502 Instruction Set.

Before starting the task of writing my 6502 emulator I took a slightly atypical approach. For programming tasks where you have a predefined input / output relationship it can be handy to use test driven development. In this case test driven development involves first writing a test in 6502 assembly that you expect to perform some action and then running it and observing the output. If the output matches what you expect (from data sheets etc) then your code for that action is correct and functional.

In this case I wrote an exhaustive test for each of the 6502’s opcodes and connected this to my emulator code. By running the test routines I could ensure my code was 6502 compatible throughout the development process. It also helped highlight unimplemented functionality.

I originally wrote the emulation code as a small C application on my OSX box. Running it locally allowed me to quickly test and make changes. My final design used a simple switch statement to decode instructions and a collection of operand decoding utilities. The goal was to keep it as simple as practical.

I decided to use a switch statement for instruction decoding due to it being easy for the C compiler to optimise into an efficient jump table. However I could have used an array of function pointers which would have likely been optimised as well (plus possibly more readable). However this would have required some navigating around the AVR memory model and you would have risked the overhead of call/return instructions.

Completing the emulator was a case of reading the 6502 programming guide and consulting the plentiful resources on 6502 emulation. After completing my first compatible build I decided to test the simulator against some of the original Apple firmware. Sadly I was plagued with strange bugs that made no sense. In a pinch I found the source code another emulator and decided to wire it into my processor unit tests. It turned out I hadn’t properly understood the x-indexed, indirect addressing mode and my unit tests were broken. It was a seriously frustrating one line fix.

The original Apple II firmware, totalling 12 kilobytes, was stored on six 2 kilobyte socketed ROM’s. These memory chips were mapped to the program address space between $D000 - $FFFF. Originally only four of the ROM sockets were populated.

Apple II Firmware:

$F800-$FFFF System Monitor (Hardware Routines).

$F689-$F7FC Sweet-16 Interpreter (Virtual Machine).

$F500-$F63C Mini-Assembler.

$E000-$F424 Integer Basic.

The system monitor program functions as a sort of simple shell. It includes functionality that allows you to manipulate memory contents, trace/debug programs, and execute memory locations. It also includes a large amount of hardware routines, such as initialising memory, reading characters from the keyboard, displaying characters on the screen, and saving/loading programs. When the Apple II first starts it loads a reset vector from $FFFC which points to the beginning of the system monitor program.

Integer basic was a handwritten BASIC interpreter written by Steve Wozniak. It was syntax compatible with HP BASIC and used 16 bit signed numbers for math operations. Steve had originally intended to implement floating point math however in order to save several weeks development time he released it in integer only mode. This was the primary software users encountered when using their Apple II. In fact later models booted directly into BASIC.

The original Apple II shipped with 4 kilobytes of DRAM memory, of this approximately 1 kilobytes was dedicated to text video frame buffer memory. This left 3 kilobytes of general purpose memory. The first 768 bytes of this memory was shared between system monitor variables in the first page of memory and the processor stack / input buffer.

Apple II RAM Memory Map:

$0000-$00FF Zeropage (System Monitor Variables).

$0100-$0300 Processor Stack / Line Input Buffer.

$0300-$03FF Free Space.

$0400-$07FF Text Video Buffer.

$0800-$0FFF Free Space.

The Apple II used a rather novel approach for video generation, there’s a couple of popular methods for storing frames in memory. Rows can either be stored sequentially in memory or interleaved. Interleaving provides advantages when generating NTSC video. The Apple II took this approach one step further, using an 8:1 interleaving scheme. This had the first line followed by the ninth line. This peculiar approach allowed Steve Wozniak to simplify the video generation circuitry. A very smart hack!

As shown in my previous post on the GhettoVGA project, I designed a video interface for the Arduino Uno that uses the secondary USB interface IC to store/generate video. In the Arduino code it made no sense to keep the interleaved video mode instead my emulator decodes screen addresses and converts them into sequential memory locations. In order to save memory on the Arduino, storing the frame buffer is left to the secondary processor. This frees between 512 - 1024 bytes of memory.

Apple II Video Character Set Including Inverse And Flashing Modes.

The original Apple II supports a custom video character set that is loosely based on ASCII however it adds two custom modes, flashing and inverse text. In the original hardware this is implemented with discrete digital logic that essentially inverts the output of the video generator IC using an exclusive-or gate. The signal for the flashing text is generated from a simple clock divider and flashes at approximately 2 Hz.

Block Diagram Of The Apple II Video Interface Showing XOR On Output.

When I first presented my GhettoVGA project it was focused primarily on the ASCII character set. In order to improve efficiency and simplicity I converted it over to the Apple character set. Within the tight timing constraints of my AVR VGA generator it was not possible to implement the full inverse mode.
However it proved possible to generate flashing characters.

This was achieved by exclusive-or'ing the character lookup address with $80, this had the effect of toggling character lookups above and below the $80 boundary. By keeping remapped normal mode characters constant, but toggling between inverse and non-inverse for flashing I was able to achieve flashing text with very little CPU time. The clock for the flashing text came from dividing the frame counter.

Historically keyboard input hasn’t been subject to the same degree of standardisation as ASCII and thus the Apple II uses a custom keyboard protocol. The keyboard itself is based on a modified QWERTY layout.
As for sourcing a keyboard for this project I decided to use an old PS/2 device.

PS/2 is extremely easy to interface with Arduino, it uses 5 volt TTL logic and uses a synchronous serial protocol. PS/2 outputs are open collector which means you must use a pull up resistor. However the Arduino has internal pull-ups on many pins which can easily be used.

PS/2 Keyboard Timing Diagram.

PS/2 packets are 11 bits in length consisting of a fixed start bit (LOW), 8 data bits, a parity bit, and a fixed stop bit (HIGH). Data is transferred Least Significant Bit first. The parity bit is used to ensure transmission occurred properly. In my project I chose not to implement parity checking. Listening for data bits relies on monitoring the clock lines. One could poll for state changes, however the Arduino provides interrupt on change capability on several pins which is vastly superior.

PS/2 keyboards use an interesting protocol for communicating key presses/releases. When a key is pressed, the keyboard sends a scan code corresponding to the key press. When a key is released, the keyboard first sends the byte $F0 and then the scan code value. It’s up to the host to track modifier keys, etc. For extra fun, the scan codes themselves are mostly random, being a product of the key matrix.

Default PS/2 Keyboard Scan Codes.

The only realistic way of mapping PS/2 scan codes to Apple keyboard codes is through the use of a lookup table. The Apple II lacks a key up command and modifiers are processed within the keyboard hardware. This is much simpler, the Apple II acknowledges key reads by clearing the uppermost bit of the keyboard register. The keyboard handling code for my project is shown below. You can modify the scan code lookup table for easy ascii decoding.

For nonvolatile storage the Apple II originally shipped with a cassette interface. The idea was that you would plug the cassette interface into the earphone and recording jacks of a standard cassette player. Data was stored using a simple Frequency Shift Keying scheme at approximately 1500 baud.

Captured Audio Demonstrating Apple II FSK Modulation.

The idle state of the cassette interface was a 770 Hz square wave, data began with a 200 us sync signal followed by a series of bits, either one full cycle of a 1 KHz square wave (HIGH) or one full cycle of a 2 KHz square wave (LOW). It was the responsibility of the host to keep track of the number of bytes shifted in and terminate the read when appropriate. For most encoding schemes the Apple II saved two records of data to tape. The first record contained a two byte data length indicator and the second record included the data itself.

Tape data is detected using an incredibly simple circuit utilising a single 741 op amp. The headphone jack of the cassette player is passed through an inverting zero crossing detector with approximately 100mv of hysteresis.

The circuit acts as a sort of comparator, when the input signal is less than -100mv the op-amp’s output is driven high, when the input signal is greater than 100mv the output is driven low (-4v). R29 limits the maximum output current of the op-amp and the input clamping diode clamps the signal to approximately TTL levels (0 - 4v).

Effect Of A Zero Crossing Detector On A Sinusoidal Input Signal.

The output of the zero crossing detector is made available as a software register. By using a carefully timed loop and looking for the pin toggling one can detect the incoming frequency and hence extract data. It’s an incredibly elegant approach.

Schematic Of Flip Flop Based Audio Generators Used In Apple II.

Tape data is also written out using an incredibly simple approach, the Apple II uses a 74LS74 flip flop to generate tape and audio signals. By writing to a register address you can cause the flip flop to change state. Essentially toggling its output. By using a carefully timed loop you can toggle the flip flop at the desired frequency and generate an audio signal. R18 and R19 act as a voltage divider to limit the output.

The speaker uses the same approach, however in order to drive the low impedance load of a speaker a darlington transistor was used to provide high current gain.

Early on in the design process I chose to not implement a cycle accurate 6502 cpu, it added extra complexity and on the AVR any extra speed I could get was a huge bonus. However the lack of cycle accurate instructions makes keeping tight timing loops for signal generation / decoding impossible.

In order to avoid these complexities I decided to implement the tape encoding/decoding in the native AVR instruction set and implement hooks that would interrupt the 6502’s execution of the system monitor routines. This gave me a huge amount of flexibility in my decoding approach.

The above code generates and decodes Apple II formatted tape cassettes, it uses a very similar algorithm to Steve Wozniak’s original approach, however it allows me to use the handy delay and time routines. This code could potentially be used for other projects, instead of using tape, I recorded audio data to my mobile phone.

Analog Input Circuitry For Arduino.

The analog audio interface on the Arduino is equally simple, all that is needed is a 4.7k resistor and a 10uF capacitor. In order to increase the input sensitivity of the Arduino’s ADC I configured the Arduino to use its internal 1.1 volt reference. I then enabled the internal pull up on the ADC pin, this allowed me to use a single biasing resistor and capacitor.

At this point I had everything I needed in order to boot up an emulated Apple II. At first I configured the emulator code to write characters to the serial port. This proved highly successful and paved the way for further experiments.

Performance Measurement

Device

Instructions

Microseconds

Real Approx.

10, 000

~30, 000

Emulated

10, 000

192, 000

Approximately 5 - 8x slower than the MOS 6502 clocked at 1 MHz.

The Atmega328p on the Arduino Uno comes with 2 kilobytes of ram, significantly less than the 4 kilobytes of the original Apple II. As established earlier however not all of that ram was available for general use. Through some smart design the Arduino emulator provides 1.5 kilobytes of general purpose memory. Providing nearly 1 kilobyte for BASIC programs. This has proved sufficient to handle fairly complex programs as demonstrated below.

Completed Apple II Emulator Showing The Complete Hardware.

Now that I had completed the hardware for the Apple II it was time to write some software. I needed a demo to run and prove the device was functional. After seeing a recent article on calculating the Mandelbrot set on early mainframe computers I decided I would attempt to replicate the project using integer BASIC.

The algorithm I chose is known as the escape time algorithm, the escape time algorithm uses a repeating calculation to calculate the number of iterations required before the equation begins to diverge. Iteration values above a threshold are considered to belong to the Mandelbrot set and values below are not. It’s brute force but it’s very simple and memory efficient.

Calculating the Mandelbrot sequence takes a few minutes on the emulated hardware. The end result is fairly neat considering it was done with 16 Bit fixed point math on an Arduino Uno.

Displaying The Mandelbrot Fractal.

There is a couple small bugs I’ve noticed / improvements. The keyboard code doesn’t reset the bit counter appropriately so very occasionally (quite rare) its possible when resetting the machine to get the keyboard out of sync, need to implement a timeout. Also I think there’s some small bugs with the flashing character functionality. I swear I once saw a back to front flashing “R”, I have absolutely no idea how that happened!

I’d love to add some of the graphics modes functionality but I’ll need more memory for that! Linked below is the source code to the emulator and the video display firmware.