Veronica – ROM Monitor

Booting into a useful state.

A common feature of 1980s computers is that they booted into a useful state. This was one of the main things that separated them from the hobbyist and trainer boards that preceded them. You could plunk an Apple ][ or a TRS-80 on your desk, flip the switch, and it would boot up into a state that allowed you to make it do something. This sounds ridiculously obvious now, but it was a big deal at the time. It was when home computers crossed the threshold from toy to tool.

Veronica needs to cross that threshold now. Her creator is already kind of a tool, but so far Veronica is still a toy. What we need is a ROM monitor. A very basic piece of software that will allow you to do a little something on the machine besides stare at a blinking light. In my case, the significance is even greater. I’ve said from the beginning that I wanted to build a computer from scratch, and that my definition of “computer” was “a machine that you can use to write code for itself, on itself”. At the moment, Veronica is more like an embedded device or a game console. She can only be programmed by tethering another computer to her. Let’s see what we can do about that.

Before we dive into this, I should mention my assembly macros. I’m developing with a ca65-based pipeline, and one of the awesome features is a powerful macro language. That goes a very long way to making assembly easy to write and understand. Listed below is my typical set of macros. They should all be self-explanatory except for a couple.

The first slightly weird one is CALL16. That’s a shortcut for calling functions with a 16-bit argument. I have some standard 6502 zero-page areas specified, and one of those holds arguments for function calls. Implementing a high-level generalized stack-based function calling convention on the 6502 is a pretty complex task, and not something I need for this little bit of ROM code. Instead, I simply pass function parameters in pre-designated places on the zero-page. The code is faster, smaller, and simpler to read this way. It’s more limited, because you don’t have a proper local context for each function call. You need to be aware that you’re stepping on the previous context when calling a new function, so nesting calls must be done carefully. We’re programming without a net at this low level.

The other slightly odd macros are the GPU commands (static and variable versions). These are convenience routines for passing the two-byte commands through the memory-mapped GPU command register at memory location $EFFF. This is a command byte followed by a parameter byte. Each two-byte command is queued up and processed on the GPU when it has time. This is all documented on the various GPU pages of Veronica’s blog entries.

Okay, with the housekeeping out of the way, let’s get down to business. I already have some input routines that I wrote for my keyboard interface, so we’ll be leveraging that. After checking all the RAM, Veronica will boot up into a loop that takes a command line of input:

Once we have the command line as a null-terminated string in a buffer, we parse the command and match it up to known commands. This is done using a jump-table, which is a very powerful assembly technique to which the 6502 is well-suited.

Now to write the commands themselves. The main job of a ROM monitor is to examine, modify, and execute memory. These three things are the minimum requirement to program a computer. My “Read” command takes a starting address and a number of bytes, then displays that memory onscreen in a human-readable format.

Here’s a video of booting up into ROM, and using the Read command to look at some memory. At startup, RAM tends to be filled with $42, because that’s a special value used by the RAM diagnostic.

Now we need to be able to modify memory. That’s done with the Write command. I won’t go over all the code for all the commands here, because they’re quite similar. The Write command takes a starting address, then an arbitrary list of space-delimited hex bytes. Here’s a demonstration of examining memory at $4000, then writing some NOPs into that same area. After each modification, the Write command automatically displays the memory you modified so you can see the change.

The command to execute code (“Go”) is the last piece of the puzzle, and it’s trivial. Here’s a demonstration that brings it all together. I fire up Veronica’s ROM monitor, enter some code into memory (hand assembled) to display an ASCII character set, then run it. With this act, I have proven that Veronica meets my definition of “computer”. You can use her to write code for herself, entirely untethered to any other devices.

Oh, just more thing. Since this ROM monitor isn’t very exciting with just those housekeeping commands, and I have those gamepads ready to go, I added this:

Honestly, I doubt there’s more than about $20 in parts in Veronica. At the end of the day, it’s just a handful of common chips and passives. Of course, you have to blissfully ignore the labor to arrive at that number. There’s easily over a thousand hours of work in it by now, if you include all the time spent learning enough to actually do this. I started with nothing but a 6502 and an idea. If I’d known how much I didn’t know, I might never have started. 🙂

How much would the final cost be? Well, it IS true there’s still a few pieces missing: some storage (SD card?) or at the very least some sort of wired connection to a PC as an alternative. But that wouldn’t be MUCH more.

Well, the main benefit of the backplane is modularity, and that has been very valuable for me in learning how to do this. In the future, I could have a chance of developing a single board computer with comparable features all in one go. The backplane has been invaluable for experimenting and mixing and matching elements.

You wouldn’t be doing this in the first place though if the labor (and learning) wasn’t something you enjoyed. 🙂 My current big project has hundreds of hours of labor in it, and I’ve learned so so much in the process that I wouldn’t have learned otherwise. That’s why I keep making up projects that are so far beyond my current abilities. If I knew how to do everything I needed to know for a project up front, it wouldn’t be nearly as much fun. 🙂

Totally. I always try to do something that I think is right at the edge of my abilities. The task is inevitably underestimated, so that pushes me into new territory without seeming so difficult that I get discouraged and give up. It’s a delicate balance though. During the video board development, for example, I definitely got in over my head. I came very close to giving up many times. You just have to step away sometimes, then come back and chip away at it a little more. Eventually, a breakthrough happens.

Yeah, for sure. I’ve been working on my lamp for nearly 2 years. So many times I’ve been frustrated and walked away, sometimes for months.

When I started, I guess I did have a bit of a “how hard can it be” feeling initially, thinking that using 10W LEDs would be just “scaling up” the way I’d done 3W LEDs. That “works”, but 12 of them creates a ridiculous amount of heat, which in a confined space, surrounded by wood… I ended up reading about what temperatures various types of wood spontaneously combust.

So I ended up learning about switching power supplies, constant current buck convertors, and trying to build one, which was an abject failure. 🙂 But I learned enough about them to be able to pick an open source one someone else made, the Picobuck. At some point, it becomes a matter of deciding whether I really want to focus on learning about switching power supplies for months so I can build my own, or just buy someone elses so I can focus on other parts of the project.

Yah, there’s a lot of “picking my battles” on Veronica as well. For example, I wanted to support USB keyboards, but didn’t want or care to learn how to build an entire USB host stack from scratch. So I used the PS/2 compatibility mode built into 90% of USB keyboards. Why spend time on that stuff when I’d rather be reverse engineering Nintendo controllers, or other stuff I was interested in. Or when I got stuck with a race condition in my GPU command register, I was very glad to find a dual port SRAM FIFO on a chip that I could stick in there to make the problem go away. It would have taken a crapload of TTL logic to solve the problem, and I knew I could do it, I just wasn’t feeling masochistic enough to bother. 🙂

You know, Quinn, the absolute best parts of this build are the decisions you make and the honesty you display in documenting it all. If I had a young engineer under my wing I would point her here to learn how to ‘think like an engineer’.

Kudos on the ‘Ronny ROM’, yes every computer should play Pong! It’s the post – 1980 Hello World !

Nice work as usual! I like the Pong. Reminds me of a Tetris clone that a bunch of friends of mine wrote for the Prime computer back in the day, which also had Pacman built into it: whenever you filled a row of blocks, a little pacman would come and eat them up. And the Pacman clone did something Tetris-y when you finished a level.

On a more serious note, I’d like to point out that it might be useful to generate your hexdump in a format that’s compatible with your commands. I see in the videos that your monitor is more-or-less full screen so this idea might not be compatible with your ideas at all (as I said before, I like that you always find new original ways to solve problems) but if you would ever consider e.g. a serial port for input and output, you could use the monitor to dump your memory and a PC on the serial port to save the hex dump to a file. Then later on, if you want to load that data or program into Veronica again, all you would do is connect that PC again and send the saved hexdump to the ROM monitor.

Many other systems do this, for example the Woz monitor on the Apple 1(*) generates hexdumps in the form “XXXX: AA BB CC DD EE FF GG HH” and the command to enter hexadecimal data into memory is “XXXX: ” followed by the bytes to store. On the PET 2001, the output is even compatible with the built-in editor so you can just dump some memory. move the cursor up on the screen to change some bytes and hit Enter to store the changed data. It’s an old idea (certainly not mine) from the days of when printing terminals and paper tape were an easy way to record output and send it back to the input.

Of course if you’re going to add an SD card reader (or a disk drive or whatever), it might not be that big of a deal if you can’t save Veronica’s memory through a serial port, and I know you have an EEPROM on board anyway (right?). But I thought I’d mention it anyway…

As always, keep up the good work!

===Jac

(*) I mention the Apple 1 because I got a Replica 1 from Briel Computers the other day so that’s what I’ve been playing with. You might also be interested in Krusader, which is an 8K program for the Replica 1 that lets you enter and edit assembly language programs from the keyboard and assemble them to machine code, without the need to cross-assemble the code on a PC. The program is not mine but I think it’s open source and probably not too difficult to port to Veronica. Google for Replica 1 Krusader for more info…

It has schematics for the whole computer. If you look at the one for the video output, you see that the video driver has RGB output as well as H sync and V sync. It feeds into the RF driver to send to the tv. I’m guessing that you can take the output of the video chip and run the V sync and H sync through frequee multipliers, I think it would be VGA.

Unfortunately, it isn’t quite that simple. If you run the H sync through a multiplier to double it, you get an H sync that happens in the middle of the otherwise normal video line. At best, this breaks your video line into two pieces; at worst, there’s no proper ‘front porch’ and so a lot of the video signal for the ‘second half’ of the line is lost before the monitor is ready to display it. You’ll have to redesign the signal profile of the video line, and unless you also double the rate of the dot clock, you will effectively halve your horizontal resolution.

Yes, you’ll have doubled the number of lines in a frame by doing this — but you’ll have added another V sync in the middle of the frame as well, which puts you back to the same original number of lines per frame; now they’re just happening twice as fast.

The way the Commodore 128’s 8564 VIC-II did things was by taking advantage of the fact that a 30fps 525-line interlaced NTSC video signal is really equivalent to a 60fps 262.5-line progressive video signal that is offset vertically by half a scan line every other frame. It actually outputs 60fps at 262 lines. The lack of a half-line offset makes the scan lines line up from frame to frame, and accounts for the fact that on a high-quality CRT monitor with good focus, you could see black between the scan lines.

So, the easiest way to make the 128’s VIC-II signal into something that might be able to sync up on a VGA monitor would be to use a line-doubler, which would accept a full line of video, then spit it back out twice, at twice the speed.

Thanks for that. You saved me from making a horrible mistake on my own project. Even though I’m a very long way away from that point.

Back on topic, though. Can a ROM monitor be programmed to use a disk drive, or would you have to do something like the C64 that has a kernel ROM and a basic ROM on board with a DOS on a ROM in the disk drive?

Sure! The ROM is just a permanent code/data storage medium that is jumped into automatically by the CPU. That code can do anything you want.

Many ’80s computers did this. One of the key improvements in the Apple //+ (over the regular Apple //) was the new Autostart ROM. On startup, it scanned all the slots looking for a disk controller, and if it found one, it started the bootstrapping process from disk. This was an awesome feature at the time, which made the machine very easy to use compared to competitors, which required cryptic command sequences to launch software. That simple bit of code meant you could put in a disk, turn on the machine, and everything would just “go”.

John, I have an entire X3.131-1994 SCSI-2 driver in ROM in my POC V1.1 single-board computer’s firmware, with support for hard disks, CD/DVD-ROMs and tape (seven devices total). There is no “DOS,” since that implies that a filesystem is present somewhere. The firmware does have the ability to load a boot block from SCSI device $00 and execute master boot code. The M/L monitor can issue low-level commands to SCSI devices, such as read or write data, format a disk, etc.

The BIOS in a modern PC doesn’t know anything about filesystems, and only knows enough to read a boot block from a device and execute the code therein. So lack of direct filesystem support in ROM is not an omission, but an intentional characteristic.

As for attaching mass storage to Veronica, from what I gleaned in following this project as it developed, it’s certainly possible, assuming she has enough space in her ROM for the necessary drivers. Attaching an IDE-like device would not be trivial though. The code needed to drive an IDE device is complicated by the fact that the IDE interface is 16 bit, requiring double buffering in the host machine due to the 6502’s inability to load and store words.

When I was mulling over the attachment of mass storage to my POC unit, I decided to go with SCSI because I would not have to double buffer the I/O path, and because SCSI is so much more intelligent than IDE. Once I leaped the hurdle of learning to talk to the SCSI controller ASIC (an NCR 53C94), I was able to develop compact, interrupt-driven code that made SCSI I/O fairly simple for a calling program. I doubt that I could have done so with IDE, which basically has no intelligence.

Wonderful! You know there IS a nice open source tiny basic for the 6502 that you could put in rom, though you might have to relocate it. Google for Tom Pittmans tiny basic (but you already knew that).
IIRC he org’ed it to run out of low memory, but you could store it in rom at high memory and then block copy it to low memory to make it run.
Another nice addition to your rom monitor would be a simple line by line mini assembler. A single pass assembler can only code ‘backward’ branches, so to jump or call ahead you have to supply the address, but at least the assembler saves you from having to commit all the opcode hex values to memory.

I love how zero page can be used as a sort of pseudo-register bank. Even the addressing modes for zero page treat them almost as though they were registers. Useful considering the 6502 only has the three actual ones.

Indeed, that’s basically exactly what it is- a memory addressable register set. It’s obscenely clever, and a model still used by a lot of modern microcontrollers.

If the 6502 has one big weakness, it’s that the instruction set is not well suited to modern language calling conventions. Setting up a system to call functions with local contexts all managed on the stack is a lot of work compared to other CPUs. The 68000, for example, has a whole suite of instructions specifically for dealing with high-level language constructs. Of course, that’s a much more modern CPU, and both are likely a product of the time they were developed in, not to mention cost considerations.

If the 6502 has one big weakness, it’s that the instruction set is not well suited to modern language calling conventions. Setting up a system to call functions with local contexts all managed on the stack is a lot of work compared to other CPUs.

This is where the 65C816 has a significant advantage over the 65(c)02. There are instructions for generating stack frames without clobbering the registers (PEA, PEI and PER), and other instructions that use the (16-bit) stack pointer as the base address for indexing into stack frames (e.g., LDA $01,S). The stack pointer can be directly copied to and from the .C accumulator, which is real handy for doing stack shuffles in conjunction with the MVP block copy instruction. All of these together make an optimizing C compiler a practical application on the ‘816.

I make extensive use of these capabilities in the firmware for my POC computer. Indeed, if I had had to stick to 65(c)02 code I could not have built in the level of functionality that I achieved, simply because the code would not have fit into the 8KB ROM address space.

Indeed, the 65C816 is a really under-appreciated chip. I got to know it when I was learning to program my Apple iigs, which remains probably my favorite computer ever.

If I to pick a favorite CPU in isolation though, I’d probably nominate the 68000. Programming that in assembly is so smooth and easy. There’s pretty much exactly an instruction for everything you want to do. I particularly liked the 68040, which had a quad-word memory move instruction. Many years ago, I wrote a positively beastly sprite blitter with that on an ‘040-based Mac. It was stonking fast, but only ran on a tiny subset of computers that nobody bought. 😀

I actually considered designing and building an SBC powered by a 68K MPU. Two things at the time that put me off were the cost of the 68Ks that had the MMU built in, and the MPU’s terrible interrupt latency. The cost has greatly decreased over the years, but not the interrupt latency. 🙂

I was considering building a SBC around the 65C816. They’re still available as new production today in both dip and smt packages. The only problem is that to make use of the chip’s full potential with expanded memory addressing almost requires custom bus loglic implemented in FPLA or similarly exotic parts. I guess you could build it using discrete high speed advanced cmos logic, but you’d probably have to spring for high speed static rams (like 55ns or better) if you wanted to run the cpu at its fastest clock rates. I tried to figure out the timing required for the logic, but the data sheets on the chip are not that clear to me. I don’t think the Apple ][GS took full advantage of the CPU’s extended addressing abilities.

Now if you ever decide on doing a Veronica ][ spin with that chip, I’ll surly pay attention!

I’m not certain if the iigs did leverage all of the 65C816’s tricks, but it did support up 1.25MB of RAM initially, and several exotic addressing tricks like shadowing writes into multiple banks (including video memory), stack relocation and more. Later hackers extended the address space to 8MB. It was a remarkable machine, sadly intentionally crippled on clock speed so as not to compete too much with the less capable Macs of the day. A GS with an accelerator card like a Transwarp or Zip Chip can perform some astonishing things for its day.

I was considering building a SBC around the 65C816…to make use of the chip’s full potential with expanded memory addressing almost requires custom bus loglic implemented in FPLA or similarly exotic parts…

Not necessarily. Several years ago, I did an extensive timing analysis of the bank latching logic to determine if it could be done without use of programmable logic and concluded that it could be. 74ABT devices have single digit nanosecond propagation times, producing a level of performance that would be commensurate with the timing requirements of a 65C816 running at 20 MHz. For example, a 74ABT573 running on 5 volts is fast enough at 20 MHz to latch and propagate the A16-A23 address component prior to the rise of Ø2. A 74ABT245 bus transceiver would have no problem at that clock rate in changing directions according to what the data bus is doing at any given instance. On the other hand, a 74AC138 would be struggling a bit due to its worst-case 10ns prop time but would manage to stay with the MPU if its chip enables were asserted soon enough during Ø2 low.

Your real concern with the use of discrete logic would be the effect of cumulative prop time. Addressing RAM beyond 64KB will not only require the capture and propagation of A16-A23 but possibly the management of multiple chip selects if more than one SRAM is involved. Although selecting the appropriate RAM could be done with a 74AC138 (or 74F138, which is slightly faster, although more power-hungry), I don’t think it’s practical beyond 14 MHz. It all has to happen before Ø2 goes high, which gives your logic a 35ns window in which to perform its magic. When the response time of the addressed device (RAM, ROM or I/O, the latter two which would have to be wait-stated) is added to glue logic prop time there’s just too much involved for reliable high speed operation to be achieved.

A high speed 65C816 system’s glue logic is best implemented in a programmable logic device. Doing so not only addresses timing concerns, it results in improved printed circuit board layout density, which is important as the clock speed is ramped up. Currently available PLDs can manage pin-to-pin prop times of 10ns or faster, and use a lot less PCB real estate than the equivalent discrete gates. My next generation 65C816 project (POC V2) will use a CPLD for glue logic, with only a few discrete devices in parts of the circuit where speed is not a concern.

…but you’d probably have to spring for high speed static rams (like 55ns or better) if you wanted to run the cpu at its fastest clock rates.

My POC V2 unit will have 512KB in a single Cypress CY7C1049D static RAM. The CY7C1049D is available in 10ns and 8ns speed grades, and is in an SOIC-36 package (50 mil pin spacing). Garth Wilson makes a plug-in DIMM that uses eight of these SRAMs, producing 4MB that is addressed with A0-A18 and 8 chip selects. That device will find its way into a future ‘816 project I have in mind.

I tried to figure out the timing required for the logic, but the data sheets on the chip are not that clear to me.

Unfortunately, the timing diagram and specs in the 65C816 data sheet leave something to be desired in clarity and accuracy. A number of us at 6502.org have long suspected that some of the numbers are bogus, as if they were true, the CMD SuperCPU card for the Commodore 64 would not have been able to function (it ran at 20 MHz). An outgrowth of that discussion is that one of our members went to the trouble to create an annotated and properly scale timing diagram for the ‘816 when running at its maximum “official” speed of 14 MHz (see http://forum.6502.org/download/file.php?id=365&mode=view). It may be of help to you in understanding what the ‘816 is doing, especially during Ø2 low, which is when the bank address (A16-A23) must be captured and propagated by glue logic.

I don’t think the Apple ][GS took full advantage of the CPU’s extended addressing abilities.

The timing diagram you linked to does answer some questions, but I think I’d need to see a logic diagram or equations to answer the issue of how the processor signals are translated into memory access signals. In some cases it looks like delay lines are required to tap off the PH2 and E signals.

As for clock speeds, I’d be content with a lower clock speed than the max to make hw debugging easier, perhaps selecting a clock rate that would also either sync with video signal generation, or baud rate generation. Crystals at such frequencies are probably easy to find as well.

The timing diagram you linked to does answer some questions, but I think I’d need to see a logic diagram or equations to answer the issue of how the processor signals are translated into memory access signals. In some cases it looks like delay lines are required to tap off the PH2 and E signals.

I could publish logic equations (they aren’t all that complicated) but doing that here would be usurping Quinn’s space and is not courteous. Page 45 in the current ‘816 data sheet illustrates a method for capturing A16-A23 from the data bus. As shown, there is a potential error in the circuit, in that it doesn’t account for the state of the VDA and VPA signals (if the expression VDA && VPA is false then the address bus is invalid).

As for Ø2, no delaying is needed or should be present. All timing is based on the rise and fall of Ø2. The ‘816’s internal actions lead or lag Ø2 in a way that obviates the need for clock stretching or contracting. The key to a successful design is in the use of really fast glue logic. A CPLD or FPGA is best at speeds above 14 MHz. My POC unit, which will POST at 14 MHz, uses 74AC logic. Also helping is a strong clock source with a rise an fall time below 5ns. I achieve the latter by running the output of a can oscillator through a 74ABT74 flop.

The 65C816’s E output is mostly unused in present-day implementations. My understanding is that it was a feature requested by Apple when the ][gs was on the drawing board, so hardware and software could determine in which mode the ‘816 was running at any given instance. I don’t use E for anything, haven’t found a good use for it, and am not aware of any other current ‘816 machine that does look at E.

As for clock speeds, I’d be content with a lower clock speed than the max to make hw debugging easier, perhaps selecting a clock rate that would also either sync with video signal generation, or baud rate generation. Crystals at such frequencies are probably easy to find as well.

I’ve found from experience that it is best to keep the Ø2 clock source separate from other clock sources. Accurate baud rate generation, in particular, involves “weird” frequencies, such as 3.6864 MHz or 1.8432 MHz (used with the 65C51 UART). With can oscillators being as cheap as they are, there’s no compelling reason to bind the Ø2 clock to a baud rate or video dot clock. That’s really old school that made sense in the days of the Commodore 64, but not now.

How much more (or less?) work would building Veronica with a 68k chip have been, compared to the 6502? Are 68k chips even available in DIP?
I don’t know much about the 68k series, so forgive the nievety of the question. 🙂

The original 68000 is available in a DIP, though I don’t think the later ones are. Mind you, it would be easy enough to grab a breakout board with a socket for whatever form factor the chip comes in (sparkfun, adafruit, etc).

Cost-wise, I don’t think it would be much different. These old chips probably all cost about the same (ie. basically nothing) nowadays. There are a few exceptions, such the C-64 SID chip which is now really valuable due to the popularity of 8-bit music. To the point where people are buying up C-64s for a couple of bucks, stripping the SID out, and selling it on eBay for $40. A real shame.

The main issue with a 68k build is probably size. The 24-bit address bus and 16-bit data bus would be a bit of a pain, depending on your layout. Your board sizes and complexity will be higher, and your memory map could get a lot more complicated. These are the main reasons I didn’t use one in Veronica (this being my first ever home-brew computer build, after all).

Interestingly, Motorola themselves addressed this “problem” with the 68008. It’s a 68000 core with smaller busses, to make it easy to build into smaller computer designs. Internally, the chip automatically splits up the address and data requests and sends them out in pieces over the smaller bus. In other words, you get all the benefits of 32-bit registers, large word math operations, etc, with a small bus. The tradeoff is of course half the speed in memory operations. Still, a great compromise, and one I would consider for an SBC build.

I’ve got a single sample of Digital Equipment’s single chip PDP-11. It’s in a 40 pin DIP and runs the basic PDP-11 instruction set. The chip can be set up to operate in 16 or 8 bit data bus modes, and will handle dynamic ram refresh as well (Though with the density, speed, and price of sram today, that feature is not worth anything anymore.)

I’ve been hankering with the idea of building a PDP-11 with that chip, and maybe making use of an AVR chip to emulate the front panel of the machine. I’d love to build a PDP-11/40 look-a-like! I used to work for DEC and I still have a soft spot in my heart for those machines!

Yes, following that one very closely. I’ve been seriously considering a 68k SBC myself, so it’s great to see someone else do it. It’s looking really nice so far. I’m also excited to see they’re using one of the Yamaha video chips that I’ve been itching to try. I’ll be interested to see how they interface the linear RGB output to a modern display. They’ll need to generate porch signals or some other kind of framing to connect a modern VGA or composite display. Computers of the period that used RGB outputs (such as IIgs, since we’re on the subject) used custom monitors that are nearly unobtainium now. That’s one thing that made me shy away a bit from the Yamaha chips (aside from the difficulty of obtaining them).

Running Blondihacks costs a lot of time and money. If you enjoy this content, please consider giving a donation by Paypal or Patreon to defer a small part of the site's operating costs. Thanks for being an awesome community of hackers, tinkerers, and smart asses!