An Umbrella Site for Michael Martin's software experiments

This blog is predominantly about programming old computer hardware, with a focus on using modern tools but as little as possible between our logic and the underlying system. That’s not always talking directly to the hardware—usually we’ll make heavy use of BIOSes or onboard ROM routines—but it is to use the simplest constructs that we can to meet our needs.

It’s time to apply this philosophy to a more modern system. I had some practical problems to solve when I began investigating this, but in the end this has turned into another Bumbershoot Software project where I try to do as much as possible while relying on as little as possible and with as few compromises as possible.

Background

Apple has used fairly sophisticated metadata formats for their executables throughout their history. Classic Mac OS bundled applications into carefully formatted resource forks, and both OS X (now macOS) and iOS use precisely-structured directories to represent much of any given application. In addition to the executable binary, there are separate files representing graphical resources and usually some kind of declarative representation of much or even all of the user interface. These technologies have changed over the past 20 years but for the most part the core logic has remained unchanged.

My goal for this article and its followups is to strip away as much of the automatic frameworks in macOS and iOS applications as I can, so that one might treat developing for the Apple ecosystem in a way consistent with other operating systems.

Simplifying iOS

In some senses, our options on iOS are limited. Apple very aggressively polices its applications in its App Store and will prune anything that does not conform to its latest deprecation levels. That said, we still have have a lot of leeway. Most of the newer technologies that Apple has introduced over the life of the iPhone do not replace earlier APIs so much as wrap them. That means we don’t need to significantly change the APIs we use, because those are stable and generally available across all versions of iOS that the App Store would let us support.

The main intrusion Xcode makes into a notionally empty iOS project is two storyboards. Storyboards are an extension over their earlier Interface Builder system (“NIB” and “XIB”) which declaratively encode not only the UI on the various screens of an application but also represents transitions between them (segues). The underlying UI elements—UIViewController for individual screens, and UIView for the widgets within it. One of these storyboards is called LaunchScreen.storyboard and it controls the interim display while iOS itself is loading your application and preparing it to run. This is not really part of our logic, and this one doesn’t need to be stripped out. The other, however, is Main.storyboard, which is supposed to represent the entire workflow throughout the whole system, and this rapidly becomes an unwieldy nightmare. The article I linked there suggests breaking this down so that each storyboard is a single view controller, but why bother with this when iOS still permits working directly with View Controllers and Views?

Well, OK, there is an answer to that, which is that the level of single view controllers is a little inconvenient for doing visual design of a user interface. But for the purposes of this project, we’d like to remove all of the declarative UI and do everything (except the launch screen, which predates us starting anyway) programmatically.

Removing the main storyboard is easy—just delete it out of the project and remove the UIMainStoryboardFile key from your project’s Info.plist. Of course, that isn’t quite enough as it is; the main storyboard was responsible for actually setting up the app’s basic display and its first screen of UI. We’ll have have to do that part by hand. In our application delegate, we’ll need to extend the application:didFinishLaunchingWithOptions: callback with some code to set up the window and put the root view controller into place:

In the view controller’s loadView: method, assign a new instance of your core view to the self.view property.

In the view controller’s viewDidLoad: method, set up any extra or cross-view gesture recognizers and actually activate the layout constraints. (If you aren’t working as close to the API as possible, the Masonry library provides some much-needed abstraction over AutoLayout’s systems of linear equations.)

Auto Layout was introduced in iOS 6, and any actually releasable iPhone app will thus at this point be guaranteed to support it. The auto-resizing mask is at this point just an embarrassing legacy of early OS X, and disabling it is just a bit of omnipresent boilerplate. With this, we’ve stripped away pretty much all the tooling around the raw code, and the UI will be built programmatically by the application as it runs.

On iOS, though, this doesn’t really buy you a whole lot beyond the ability to more easily use tools like Masonry and noticeably fewer merge conflicts thanks to a lack of storyboards. There’s not a whole lot of reason to actually do this on iOS.

Doing it on macOS is a little more exciting.

From iOS to macOS

Apple’s research and development of its user interface APIs is very clearly being spearheaded by iOS. Macs have gotten innovations from iOS in waves; the view controller paradigm supplemented OS X 10.0’s window controllers in 10.5 Leopard, Auto Layout was introduced in 10.7 Lion, and 10.10 Yosemite knitted these together more tightly. It’s a perennial fear amongst Mac users that Apple is planning to reduce the Macintosh to an iPad with a keyboard, but these API imports really just make development on the Mac more straightforward. The two main differences when writing Cocoa for Mac instead of iOS is that the classes and constants all start with NS instead of UI and there is an extra NSWindow and NSWindowController above the view controller level.

Oh, and storyboards aren’t used by default. because those are tied pretty strongly to a single-screen display that evolves over time as opposed to the desktop metaphor. Mac development continues to use the old “Interface Builder” APIs and represents the interfaces so designed as .nib files in the app bundle, or the equivalent XML-based .xib files when part of an Xcode project. (Storyboards are supported in macOS as of 10.10, but there’s very little call for them in a desktop application.)

That aside, the convergence between macOS and iOS is strong enough that the procedure for stripping down a macOS application is very similar to the iOS one. Put all your UI-building code in NSViewControllers and NSViews, set up your NSApplicationDelegate so that it builds your window, and you’re almost home free. There are two complications.

Building a Window From Scratch

The window’s lifecycle is split up a bit more in macOS.

Create the window with [NSWindow windowWithContentViewController:] in the application delegate’s init method.

In the delegate’s applicationWillFinishLaunching: handler, set your window’s contentView hander to the window’s contentViewController.view.

In the delegate’s applicationDidFinishLaunching: handler, call [NSApp activateIgnoringOtherApps:YES] and then invoke makeKeyAndOrderFront: on your main window with the application delegate as the parameter.

Note that NSWindows did not get contentViewControllers until 10.10. If you want your application to run on older versions of macOS, you’ll need to create the window with the initWithContentRect:styleMask:backing:defer: initializer, and just create your main NSView just before assigning it to the window’s content view.

Dealing With the Menu Bar

Even if we wipe out all the windows from our Interface Builder files, we can’t actually get rid of the last one by default it. The project’s Info.plist will list an NSMainNibFile which specifies the initial state of the menu bar, and NSApplicationMain seems to insist on it being there. To get rid of that dependency, we need to replace NSApplicationMain and then build a menu bar ourselves at an appropriate time.

We will also need to set the “activation policy” that tells the rest of the OS how to deal with our windows, and in particular, that it should treat them as ordinary GUI application windows and not popunders or system alerts or whatever. We do this by adding this line to the applicationWillFinishLaunching: handler:

[NSApp setActivationPolicy:NSApplicationActivationPolicyRegular];

This handler is also the correct place to build our main menu. The most basic possible menu just has an application menu with a single option to quit out. Some straightforward code for building that looks like this:

(If you aren’t familiar with Objective-C, lines like [foo bar] are invoking the method bar on the object foo and method arguments are introduced by colons. And yes, the default indentation styles are that when you do a method call that spans multiple lines like this that you make the colons line up with ragged edges on both left and right. Don’t look at me, I didn’t do it.)

With this in place we can now remove our reference to a “main NIB file” from our Info.plist. But we can do more than that.

The Punchline

Once we’ve stripped out MainMenu.nib and declined to use any other bundle resources, we have not merely removed files from our app bundle. We have removed the need for an app bundle at all. A Cocoa application that has been stripped down in this way can be run directly from the Terminal, and it will behave almost exactly like an X11 app does in a Unix terminal. The terminal’s control codes can be used to suspend, resume, or terminate the graphical application, and the application can output data to the Terminal’s standard output and error, which may be redirected as usual. Meanwhile, the application itself lives on the Dock and gets its own menu bar to switch to, and otherwise functions just like an ordinary application, albeit one that is using a Terminal-derived icon for its window.

That’s a fun trick to have, and that actually solves the problem I originally had (which was that I needed to pop up a dialog box inside a shell script, and AppleScript was not behaving as cleanly as I’d have liked it to). But this is all just dry theory and snippets, so far. Next time I will go through a worked example, and then we will see how many modern APIs we can abandon without having to compromise anything. Programming and API conveniences are nice, after all, but being able to actually run on older machines is nicer still.

References and Acknowledgements

Stack Overflow is full of people trying to do various parts of the techniques that I’ve presented here, though most of the answers may be summarized as “for the love of God, don’t do this thing.” Matt Gallagher’s Minimalist Cocoa Programming article from 2010 provided the most solid base for the rest of my development here, and where the procedures I have presented differ from his, they are to account for changes in the default macOS programming API, or to account for places where the minimalist approach doesn’t produce a fully well-behaved application when run from the Terminal. But that is no discredit to him—he could not see the future, and his goal was not my goal.

Last year I outlined the math required to provide a consistent aspect-corrected image using OpenGL and Cairo under GTK3. I’ve been working on mastering the SDL 2 library lately—I’ve used SDL 1 on and off for years, but there’s a big gap between the two versions—and this is the first problem I end up facing in pretty much anything I do.

Now, the joke here is that SDL2 automates this, so instead of doing a ton of math I could just name a function and make that be the whole article. But the automation isn’t quite complete, so there’s some subtleties we’ll need to delve into. Let’s start with why I need to actually do some study to pick up the new library in the first place.

How SDL1 and SDL2 See the World

SDL1 was written in the late 1990s to make it easier to port the games of the era to the machines of the era. It is, as a result, built around the kinds of basic graphics abstractions that existed at the time, for games like Doom or Sid Meier’s Alpha Centauri. The core graphical abstraction is thus a simulated framebuffer—machines of this era ultimately had some area of memory that was used as a bitmap that represented the screen, and SDL1 allowed you to capture it and write to it, translating colors into appropriate formats or even doing palette lookups.

For windowed systems like Windows 98 or X11, it could also pretend you had a pointer to “screen memory” but actually be managing a movable 2D rendering context displayed within a window. Unfortunately, this means that resizing a window is functionally equivalent to changing a screen resolution in a non-windowed system, which has implications for how blocks of pixels get copied to or from this simulated screen. Still, this was an issue anyway, so for the most part using SDL meant pretending that you were dealing with a slightly fancier version of the old VGA graphics modes. (Alternately, you could use SDL to set up an OpenGL rendering context and just use that; while this was a very common thing to do, in this mode SDL is no different from GLUT or other wrappers around WGL/GLX/AGL/EGL/etc. I’m ignoring that for this article.) And since “any machine with a framebuffer to draw pixels to” was pretty much “any machine”, this was as good as one could reasonably expect to get for compatibility.

Fast-forward ten years or so, and that assumption has started to get really shaky. By now, 3D-accelerated graphics are the default and the fundamental unit of graphics is no longer the framebuffer. Instead, all rendering systems are working with some kind of shader-based rendering system, and “framebuffers” are just an old-fashioned name for a texture being used to render an entire displayable area. This is a little more inconvenient to set up, but life is much easier when it comes to resizing windows; we’ve seen that already with the aspect-preserving screen-scaling routines. Not only that, but the meaning of a game program “going fullscreen” has changed—by now the default expectation is that a “fullscreen” game will simply be an unmovable, undecorated window the size of your entire desktop, with the desktop running as usual underneath it. This is a noticable improvement in program stability and desktop coherence from the old world where the monitor’s resolution actually changed and the desktop often got scrambled and other applications had their graphics contexts forcibly destroyed mid-run.

SDL2 acknowledges this shift in basic hardware assumptions, and the graphics API adapted to fit it. The SDL1 data structure SDL_Surface for representing frame buffers still exists, but now it is a block of pixel data marked as being purely under CPU control. Complementing it is SDL_Texture which is under dedicated GPU control if possible, and SDL_Renderer which is the context into which textures and other geometric elements get drawn.

Bridging the Gap

Some of the changes from SDL1 to SDL2 are, on the face of it, simply cosmetic. Most SDL1 applications, for instance, will be switching their single call to SDL_SetVideoMode to a single call to SDL_CreateWindow. This is in one sense just being honest about what it is that the call is actually doing—most SDL1-based applications were never written to be run in environments where SDL_SetVideoMode ever did anything but create or resize a window—but in another sense it’s a major change in how the application works. SDL2, unlike its predecessor, actually does support one application with multiple simultaneous windows.

Likewise, the division of labor between Window, Renderer, Texture and Surface—where previously all operations were either truly global or operations on SDL_Surfaces that might have been the simulated screen—has needed to move functions into new conceptual namespaces. SDL_Flip is now SDL_RenderPresent, for instance.

The more interesting question is how to reorganize your rendering logic to play well with the new world. The SDL Migration Guide outlines a number of scenarios depending on how a program ends up using the core APIS. The two major ones are situations where the system has built up multiple surfaces and is using them like sprites, and situations where the whole screen is built pixel by pixel and then rendered into place with a final blit.

In each case, SDL2 interposes itself between us and the GPU doing the actual rendering, and it provides a decent level of support for handling aspect-corrected scaling on its own. We’ll look at that first, and then fill in the gaps left behind.

A Quick Caveat

At the time of this writing, the latest version of SDL 2 is 2.0.8. This is the first edition of SDL 2 that includes a driver for Apple’s Metal APIs, and my experiments suggest that the driver doesn’t support these calls, at least not when used as it is on Windows, X11, or Wayland. Furthermore, even the non-Metal drivers produced odd results when I tested the APIs using brew’s version of SDL2 instead of the official Framework bundle provided by the SDL team itself.

Logical Screen Sizes

The greatest level of automation is provided by the SDL_RenderSetLogicalSize function. This automatically computes (and recomputes, as the window size changes) appropriate scaling and offset factors so that when you make calls in the 2D Rendering API, you may treat it as a “virtual screen” with those dimensions, and you will get results like those we saw in our previous articles.

So, does that get to be the whole article? Not quite. there are several sharp corners we need to file down before this gives us the things we want.

We’d like to be able to control the color of the letterboxing rectangles.

The logical-size feature is implemented in terms of some other parts of the API, which means that using this forbids use of certain other parts of the API.

The logical size is always expressed in square pixels. Our aspect-correction was, in part, because the displays we were hoping to simulate used non-square pixels.

This only works for the 2D Rendering API. If you want to use OpenGL to do actual 3D rendering, none of this will work.

We’ll save 3D rendering for another time, but let’s take the rest in turn.

Josh Juran’s Advanced Mac Substitute has been making some very rapid progress lately. This is an approach to Mac emulation where the OS is replaced with a workalike copy, and this is a much heavier lift than the traditional emulation technique of ripping the ROM image from your own Mac and feeding it to a system like Basilisk II. (I haven’t yet done any work for Bumbershoot with the Atari line of home computers, but it turns out that Atari emulation has advanced so far down that path that modern replacement kernels are preferred alternatives even on real hardware, so this isn’t without precedent, but it is the first time I’ve mentioned it here.) This project has been going on for awhile, but he’s gotten The Fool’s Errand to run in it. I’d gotten the impression that this game was written in some kind of compiled BASIC, so it’s kind of alarming that its runtime would rely on largely undocumented behavior.

This has been going on for many years now, but Casey Muratori of Molly Rocket has been building a modern Windows game completely from scratch in a series of live-coding sessions with commentary and Q&A. This is the Handmade Hero project and while I’ve mostly just been sampling the four years of archives, it’s definitely rewarded that time. I found it while looking for information on writing runnable programs in C without actually linking in the C runtime.

Mark Brown’s YouTube channel has also been going on for some years, but I’ve only recently discovered it. His videos are mostly game criticism, which is very much not the same thing as game reviews. They are in-depth and knowledgable works in the vein of what I’ve posted on this blog under the theorycrafting category. In addition to these videos (which he publishes under the series name “Game Maker’s Toolkit”), he also has a series (“Boss Keys”) that systematically analyzes the structure and operation of the dungeons in the Legend of Zelda series. A second season (which focuses on Metroid, Castlevania: Symphony of the Night, and the games they inspired) has just started, and I’m looking forward to it.

As long as I’m linking to YouTube channels that do deep analysis on game subjects, 8-Bit Music Theory has been a long-time favorite of mine. My actual music theory background is a bit weak—nothing formal beyond piano and clarinet lessons as a child, though apparently as a child I took quite quickly to the theoretical aspects of the lessons—but the sheer enthusiasm he brings to these videos is itself worth the price of admission, and I usually end up learning something along the way.

Finally, the annual Interactive Fiction Competition is running. I spent about ten years reviewing every entry that was submitted to that annual comp, but my interests in game criticism started diverging from where IFComp entrants were experimenting, and the competition also started getting much larger. I took a bit of a hiatus from it, but I’m hoping to at least play a dozen or so entries this time around.

As for my own work…

These past few weeks I’ve mostly been focusing on reworking and rewriting some problematic window management code in the VICE emulator. I’m not 100% sure that I can get an interesting article out of that, but there’s a chance I could get one or two. We’ll see.

As a result of that, this is the first version I’ve made that doesn’t use keyboard controls.

It is the oldest platform I have ever targeted, three and a half years older than the ZX81 and four years older than the IBM PC.

It’s the first port I’ve done with sound.

It is the simplest hardware I’ve targeted for a port; in particular, it is the first to not have a frame buffer or character set. (I’d say “or a text mode”, but a reasonable case can be made that the ZX Spectrum did not have a text mode.)

I’m mostly happy with the results of this project, but the deeper I got into writing it up, the more I found myself disliking aspects of the implementation. There’s a lot of code that’s trivially inefficient or bloated in there, but I feel like I shouldn’t really be bothering with improving that because at no point will that shrink code past a power-of-two boundary that would mean a smaller ROM chip, and at no point is my code so slow that it misses a checkpoint it has to hit. Improving execution speed at this point would just mean that I’d be spending more time idling waiting for the next scanline or for VBLANK to finish.

On the other hand, some of the changes would make the code-as-written much cleaner. One of the obvious issues is that I basically forgot that LDX addr,Y existed as an instruction, but it turns out I don’t even need to use the 48-pixel kernel there at all because my logo fits within 40 pixels, and doing a 40-pixel display kernel is completely trivial and requires no use of VDEL trickery at all. That said, one of the whole reasons I put the header in at all was so that I could implement the 48-pixel routine.

The game flow is also a bit awkward—I modeled gameplay on early Atari cartridges, but later games tended to let you start games with the fire button and have in-game credits. Really doing those right would require revisiting a lot of my earlier design decisions.

But as it is, I managed to meet all my initial goals—input, sound, a game board with no compromises at 60Hz, and the 48px kernel—and I did it in a software package that is just about middle of the road in terms of size. At 973 bytes of code and data, it comes in between my ZX81 implementation (903 bytes) and my first C64 implementation (at 977). The data is spent in different ways, but it’s pretty interesting to see how even with the enormous disparity in system power, the ports all end up weighing in at about the same size.

To produce a solid block of centered pixels, we need to set the players into “three copies close” mode. Each of those three copies will be eight pixels wide and have eight pixels of space between them. The first part of the trick is that we will interleave the two players so that there aren’t any gaps. That means that, for a solid centered display Player 1 needs to be drawn 8 pixels to the right of Player 0 and the fourth player graphic (which is Player 1’s second copy) needs to land on pixel 80, the beginning of the second half of the screen. Working backwards from that we find that our target pixels are 56 and 64. We covered quite early on how to locate our sprites in fixed locations, but there’s a slight wrinkle this time. The closest we can get to our target pixels is to write on cycles 37 and 40. However, if we write on cycles 34 and 37, we save a couple bytes in our placement code and still remain within just within the range of a single corrective HMOVE:

I then remove six extra scanlines from the blank area at the bottom to balance it out. This moves the game board down a bit but (using a placeholder graphic that’s just a solid block of color) shows us a decently centered total display:

Now to turn that into our logo.

The General Approach

The STA GRPn instructions take three cycles to execute, during which time the display will advance nine pixels. We need to time writes to these registers so that the graphics are changed just before they’re used. At the time that’s happening, the other set of graphics are being consulted, so we do a have a little leeway here. We also have the advantage of being able to preload some values into the graphics first, so if the first two graphics values are fine, we simply need to update the next four graphics blocks. The challenge here is that we only have the A, X and Y registers to work with, so we can only store three values in a row without some kind of load operation—and load operations all cost unacceptably large amounts of time. The crucial insight to surmounting this challenge—which I see credited primarily to Dave Staugas at Atari—is that there are a few extra places to stash data.

Vertical Delay and the Shadow Registers

GRP0 and GRP1 are the two sprite graphics registers that the TIA chip admits to having, and they’re the only ones that can be directly written. However, it also has some internal registers for storing older values of them that were intended to be used to shift certain graphics down a scanline within a display kernel that took more than one scanline per loop. (Hence the name of the control flags for this, which have names like VDELP0 for Vertical DELay Player 0.) What they actually do is switch to displaying the “older” values of that graphic. This technique—the TIA Hardware Guide I’ve been using for reference calls it “The Venerable Six-Digit Score Trick”, and I’ve also seen it called the “Staugas Score Kernel”—uses those shadow registers to preload enough graphics data to let us display the entire 48 pixels.

These shadow registers don’t simply record the previous value of a write to a graphics register, unfortunately. Instead, the way it works is that any time you write to one of the player’s graphics registers, the “current” value of the other player’s graphics register is copied to its “old” value. This means that for any string of writes to the registers, there can be at most three unique values stored. (since writing the third value has wiped out one of the earlier ones). Fortunately, three unique values is all we need, since we’ve got A, X, and Y to cover the pending values. We’ll preload the graphics registers with three values, preload the registers with the remaining three, and then juggle the values so they display properly across the whole scanline.

This will be a mostly invisible update; I’m just laying the groundwork to prepare for the final update where I complete the project. There’s a few interesting things in here, but a distressing amount of it works out to me struggling with tools that I myself built to get them to do things I didn’t quite design them to do. Still, it’s got to be done, so let’s get to work.

The Atari 2600’s sound capabilities are rather modest and unambitious, all told. The overall principle is very similar to the one we saw with the SN76489—we have a number of channels, and they produce a square wave or a modified square wave of some kind. We pick our volume independently for each channel, and we select our frequency by picking a divisor for some fundamental tone. The key differences with the SN76489 are:

There are only two channels on the Atari 2600 instead of four.

Each channel on the Atari has a selectable waveform, as opposed to the SN76489’s fixed square waves on three channels and choice of two noise waveforms on the fourth.

The waveform (“distortion”) selector for the Atari has sixteen options. It turns out there is some overlap; only ten of the selections are unique, and two of them are effectively square waves with different base frequencies. The end result is nine different waveforms and ten waveform control options.

There are only 31 available divisors for the base frequency. While there were a handful of heroic attempts to render background music into arcade ports, the results are a bit cringeworthy at best. The 2600’s sound circuitry is intended for sound effects. If you do want to make a little ditty, though, at least you’ve got two waveform options with different frequency ranges.

These options are ultimately sufficiently limited that in order to identify what sounds we want to use we’re better off just exhaustively listing possibilities and experimenting with settings rather than trying to compute tones or effects from first principles. Duane Alan Hahn, aka Random Terrain, not only produced those exhaustive tables but also very nice program about ten years ago that allows this kind of experimentation called Tone Toy 2008:

Playing around with these a bit, I settle on a few sound effects I like.

Setting the waveform to 12 (“Pure Lower Tone”) and frequencies of 10 and 5 made some nice beep sounds I could use for making moves.

Setting the waveform to 15 (“Electronic Rumble”) and then repeatedly counting down the frequency divider made a little electric whooping sound that worked as a victory marker. The Tone Toy will let you do sweeps but it always sweeps at a speed of one value per frame; to get the sound I wanted it would need to be about twice that fast.